Skip to content

CloudWatch Agent部署-手动部署

为什么会需要部署CloudWatch Agent?因为CloudWatch云控制台默认监控了EC2的CPU、磁盘、网络指标。而内存被认为属于操作系统级别的指标。因此当需要监控内存指标的时候,就需要部署Agent采集数据。

部署Agent有三种方式:

  • 手动安装Agent:手动部署适合少量的场景;且部署过程冗长繁琐。(本案例)
  • Systems Manager:前提是先部署Systems Manager Agent;适合批量的、长期的服务器。
  • CloudFormation:代码部署,类似Terrafrom,需要了解其语法。

演示环境使用Ubuntu 24。

  • 打开IAM控制台->用户->创建用户。
  • 用户名:cloudwatch_agent
  • 权限选项:“直接附加策略”
  • 权限策略:CloudWatchAgentServerPolicy,选中,下一步。
  • 创建用户。
  • 选中刚刚创建的cloudwatch_agent用户->安全凭据->创建访问密钥。
  • 使用案例:“第三方服务”
  • 打开IAM控制台->角色->创建角色。
  • 可信实体类型:AWS服务
  • 使用案例:EC2
  • 权限策略:CloudWatchAgentServerPolicy
  • 角色名称:CloudWatchAgentServerPolicy
  • 创建角色

转到EC2控制台->选中实例->操作->安全->修改IAM角色->CloudWatchAgentServerPolicy

官方文档地址

架构平台下载链接
x86-64Amazon Linux 2023 和 Amazon Linux 2https://amazoncloudwatch-agent.s3.amazonaws.com/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm
x86-64Centoshttps://amazoncloudwatch-agent.s3.amazonaws.com/centos/amd64/latest/amazon-cloudwatch-agent.rpm
x86-64Redhathttps://amazoncloudwatch-agent.s3.amazonaws.com/redhat/amd64/latest/amazon-cloudwatch-agent.rpm
x86-64SUSEhttps://amazoncloudwatch-agent.s3.amazonaws.com/suse/amd64/latest/amazon-cloudwatch-agent.rpm
x86-64Debianhttps://amazoncloudwatch-agent.s3.amazonaws.com/debian/amd64/latest/amazon-cloudwatch-agent.deb
x86-64Ubuntuhttps://amazoncloudwatch-agent.s3.amazonaws.com/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb
x86-64Oraclehttps://amazoncloudwatch-agent.s3.amazonaws.com/oracle_linux/amd64/latest/amazon-cloudwatch-agent.rpm
x86-64Windowshttps://amazoncloudwatch-agent.s3.amazonaws.com/windows/amd64/latest/amazon-cloudwatch-agent.msi
sudo apt update
curl -O https://amazoncloudwatch-agent.s3.amazonaws.com/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb
sudo chmod +x amazon-cloudwatch-agent.deb && sudo apt install -y ./amazon-cloudwatch-agent.deb

可以通过创建向导或者手动编辑的方式来创建一个或多个配置文件。推荐使用向导的方式生成配置文件;而手动编辑可以更精准的控制参数。

CloudWatch代理使用一个名为common-config.toml的配置文件。在Linux上,该文件位于/opt/aws/amazon-cloudwatch-agent/etc目录中。在Windows上,该文件位于C:\ProgramData\Amazon\AmazonCloudWatchAgent目录中。

向导生成的文件为bin/config.json

运行代理配置向导

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard

!!! warning - 此向导将通过问答的方式进行选择。如果选择错误或者想重新配置,使用++Ctrl+c++按键中断Shell,之后再次运行即可。 - 本次演示创建仅使用CloudWatch Agent的方式,跳过collectD和其他配置。 - 每个步骤都有默认选项,符合预期选择直接按回车即可,否则请填写对应的数字。

选择你的系统类型,3为macOS

================================================================
= Welcome to the Amazon CloudWatch Agent Configuration Manager =
= =
= CloudWatch Agent allows you to collect metrics and logs from =
= your host and send them to CloudWatch. Additional CloudWatch =
= charges may apply. =
================================================================
On which OS are you planning to use the agent?
1. linux
2. windows
3. darwin
default choice: [1]:

1为EC2实例;2为本地部署(非AWS实例)

Trying to fetch the default region based on ec2 metadata...
I! imds retry client will retry 1 timesAre you using EC2 or On-Premises hosts?
1. EC2
2. On-Premises
default choice: [1]:

你将使用哪个用户来运行Agent?

Which user are you planning to run the agent?
1. cwagent
2. root
3. others
default choice: [1]:2

是否将进程以服务的方式运行?建议选择默认。

Do you want to turn on StatsD daemon?
1. yes
2. no
default choice: [1]:

选择服务的监听端口。如果本机开启了防火墙,需要放行此端口。

Which port do you want StatsD daemon to listen to?
default choice: [8125]

数据采集间隔。可以根据服务器角色进行选择。如果服务器运行的是关键性业务,且对可以选择1,否则建议选择2或3。过于频繁的采集数据也会对造成网络和CPU的负载。

What is the collect interval for StatsD daemon?
1. 10s
2. 30s
3. 60s
default choice: [1]:

选择数据上报的周期。

What is the aggregation interval for metrics collected by StatsD daemon?
1. Do not aggregate
2. 10s
3. 30s
4. 60s
default choice: [4]:

是否从CollectD程序监控指标,前提是它必须得被安装,否则Agent将启动失败。选择否

Do you want to monitor metrics from CollectD? WARNING: CollectD must be installed or the Agent will fail to start
1. yes
2. no
default choice: [1]:2

你想监控主机的其它指标吗?如CPU、内存等。

Do you want to monitor any host metrics? e.g. CPU, memory, etc.
1. yes
2. no
default choice: [1]:

你想监控每个CPU核心的指标吗?

Do you want to monitor cpu metrics per core?
1. yes
2. no
default choice: [1]:

你想添加EC2的多维度(映像ID、实例ID、实例类型、自动伸缩组名称)到监控指标吗?

Do you want to add ec2 dimensions (ImageId, InstanceId, InstanceType, AutoScalingGroupName) into all of your metrics if the info is available?
1. yes
2. no
default choice: [1]:

你是否希望以实例ID的方式聚合?

Do you want to aggregate ec2 dimensions (InstanceId)?
1. yes
2. no
default choice: [1]:

你是否希望以高解析度(秒级)。这将开启秒级解析度采集所有指标,但它支持通过json文件修改。

Would you like to collect your metrics at high resolution (sub-minute resolution)? This enables sub-minute resolution for all metrics, but you can customize for specific metrics in the output json file.
1. 1s
2. 10s
3. 30s
4. 60s
default choice: [4]:

你想配置哪个默认指标?

Which default metrics config do you want?
1. Basic
2. Standard
3. Advanced
4. None
default choice: [1]:

你对上述配置满意吗?它可以在向导完成后手动修改。

# ...
Are you satisfied with the above config? Note: it can be manually customized after the wizard completes to add additional items.
1. yes
2. no
default choice: [1]:

你有任何CloudWatch代理生成的日志文件要导入吗?选择否

Do you have any existing CloudWatch Log Agent (http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html) configuration file to import for migration?
1. yes
2. no
default choice: [2]:

你想监控所有的日志文件吗?选择否

Do you want to monitor any log files?
1. yes
2. no
default choice: [1]:2

您是否希望 CloudWatch 代理也检索 X-ray 跟踪?选择否

Do you want the CloudWatch agent to also retrieve X-ray traces?
1. yes
2. no
default choice: [1]:2

你有任何已经配置的x-ray配置文件需要导入吗?选择否

Do you have an existing X-Ray Daemon configuration file to import for migration?
1. yes
2. no
default choice: [1]:2

请检查以上配置文件的内容,文件存储在/opt/aws/amazon-cloudwatch-agent/bin/config.json。如果需要可以手动在向导完成后编辑。

你希望将此配置文件存储到SSM的参数仓库吗?选择2退出

Please check the above content of the config.
The config file is also located at /opt/aws/amazon-cloudwatch-agent/bin/config.json.
Edit it manually if needed.
Do you want to store the config in the SSM parameter store?
1. yes
2. no
default choice: [1]:2

此时,Agent还尚未启动。应该开放安全组的8125端口。

经过向导生成的config.json文件存放在bin/目录下,需要将其拷贝并且重命名到etc/目录下。

sudo cp -v /opt/aws/amazon-cloudwatch-agent/bin/config.json /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
  • 如果是AWS的root用户,则直接创建访问密钥即可;
  • 如果是子账户,则需要选择第四项“第三方服务”
sudo tee /opt/aws/amazon-cloudwatch-agent/etc/credentials > /dev/null <<EOF
[AmazonCloudWatchAgent]
aws_access_key_id = 访问密钥
aws_secret_access_key = 访问密钥密码
EOF

修改etc/common-config.toml文件。将shared_credential_file取消注释,并填写credential文件的地址。

cd /opt/aws/amazon-cloudwatch-agent/etc/
sudo sed -i '$a\shared_credential_file = "/opt/aws/amazon-cloudwatch-agent/etc/credentials"' common-config.toml

启动Agent并查看其状态。

for value in enable restart status; do sudo systemctl ${value} amazon-cloudwatch-agent; done
sudo tail -f /var/log/amazon/amazon-cloudwatch-agent/amazon-cloudwatch-agent.log

正常情况下,日志看起来是这个样子。

2025-06-20T04:20:04Z I! {"caller":"ec2tagger/ec2tagger.go:533","msg":"ec2tagger: Initial retries succeeded","kind":"processor","name":"ec2tagger","pipeline":"metrics/hostCustomMetrics"}
2025-06-20T04:20:04Z I! {"caller":"ec2tagger/ec2tagger.go:444","msg":"ec2tagger: EC2 tagger hasfinished initial retrieval of tags and Volumes","kind":"processor","name":"ec2tagger","pipeline/hostCustomMetrics"}
2025-06-20T04:24:36Z I! {"caller":"ec2tagger/ec2tagger.go:251","msg":"ec2tagger: Refresh for voo longer needed, stop refreshTicker.","kind":"processor","name":"ec2tagger","pipeline":"metrics

完成EC2实例手动部署Agent全部过程。现在可以在CloudWatch的控制台->指标->全部指标,会出现一个自定义命名空间,其名称为CWAgent

!!! note 命名空间CWAgent无法手动删除,在没有数据更新的情况下,15天之后AWS会自动删除。

向导只需要运行一次,配置文件可以通用。

配置文件内容

{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"metrics": {
"aggregation_dimensions": [
[
"InstanceId"
]
],
"append_dimensions": {
"AutoScalingGroupName": "${aws:AutoScalingGroupName}",
"ImageId": "${aws:ImageId}",
"InstanceId": "${aws:InstanceId}",
"InstanceType": "${aws:InstanceType}"
},
"metrics_collected": {
"disk": {
"measurement": [
"used_percent"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
},
"statsd": {
"metrics_aggregation_interval": 60,
"metrics_collection_interval": 10,
"service_address": ":8125"
}
}
}
}
  1. 确保已经安装并启动了collectD程序;
  2. 确保开放了8125端口;如果当时选择了否,则跳过此步骤1和2。

以Ubuntu为例,安装collectD

sudo apt install collectD

尽管collectD程序的主要作用是数据采集,但是刚好它也携带了一个简单的图形展示。使用cgi脚本,因此可以使用Apache2来进行访问。在Ubuntu/Debian中的/usr/share/doc/collectd/examples/collection3,在这个目录中包含了Web程序。甚至还有PHP程序用来展示图形数据。

安装Apache2并进行配置

sudo apt install apache2

拷贝图形程序到对应的目录。

sudo cp -r /usr/share/doc/collectd/examples/collection3 /var/www/
sudo chown -R www-data:www-data /var/www/collection3/bin
sudo chmod -R 775 /var/www/collection3
# cgi脚本需要执行权限
sudo chmod +x /var/www/collection3/bin
# 开启apche2的cgi模块
sudo a2enmod cgi

编辑文件/etc/apache2/sites-available/000-default.conf

<VirtualHost *:80> 段中添加如下配置:

ScriptAlias /cgi-bin/ /var/www/collection3/bin/
<Directory "/var/www/collection3/bin">
Options +ExecCGI
AddHandler cgi-script .cgi .pl .py
Require all granted
</Directory>

改变DocumentRoot的值,修改完后看起来是这个样子的。

DocumentRoot /var/www/collection3/

由于cgi脚本是Perl写的,因此需要安装Perl软件包

sudo apt update
sudo apt install librrds-perl libconfig-general-perl libhtml-parser-perl libregexp-common-perl libcgi-pm-perl

验证安装

perl -MCGI -e 'print "CGI module is installed\n";'

如果没报错并打印出 CGI module is installed,说明成功了。

启动Apache2

sudo systemctl enable apache2 && sudo systemctl start apache2

测试是否可以访问

curl -I localhost/bin/index.cgi
# 输出这些即为正常
HTTP/1.1 200 OK
Date: Thu, 19 Jun 2025 09:13:13 GMT
Server: Apache/2.4.58 (Ubuntu)
Vary: Accept-Encoding
Content-Type: text/html; charset=UTF-8

似乎在一些就绪之后,期望的页面并没有出现。如果在Apache2的日志中发现了这样的错误(没有找到CGI模块),则表示libcgi-pm-perl软件安装出现了问题,导致模块没有被识别。

Can't locate CGI.pm in @INC (you may need to install the CGI module) (@INC entries checked: /var/www/collection3/lib /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.38.2 /usr/local/share/perl/5.38.2 /usr/lib/x86_64-linux-gnu/perl5/5.38 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl-base /usr/lib/x86_64-linux-gnu/perl/5.38 /usr/share/perl/5.38 /usr/local/lib/site_perl) at /var/www/collection3/bin/index.cgi line 40.

此时,可以通过http://{ip}/bin/index.cgi访问。

默认情况下,collectD已经开启了对CPU、网络接口、负载和内存的检测。其配置文件在/etc/collectd/目录下。更多关于安装信息,可以参考collectd wiki GitHub

可以查看配置文件来确认都开启了哪些插件。其中LoadPlugin就是已经开启的插件。

cat collectd.conf | grep -Ev "^$|#"
# 输出
FQDNLookup true
LoadPlugin syslog
<Plugin syslog>
LogLevel info
</Plugin>
LoadPlugin battery
LoadPlugin cpu
LoadPlugin df
LoadPlugin disk
LoadPlugin entropy
LoadPlugin interface
LoadPlugin irq
LoadPlugin load
LoadPlugin memory
LoadPlugin processes
LoadPlugin rrdtool
LoadPlugin swap
LoadPlugin users
<Plugin df>
FSType rootfs
FSType sysfs
FSType proc
FSType devtmpfs
FSType devpts
FSType tmpfs
FSType fusectl
FSType cgroup
IgnoreSelected true
</Plugin>
<Plugin rrdtool>
DataDir "/var/lib/collectd/rrd"
</Plugin>
<Include "/etc/collectd/collectd.conf.d">
Filter "*.conf"
</Include>

如果当前操作系统开启了SELINUX,则可以通过这篇文档的指导完成配置通过安全增强型 Linux (SELinux) 设置 CloudWatch 代理