使用AWS Systems Manager简化AWS EC2内存监控设置

2023年 10月 14日 60.5k 0

国内的云服务商一般都会为用户的的服务器安装OS内的监控插件,通过插件来提供CPU,磁盘和内存指标监控。但是这在AWS并不是默认的,原因是在国外用户的隐私和信息安全是非常重要的,安全是AWS的第一优先级,AWS不会在用户未明确许可的情况下在服务器的OS里面安装指标收集插件并主动的收集部分指标。

默认情况下,AWS的监控服务Cloudwatch并没有对EC2内的内存总量和使用情况进行监控,因为内存属于用户操作系统内的信息,在AWS的产品设计中,所有系统内的信息都属于用户的私有财产和信息。所以默认情况下,AWS的Cloudwatch不收集相关信息,除非你主动进行相关的设置,在服务器系统内安装CloudWatch agent

实际使用的项目中,以内存监控为代表的系统、应用层面的监控是系统监控中的非常重要的一环,所以AWS提供了Cloudwatch Agent来帮助用户将EC2实例中的系统层面的信息,事实上,Cloudwatch Agent不仅仅能够收集内存信息,还能在更多系统层面收集信息,比如: CPU Active/Idle timeDisk IO TimeNetwork的包转发数等等,相比EC2的默认Cloudwatch,它可以提供更为详细和多样性的监控。

一般情况下,监控某台服务器我们可以为它安装指标收集插件。我们可以参考Monitor memory and disk metrics for Amazon EC2 Linux instances,里面涉及到比较多的步骤,如果被监控的服务器比较多的话需要我们逐个服务器进行设置,比较麻烦。那么我们怎么自动的为我们的服务器做好这些步骤呢?答案就算使用AWS Systems Manager来自动化完成。

AWS Systems Manager 设置

先下载cloudfromation模板,使用浏览器下载 mem-metrics-cfn-temp.yaml,或者使用wget下载:wget https://d2908q01vomqb2.cloudfront.net/artifacts/MTBlog/cloudops-1223/mem-metrics-cfn-temp.yaml

备注:建议进行细节的修改更准确的获取EC2的role,因为在容器或者某些场景下instance profile name不一定总是和role name一致。

例如我的EKS节点的role和instance profile不一致

完整的代码如下:

---
AWSTemplateFormatVersion: '2010-09-09'
Description: A sample template to create a AWS Systems Manager Automation Document that installs Amazon CloudWatch agent, sets up necessary permissions and configures CloudWatch agent to publish memory metrics
  to CloudWatch
Resources:
  SsmMemMetricsAutomationRole:
    Type: AWS::IAM::Role
    Properties:
      Description: AWS IAM role for AWS Systems Manager to execute automation document
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: ssm.amazonaws.com
            Action: sts:AssumeRole
            Condition:
              StringEquals:
                aws:SourceAccount: !Sub ${AWS::AccountId}
              ArnLike:
                aws:SourceArn: !Sub arn:${AWS::Partition}:ssm:*:${AWS::AccountId}:automation-execution/*
      ManagedPolicyArns:
        - !Sub arn:${AWS::Partition}:iam::aws:policy/service-role/AmazonSSMAutomationRole
      Path: /
      Policies:
        - PolicyName: SsmMemMetricIamPolicy
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - iam:GetRole
                  - iam:GetInstanceProfile
                  - iam:GetPolicy
                  - iam:AttachRolePolicy
                  - iam:ListInstanceProfiles
                  - ec2:DescribeInstances
                Resource: '*'
  CloudWatchAgentConfigFile:
    Type: AWS::SSM::Parameter
    Properties:
      Name: CloudwatchAgentConfigForMemoryMetricsLinux.json
      Description: Store CloudWatch Agent configuration file as AWS Systems Manager Parameter
      Type: String
      Value: |
        {
            "agent": {
                    "metrics_collection_interval": 60,
                    "run_as_user": "cwagent"
            },
            "metrics": {
                    "append_dimensions": {
                        "InstanceId": "${aws:InstanceId}"
                    },
                    "metrics_collected": {
                    "mem": {
                            "measurement": [
                                "mem_used_percent"
                            ],
                            "metrics_collection_interval": 60
                    }
                }
            }
        }
  MemoryMetricsRunbook:
    Type: AWS::SSM::Document
    Properties:
      DocumentFormat: YAML
      DocumentType: Automation
      Name: ConfigureMemoryMetricsOnEC2Linux
      Content:
        description: Install CloudWatch Agent, Add permissions to target instances and configure CloudWatch agent to publish metrics
        schemaVersion: '0.3'
        assumeRole: '{{AutomationAssumeRole}}'
        parameters:
          InstanceId:
            type: String
            description: Select instances
          AutomationAssumeRole:
            type: String
            description: (Optional) The ARN of the role that allows Automation to perform the actions on your behalf.
            default: ''
            allowedPattern: ^arn:aws(-cn|-us-gov)?:iam::d{12}:role/[w+=,.@_/-]+|^$
        mainSteps:
          - name: AttachCloudWatchAgentServerPolicy
            action: aws:executeScript
            onFailure: Abort
            isCritical: true
            timeoutSeconds: 600
            description: |
              ## Find the attached role, attach CloudWatchAgentServer managed policy to the role
            inputs:
              Runtime: python3.8
              Handler: attach_cloudwatch_agent_managed_policy
              InputPayload:
                InstanceIds: '{{InstanceId}}'
              Script: |
                import boto3
                ec2_client = boto3.client('ec2')
                iam_client = boto3.client('iam')
                current_session = boto3.session.Session()
                current_region = current_session.region_name
                partition = current_session.get_partition_for_region(current_region)
                cloudwatchagent_policy_arn = f'arn:{partition}:iam::aws:policy/CloudWatchAgentServerPolicy'
                # instances_id = event['InstanceIds']
                def attach_cloudwatch_agent_managed_policy(event,context):
                  # Define the instance ID for which you want to find the IAM role
                  instance_id = event['InstanceIds']

                  # Use the describe_instances() method to get information about the instance
                  response = ec2_client.describe_instances(InstanceIds=[instance_id])

                  # Get the IAM role from the response
                  # 下面这部分作了细节改动,方便更准确获取EC2的role
                  
                  # 获取AWS EC2 IAM instance profile
                  iam_instance_profile_arn = response['Reservations'][0]['Instances'][0]['IamInstanceProfile']['Arn']
                  # 截取instance profile名称
                  ec2_instance_profile_name = iam_instance_profile_arn.split('/')[-1]
                  iam_client = boto3.client('iam')
                  # 根据instance profile名称获取instance profile的详细信息
                  iam_instance_profile = iam_client.get_instance_profile(
                      InstanceProfileName=ec2_instance_profile_name
                  )
                  # 根据instance profile详情获取EC2 role name
                  ec2_iam_role_name = iam_instance_profile['InstanceProfile']['Roles'][0]['RoleName']
                  iam_client.attach_role_policy(RoleName=ec2_iam_role_name, PolicyArn=cloudwatchagent_policy_arn)
          - name: installCWAgent
            action: aws:runCommand
            onFailure: Abort
            inputs:
              Parameters:
                action:
                  - Install
                installationType:
                  - Uninstall and reinstall
                name:
                  - AmazonCloudWatchAgent
              DocumentName: AWS-ConfigureAWSPackage
              InstanceIds:
                - '{{InstanceId}}'
          - name: configureCWAgent
            action: aws:runCommand
            inputs:
              DocumentName: AmazonCloudWatch-ManageAgent
              InstanceIds:
                - '{{InstanceId}}'
              Parameters:
                action: configure
                mode: ec2
                optionalConfigurationSource: ssm
                optionalConfigurationLocation: CloudwatchAgentConfigForMemoryMetricsLinux.json
                optionalRestart: 'yes'
Outputs:
  SsmMemMetricsAutomationRoleName:
    Description: Name of the SSM Automation IAM Role
    Value: !Ref SsmMemMetricsAutomationRole

使用改动过的yaml文件创建cloudfromation stack

aws cloudformation create-stack --stack-name MemoryMetricsAutomation --template-body file://mem-metrics-cfn-temp.yaml --capabilities CAPABILITY_NAMED_IAM

等待cloudfromation stack创建完成

aws cloudformation wait stack-create-complete --stack-name MemoryMetricsAutomation

获取AWS Systems Manager自动执行所需的SsmMemMetricsAutomationRoleName

aws cloudformation describe-stacks --stack-name MemoryMetricsAutomation --query 'Stacks[0].Outputs[?OutputKey==`SsmMemMetricsAutomationRoleName`].OutputValue'

具体的操作如下:

使用AWS Systems Manager安装插件

打开AWS Systems Manager->Change Management->Automation

接着点击Execute automation,选择我们前面创建好的runbook,这里我选择Owned by me->ConfigureMemoryMetricsOnEC2Linux

然后选择simple execution并选中我们需要安装监控的EC2,最后选择前面创建的SsmMemMetricsAutomationRoleName(一般名称为MemoryMetricsAutomation-SsmMemMetricsAutomationRole-xxxx),然后提交

提交后就会进行相关的流程化安装

等待执行完成

等几分钟后内存指标会被上报到cloudwatch中,最后我们可以在cloudwatch里面看到内存的使用情况

总结

  • 被安装内存监控的EC2必须先被赋予一个IAM Role,这样cloudwatch agent才能获取到合适的权限发送指标到cloudwatch
  • 后续新的EC2安装监控,我们可以直接在AWS Systems Manager->Change Management->Automation批量选择这些EC2来简化安装流程
  • 新创建的EC2一般默认不会安装cloudwatch gaent,我们除了可以使用AWS Systems Manager外,也可以设置在EC2启动是自动执行自定义脚步来完成类似步骤,具体的可以使用instance user data来实现,比如将脚本写在Launch template的user data里面来自动的为EC2 Auto Scaling Group里面的EC2进行启动时安装
  • 参考

    • Setup memory metrics for Amazon EC2 instances using AWS Systems Manager
    • 使用Cloudwatch Agent在Cloudwatch中收集、展现EC2的内存及系统、应用日志
    • Monitor memory and disk metrics for Amazon EC2 Linux instances

    相关文章

    服务器端口转发,带你了解服务器端口转发
    服务器开放端口,服务器开放端口的步骤
    产品推荐:7月受欢迎AI容器镜像来了,有Qwen系列大模型镜像
    如何使用 WinGet 下载 Microsoft Store 应用
    百度搜索:蓝易云 – 熟悉ubuntu apt-get命令详解
    百度搜索:蓝易云 – 域名解析成功但ping不通解决方案

    发布评论