使用robusta收集事件pod崩溃OOM日志

2023年 7月 15日 37.8k 0

robusta的功能远不止本章介绍的这些,它可以去监控Kubernetes,提供观测性,可以于prometheus接入,作为告警的二次处理,自动修复等,也提供了事件的时间线。

此前使用的是阿里的kube-eventer,kube-eventer仅仅只是提供了一个转发,因此kube-eventer只能解决的是事件触发的通知。

当然, 如果robusta也是仅仅止步于此,那也没用多少必要性去使用它。它还提供了另外一种非常有用的功能: 事件告警。 在robusta的事件告警中,当侦测到后,会将预设中预设的pod状态连同最近一段日志发送到slack. 这也是为什么会有这篇文章最重要的原因。

基础依赖

python版本必须等于大于3.7,于是我们升级版本

升级python

wget https://www.python.org/ftp/python/3.9.16/Python-3.9.16.tar.xz
yum install gcc zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel libffi-devel -y
yum install libffi-devel -y
yum install zlib* -y

tar xf Python-3.9.16.tar.xz
cd Python-3.9.16
./configure --with-ssl --prefix=/usr/local/python3  
make 
make install
rm -rf  /usr/bin/python3 /usr/bin/pip3
ln -s /usr/local/python3/bin/python3 /usr/bin/python3
ln -s /usr/local/python3/bin/pip3 /usr/bin/pip3

准备国内源

mkdir -p ~/.pip/
cat > ~/.pip/pip.conf << EOF
[global]
trusted-host =  mirrors.aliyun.com
index-url = http://mirrors.aliyun.com/pypi/simple
EOF

robusta.dev

参考官方文档开始安装

pip3 install -U robusta-cli --no-cache
robusta gen-config

由于网络问题,我个人将使用使用docker进行配置

curl -fsSL -o robusta https://docs.robusta.dev/master/_static/robusta
chmod +x robusta
./robusta gen-config
  • 开始之前,务必下载我中转的镜像

    docker pull registry.cn-zhangjiakou.aliyuncs.com/marksugar-k8s/robusta-cli:latest
    docker tag  us-central1-docker.pkg.dev/genuine-flight-317411/devel/robusta-cli:latest registry.cn-zhangjiakou.aliyuncs.com/marksugar-k8s/robusta-cli:latest
[root@master1 opt]# ./robusta  gen-config
Robusta reports its findings to external destinations (we call them "sinks").
We'll define some of them now.

Configure Slack integration? This is HIGHLY recommended. [Y/n]: y
# 强烈建议配置slack
If your browser does not automatically launch, open the below url:
https://api.robusta.dev/integrations/slack?id=64a3ee7c-5691-466f-80da-85e8ece80359
# 浏览器打开
======================================================================
Error getting slack token ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
======================================================================
======================================================================
Error getting slack token HTTPSConnectionPool(host='api.robusta.dev', port=443): Max retries exceeded with url: /integrations/slack/get-token?id=64a3ee7c-5691-466f-80da-85e8ece80359 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f50b1f18cd0>: Failed to establish a new connection: [Errno 110] Connection timed out'))
======================================================================
You've just connected Robusta to the Slack of: crow as a cock
Which slack channel should I send notifications to? #

根据提示打开If your browser does not automatically launch, open the below url: https://api.robusta.dev/integrations/slack?id=64a3ee7c-5691-466f-80da-85e8ece80359

image-20230202103534363.png

勾选允许

image-20230202103642681.png

如下

image-20230202104705985.png

此时slack已经有了 robusta应用

image-20230202103836328.png

继续下一步,在提示种选择了频道后

Which slack channel should I send notifications to? # devops

会受到一封消息

image-20230202105302497.png

执行完成后,如下:

[root@master1 opt]# ./robusta  gen-config
Robusta reports its findings to external destinations (we call them "sinks").
We'll define some of them now.

Configure Slack integration? This is HIGHLY recommended. [Y/n]: y
If your browser does not automatically launch, open the below url:
https://api.robusta.dev/integrations/slack?id=d1fcbb13-5174-4027-a176-a3dcab10c27a
======================================================================
Error getting slack token HTTPSConnectionPool(host='api.robusta.dev', port=443): Max retries exceeded with url: /integrations/slack/get-token?id=d1fcbb13-5174-4027-a176-a3dcab10c27a (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f0ec508eee0>: Failed to establish a new connection: [Errno 110] Connection timed out'))
======================================================================
You've just connected Robusta to the Slack of: crow as a cock
Which slack channel should I send notifications to? # devops
Configure MsTeams integration? [y/N]: n
配置MsTeams集成?[y / N]: N
Configure Robusta UI sink? This is HIGHLY recommended. [Y/n]: y
配置Robusta UI接收器?这是强烈推荐的。[Y / n]:
Enter your Gmail/Google address. This will be used to login: user@gmail.com
输入您的Gmail/谷歌地址。这将用于登录:
Choose your account name (e.g your organization name): marksugar
选择您的帐户名称(例如您的组织名称):
Successfully registered.

Robusta can use Prometheus as an alert source.
If you haven't installed it yet, Robusta can install a pre-configured Prometheus.
Would you like to do so? [y/N]: y
罗布斯塔可以使用普罗米修斯作为警报源。
如果你还没有安装它,罗布斯塔可以安装一个预先配置的Prometheus。
你愿意这样做吗?[y / N]:
Please read and approve our End User License Agreement: https://api.robusta.dev/eula.html
Do you accept our End User License Agreement? [y/N]: y
请阅读并批准我们的最终用户许可协议:https://api.robusta.dev/eula.html
您是否接受我们的最终用户许可协议?[y / N]:
Last question! Would you like to help us improve Robusta by sending exception reports? [y/N]: n
最后一个问题!你愿意通过发送异常报告来帮助我们改进Robusta吗?[y / N]:

Saved configuration to ./generated_values.yaml - save this file for future use!
Finish installing with Helm (see the Robusta docs). Then login to Robusta UI at https://platform.robusta.dev

By the way, we'll send you some messages later to get feedback. (We don't store your API key, so we scheduled future messages using Slack'sAPI)
保存配置到。/generated_values。保存这个文件以备将来使用!
完成Helm的安装(参见罗布斯塔文档)。然后登录到罗布斯塔用户界面https://platform.robusta.dev

顺便说一下,我们稍后会给你发一些信息以获得反馈。(我们不存储你的API密钥,所以我们使用Slack的API来安排未来的消息)

上述完成后,创建了一个generated_values.yaml

globalConfig:
  signing_key: 92a8195-a3fa879b3f88
  account_id: 79efaf9c433294
sinksConfig:
- slack_sink:
    name: main_slack_sink
    slack_channel: devops
    api_key: xoxb-4715825756487-4749501ZZylPy1f
- robusta_sink:
    name: robusta_ui_sink
    token: eyJhY2NvjIn0=
enablePrometheusStack: true
enablePlatformPlaybooks: true
runner:
  sendAdditionalTelemetry: false
rsa:
  private: LS0tLS1CRUdJTiBRCBSU0EgUFJJVkFURSBLRVktLS0tLQo=
  public: LS0tLS1CRUdJTiBQTElDIEtFWS0tLS0tCg==

helm

紧接着使用上述创建的yaml文件进行安装。我们适当调整下内容

关于触发器的种类非常多,我们可以参考:example-triggers, java-troubleshooting,event-enrichmentmiscellaneous,kubernetes-triggers。我们可以针对某一组pod或者名称空间进行过滤去监控的特定的信息。

我们节选一些测试,并且加到generated_values.yaml种,如下:

globalConfig:
  signing_key: 92a8195-a3fa879b3f88
  account_id: 79efaf9c433294
sinksConfig:
- slack_sink:
    name: main_slack_sink
    slack_channel: devops
    api_key: xoxb-4715825756487-4749501ZZylPy1f
- robusta_sink:
    name: robusta_ui_sink
    token: eyJhY2NvjIn0=
enablePrometheusStack: false
enablePlatformPlaybooks: true
runner:
  sendAdditionalTelemetry: false
rsa:
  private: LS0tLS1CRUdJTiBRCBSU0EgUFJJVkFURSBLRVktLS0tLQo=
  public: LS0tLS1CRUdJTiBQTElDIEtFWS0tLS0tCg==
  
customPlaybooks:
- triggers:
    - on_deployment_update: {}
  actions:
    - resource_babysitter:
        omitted_fields: []
        fields_to_monitor: ["spec.replicas"]

- triggers:
  - on_pod_crash_loop:
      restart_reason: "CrashLoopBackOff"
      restart_count: 1
      rate_limit: 3600
  actions:
  - report_crash_loop: {}

- triggers:
  - on_pod_oom_killed:
      rate_limit: 900
      exclude:
        - name: "oomkilled-pod"
          namespace: "default"
  actions:
  - pod_graph_enricher:
      resource_type: Memory
      display_limits: true

- triggers:
  - on_container_oom_killed:
      rate_limit: 900
      exclude:
        - name: "oomkilled-container"
          namespace: "default"
  actions:
  - oomkilled_container_graph_enricher:
      resource_type: Memory

- triggers:
  - on_job_failure:
      namespace_prefix: robusta
  actions:
  - create_finding:
      title: "Job $name on namespace $namespace failed"
      aggregation_key: "Job Failure"
  - job_events_enricher: 

runner:
  image: registry.cn-zhangjiakou.aliyuncs.com/marksugar-k8s/robusta-runner:0.10.10
  imagePullPolicy: IfNotPresent
kubewatch:
  image: registry.cn-zhangjiakou.aliyuncs.com/marksugar-k8s/kubewatch:v2.0
  imagePullPolicy: IfNotPresent

现在我们开始使用helm安装

helm repo add robusta https://robusta-charts.storage.googleapis.com && helm repo update
helm upgrade --install robusta --namespace robusta  --create-namespace  robusta/robusta -f ./generated_values.yaml 
--set clusterName=test
    
也可以使用如下命令调试    
helm upgrade --install robusta --namespace robusta  robusta/robusta -f ./generated_values.yaml  --set clusterName=test --dry-run 

如下

[root@master1 opt]# helm upgrade --install robusta --namespace robusta  --create-namespace  robusta/robusta -f ./generated_values.yaml 
> --set clusterName=test
Release "robusta" does not exist. Installing it now.
NAME: robusta
LAST DEPLOYED: Thu Feb  2 15:58:32 2023
NAMESPACE: robusta
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing Robusta 0.10.10

As an open source project, we collect general usage statistics.
This data is extremely limited and contains only general metadata to help us understand usage patterns.
If you are willing to share additional data, please do so! It really help us improve Robusta.

You can set sendAdditionalTelemetry: true as a Helm value to send exception reports and additional data.
This is disabled by default.

To opt-out of telemetry entirely, set a ENABLE_TELEMETRY=false environment variable on the robusta-runner deployment.


Visit the web UI at: https://platform.robusta.dev/

等待pod就绪

[root@master1 opt]# kubectl -n robusta get pod -w
NAME                                 READY   STATUS              RESTARTS   AGE
robusta-forwarder-78964b4455-vnt77   1/1     Running             0          2m55s
robusta-runner-758cf9c986-87l4x      0/1     ContainerCreating   0          2m55s
robusta-runner-758cf9c986-87l4x      1/1     Running             0          7m6s

此时如果你的集群上pod有异常状态的而崩溃的,在被删除前,将会将日志发送到slack, slack上已经可以收到日志信息了

image-20230202160816942.png

选择点击以展开内联,即可查看详细信息

image-20230202160852978.png

相关文章

LeaferJS 1.0 重磅发布:强悍的前端 Canvas 渲染引擎
10分钟搞定支持通配符的永久有效免费HTTPS证书
300 多个 Microsoft Excel 快捷方式
一步步配置基于kubeadmin的kubevip高可用
istio全链路传递cookie和header灰度
REST Web 服务版本控制

发布评论