kubernetes中skywalking9.0部署使用

应用运维 2023-07-15 宇宙之一粟手机阅读

8.9.0是skywalking发布的最后一个功能版本，从2018年开始，skywalking一直是在服务，端点，实例间依赖的关系和拓扑结构，基于代理跟踪监控发展到全栈，包括日志，跟踪，指标和事件等。也添加了更多，如vm，k8s监控，服务网格。同时也引入了更多的方式来观测，如：ebpf

但是在8.x的版本中使用了组的概念来解决混合的问题，但在v9核心中最重要的概念是LAYER

层代表计算机科学中的一个抽象框架，例如操作系统（VM 层）、Kubernetes（k8s 层）、Service Mesh（典型的 Isto+Envoy 层），这种层将是从不同技术检测到的不同服务的所有者。相比较v8的组。显然，一个新layer概念要好得多。此外，group概念将被保留，因为它在每个layer，组将被设计为最终用户在内部对他们的服务进行分组。使用no group，它将在默认组中。

这些在可视化UI中已经作为一个管理后台的dashboard的样式

可视化（UI）

databases
kubernetes
service mesh
general service
browser

这样导致SkyWalking 9.0.0 看起来是一个全栈 APM 系统，查看更多讨论

部署

skywalking9.0

创建名称空间

apiVersion: v1
kind: Namespace
metadata:
  name: skywalking

给es创建pvc

apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-client-provisioner
labels:
 app: nfs-client-provisioner
1. replace with namespace where provisioner is deployed
namespace: default
spec:
replicas: 1
strategy:
 type: Recreate
selector:
 matchLabels:
   app: nfs-client-provisioner
template:
 metadata:
   labels:
     app: nfs-client-provisioner
 spec:
   serviceAccountName: nfs-client-provisioner
   containers:
        - name: nfs-client-provisioner
          image: quay.io/external_storage/nfs-client-provisioner:latest
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: nfs-client-root
              mountPath: /persistentvolumes
          env:
            - name: PROVISIONER_NAME
              value: fuseim.pri/ifs
            - name: NFS_SERVER
              value: 192.168.3.19
            - name: NFS_PATH
              value: /data/nfs-k8s
      volumes:
        - name: nfs-client-root
          nfs:
            server: 192.168.3.19
            path: /data/nfs-k8s
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nfs-client-provisioner
  1. replace with namespace where provisioner is deployed
  namespace: default
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: nfs-client-provisioner-runner
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: run-nfs-client-provisioner
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    1. replace with namespace where provisioner is deployed
    namespace: default
roleRef:
  kind: ClusterRole
  name: nfs-client-provisioner-runner
  apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
  1. replace with namespace where provisioner is deployed
  namespace: default
rules:
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
  1. replace with namespace where provisioner is deployed
  namespace: default
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    1. replace with namespace where provisioner is deployed
    namespace: default
roleRef:
  kind: Role
  name: leader-locking-nfs-client-provisioner
  apiGroup: rbac.authorization.k8s.io

创建pvc

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-storage
  namespace: default 
provisioner: fuseim.pri/ifs # or choose another name, must match deployment's env PROVISIONER_NAME'
parameters:
  archiveOnDelete: "false"
1. Supported policies: Delete、 Retain ， default is Delete
reclaimPolicy: Retain
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata: 
  name: pvc-skywalking
  namespace: skywalking
spec:
  accessModes:
  - ReadWriteMany 
  storageClassName: nfs-storage
  resources: 
    requests:
      storage: 10Gi

创建es pod

# Source: skywalking/charts/elasticsearch/templates/statefulset.yaml
apiVersion: v1
kind: Service
metadata:
  name: elasticsearch
  namespace: skywalking  
  labels:
    app: elasticsearch
spec:
  type: ClusterIP
  ports:
  - name: elasticsearch
    port: 9200
    protocol: TCP
  selector:
    app: elasticsearch 
---    
apiVersion: apps/v1
kind: Deployment
metadata:
  name: elasticsearch
  namespace: skywalking
  labels:
    app: elasticsearch
spec:
  selector:
    matchLabels:
      app:  elasticsearch
  replicas: 1
  template:
    metadata:
      name: elasticsearch
      labels:
        app: elasticsearch
    spec:
      initContainers:
      - name: configure-sysctl
        securityContext:
          runAsUser: 0
          privileged: true
        image: "docker.elastic.co/elasticsearch/elasticsearch:6.8.6"
        imagePullPolicy: "IfNotPresent"
        1. command: ["sysctl", "-w", "vm.max_map_count=262144"]
        command: ["/bin/sh"]
        args: ["-c", "sysctl -w DefaultLimitNOFILE=65536; sysctl -w DefaultLimitMEMLOCK=infinity; sysctl -w DefaultLimitNPROC=32000; sysctl -w vm.max_map_count=262144"]  
        resources:
          {}
      containers:
      - name: "elasticsearch"
        securityContext:
          capabilities:
            drop:
            - ALL
          runAsNonRoot: true
          runAsUser: 1000
        image: "docker.elastic.co/elasticsearch/elasticsearch:6.8.6"       
        imagePullPolicy: "IfNotPresent"      
        livenessProbe:
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 2
          successThreshold: 1
          tcpSocket:
            port: 9300
          timeoutSeconds: 2
        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 2
          successThreshold: 2
          tcpSocket:
            port: 9300
          timeoutSeconds: 2                  
        ports:
        - name: http
          containerPort: 9200
        - name: transport
          containerPort: 9300
        resources:
          limits:
            cpu: 1000m
            memory: 2Gi
          requests:
            cpu: 100m
            memory: 2Gi
        env:
          - name: node.name
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: cluster.name
            value: "elasticsearch"
          - name: network.host
            value: "0.0.0.0"
          - name: ES_JAVA_OPTS
            value: "-Xmx1g -Xms1g -Duser.timezone=Asia/Shanghai"
          - name: discovery.type
            value: single-node
          1. value: "-Xmx1g -Xms1g -Duser.timezone=Asia/Shanghai MAX_OPEN_FILES=655350 MAX_LOCKED_MEMORY=unlimited"
          1. - name: node.data
          1.   value: "true"
          1. - name: node.ingest
          1.   value: "true"
          1. - name: node.master
          1.   value: "true"
          1. - name: http.cors.enabled
          1.   value: "true"
          1. - name: http.cors.allow-origin
          1.   value: "*"
          1. - name: http.cors.allow-headers
          1.   value: "X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization"             
          1. - name: bootstrap.memory_lock
          1.   value: "true"
        volumeMounts:
        - mountPath: /usr/share/elasticsearch/data
          name: elasticsearch-data
      restartPolicy: Always
      volumes:
      - name: elasticsearch-data
        persistentVolumeClaim:
          claimName: pvc-skywalking

创建kabana

kabana使用来管理es的，也可以使用其他的套件，如果有的话

apiVersion: v1
kind: Service
metadata:
  labels:
    app: kibana
  name: kibana
  namespace: skywalking
spec:
  ports:
  - name: http
    port: 5601
    protocol: TCP
    targetPort: 5601
  selector:
    app: kibana
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: kibana-ui
  namespace: skywalking
spec:
  ingressClassName: nginx
  rules:
  - host: local.kabana.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: kibana
            port:
              number: 5601  
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: kibana
  name: kibana
  namespace: skywalking
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kibana
  template:
    metadata:
      labels:
        app: kibana
    spec:
      containers:
      - env:
        - name: ELASTICSEARCH_HOSTS
          value: http://elasticsearch:9200
        image: kibana:6.8.6
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          initialDelaySeconds: 10
          periodSeconds: 2
          successThreshold: 1
          tcpSocket:
            port: 5601
          timeoutSeconds: 2
        name: kibana
        ports:
        - containerPort: 5601
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 10
          periodSeconds: 2
          successThreshold: 2
          tcpSocket:
            port: 5601
          timeoutSeconds: 2
        resources:
          limits:
            cpu: "2"
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 128Mi

创建alarm configmap文件

并且配置一个简单的告警模板

apiVersion: v1
kind: ConfigMap
metadata:
  name: alarm-configmap
  namespace: skywalking
data:
  alarm-settings.yml: |-
    rules:
      1. Rule unique name, must be ended with `_rule`.
      service_resp_time_rule:
        metrics-name: service_resp_time
        op: ">"
        threshold: 1000
        period: 5
        count: 3
        silence-period: 5
        message: Response time of service {name} is more than 1000ms in 3 minutes of last 10 minutes.
      service_sla_rule:
        1. Metrics value need to be long, double or int
        metrics-name: service_sla
        op: "<"
        threshold: 8000
        1. The length of time to evaluate the metrics
        period: 10
        1. How many times after the metrics match the condition, will trigger alarm
        count: 2
        1. How many times of checks, the alarm keeps silence after alarm triggered, default as same as period.
        silence-period: 3
        message: Successful rate of service {name} is lower than 80% in 2 minutes of last 10 minutes
      service_resp_time_percentile_rule:
        1. Metrics value need to be long, double or int
        metrics-name: service_percentile
        op: ">"
        threshold: 1000,1000,1000,1000,1000
        period: 10
        count: 3
        silence-period: 5
        message: Percentile response time of service {name} alarm in 3 minutes of last 10 minutes, due to more than one condition of p50 > 1000, p75 > 1000, p90 > 1000, p95 > 1000, p99 > 1000
      service_instance_resp_time_rule:
        metrics-name: service_instance_resp_time
        op: ">"
        threshold: 1000
        period: 10
        count: 2
        silence-period: 5
        message: Response time of service instance {name} is more than 1000ms in 2 minutes of last 10 minutes
      database_access_resp_time_rule:
        metrics-name: database_access_resp_time
        threshold: 1000
        op: ">"
        period: 10
        count: 2
        message: Response time of database access {name} is more than 1000ms in 2 minutes of last 10 minutes
      endpoint_relation_resp_time_rule:
        metrics-name: endpoint_relation_resp_time
        threshold: 1000
        op: ">"
        period: 10
        count: 2
        message: Response time of endpoint relation {name} is more than 1000ms in 2 minutes of last 10 minutes
    dingtalkHooks:
      textTemplate: |-
        {
          "msgtype": "text",
          "text": {
            "content": "Apache SkyWalking Alarm: n %s."
          }
        }
      webhooks:
        - url: https://oapi.dingtalk.com/robot/send?access_token=0ca06927f1cd962ed8b47086
          secret: SEC4c70c124f6148869de3285

配置skywalking

#ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app: skywalking
  name: skywalking-oap
  namespace: skywalking
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: skywalking
  namespace: skywalking  
  labels:
    app: skywalking
rules:
  - apiGroups: [""]
    resources: ["pods","configmaps"]
    verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: skywalking
  namespace: skywalking  
  labels:
    app: skywalking
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: skywalking
subjects:
  - kind: ServiceAccount
    name: skywalking-oap
    namespace: skywalking
---    
1. es
1. apiVersion: v1
1. kind: Service
1. metadata:
1.   name: elasticsearch-master
1.   namespace: skywalking  
1.   labels:
1.     app: "elasticsearch-master"
1. spec:
1.   type: ClusterIP
1.   ports:
1.   - name: elasticsearch-master
1.     port: 9200
1.     protocol: TCP    
1. ---
1. apiVersion: v1
1. kind: Endpoints
1. metadata:
1.   name: elasticsearch-master
1.   namespace: skywalking  
1.   labels:
1.     app: "elasticsearch-master"
1. subsets:
1. - addresses:
1.   - ip: 192.168.0.13
1.   ports:
1.   - name: elasticsearch-master
1.     port: 9200
1.     protocol: TCP
---
1. oap
apiVersion: v1
kind: Service
metadata:
  name: skywalking-oap
  namespace: skywalking  
  labels:
    app: skywalking-oap
spec:
  type: ClusterIP
  ports:
  - port: 11800
    name: grpc
  - port: 12800
    name: rest
  selector:
    app: skywalking-oap
    chart: skywalking-4.2.0
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: skywalking-oap
  name: skywalking-oap
  namespace: skywalking  
spec:
  replicas: 1
  selector:
    matchLabels:
      app: skywalking-oap
  template:
    metadata:
      labels:
        app: skywalking-oap
        chart: skywalking-4.2.0
    spec:
      serviceAccountName: skywalking-oap
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  app: "skywalking"
                  release: "skywalking"
                  component: "oap"
      initContainers:
      - name: wait-for-elasticsearch
        image: busybox:1.30
        imagePullPolicy: IfNotPresent
        command: ['sh', '-c', 'for i in $(seq 1 60); do nc -z -w3 elasticsearch 9200 && exit 0 || sleep 5; done; exit 1']
      containers:
      - name: oap
        image: skywalking.docker.scarf.sh/apache/skywalking-oap-server:9.0.0
        1. docker pull apache/skywalking-oap-server:8.8.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          tcpSocket:
            port: 12800
          initialDelaySeconds: 15
          periodSeconds: 20
        readinessProbe:
          tcpSocket:
            port: 12800
          initialDelaySeconds: 15
          periodSeconds: 20
        ports:
        - containerPort: 11800
          name: grpc
        - containerPort: 12800
          name: rest
        env:
        - name: JAVA_OPTS
          value: "-Dmode=no-init -Xmx2g -Xms2g"
        - name: SW_CLUSTER
          value: kubernetes
        - name: SW_CLUSTER_K8S_NAMESPACE
          value: "default"
        - name: SW_CLUSTER_K8S_LABEL
          value: "app=skywalking,release=skywalking,component=oap"
        1. 记录数据。
        - name: SW_CORE_RECORD_DATA_TTL
          value: "2"
        1. Metrics数据  
        - name: SW_CORE_METRICS_DATA_TTL
          value: "2"
        - name: SKYWALKING_COLLECTOR_UID
          valueFrom:
            fieldRef:
              fieldPath: metadata.uid
        - name: SW_STORAGE
          value: elasticsearch
        - name: SW_STORAGE_ES_CLUSTER_NODES
          value: "elasticsearch:9200"
        volumeMounts:
          - name: alarm-settings
            mountPath: /skywalking/config/alarm-settings.yml
            subPath: alarm-settings.yml
      volumes:
      - configMap:
          name: alarm-configmap
        name: alarm-settings
---
1. ui
apiVersion: v1
kind: Service
metadata:
  labels:
    app: skywalking-ui
  name: skywalking-ui
  namespace: skywalking  
spec:
  type: ClusterIP
  ports:
    - port: 80
      targetPort: 8080
      protocol: TCP
  selector:
    app: skywalking-ui
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: skywalking-ui
  namespace: skywalking  
  labels:
    app: skywalking-ui
spec:
  replicas: 1
  selector:
    matchLabels:
        app: skywalking-ui
  template:
    metadata:
      labels:
        app: skywalking-ui
    spec:
      affinity:
      containers:
      - name: ui
        image: skywalking.docker.scarf.sh/apache/skywalking-ui:9.0.0
        1. docker pull apache/skywalking-ui:9.0.0
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
          name: page
        env:
        - name: SW_OAP_ADDRESS
          value: http://skywalking-oap:12800    
---
1. job
apiVersion: batch/v1
kind: Job
metadata:
  name: "skywalking-es-init"
  namespace: skywalking  
  labels:
    app: skywalking-job
spec:
  template:
    metadata:
      name: "skywalking-es-init"
      labels:
        app: skywalking-job
    spec:
      serviceAccountName: skywalking-oap
      restartPolicy: Never
      initContainers:
      - name: wait-for-elasticsearch
        image: busybox:1.30
        imagePullPolicy: IfNotPresent
        command: ['sh', '-c', 'for i in $(seq 1 60); do nc -z -w3 elasticsearch 9200 && exit 0 || sleep 5; done; exit 1']
      containers:
      - name: oap
        image: skywalking.docker.scarf.sh/apache/skywalking-oap-server:9.0.0
         1. docker pull apache/skywalking-oap-server:9.0.0
        imagePullPolicy: IfNotPresent
        env:
        - name: JAVA_OPTS
          value: "-Xmx2g -Xms2g -Dmode=init"
        - name: SW_STORAGE
          value: elasticsearch
        - name: SW_STORAGE_ES_CLUSTER_NODES
          value: "elasticsearch:9200"
        1. 记录数据。
        1. - name: SW_CORE_RECORD_DATA_TTL
        1.   value: "2"
        1. Metrics数据  
        1. - name: SW_CORE_METRICS_DATA_TTL
        1.   value: "2"
        volumeMounts:
      volumes:
1. ---
1. apiVersion: v1
1. kind: Pod
1. metadata:
1.   name: "skywalking-qyouc-test"
1.   annotations:
1.     "helm.sh/hook": test-success
1. spec:
1.   containers:
1.   - name: "skywalking-ggmnx-test"
1.     image: "docker.elastic.co/elasticsearch/elasticsearch:6.8.6"
1.     command:
1.       - "sh"
1.       - "-c"
1.       - |
1.         #!/usr/bin/env bash -e
1.         curl -XGET --fail 'elasticsearch-master:9200/_cluster/health?wait_for_status=green&timeout=1s'
1.   restartPolicy: Never
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: skywalking-ui
  namespace: skywalking
spec:
  ingressClassName: nginx
  rules:
  - host: local.skywalking.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: skywalking-ui
            port:
              number: 80

创建skywalking service group

kubectl apply -f ns.yaml
kubectl apply -f nas-to-es.yaml
kubectl apply -f es.yaml
kubectl apply -f kabana.yaml
kubectl apply -f alarm.yaml
kubectl apply -f 9.0.yaml

pod引入

在skywalking的donwload页面中选择Agent-> java agent。如，下载8.10.0

https://www.apache.org/dyn/closer.cgi/skywalking/java-agent/8.10.0/apache-skywalking-java-agent-8.10.0.tgz

将agent解压并添加到Dockerfile中，启动并指定jar包位置，如下

.....
COPY  ./skywalking-agent /devops/skywalking-agent
....
CMD java -javaagent:/devops/skywalking-agent/skywalking-agent.jar .....

而后在pod的环境变量中配置必要的参数

服务自动分组，根据${服务名称} = [${组名称}::]${逻辑名称}，一旦服务名称包含双冒号（::），冒号之前的文字字符串将被视为组名。在最新的 GraphQL 查询中，组名已作为选项参数提供。

value: mark::test1 mark是组，test1是应用名称

        - name: SW_AGENT_NAME
          value: mark::test1
        - name: SW_AGENT_COLLECTOR_BACKEND_SERVICES
          value: skywalking-oap.skywalking:11800

而在第二个应用程序的时候就变成，mark::test2

这样一来test1和test2都属于mark组，在界面中展示

其他agent envirnment variable见Table of Agent Configuration Properties

忽略url

并不是所有的url都值得被关注，因此出于各方面考虑，配置忽略是有必要的。

有两种方法可以配置忽略模式。通过系统环境设置具有更高的优先级。

通过系统环境变量设置，需要添加skywalking.trace.ignore_path到系统变量中，值为需要忽略的路径，多个路径之间用,复制/agent/optional-plugins/apm-trace-ignore-plugin/apm-trace-ignore-plugin.config到/agent/config/目录，并添加规则以过滤跟踪trace.ignore_path=/your/path/1/**,/your/path/2/**

实际中，在optional-plugins下将apm-trace-ignore-plugin-8.10.0.jar复制到plugins即可

# cp optional-plugins/apm-trace-ignore-plugin-8.10.0.jar plugins/

配置文件和环境变量任选其一，环境变量优先

添加配置文件

# cat config/apm-trace-ignore-plugin.config
trace.ignore_path=${SW_AGENT_TRACE_IGNORE_PATH:GET:/health,/eureka/**}

添加环境变量

忽略GET:/health和/eureka/**的路径

        - name: SW_AGENT_TRACE_IGNORE_PATH
          value: GET:/health,/eureka/**

但是，忽略的URL不一定会马上生效，极大可能会延迟生效。

参考

忽略模式

kubernetes中skywalking9.0部署使用

部署

pod引入

忽略url

参考

Kubernetes Liveness 和 Readiness 探测避免给自己挖坑续集

httpd2.2配置文件详解/虚拟主机配置等 (二)

使用Nginx Frp群晖DSM7.0后手机浏览器无法打开登录一直 Loading…

2023 年最适合 Windows 11 使用的 20 个应用

docker config的配置使用