kubernetes中skywalking9.0部署使用

2023年 7月 15日 109.6k 0

8.9.0是skywalking发布的最后一个功能版本,从2018年开始,skywalking一直是在服务,端点,实例间依赖的关系和拓扑结构,基于代理跟踪监控发展到全栈,包括日志,跟踪,指标和事件等。也添加了更多,如vm,k8s监控,服务网格。同时也引入了更多的方式来观测,如:ebpf

但是在8.x的版本中使用了组的概念来解决混合的问题,但在v9核心中最重要的概念是LAYER

层代表计算机科学中的一个抽象框架,例如操作系统(VM 层)、Kubernetes(k8s 层)、Service Mesh(典型的 Isto+Envoy 层),这种层将是从不同技术检测到的不同服务的所有者。相比较v8的组。显然,一个新layer概念要好得多。此外,group概念将被保留,因为它在每个layer,组将被设计为最终用户在内部对他们的服务进行分组。使用no group,它将在默认组中。

这些在可视化UI中已经作为一个管理后台的dashboard的样式

可视化(UI)

  • databases
  • kubernetes
  • service mesh
  • general service
  • browser

这样导致SkyWalking 9.0.0 看起来是一个全栈 APM 系统,查看更多讨论

image-20220515180658116.png

部署

skywalking9.0

  • 创建名称空间
apiVersion: v1
kind: Namespace
metadata:
  name: skywalking
  • 给es创建pvc
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-client-provisioner
labels:
 app: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
spec:
replicas: 1
strategy:
 type: Recreate
selector:
 matchLabels:
   app: nfs-client-provisioner
template:
 metadata:
   labels:
     app: nfs-client-provisioner
 spec:
   serviceAccountName: nfs-client-provisioner
   containers:
        - name: nfs-client-provisioner
          image: quay.io/external_storage/nfs-client-provisioner:latest
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: nfs-client-root
              mountPath: /persistentvolumes
          env:
            - name: PROVISIONER_NAME
              value: fuseim.pri/ifs
            - name: NFS_SERVER
              value: 192.168.3.19
            - name: NFS_PATH
              value: /data/nfs-k8s
      volumes:
        - name: nfs-client-root
          nfs:
            server: 192.168.3.19
            path: /data/nfs-k8s
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: default
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: nfs-client-provisioner-runner
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: run-nfs-client-provisioner
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    # replace with namespace where provisioner is deployed
    namespace: default
roleRef:
  kind: ClusterRole
  name: nfs-client-provisioner-runner
  apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: default
rules:
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: default
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    # replace with namespace where provisioner is deployed
    namespace: default
roleRef:
  kind: Role
  name: leader-locking-nfs-client-provisioner
  apiGroup: rbac.authorization.k8s.io

创建pvc

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-storage
  namespace: default 
provisioner: fuseim.pri/ifs # or choose another name, must match deployment's env PROVISIONER_NAME'
parameters:
  archiveOnDelete: "false"
# Supported policies: Delete、 Retain , default is Delete
reclaimPolicy: Retain
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata: 
  name: pvc-skywalking
  namespace: skywalking
spec:
  accessModes:
  - ReadWriteMany 
  storageClassName: nfs-storage
  resources: 
    requests:
      storage: 10Gi
  • 创建es pod
# Source: skywalking/charts/elasticsearch/templates/statefulset.yaml
apiVersion: v1
kind: Service
metadata:
  name: elasticsearch
  namespace: skywalking  
  labels:
    app: elasticsearch
spec:
  type: ClusterIP
  ports:
  - name: elasticsearch
    port: 9200
    protocol: TCP
  selector:
    app: elasticsearch 
---    
apiVersion: apps/v1
kind: Deployment
metadata:
  name: elasticsearch
  namespace: skywalking
  labels:
    app: elasticsearch
spec:
  selector:
    matchLabels:
      app:  elasticsearch
  replicas: 1
  template:
    metadata:
      name: elasticsearch
      labels:
        app: elasticsearch
    spec:
      initContainers:
      - name: configure-sysctl
        securityContext:
          runAsUser: 0
          privileged: true
        image: "docker.elastic.co/elasticsearch/elasticsearch:6.8.6"
        imagePullPolicy: "IfNotPresent"
        # command: ["sysctl", "-w", "vm.max_map_count=262144"]
        command: ["/bin/sh"]
        args: ["-c", "sysctl -w DefaultLimitNOFILE=65536; sysctl -w DefaultLimitMEMLOCK=infinity; sysctl -w DefaultLimitNPROC=32000; sysctl -w vm.max_map_count=262144"]  
        resources:
          {}
      containers:
      - name: "elasticsearch"
        securityContext:
          capabilities:
            drop:
            - ALL
          runAsNonRoot: true
          runAsUser: 1000
        image: "docker.elastic.co/elasticsearch/elasticsearch:6.8.6"       
        imagePullPolicy: "IfNotPresent"      
        livenessProbe:
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 2
          successThreshold: 1
          tcpSocket:
            port: 9300
          timeoutSeconds: 2
        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 2
          successThreshold: 2
          tcpSocket:
            port: 9300
          timeoutSeconds: 2                  
        ports:
        - name: http
          containerPort: 9200
        - name: transport
          containerPort: 9300
        resources:
          limits:
            cpu: 1000m
            memory: 2Gi
          requests:
            cpu: 100m
            memory: 2Gi
        env:
          - name: node.name
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: cluster.name
            value: "elasticsearch"
          - name: network.host
            value: "0.0.0.0"
          - name: ES_JAVA_OPTS
            value: "-Xmx1g -Xms1g -Duser.timezone=Asia/Shanghai"
          - name: discovery.type
            value: single-node
          # value: "-Xmx1g -Xms1g -Duser.timezone=Asia/Shanghai MAX_OPEN_FILES=655350 MAX_LOCKED_MEMORY=unlimited"
          # - name: node.data
          #   value: "true"
          # - name: node.ingest
          #   value: "true"
          # - name: node.master
          #   value: "true"
          # - name: http.cors.enabled
          #   value: "true"
          # - name: http.cors.allow-origin
          #   value: "*"
          # - name: http.cors.allow-headers
          #   value: "X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization"             
          # - name: bootstrap.memory_lock
          #   value: "true"
        volumeMounts:
        - mountPath: /usr/share/elasticsearch/data
          name: elasticsearch-data
      restartPolicy: Always
      volumes:
      - name: elasticsearch-data
        persistentVolumeClaim:
          claimName: pvc-skywalking
  • 创建kabana

kabana使用来管理es的,也可以使用其他的套件,如果有的话

apiVersion: v1
kind: Service
metadata:
  labels:
    app: kibana
  name: kibana
  namespace: skywalking
spec:
  ports:
  - name: http
    port: 5601
    protocol: TCP
    targetPort: 5601
  selector:
    app: kibana
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: kibana-ui
  namespace: skywalking
spec:
  ingressClassName: nginx
  rules:
  - host: local.kabana.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: kibana
            port:
              number: 5601  
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: kibana
  name: kibana
  namespace: skywalking
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kibana
  template:
    metadata:
      labels:
        app: kibana
    spec:
      containers:
      - env:
        - name: ELASTICSEARCH_HOSTS
          value: http://elasticsearch:9200
        image: kibana:6.8.6
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          initialDelaySeconds: 10
          periodSeconds: 2
          successThreshold: 1
          tcpSocket:
            port: 5601
          timeoutSeconds: 2
        name: kibana
        ports:
        - containerPort: 5601
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 10
          periodSeconds: 2
          successThreshold: 2
          tcpSocket:
            port: 5601
          timeoutSeconds: 2
        resources:
          limits:
            cpu: "2"
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 128Mi
  • 创建alarm configmap文件

并且配置一个简单的告警模板

apiVersion: v1
kind: ConfigMap
metadata:
  name: alarm-configmap
  namespace: skywalking
data:
  alarm-settings.yml: |-
    rules:
      # Rule unique name, must be ended with `_rule`.
      service_resp_time_rule:
        metrics-name: service_resp_time
        op: ">"
        threshold: 1000
        period: 5
        count: 3
        silence-period: 5
        message: Response time of service {name} is more than 1000ms in 3 minutes of last 10 minutes.
      service_sla_rule:
        # Metrics value need to be long, double or int
        metrics-name: service_sla
        op: "<"
        threshold: 8000
        # The length of time to evaluate the metrics
        period: 10
        # How many times after the metrics match the condition, will trigger alarm
        count: 2
        # How many times of checks, the alarm keeps silence after alarm triggered, default as same as period.
        silence-period: 3
        message: Successful rate of service {name} is lower than 80% in 2 minutes of last 10 minutes
      service_resp_time_percentile_rule:
        # Metrics value need to be long, double or int
        metrics-name: service_percentile
        op: ">"
        threshold: 1000,1000,1000,1000,1000
        period: 10
        count: 3
        silence-period: 5
        message: Percentile response time of service {name} alarm in 3 minutes of last 10 minutes, due to more than one condition of p50 > 1000, p75 > 1000, p90 > 1000, p95 > 1000, p99 > 1000
      service_instance_resp_time_rule:
        metrics-name: service_instance_resp_time
        op: ">"
        threshold: 1000
        period: 10
        count: 2
        silence-period: 5
        message: Response time of service instance {name} is more than 1000ms in 2 minutes of last 10 minutes
      database_access_resp_time_rule:
        metrics-name: database_access_resp_time
        threshold: 1000
        op: ">"
        period: 10
        count: 2
        message: Response time of database access {name} is more than 1000ms in 2 minutes of last 10 minutes
      endpoint_relation_resp_time_rule:
        metrics-name: endpoint_relation_resp_time
        threshold: 1000
        op: ">"
        period: 10
        count: 2
        message: Response time of endpoint relation {name} is more than 1000ms in 2 minutes of last 10 minutes
    dingtalkHooks:
      textTemplate: |-
        {
          "msgtype": "text",
          "text": {
            "content": "Apache SkyWalking Alarm: n %s."
          }
        }
      webhooks:
        - url: https://oapi.dingtalk.com/robot/send?access_token=0ca06927f1cd962ed8b47086
          secret: SEC4c70c124f6148869de3285
  • 配置skywalking
#ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app: skywalking
  name: skywalking-oap
  namespace: skywalking
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: skywalking
  namespace: skywalking  
  labels:
    app: skywalking
rules:
  - apiGroups: [""]
    resources: ["pods","configmaps"]
    verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: skywalking
  namespace: skywalking  
  labels:
    app: skywalking
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: skywalking
subjects:
  - kind: ServiceAccount
    name: skywalking-oap
    namespace: skywalking
---    
# es
# apiVersion: v1
# kind: Service
# metadata:
#   name: elasticsearch-master
#   namespace: skywalking  
#   labels:
#     app: "elasticsearch-master"
# spec:
#   type: ClusterIP
#   ports:
#   - name: elasticsearch-master
#     port: 9200
#     protocol: TCP    
# ---
# apiVersion: v1
# kind: Endpoints
# metadata:
#   name: elasticsearch-master
#   namespace: skywalking  
#   labels:
#     app: "elasticsearch-master"
# subsets:
# - addresses:
#   - ip: 192.168.0.13
#   ports:
#   - name: elasticsearch-master
#     port: 9200
#     protocol: TCP
---
# oap
apiVersion: v1
kind: Service
metadata:
  name: skywalking-oap
  namespace: skywalking  
  labels:
    app: skywalking-oap
spec:
  type: ClusterIP
  ports:
  - port: 11800
    name: grpc
  - port: 12800
    name: rest
  selector:
    app: skywalking-oap
    chart: skywalking-4.2.0
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: skywalking-oap
  name: skywalking-oap
  namespace: skywalking  
spec:
  replicas: 1
  selector:
    matchLabels:
      app: skywalking-oap
  template:
    metadata:
      labels:
        app: skywalking-oap
        chart: skywalking-4.2.0
    spec:
      serviceAccountName: skywalking-oap
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  app: "skywalking"
                  release: "skywalking"
                  component: "oap"
      initContainers:
      - name: wait-for-elasticsearch
        image: busybox:1.30
        imagePullPolicy: IfNotPresent
        command: ['sh', '-c', 'for i in $(seq 1 60); do nc -z -w3 elasticsearch 9200 && exit 0 || sleep 5; done; exit 1']
      containers:
      - name: oap
        image: skywalking.docker.scarf.sh/apache/skywalking-oap-server:9.0.0
        # docker pull apache/skywalking-oap-server:8.8.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          tcpSocket:
            port: 12800
          initialDelaySeconds: 15
          periodSeconds: 20
        readinessProbe:
          tcpSocket:
            port: 12800
          initialDelaySeconds: 15
          periodSeconds: 20
        ports:
        - containerPort: 11800
          name: grpc
        - containerPort: 12800
          name: rest
        env:
        - name: JAVA_OPTS
          value: "-Dmode=no-init -Xmx2g -Xms2g"
        - name: SW_CLUSTER
          value: kubernetes
        - name: SW_CLUSTER_K8S_NAMESPACE
          value: "default"
        - name: SW_CLUSTER_K8S_LABEL
          value: "app=skywalking,release=skywalking,component=oap"
        # 记录数据。
        - name: SW_CORE_RECORD_DATA_TTL
          value: "2"
        # Metrics数据  
        - name: SW_CORE_METRICS_DATA_TTL
          value: "2"
        - name: SKYWALKING_COLLECTOR_UID
          valueFrom:
            fieldRef:
              fieldPath: metadata.uid
        - name: SW_STORAGE
          value: elasticsearch
        - name: SW_STORAGE_ES_CLUSTER_NODES
          value: "elasticsearch:9200"
        volumeMounts:
          - name: alarm-settings
            mountPath: /skywalking/config/alarm-settings.yml
            subPath: alarm-settings.yml
      volumes:
      - configMap:
          name: alarm-configmap
        name: alarm-settings
---
# ui
apiVersion: v1
kind: Service
metadata:
  labels:
    app: skywalking-ui
  name: skywalking-ui
  namespace: skywalking  
spec:
  type: ClusterIP
  ports:
    - port: 80
      targetPort: 8080
      protocol: TCP
  selector:
    app: skywalking-ui
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: skywalking-ui
  namespace: skywalking  
  labels:
    app: skywalking-ui
spec:
  replicas: 1
  selector:
    matchLabels:
        app: skywalking-ui
  template:
    metadata:
      labels:
        app: skywalking-ui
    spec:
      affinity:
      containers:
      - name: ui
        image: skywalking.docker.scarf.sh/apache/skywalking-ui:9.0.0
        # docker pull apache/skywalking-ui:9.0.0
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
          name: page
        env:
        - name: SW_OAP_ADDRESS
          value: http://skywalking-oap:12800    
---
# job
apiVersion: batch/v1
kind: Job
metadata:
  name: "skywalking-es-init"
  namespace: skywalking  
  labels:
    app: skywalking-job
spec:
  template:
    metadata:
      name: "skywalking-es-init"
      labels:
        app: skywalking-job
    spec:
      serviceAccountName: skywalking-oap
      restartPolicy: Never
      initContainers:
      - name: wait-for-elasticsearch
        image: busybox:1.30
        imagePullPolicy: IfNotPresent
        command: ['sh', '-c', 'for i in $(seq 1 60); do nc -z -w3 elasticsearch 9200 && exit 0 || sleep 5; done; exit 1']
      containers:
      - name: oap
        image: skywalking.docker.scarf.sh/apache/skywalking-oap-server:9.0.0
         # docker pull apache/skywalking-oap-server:9.0.0
        imagePullPolicy: IfNotPresent
        env:
        - name: JAVA_OPTS
          value: "-Xmx2g -Xms2g -Dmode=init"
        - name: SW_STORAGE
          value: elasticsearch
        - name: SW_STORAGE_ES_CLUSTER_NODES
          value: "elasticsearch:9200"
        # 记录数据。
        # - name: SW_CORE_RECORD_DATA_TTL
        #   value: "2"
        # Metrics数据  
        # - name: SW_CORE_METRICS_DATA_TTL
        #   value: "2"
        volumeMounts:
      volumes:
# ---
# apiVersion: v1
# kind: Pod
# metadata:
#   name: "skywalking-qyouc-test"
#   annotations:
#     "helm.sh/hook": test-success
# spec:
#   containers:
#   - name: "skywalking-ggmnx-test"
#     image: "docker.elastic.co/elasticsearch/elasticsearch:6.8.6"
#     command:
#       - "sh"
#       - "-c"
#       - |
#         #!/usr/bin/env bash -e
#         curl -XGET --fail 'elasticsearch-master:9200/_cluster/health?wait_for_status=green&timeout=1s'
#   restartPolicy: Never
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: skywalking-ui
  namespace: skywalking
spec:
  ingressClassName: nginx
  rules:
  - host: local.skywalking.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: skywalking-ui
            port:
              number: 80
  • 创建skywalking service group
kubectl apply -f ns.yaml
kubectl apply -f nas-to-es.yaml
kubectl apply -f es.yaml
kubectl apply -f kabana.yaml
kubectl apply -f alarm.yaml
kubectl apply -f 9.0.yaml

pod引入

在skywalking的donwload页面中选择Agent-> java agent。如,下载8.10.0

https://www.apache.org/dyn/closer.cgi/skywalking/java-agent/8.10.0/apache-skywalking-java-agent-8.10.0.tgz

将agent解压并添加到Dockerfile中,启动并指定jar包位置,如下

.....
COPY  ./skywalking-agent /devops/skywalking-agent
....
CMD java -javaagent:/devops/skywalking-agent/skywalking-agent.jar .....

而后在pod的环境变量中配置必要的参数

服务自动分组,根据${服务名称} = [${组名称}::]${逻辑名称},一旦服务名称包含双冒号(::),冒号之前的文字字符串将被视为组名。在最新的 GraphQL 查询中,组名已作为选项参数提供。

value: mark::test1 mark是组,test1是应用名称

        - name: SW_AGENT_NAME
          value: mark::test1
        - name: SW_AGENT_COLLECTOR_BACKEND_SERVICES
          value: skywalking-oap.skywalking:11800     

而在第二个应用程序的时候就变成,mark::test2

这样一来test1和test2都属于mark组,在界面中展示

其他agent envirnment variable见Table of Agent Configuration Properties

忽略url

并不是所有的url都值得被关注,因此出于各方面考虑,配置忽略是有必要的。

有两种方法可以配置忽略模式。通过系统环境设置具有更高的优先级。

通过系统环境变量设置,需要添加skywalking.trace.ignore_path到系统变量中,值为需要忽略的路径,多个路径之间用,复制/agent/optional-plugins/apm-trace-ignore-plugin/apm-trace-ignore-plugin.config/agent/config/目录,并添加规则以过滤跟踪trace.ignore_path=/your/path/1/**,/your/path/2/**

实际中,在optional-plugins下将apm-trace-ignore-plugin-8.10.0.jar复制到plugins即可

# cp optional-plugins/apm-trace-ignore-plugin-8.10.0.jar plugins/
  • 配置文件和环境变量任选其一,环境变量优先

添加配置文件

# cat config/apm-trace-ignore-plugin.config
trace.ignore_path=${SW_AGENT_TRACE_IGNORE_PATH:GET:/health,/eureka/**}

添加环境变量

忽略GET:/health和/eureka/**的路径

        - name: SW_AGENT_TRACE_IGNORE_PATH
          value: GET:/health,/eureka/**

但是,忽略的URL不一定会马上生效,极大可能会延迟生效。

参考

忽略模式

相关文章

LeaferJS 1.0 重磅发布:强悍的前端 Canvas 渲染引擎
10分钟搞定支持通配符的永久有效免费HTTPS证书
300 多个 Microsoft Excel 快捷方式
一步步配置基于kubeadmin的kubevip高可用
istio全链路传递cookie和header灰度
REST Web 服务版本控制

发布评论