8.9.0是skywalking发布的最后一个功能版本,从2018年开始,skywalking一直是在服务,端点,实例间依赖的关系和拓扑结构,基于代理跟踪监控发展到全栈,包括日志,跟踪,指标和事件等。也添加了更多,如vm,k8s监控,服务网格。同时也引入了更多的方式来观测,如:ebpf
但是在8.x的版本中使用了组的概念来解决混合的问题,但在v9核心中最重要的概念是LAYER
层代表计算机科学中的一个抽象框架,例如操作系统(VM 层)、Kubernetes(k8s 层)、Service Mesh(典型的 Isto+Envoy 层),这种层将是从不同技术检测到的不同服务的所有者。相比较v8的组。显然,一个新layer
概念要好得多。此外,group
概念将被保留,因为它在每个layer
,组将被设计为最终用户在内部对他们的服务进行分组。使用no group
,它将在默认组中。
这些在可视化UI中已经作为一个管理后台的dashboard的样式
可视化(UI)
- databases
- kubernetes
- service mesh
- general service
- browser
这样导致SkyWalking 9.0.0 看起来是一个全栈 APM 系统,查看更多讨论
部署
skywalking9.0
- 创建名称空间
apiVersion: v1
kind: Namespace
metadata:
name: skywalking
- 给es创建pvc
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-client-provisioner
labels:
app: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: nfs-client-provisioner
template:
metadata:
labels:
app: nfs-client-provisioner
spec:
serviceAccountName: nfs-client-provisioner
containers:
- name: nfs-client-provisioner
image: quay.io/external_storage/nfs-client-provisioner:latest
imagePullPolicy: IfNotPresent
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: fuseim.pri/ifs
- name: NFS_SERVER
value: 192.168.3.19
- name: NFS_PATH
value: /data/nfs-k8s
volumes:
- name: nfs-client-root
nfs:
server: 192.168.3.19
path: /data/nfs-k8s
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: nfs-client-provisioner-runner
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: run-nfs-client-provisioner
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
roleRef:
kind: ClusterRole
name: nfs-client-provisioner-runner
apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
roleRef:
kind: Role
name: leader-locking-nfs-client-provisioner
apiGroup: rbac.authorization.k8s.io
创建pvc
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-storage
namespace: default
provisioner: fuseim.pri/ifs # or choose another name, must match deployment's env PROVISIONER_NAME'
parameters:
archiveOnDelete: "false"
# Supported policies: Delete、 Retain , default is Delete
reclaimPolicy: Retain
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc-skywalking
namespace: skywalking
spec:
accessModes:
- ReadWriteMany
storageClassName: nfs-storage
resources:
requests:
storage: 10Gi
- 创建es pod
# Source: skywalking/charts/elasticsearch/templates/statefulset.yaml
apiVersion: v1
kind: Service
metadata:
name: elasticsearch
namespace: skywalking
labels:
app: elasticsearch
spec:
type: ClusterIP
ports:
- name: elasticsearch
port: 9200
protocol: TCP
selector:
app: elasticsearch
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: elasticsearch
namespace: skywalking
labels:
app: elasticsearch
spec:
selector:
matchLabels:
app: elasticsearch
replicas: 1
template:
metadata:
name: elasticsearch
labels:
app: elasticsearch
spec:
initContainers:
- name: configure-sysctl
securityContext:
runAsUser: 0
privileged: true
image: "docker.elastic.co/elasticsearch/elasticsearch:6.8.6"
imagePullPolicy: "IfNotPresent"
# command: ["sysctl", "-w", "vm.max_map_count=262144"]
command: ["/bin/sh"]
args: ["-c", "sysctl -w DefaultLimitNOFILE=65536; sysctl -w DefaultLimitMEMLOCK=infinity; sysctl -w DefaultLimitNPROC=32000; sysctl -w vm.max_map_count=262144"]
resources:
{}
containers:
- name: "elasticsearch"
securityContext:
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000
image: "docker.elastic.co/elasticsearch/elasticsearch:6.8.6"
imagePullPolicy: "IfNotPresent"
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 30
periodSeconds: 2
successThreshold: 1
tcpSocket:
port: 9300
timeoutSeconds: 2
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 30
periodSeconds: 2
successThreshold: 2
tcpSocket:
port: 9300
timeoutSeconds: 2
ports:
- name: http
containerPort: 9200
- name: transport
containerPort: 9300
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 100m
memory: 2Gi
env:
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: cluster.name
value: "elasticsearch"
- name: network.host
value: "0.0.0.0"
- name: ES_JAVA_OPTS
value: "-Xmx1g -Xms1g -Duser.timezone=Asia/Shanghai"
- name: discovery.type
value: single-node
# value: "-Xmx1g -Xms1g -Duser.timezone=Asia/Shanghai MAX_OPEN_FILES=655350 MAX_LOCKED_MEMORY=unlimited"
# - name: node.data
# value: "true"
# - name: node.ingest
# value: "true"
# - name: node.master
# value: "true"
# - name: http.cors.enabled
# value: "true"
# - name: http.cors.allow-origin
# value: "*"
# - name: http.cors.allow-headers
# value: "X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization"
# - name: bootstrap.memory_lock
# value: "true"
volumeMounts:
- mountPath: /usr/share/elasticsearch/data
name: elasticsearch-data
restartPolicy: Always
volumes:
- name: elasticsearch-data
persistentVolumeClaim:
claimName: pvc-skywalking
- 创建kabana
kabana使用来管理es的,也可以使用其他的套件,如果有的话
apiVersion: v1
kind: Service
metadata:
labels:
app: kibana
name: kibana
namespace: skywalking
spec:
ports:
- name: http
port: 5601
protocol: TCP
targetPort: 5601
selector:
app: kibana
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: kibana-ui
namespace: skywalking
spec:
ingressClassName: nginx
rules:
- host: local.kabana.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: kibana
port:
number: 5601
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: kibana
name: kibana
namespace: skywalking
spec:
replicas: 1
selector:
matchLabels:
app: kibana
template:
metadata:
labels:
app: kibana
spec:
containers:
- env:
- name: ELASTICSEARCH_HOSTS
value: http://elasticsearch:9200
image: kibana:6.8.6
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 2
successThreshold: 1
tcpSocket:
port: 5601
timeoutSeconds: 2
name: kibana
ports:
- containerPort: 5601
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 2
successThreshold: 2
tcpSocket:
port: 5601
timeoutSeconds: 2
resources:
limits:
cpu: "2"
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
- 创建alarm configmap文件
并且配置一个简单的告警模板
apiVersion: v1
kind: ConfigMap
metadata:
name: alarm-configmap
namespace: skywalking
data:
alarm-settings.yml: |-
rules:
# Rule unique name, must be ended with `_rule`.
service_resp_time_rule:
metrics-name: service_resp_time
op: ">"
threshold: 1000
period: 5
count: 3
silence-period: 5
message: Response time of service {name} is more than 1000ms in 3 minutes of last 10 minutes.
service_sla_rule:
# Metrics value need to be long, double or int
metrics-name: service_sla
op: "<"
threshold: 8000
# The length of time to evaluate the metrics
period: 10
# How many times after the metrics match the condition, will trigger alarm
count: 2
# How many times of checks, the alarm keeps silence after alarm triggered, default as same as period.
silence-period: 3
message: Successful rate of service {name} is lower than 80% in 2 minutes of last 10 minutes
service_resp_time_percentile_rule:
# Metrics value need to be long, double or int
metrics-name: service_percentile
op: ">"
threshold: 1000,1000,1000,1000,1000
period: 10
count: 3
silence-period: 5
message: Percentile response time of service {name} alarm in 3 minutes of last 10 minutes, due to more than one condition of p50 > 1000, p75 > 1000, p90 > 1000, p95 > 1000, p99 > 1000
service_instance_resp_time_rule:
metrics-name: service_instance_resp_time
op: ">"
threshold: 1000
period: 10
count: 2
silence-period: 5
message: Response time of service instance {name} is more than 1000ms in 2 minutes of last 10 minutes
database_access_resp_time_rule:
metrics-name: database_access_resp_time
threshold: 1000
op: ">"
period: 10
count: 2
message: Response time of database access {name} is more than 1000ms in 2 minutes of last 10 minutes
endpoint_relation_resp_time_rule:
metrics-name: endpoint_relation_resp_time
threshold: 1000
op: ">"
period: 10
count: 2
message: Response time of endpoint relation {name} is more than 1000ms in 2 minutes of last 10 minutes
dingtalkHooks:
textTemplate: |-
{
"msgtype": "text",
"text": {
"content": "Apache SkyWalking Alarm: n %s."
}
}
webhooks:
- url: https://oapi.dingtalk.com/robot/send?access_token=0ca06927f1cd962ed8b47086
secret: SEC4c70c124f6148869de3285
- 配置skywalking
#ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app: skywalking
name: skywalking-oap
namespace: skywalking
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: skywalking
namespace: skywalking
labels:
app: skywalking
rules:
- apiGroups: [""]
resources: ["pods","configmaps"]
verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: skywalking
namespace: skywalking
labels:
app: skywalking
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: skywalking
subjects:
- kind: ServiceAccount
name: skywalking-oap
namespace: skywalking
---
# es
# apiVersion: v1
# kind: Service
# metadata:
# name: elasticsearch-master
# namespace: skywalking
# labels:
# app: "elasticsearch-master"
# spec:
# type: ClusterIP
# ports:
# - name: elasticsearch-master
# port: 9200
# protocol: TCP
# ---
# apiVersion: v1
# kind: Endpoints
# metadata:
# name: elasticsearch-master
# namespace: skywalking
# labels:
# app: "elasticsearch-master"
# subsets:
# - addresses:
# - ip: 192.168.0.13
# ports:
# - name: elasticsearch-master
# port: 9200
# protocol: TCP
---
# oap
apiVersion: v1
kind: Service
metadata:
name: skywalking-oap
namespace: skywalking
labels:
app: skywalking-oap
spec:
type: ClusterIP
ports:
- port: 11800
name: grpc
- port: 12800
name: rest
selector:
app: skywalking-oap
chart: skywalking-4.2.0
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: skywalking-oap
name: skywalking-oap
namespace: skywalking
spec:
replicas: 1
selector:
matchLabels:
app: skywalking-oap
template:
metadata:
labels:
app: skywalking-oap
chart: skywalking-4.2.0
spec:
serviceAccountName: skywalking-oap
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
app: "skywalking"
release: "skywalking"
component: "oap"
initContainers:
- name: wait-for-elasticsearch
image: busybox:1.30
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'for i in $(seq 1 60); do nc -z -w3 elasticsearch 9200 && exit 0 || sleep 5; done; exit 1']
containers:
- name: oap
image: skywalking.docker.scarf.sh/apache/skywalking-oap-server:9.0.0
# docker pull apache/skywalking-oap-server:8.8.1
imagePullPolicy: IfNotPresent
livenessProbe:
tcpSocket:
port: 12800
initialDelaySeconds: 15
periodSeconds: 20
readinessProbe:
tcpSocket:
port: 12800
initialDelaySeconds: 15
periodSeconds: 20
ports:
- containerPort: 11800
name: grpc
- containerPort: 12800
name: rest
env:
- name: JAVA_OPTS
value: "-Dmode=no-init -Xmx2g -Xms2g"
- name: SW_CLUSTER
value: kubernetes
- name: SW_CLUSTER_K8S_NAMESPACE
value: "default"
- name: SW_CLUSTER_K8S_LABEL
value: "app=skywalking,release=skywalking,component=oap"
# 记录数据。
- name: SW_CORE_RECORD_DATA_TTL
value: "2"
# Metrics数据
- name: SW_CORE_METRICS_DATA_TTL
value: "2"
- name: SKYWALKING_COLLECTOR_UID
valueFrom:
fieldRef:
fieldPath: metadata.uid
- name: SW_STORAGE
value: elasticsearch
- name: SW_STORAGE_ES_CLUSTER_NODES
value: "elasticsearch:9200"
volumeMounts:
- name: alarm-settings
mountPath: /skywalking/config/alarm-settings.yml
subPath: alarm-settings.yml
volumes:
- configMap:
name: alarm-configmap
name: alarm-settings
---
# ui
apiVersion: v1
kind: Service
metadata:
labels:
app: skywalking-ui
name: skywalking-ui
namespace: skywalking
spec:
type: ClusterIP
ports:
- port: 80
targetPort: 8080
protocol: TCP
selector:
app: skywalking-ui
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: skywalking-ui
namespace: skywalking
labels:
app: skywalking-ui
spec:
replicas: 1
selector:
matchLabels:
app: skywalking-ui
template:
metadata:
labels:
app: skywalking-ui
spec:
affinity:
containers:
- name: ui
image: skywalking.docker.scarf.sh/apache/skywalking-ui:9.0.0
# docker pull apache/skywalking-ui:9.0.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
name: page
env:
- name: SW_OAP_ADDRESS
value: http://skywalking-oap:12800
---
# job
apiVersion: batch/v1
kind: Job
metadata:
name: "skywalking-es-init"
namespace: skywalking
labels:
app: skywalking-job
spec:
template:
metadata:
name: "skywalking-es-init"
labels:
app: skywalking-job
spec:
serviceAccountName: skywalking-oap
restartPolicy: Never
initContainers:
- name: wait-for-elasticsearch
image: busybox:1.30
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'for i in $(seq 1 60); do nc -z -w3 elasticsearch 9200 && exit 0 || sleep 5; done; exit 1']
containers:
- name: oap
image: skywalking.docker.scarf.sh/apache/skywalking-oap-server:9.0.0
# docker pull apache/skywalking-oap-server:9.0.0
imagePullPolicy: IfNotPresent
env:
- name: JAVA_OPTS
value: "-Xmx2g -Xms2g -Dmode=init"
- name: SW_STORAGE
value: elasticsearch
- name: SW_STORAGE_ES_CLUSTER_NODES
value: "elasticsearch:9200"
# 记录数据。
# - name: SW_CORE_RECORD_DATA_TTL
# value: "2"
# Metrics数据
# - name: SW_CORE_METRICS_DATA_TTL
# value: "2"
volumeMounts:
volumes:
# ---
# apiVersion: v1
# kind: Pod
# metadata:
# name: "skywalking-qyouc-test"
# annotations:
# "helm.sh/hook": test-success
# spec:
# containers:
# - name: "skywalking-ggmnx-test"
# image: "docker.elastic.co/elasticsearch/elasticsearch:6.8.6"
# command:
# - "sh"
# - "-c"
# - |
# #!/usr/bin/env bash -e
# curl -XGET --fail 'elasticsearch-master:9200/_cluster/health?wait_for_status=green&timeout=1s'
# restartPolicy: Never
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: skywalking-ui
namespace: skywalking
spec:
ingressClassName: nginx
rules:
- host: local.skywalking.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: skywalking-ui
port:
number: 80
- 创建skywalking service group
kubectl apply -f ns.yaml
kubectl apply -f nas-to-es.yaml
kubectl apply -f es.yaml
kubectl apply -f kabana.yaml
kubectl apply -f alarm.yaml
kubectl apply -f 9.0.yaml
pod引入
在skywalking的donwload页面中选择Agent-> java agent。如,下载8.10.0
https://www.apache.org/dyn/closer.cgi/skywalking/java-agent/8.10.0/apache-skywalking-java-agent-8.10.0.tgz
将agent解压并添加到Dockerfile中,启动并指定jar包位置,如下
.....
COPY ./skywalking-agent /devops/skywalking-agent
....
CMD java -javaagent:/devops/skywalking-agent/skywalking-agent.jar .....
而后在pod的环境变量中配置必要的参数
服务自动分组,根据
${服务名称} = [${组名称}::]${逻辑名称}
,一旦服务名称包含双冒号(::
),冒号之前的文字字符串将被视为组名。在最新的 GraphQL 查询中,组名已作为选项参数提供。value: mark::test1 mark是组,test1是应用名称
- name: SW_AGENT_NAME
value: mark::test1
- name: SW_AGENT_COLLECTOR_BACKEND_SERVICES
value: skywalking-oap.skywalking:11800
而在第二个应用程序的时候就变成,mark::test2
这样一来test1和test2都属于mark组,在界面中展示
其他agent envirnment variable见Table of Agent Configuration Properties
忽略url
并不是所有的url都值得被关注,因此出于各方面考虑,配置忽略是有必要的。
有两种方法可以配置忽略模式。通过系统环境设置具有更高的优先级。
通过系统环境变量设置,需要添加skywalking.trace.ignore_path
到系统变量中,值为需要忽略的路径,多个路径之间用,复制/agent/optional-plugins/apm-trace-ignore-plugin/apm-trace-ignore-plugin.config
到/agent/config/
目录,并添加规则以过滤跟踪trace.ignore_path=/your/path/1/**,/your/path/2/
**
实际中,在optional-plugins下将apm-trace-ignore-plugin-8.10.0.jar复制到plugins即可
# cp optional-plugins/apm-trace-ignore-plugin-8.10.0.jar plugins/
- 配置文件和环境变量任选其一,环境变量优先
添加配置文件
# cat config/apm-trace-ignore-plugin.config
trace.ignore_path=${SW_AGENT_TRACE_IGNORE_PATH:GET:/health,/eureka/**}
添加环境变量
忽略GET:/health和/eureka/**的路径
- name: SW_AGENT_TRACE_IGNORE_PATH
value: GET:/health,/eureka/**
但是,忽略的URL不一定会马上生效,极大可能会延迟生效。
参考
忽略模式