Filebeat在Kubernetes集群中的最佳实践

1 背景

在Kubernetes还未兴起的时代,业务部署几乎所有应用都采用单机部署,当压力增大时,IDC架构只能横向拓展服务器集群,增加算力,云计算兴起后,可以动态的调整已有服务器的配置,来分担压力,或者可以通过弹性伸缩来实现根据业务量大小、集群负载、监控等情况,动态的横向调整后端服务器的数量。日志作为应用系统的一部分,通常在系统异常时,用来排查故障,寻找原因,传统的日志处理凡是通常结合grepLinux常见的文本命令工具进行分析。

为了支持更快的开发、迭代效率,近年来开始了容器化改造,并开始了拥抱 Kubernetes 生态、业务全量上云等工作。在这阶段,日志无论从规模、种类都呈现爆炸式的增长,对日志进行数字化、智能化分析的需求也越来越高,因此统一的日志平台应运而生。

2 Kubernetes日志系统收集难点

单纯日志系统的解决方案非常多,相对也比较成熟,这里就不再去赘述,我们此次只针对 Kubernetes 上的日志系统建设而论。Kubernetes 上的日志方案相比我们之前基于物理机、虚拟机场景的日志方案有很大不同,例如:

  • 日志的形式变得更加复杂,不仅有物理机/虚拟机上的日志,还有容器的标准输出、容器内的文件、容器事件、Kubernetes 事件等等信息需要采集;
  • 环境的动态性变强,在 Kubernetes 中,机器的宕机、下线、上线、Pod销毁、扩容/缩容等都是常态,这种情况下日志的存在是瞬时的(例如如果 Pod 销毁后该 Pod 日志就不可见了),所以日志数据必须实时采集到服务端。同时还需要保证日志的采集能够适应这种动态性极强的场景;
  • 日志的种类变多,上图是一个典型的 Kubernetes 架构,一个请求从客户端需要经过 CDN、Ingress、Service Mesh、Pod 等多个组件,涉及多种基础设施,其中的日志种类增加了很多,例如 K8s 各种系统组件日志、审计日志、ServiceMesh 日志、Ingress 等;

3 Kubernetes日志文件说明

关于Kubenetes的日志采集,部署方式是采用DaemonSet的方式,采集时按照k8s集群的namespace进行分类,然后根据namespace的名称创建不同的topic到kafka中

一般情况下,Kubernetes默认会在/var/log/containers/var/log/pods目录中会生成这些日志文件的软连接,如下所示:

然后,会看到这个目录下存在了此宿主机上的所有容器日志,文件的命名方式为:

1
[podName]_[nameSpace]_[depoymentName]-[containerId].log

上面这个是deployment的命名方式,其他的会有些不同,例如:DaemonSetStatefulSet等,不过所有的都有一个共同点,就是

1
*_[nameSpace]_*.log

到这里,知道这个特性,就可以往下来看Filebeat的部署和配置了。

4 Filebeat

4.1 部署

部署采用的DaemonSet方式进行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
apiVersion: v1
kind: ConfigMap
metadata:
name: filebeat-config
namespace: log
data:
filebeat.yml: |-
filebeat.inputs:
- type: container
enabled: true
paths:
- /var/log/containers/*_default_*.log
fields:
namespace: default
env: dev
k8s: cluster-dev
- type: container
enabled: true
paths:
- /var/log/containers/*_kube-system_*.log
fields:
namespace: kube-system
env: dev
k8s: cluster-dev
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
output.kafka:
hosts: ["175.27.159.78:9092","175.27.159.78:9093","175.27.159.78:9094"]
topic: '%{[fields.k8s]}-%{[fields.namespace]}'
partition.round_robin:
reachable_only: true
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: filebeat
namespace: log
labels:
k8s-app: filebeat
spec:
selector:
matchLabels:
k8s-app: filebeat
template:
metadata:
labels:
k8s-app: filebeat
spec:
serviceAccountName: filebeat
terminationGracePeriodSeconds: 30
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: filebeat
image: docker.elastic.co/beats/filebeat:7.12.0
args: [
"-c", "/etc/filebeat.yml",
"-e",
]
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
securityContext:
runAsUser: 0
# If using Red Hat OpenShift uncomment this:
#privileged: true
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- name: config
mountPath: /etc/filebeat.yml
readOnly: true
subPath: filebeat.yml
- name: data
mountPath: /usr/share/filebeat/data
- name: varlibdockercontainers
mountPath: /data/docker/containers
readOnly: true
- name: varlog
mountPath: /var/log
readOnly: true
volumes:
- name: config
configMap:
defaultMode: 0640
name: filebeat-config
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /data/docker/containers
# data folder stores a registry of read status for all files, so we don't send everything again on a Filebeat pod restart
- name: data
hostPath:
# When filebeat runs as non-root user, this directory needs to be writable by group (g+w).
path: /var/lib/filebeat-data
type: DirectoryOrCreate
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: filebeat
subjects:
- kind: ServiceAccount
name: filebeat
namespace: log
roleRef:
kind: ClusterRole
name: filebeat
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: filebeat
labels:
k8s-app: filebeat
rules:
- apiGroups: [""] # "" indicates the core API group
resources:
- namespaces
- pods
- nodes
verbs:
- get
- watch
- list
- apiGroups: ["apps"]
resources:
- replicasets
verbs: ["get", "list", "watch"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: filebeat
namespace: log
labels:
k8s-app: filebeat
1
2
3
4
5
6
[root@master filebeat]# kubectl apply -f filebeat-daemonset.yaml 
configmap/filebeat-daemonset-config-test created
daemonset.apps/filebeat created
clusterrolebinding.rbac.authorization.k8s.io/filebeat created
clusterrole.rbac.authorization.k8s.io/filebeat created
serviceaccount/filebeat created

4.2 Filebeat配置文件介绍

这里先简单介绍下filebeat的配置结构

1
2
3
4
5
6
7
filebeat.inputs:

filebeat.config.modules:

processors:

output.xxxxx:

结构大概是这么个结构,完整的数据流向简单来说就是下面这个图:

4.3 inputs

根据命名空间做分类,每一个命名空间就是一个topic,如果要收集多个集群,同样也是使用命名空间做分类,只不过topic的命名就需要加个k8s的集群名,这样方便去区分了,那既然是通过命名空间来获取日志,那么在配置inputs的时候就需要通过写正则将指定命名空间下的日志文件取出,然后读取,例如:

1
2
3
4
5
6
7
8
9
filebeat.inputs:
- type: container
enabled: true
paths:
- /var/log/containers/*_default_*.log
fields:
namespace: default
env: dev
k8s: cluster-dev

这里是写了一个命名空间,如果有多个,就排开写就行了,如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
filebeat.inputs:
- type: container
enabled: true
paths:
- /var/log/containers/*_default_*.log
fields:
namespace: default
env: dev
k8s: cluster-dev
- type: container
enabled: true
paths:
- /var/log/containers/*_kube-system_*.log
fields:
namespace: kube-system
env: dev
k8s: cluster-dev

上面说了通过命名空间创建topic,我这里加了一个自定义的字段namespace,就是后面的topic的名称,但是这里有很多的命名空间,那在输出的时候,如何动态去创建呢?

1
2
3
4
5
output.kafka:
hosts: ["10.0.105.74:9092","10.0.105.76:9092","10.0.105.96:9092"]
topic: '%{[fields.namespace]}'
partition.round_robin:
reachable_only: true

那么目前,完整的配置文件如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
apiVersion: v1
kind: ConfigMap
metadata:
name: filebeat-config
namespace: log
data:
filebeat.yml: |-
filebeat.inputs:
- type: container
enabled: true
paths:
- /var/log/containers/*_default_*.log
fields:
namespace: default
env: dev
k8s: cluster-dev
- type: container
enabled: true
paths:
- /var/log/containers/*_kube-system_*.log
fields:
namespace: kube-system
env: dev
k8s: cluster-dev
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
output.kafka:
hosts: ["175.27.159.78:9092","175.27.159.78:9093","175.27.159.78:9094"]
topic: '%{[fields.k8s]}-%{[fields.namespace]}'
partition.round_robin:
reachable_only: true

4.4 processors

如果是不对日志做任何处理,到这里就结束了,但是这样又视乎在查看日志的时候少了点什么? 到这里仅仅知道日志内容,和该日志来自于哪个命名空间,但是你=不知道该日志属于哪个服务,哪个pod,甚至说想查看该服务的镜像地址等,但是这些信息在我们上面的配置方式中是没有的,所以需要进一步的添砖加瓦。

这个时候就用到了一个配置项,叫做: processors

4.4.1 添加K8s的基本信息

在采集k8s的日志时,如果按照上面那种配置方式,是没有关于pod的一些信息的,例如:

  • Pod Name
  • Pod UID
  • Namespace
  • Labels

添加前日志示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
{
"@timestamp": "2021-05-06T02:47:09.256Z",
"@metadata": {
"beat": "filebeat",
"type": "_doc",
"version": "7.12.0"
},
"log": {
"file": {
"path": "/var/log/containers/metrics-server-5549c7694f-7vb66_kube-system_metrics-server-9108765e17c7e325abd665fb0f53c8f4b3077c698cb88392099dfbafb0475709.log"
},
"offset": 15842
},
"stream": "stderr",
"message": "E0506 02:47:09.254911 1 reststorage.go:160] unable to fetch pod metrics for pod log/filebeat-s67ds: no metrics known for pod",
"input": {
"type": "container"
},
"fields": {
"env": "dev",
"k8s": "cluster-dev",
"namespace": "kube-system"
},
"ecs": {
"version": "1.8.0"
},
"host": {
"name": "node-03"
},
"agent": {
"hostname": "node-03",
"ephemeral_id": "1c87559a-cfca-4708-8f28-e4fc6441943c",
"id": "f9cf0cd4-eccf-4d8b-bd24-2bff25b4083b",
"name": "node-03",
"type": "filebeat",
"version": "7.12.0"
}
}

那么如果想添加这些信息,就要使用processors中的一个工具,叫做: add_kubernetes_metadata, 字面意思就是添加k8s的一些元数据信息,使用方法可以先来看一段示例:

1
2
3
4
5
6
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"

host: 指定要对filebeat起作用的节点,防止无法准确检测到它,比如在主机网络模式下运行filebeat

matchers: 匹配器用于构造与索引创建的标识符相匹配的查找键

logs_path: 容器日志的基本路径,如果未指定,则使用Filebeat运行的平台的默认日志路径

加上这个k8s的元数据信息之后,就可以在日志里面看到k8s的信息了,看一下添加k8s信息后的日志格式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
{
"@timestamp": "2021-05-06T03:01:58.512Z",
"@metadata": {
"beat": "filebeat",
"type": "_doc",
"version": "7.12.0"
},
"agent": {
"hostname": "node-03",
"ephemeral_id": "c0f94fc0-b128-4eb9-b9a3-387f4cae44b7",
"id": "f9cf0cd4-eccf-4d8b-bd24-2bff25b4083b",
"name": "node-03",
"type": "filebeat",
"version": "7.12.0"
},
"ecs": {
"version": "1.8.0"
},
"stream": "stdout",
"input": {
"type": "container"
},
"host": {
"name": "node-03"
},
"container": {
"id": "6791d22d210507becd7306ead1eeda9a4c558b5ca0630ed5af4f8b1b220fb4a7",
"runtime": "docker",
"image": {
"name": "nginx:1.10"
}
},
"kubernetes": {
"namespace": "default",
"replicaset": {
"name": "nginx-5b946576d4"
},
"labels": {
"app": "nginx",
"pod-template-hash": "5b946576d4"
},
"container": {
"name": "nginx",
"image": "nginx:1.10"
},
"deployment": {
"name": "nginx"
},
"node": {
"name": "node-03",
"uid": "4340750b-1bb4-4d61-a9aa-4715c7326988",
"labels": {
"kubernetes_io/arch": "amd64",
"kubernetes_io/hostname": "node-03",
"kubernetes_io/os": "linux",
"beta_kubernetes_io/arch": "amd64",
"beta_kubernetes_io/os": "linux"
},
"hostname": "node-03"
},
"namespace_uid": "8d1dad4b-bea0-469d-9858-51147822de79",
"pod": {
"name": "nginx-5b946576d4-6kftk",
"uid": "cc8c943a-919c-4e15-9cde-05358b8588c1"
}
},
"log": {
"offset": 2039,
"file": {
"path": "/var/log/containers/nginx-5b946576d4-6kftk_default_nginx-6791d22d210507becd7306ead1eeda9a4c558b5ca0630ed5af4f8b1b220fb4a7.log"
}
},
"message": "2021-05-06 11:01:58 10.234.2.11 - - \"GET / HTTP/1.1\" 200 612 \"-\" \"curl/7.29.0\" \"-\"",
"fields": {
"k8s": "cluster-dev",
"namespace": "default",
"env": "dev"
}
}

可以看到kubernetes这个key的value有关于pod的信息,还有node的一些信息,还有namespace信息等,基本上关于k8s的一些关键信息都包含了,非常的多和全。

但是,问题又来了,这一条日志信息有点太多了,有一半多不是我们想要的信息,所以,我们需要去掉一些对于我们没有用的字段

4.4.2 删除不必要的字段

1
2
3
4
5
6
7
8
9
10
11
12
processors:
- drop_fields:
#删除的多余字段
fields:
- host
- ecs
- log
- agent
- input
- stream
- container
ignore_missing: true

4.4.3 添加日志时间

通过上面的日志信息,可以看到是没有单独的一个关于日志时间的字段的,虽然里面有一个@timestamp,但不是北京时间,而我们要的是日志的时间,message里面倒是有时间,但是怎么能把它取到并单独添加一个字段呢,这个时候就需要用到script了,需要写一个js脚本来替换。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
processors:
- script:
lang: javascript
id: format_time
tag: enable
source: >
function process(event) {
var str=event.Get("message");
var time=str.split(" ").slice(0, 2).join(" ");
event.Put("time", time);
}
- timestamp:
field: time
timezone: Asia/Shanghai
layouts:
- '2006-01-02 15:04:05'
- '2006-01-02 15:04:05.999'
test:
- '2019-06-22 16:33:51'

添加完成后,会多一个time的字段,在后面使用的时候,就可以使用这个字段了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
{
"@timestamp": "2021-05-06T04:32:10.560Z",
"@metadata": {
"beat": "filebeat",
"type": "_doc",
"version": "7.12.0"
},
"message": "2021-05-06 11:32:10 10.234.2.11 - - \"GET / HTTP/1.1\" 200 612 \"-\" \"curl/7.29.0\" \"-\"",
"fields": {
"k8s": "cluster-dev",
"namespace": "default",
"env": "dev"
},
"time": "2021-05-06 11:32:10",
"kubernetes": {
"replicaset": {
"name": "nginx-deployment-6c4b886b"
},
"labels": {
"app": "nginx-deployment",
"pod-template-hash": "6c4b886b"
},
"container": {
"name": "nginx",
"image": "nginx:1.19.5"
},
"deployment": {
"name": "nginx-deployment"
},
"node": {
"uid": "07d8a1a4-e10f-4331-adf0-2fd7d5817c2d",
"labels": {
"beta_kubernetes_io/os": "linux",
"kubernetes_io/arch": "amd64",
"kubernetes_io/hostname": "node-02",
"kubernetes_io/os": "linux",
"beta_kubernetes_io/arch": "amd64"
},
"hostname": "node-02",
"name": "node-02"
},
"namespace_uid": "8d1dad4b-bea0-469d-9858-51147822de79",
"pod": {
"name": "nginx-deployment-6c4b886b-6rbhw",
"uid": "78a28548-3d34-4df6-9a76-c651b39ff934"
},
"namespace": "default"
}
}

4.4.4 优化k8s数据信息结构

这里单独创建了一个字段k8s,字段里包含:podName, nameSpace, imageAddr, hostName等关键信息,最后再把kubernetes这个字段drop掉就可以了。最终结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
processors:
- script:
lang: javascript
id: format_k8s
tag: enable
source: >
function process(event) {
var k8s=event.Get("kubernetes");
var newK8s = {
podName: k8s.pod.name,
nameSpace: k8s.namespace,
imageAddr: k8s.container.name,
hostName: k8s.node.hostname
}
event.Put("k8s", newK8s);
}

日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
"@timestamp": "2021-05-06T05:33:25.351Z",
"@metadata": {
"beat": "filebeat",
"type": "_doc",
"version": "7.12.0"
},
"fields": {
"k8s": "cluster-dev",
"namespace": "default",
"env": "dev"
},
"k8s": {
"hostName": "node-02",
"podName": "nginx-deployment-6c4b886b-6rbhw",
"nameSpace": "default",
"imageAddr": "nginx"
},
"message": "06/May/2021:05:33:25 +0000 10.234.2.11 - - \"GET / HTTP/1.1\" 200 612 \"-\" \"curl/7.29.0\" \"-\""
}

参考文章

https://juejin.cn/post/6844903944913698824

https://juejin.cn/post/6953995023654322190


本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!