在之前的一篇中提到使用redis作为转发思路在前面两篇中写的都是elk的安装,这篇叙述在6.3.2中的一些filebeat收集日志和处理的问题,以nginx为例,后面的可能会有,也可能不会有
filebeat安装和配置
filebeat会将日志发送到reids,在这期间包含几个配置技巧,在配置文件出会有一些说明
下载和安装
[root@linuxea-VM_Node-113 ~]# wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.3.2-x86_64.rpm -O $PWD/filebeat-6.3.2-x86_64.rpm
[root@linuxea-VM_Node_113 ~]# yum localinstall $PWD/filebeat-6.3.2-x86_64.rpm -y
启动
[root@linuxea-VM_Node-113 /etc/filebeat/modules.d]# systemctl start filebeat.service
查看日志
[root@linuxea-VM_Node-113 /etc/filebeat/modules.d]# tail -f /var/log/filebeat/filebeat
2018-08-03T03:13:32.716-0400 INFO pipeline/module.go:81 Beat name: linuxea-VM-Node43_241_158_113.cluster.com
2018-08-03T03:13:32.717-0400 INFO instance/beat.go:315 filebeat start running.
2018-08-03T03:13:32.717-0400 INFO [monitoring] log/log.go:97 Starting metrics logging every 30s
2018-08-03T03:13:32.717-0400 INFO registrar/registrar.go:80 No registry file found under: /var/lib/filebeat/registry. Creating a new registry file.
2018-08-03T03:13:32.745-0400 INFO registrar/registrar.go:117 Loading registrar data from /var/lib/filebeat/registry
2018-08-03T03:13:32.745-0400 INFO registrar/registrar.go:124 States Loaded from registrar: 0
2018-08-03T03:13:32.745-0400 INFO crawler/crawler.go:48 Loading Inputs: 1
2018-08-03T03:13:32.745-0400 INFO crawler/crawler.go:82 Loading and starting Inputs completed. Enabled inputs: 0
2018-08-03T03:13:32.746-0400 INFO cfgfile/reload.go:122 Config reloader started
2018-08-03T03:13:32.746-0400 INFO cfgfile/reload.go:214 Loading of config files completed.
2018-08-03T03:14:02.719-0400 INFO [monitoring] log/log.go:124 Non-zero metrics in the last 30s
配置文件
在此配中paths下的是写日志的路径,可以使用通配符,但是如果你使用通配符后就意味着目录下的日志写在一个fields的id中,这个id会传到redis中,在传递到logstash中,最终以一个id的形式传递到kibana当然,这里测试用两个来玩,如下
filebeat.prospectors:
- type: log
enabled: true
paths:
- /data/wwwlogs/1015.log
fields:
list_id: 113_1015_nginx_access
- input_type: log
paths:
- /data/wwwlogs/1023.log
fields:
list_id: 113_1023_nginx_access
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
setup.template.settings:
index.number_of_shards: 3
output.redis:
hosts: ["IP:PORT"]
password: "OTdmOWI4ZTM4NTY1M2M4OTZh"
db: 2
timeout: 5
key: "%{[fields.list_id]:unknow}"
在output中的key: "%{[fields.list_id]:unknow}"意思是如果[fields.list_id]有值就匹配,如果没有就unknow,最终传递给redis中
redis安装
在我意淫的这套里面,redis用来转发数据的,他可以说集群也可以说单点,取决于数据量的大小按照我以往的骚操作,redis当然要用docker来跑,运行一下命令进行安装
curl -Lks4 https://raw.githubusercontent.com/LinuxEA-Mark/docker-alpine-Redis/master/Sentinel/install_redis.sh|bash
安装完成在/data/rds下有一个docker-compose.yaml文件,如下:
[root@iZ /data/rds]# cat docker-compose.yaml
version: '2'
services:
redis:
build:
context: https://raw.githubusercontent.com/LinuxEA-Mark/docker-alpine-Redis/master/Sentinel/Dockerfile
container_name: redis
restart: always
network_mode: "host"
privileged: true
environment:
- REQUIREPASSWD=OTdmOWI4ZTM4NTY1M2M4OTZh
- MASTERAUTHPAD=OTdmOWI4ZTM4NTY1M2M4OTZh
volumes:
- /etc/localtime:/etc/localtime:ro
- /data/redis-data:/data/redis:Z
- /data/logs:/data/logs
redis查看写入情况
[root@iZ /etc/logstash/conf.d]# redis-cli -h 127.0.0.1 -a OTdmOWI4ZTM4NTY1M2M4OTZh
127.0.0.1:6379> select 2
OK
127.0.0.1:6379[2]> keys *
1) "113_1015_nginx_access"
2) "113_1023_nginx_access"
127.0.0.1:6379[2]> lrange 113_1023_nginx_access 0 -1
1) "{"@timestamp":"2018-08-04T04:36:26.075Z","@metadata":{"beat":"","type":"doc","version":"6.3.2"},"beat":{"name":"linuxea-VM-Node43_13.cluster.com","hostname":"linuxea-VM-Node43_23.cluster.com","version":"6.3.2"},"host":{"name":"linuxea-VM-Node43_23.cluster.com"},"offset":863464,"message":"IP - [xexe9x9797xb4:0.005 [200] [200] xe5x9b4:[0.005] \"IP:51023\"","source":"/data/wwwlogs/1023.log","fields":{"list_id":"113_1023_nginx_access"}}"
logstash安装和配置
logstash在内网进行安装和配置,用来抓取公网redis的数据,抓到本地后发送es,在到看kibana
[root@linuxea-VM-Node117 ~]# curl -Lk https://artifacts.elastic.co/downloads/logstash/logstash-6.3.2.tar.gz|tar xz -C /usr/local && useradd elk && cd /usr/local/ && ln -s logstash-6.3.2 logstash && mkdir /data/logstash/{db,logs} -p && chown -R elk.elk /data/logstash/ /usr/local/logstash-6.3.2 && cd logstash/config/ && mv logstash.yml logstash.yml.bak
配置文件
在这个配置文件之前下载ip库,在地图中会用到,稍后配置到配置文件
- 准备工作
安装GeoLite2-City
[root@linuxea-VM-Node117 ~]# curl -Lk http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.tar.gz|tar xz -C /usr/local/logstash-6.3.2/config/
在之前5.5版本也做过nginx的格式化,直接参考
grok
nginx log_format准备
log_format upstream2 '$proxy_add_x_forwarded_for $remote_user [$time_local] "$request" $http_host'
'[$body_bytes_sent] $request_body "$http_referer" "$http_user_agent" [$ssl_protocol] [$ssl_cipher]'
'[$request_time] [$status] [$upstream_status] [$upstream_response_time] [$upstream_addr]';
nginx patterns准备,将日志和patterns可以放在kibana grok检查,也可以在grokdebug试试,不过6.3.2的两个结果并不相同
[root@linuxea-VM-Node117 /usr/local/logstash-6.3.2/config]# cat patterns.d/nginx
NGUSERNAME [a-zA-Z.@-+_%]+
NGUSER %{NGUSERNAME}
NGINXACCESS %{IP:clent_ip} (?:-|%{USER:ident}) [%{HTTPDATE:log_date}] "%{WORD:http_verb} (?:%{PATH:baseurl}?%{NOTSPACE:params}(?: HTTP/%{NUMBER:http_version})?|%{DATA:raw_http_request})" (%{IPORHOST:url_domain}|%{URIHOST:ur_domain}|-)[(%{BASE16FLOAT:request_time}|-)] %{NOTSPACE:request_body} %{QS:referrer_rul} %{GREEDYDATA:User_Agent} [%{GREEDYDATA:ssl_protocol}] [(?:%{GREEDYDATA:ssl_cipher}|-)][%{NUMBER:time_duration}] [%{NUMBER:http_status_code}] [(%{BASE10NUM:upstream_status}|-)] [(%{NUMBER:upstream_response_time}|-)] [(%{URIHOST:upstream_addr}|-)]
配置文件如下:在input中的key写的是reids中的key其中在filebeat的 key是"%{[fields.list_id]:unknow}",这里进行匹配[fields.list_id],在其中表现的是if [fields][list_id] 如果等于113_1015_nginx_access,匹配成功则进行处理grok部分是nginx的patternsgeoip中的database需要指明,source到clent_ip对useragent也进行处理ooutput中需要填写 用户和密码以便于链接到es,当然如果你没有破解或者使用正版,你是不能使用验证的,但是你可以参考x-pack的破解
input {
redis {
host => "47"
port => "6379"
key => "113_1015_nginx_access"
data_type => "list"
password => "I4ZTM4NTY1M2M4OTZh"
threads => "5"
db => "2"
}
}
filter {
if [fields][list_id] == "113_1023_nginx_access" {
grok {
patterns_dir => [ "/usr/local/logstash-6.3.2/config/patterns.d/" ]
match => { "message" => "%{NGINXACCESS}" }
overwrite => [ "message" ]
}
geoip {
source => "clent_ip"
target => "geoip"
database => "/usr/local/logstash-6.3.2/config/GeoLite2-City.mmdb"
}
useragent {
source => "User_Agent"
target => "userAgent"
}
urldecode {
all_fields => true
}
mutate {
gsub => ["User_Agent","["]",""] #将user_agent中的 " 换成空
convert => [ "response","integer" ]
convert => [ "body_bytes_sent","integer" ]
convert => [ "bytes_sent","integer" ]
convert => [ "upstream_response_time","float" ]
convert => [ "upstream_status","integer" ]
convert => [ "request_time","float" ]
convert => [ "port","integer" ]
}
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
}
}
}
output {
if [fields][list_id] == "113_1023_nginx_access" {
elasticsearch {
hosts => ["10.10.240.113:9200","10.10.240.114:9200"]
index => "logstash-113_1023_nginx_access-%{+YYYY.MM.dd}"
user => "elastic"
password => "linuxea"
}
}
stdout {codec => rubydebug}
}
json
但是也不是很骚,于是这次加上json,像这样
log_format json '{"@timestamp":"$time_iso8601",'
'"clent_ip":"$proxy_add_x_forwarded_for",'
'"user-agent":"$http_user_agent",'
'"host":"$server_name",'
'"status":"$status",'
'"method":"$request_method",'
'"domain":"$host",'
'"domain2":"$http_host",'
'"url":"$request_uri",'
'"url2":"$uri",'
'"args":"$args",'
'"referer":"$http_referer",'
'"ssl-type":"$ssl_protocol",'
'"ssl-key":"$ssl_cipher",'
'"body_bytes_sent":"$body_bytes_sent",'
'"request_length":"$request_length",'
'"request_body":"$request_body",'
'"responsetime":"$request_time",'
'"upstreamname":"$upstream_http_name",'
'"upstreamaddr":"$upstream_addr",'
'"upstreamresptime":"$upstream_response_time",'
'"upstreamstatus":"$upstream_status"}';
在nginx.conf中添加后,在主机段进行修改,但是这样一来,你日志的可读性就低了。但是,你的lostash性能会提升,因为logstash不会处理grok,直接将收集的日子转发到es这里需要说明的是,我并没有使用json,是因为他不能将useragent处理好,我并没有找到可行的方式,如果你知道,你可以告诉我但是,你可以这样。比如说使用*.log输入所有到redis,一直到kibana,然后通过kibana来做分组显示启动:
nohup sudo -u elk /usr/local/logstash-6.3.2/bin/logstash -f ./conf.d/*.yml >./nohup.out 2>&1 &
如果不出意外,你会在kibana中看到以logstash-113_1023_nginx_access-%{+YYYY.MM.dd}
的索引