elastalert配置报警并提供指标给Prometheus

elastalert是用Python写的基于elasticsearch的报警工具,这里介绍一些基础配置和如何为Prometheus提供指标 Quick Start 启用一个elasticsearch docker run -p 9200:9200 -d -e ES_JAVA_OPTS="-Xms256m -Xmx256m" elasticsearch:5.5.0-alpine 这里使用docker快速启动一个elaticsearch数据库,如何安装docker点击这里 新建文件夹rules,并添加问题example.yaml name: esa_twitter_message_timeout type: frequency index: ppp num_events: 10 timeframe: hours: 4 query_delay: minutes: 5 filter: - term: user_p: "1.0" ignore_email: true alert: - "email" email: - "elastalert@example.com" 这里设置一个query_delay时间防止搜索时日志还没有发送到es数据库中 ignore_email:跳过发送邮件 启动elastalert docker run -p 8000:8000 \ -e ELASTICSEARCH_HOST=192.168.99.100 \ -e ELASTALERT_INDEX=s102 \ -v $(pwd)/rules:/opt/rules \ hand/elastalert_prometheus:v0.1 ELASTALERT_INDEX指定elastalert存储的index »

Author image VinkDong on #ops,

fluentd实践

Fluentd在日志收集和日志缓存方面有很大的应用,fluentd组件很多,这里主要针对微服务架构的部分组件做一些例子 关于fluend安装和日志直接聚合为Prometheus提供数据查看这里 手动抛指标给fluentd echo '{"message":"hello"}' | fluent-cat debug.log --host testserver --port 24225 直接使用tail读取日志并指定key数据格式 <source> @type tail path /a.log pos_file /a.pos tag a.log format /^(?<count>\d+)$/ types count:integer read_from_head true </source> 这里的type可用的格式为integer (“int” would NOT work!)、string、bool、float、time、time、array read_from_head表示从头开始读 一个tag复制成多份不同tag分别处理 安装rewrite_tag_filter gem install fluent-plugin-rewrite-tag-filter 配置文件如下 <match a.*.*> @type copy <store> @type rewrite_tag_filter <rule> key message tag aaa.${tag} pattern .* </rule> </store> <store> @type rewrite_tag_filter <rule> key message tag bbb. »

Author image VinkDong on #ops,

fluentd聚合数据给Prometheus提供指标

有时,需要计数一分钟内WARN的日志出现了多少次,fluentd第三方插件fluent-plugin-grepcounter提供了这个功能,取到指标之后,再使用Prometheus收集处理就方便多了,fluentd同样也有第三方的prometheus插件fluent-plugin-prometheus 首先安装fluentd(alpine系统) apk update apk add gcc apk add ruby ruby-dev build-base ruby-irb gem install fluentd -v "~> 0.12.0" --no-ri --no-rdoc 安装需要的插件 gem install fluent-plugin-grepcounter gem install fluent-plugin-prometheus 添加fluentd配置文件/fld.conf <source> @type prometheus </source> <source> @type tail path /a.log pos_file /a.pos tag a.log format none </source> <match warn.count.**> @type prometheus <metric> name message_warn_counter type gauge desc The total number of count in message in 10s. »

prometheus export取jvm,json指标

jmx-exporter 当Java应用提供jmx端口时可以在不重启应用的情况下使用jmx_exporterh给Prometheus提供指标 下载jmx_exporter wget https://share.vinkdong.com/_category/ops/java/jmx_exporter_1.0.tar.gz 解压jmx_exporter_1.0.tar.gz tar -xzf jmx_exporter_1.0.tar.gz cd jmx_exporter/ 编辑config.yml,hostPort改为对应的jmx地址,javaapp替换为对应app的名字 --- hostPort: 127.0.0.1:9902 username: password: lowercaseOutputName: true blacklistObjectNames: ["com.alibaba.druid:type=DruidDataSourceStat"] whitelistObjectNames: ["java.lang:type=Memory","java.lang:type=ClassLoading","java.lang:type=Threading","java.lang:type=OperatingSystem"] rules: - pattern: 'java.lang<type=(Memory|ClassLoading|Threading|OperatingSystem)><.*>(\w+):' name: framework_jvm_$1 labels: javaapp: uatosp name: $2 编辑run.sh,5557可以改为合适的端口 #!/usr/bin/env bash # Script to run a java application for testing jmx4prometheus. # Note: You can use localhost:5556 instead of 5556 for configuring socket hostname. »

Prometheus Alert常用配置

告警配置 先附上一个测试规则的地址 https://prometheus.io/webtools/alerting/routing-tree-editor/ 短信 在alertmanager得webhook的基础上进行了适应性开发源码在github上,报警触发后往指定地址发送如下形式的POST请求,存在Extra则分别替换为receiver发送多次: { "receiver": "msg_18888888888", "status": "firing", "alerts": [{ "status": "firing", "labels": { "alertname": "NodeCPUUsage", "app": "prometheus", "cluster": "default", "component": "node-exporter", "cpu": "cpu4", "instance": "172.0.0.2:9100", "job": "kubernetes-endpoints", "kubernetes_name": "prometheus-node-exporter", "kubernetes_namespace": "monitoring", "mode": "idle", "node_name": "ci", "severity": "page" }, "annotations": { "DESCRIPTION": ": CPU usage is above 75% (current value is: )", "SUMMARY": ": High CPU usage detected" }, "startsAt": "2017-10-17T07:10:47.804Z", "endsAt": "0001-01-01T00:00:00Z", "generatorURL": "http://prometheus. »