• 推送度量指标
    • Java批量任务例子

    推送度量指标


    偶尔你需要监控不能被获取的实例。它们可能被防火墙保护,或者它们生命周期太短而不能通过拉模式获取数据。Prometheus的Pushgateway允许你将这些实例的时间序列数据推送到Prometheus的代理任务中。结合Prometheus简单的文本导出格式,这使得即使没有客户库,也能使用shell脚本获取数据。

    • shell实现用例,查看Readme
    • Java, 详见PushGateway类
    • Go,详见Push和PushAdd
    • Python, 详见Pushgateway
    • Ruby, 详见Pushgateway

    Java批量任务例子

    这个例子主要说明, 如何执行一个批处理任务,并且在没有执行成功时报警

    如果使用Maven,添加下面的代码到pom.xml文件中:

    1. <dependency>
    2. <groupId>io.prometheus</groupId>
    3. <artifactId>simpleclient</artifactId>
    4. <version>0.0.10</version>
    5. </dependency>
    6. <dependency>
    7. <groupId>io.prometheus</groupId>
    8. <artifactId>simpleclient_pushgateway</artifactId>
    9. <version>0.0.10</version>
    10. </dependency>

    执行批量作业的代码:

    1. import io.prometheus.client.CollectorRegistry;
    2. import io.prometheus.client.Gauge;
    3. import io.prometheus.client.exporter.PushGateway;
    4. void executeBatchJob() throws Exception {
    5. CollectorRegistry registry = new CollectorRegistry();
    6. Gauge duration = Gauge.build()
    7. .name("my_batch_job_duration_seconds")
    8. .help("Duration of my batch job in seconds.")
    9. .register(registry);
    10. Gauge.Timer durationTimer = duration.startTimer();
    11. try {
    12. // Your code here.
    13. // This is only added to the registry after success,
    14. // so that a previous success in the Pushgateway is not overwritten on failure.
    15. Gauge lastSuccess = Gauge.build()
    16. .name("my_batch_job_last_success_unixtime")
    17. .help("Last time my batch job succeeded, in unixtime.")
    18. .register(registry);
    19. lastSuccess.setToCurrentTime();
    20. } finally {
    21. durationTimer.setDuration();
    22. PushGateway pg = new PushGateway("127.0.0.1:9091");
    23. pg.pushAdd(registry, "my_batch_job");
    24. }
    25. }

    警报一个Pushgateway,如果需要的话,修改host和port

    如果任务最近没有运行,请创建一个警报到Alertmanager。将以下内容添加到Pushgateway的Prometheus服务的记录规则中:record rules ALERT MyBatchJobNotCompleted IF min(time() - my_batch_job_last_success_unixtime{job="my_batch_job"}) > 60 * 60 FOR 5m WITH { severity="page" } SUMMARY "MyBatchJob has not completed successfully in over an hour"