prometheus_monitoring
Table of Contents
Services
Technologies: nagios, prometheus
basic nagios:
- if everything is ok, return:
- OK
- else:
- ERROR message
- message should reflect the error and perhaps a solution strategy
TODO: alert channel for slack
V3 LillySyncService
<HTML><ol></HTML>
- recipe: go to server and restart docker container “frontend_instance1”
- check: http://index-prelive.picalike.corpex-kunden.de:15672/#/queues if host appears in rabbitmq
- check: http://http://sg01.picalike.corpex-kunden.de:5002/by_service host should show OK after about 1 minute (or 2)<HTML></ol></HTML>
In case of errors, a notification is send to the slack alert channel with the prefix 'LillySyncService'.
feature extractor
top viewed (JL)
- health endpoints:
- frontend05-hpc:5001/service/health
- frontend05-hpc:5002/health
top looks (JL)
- implemented
- health endpoints:
- frontend05-hpc:8012/health
- frontend05-hpc:8013/health
also viewed (JL)
- frontend05-hpc:5004/service/health
get cat trends
- frontend05-hpc:5003/service/health
settings provider (JO)
netcup postgresql (TS)
- a basic monitoring script that reports errors to the 'alert' channel is ready and deployed at dev01 (~/bin/v5_slack_monitoring.py)
- implemented as https://git.picalike.corpex-kunden.de/-/snippets/12
shop-conveyor-belt (BZ)
- is this service running
- number of feeds per state
- are all services in zookeeper
- feed_import_fastapi
- solr_updater
- …
krawla2feed (HG)
- is this service running
- implemented
- how many products were found by the crawler
- how many products were written into the feed
feed-import (BZ/JL)
- is this service running
- how many feeds were imported
- how many products were imported
osa-report-api (JO)
<HTML><ol></HTML>
- running?
<HTML><ol></HTML>
- implemented<HTML></ol></HTML>
- pg up?
<HTML><ol></HTML>
- implemented<HTML></ol></HTML>
- last updates ok?<HTML></ol></HTML>
solr_updater (*hopefully soon deprecated*) (BZ)
<HTML><ol></HTML>
- implemented<HTML></ol></HTML>
trend_analyzer (*soon deprecated?*)
visualytics_notification_api (mostly for sending emails)
similarity_api (fridtjof implementation) (TS)
get_trend_description_solr
product_trend_calculator
image-cloud (JO)
onsight-analytics (python middleware to php frontend) (JO)
- is this service running?
osa-cluster (BZ)
mapping-service (BZ)
witt-reports
Sketches API
health check via dev01 script is active
prometheus_monitoring.txt · Last modified: 2024/04/11 14:23 by 127.0.0.1