Daily QA/Operations Task for the V5/PSQL
Cron output at dev02 is sent to the local inbox and can be viewed with mutt.
The machine should at least be checked two times:
<HTML><ol></HTML>
- At the begin of your workday
- In the early afternoon around 15:00<HTML></ol></HTML>
to verify that there is no need for any actions.
Mails should be deleted as soon as they are analyzed and at the end of the work day, there should be no 'busy' mails in the inbox.
There should be always a QA trello ticket and it should contain all issues that need further investigation
TODO: add the missing steps you do every day
Daily
Check for processes that have been blocked (not started) due to database locking (mutt).
refresh daily live/prelive
- import times, num prods in db, obvious failures
- errors:
less /var/mail/picalike | grep ERROR
- warnings:
less /var/mail/picalike | grep WARNING
<HTML><p></HTML>→ list shops with: many missing cat maps, truncated prices, …<HTML></p></HTML>
v5 extractor
there is a script for error checking: /home/picalike/v5/scripts/check_extractor_logs.sh
check that the v5_extractor is up & running and that were recent enrichments for any shop:
docker logs -t --tail 10000 v5_extractor_container 2>&1 | grep "times: img/net/store"
important check:
docker logs -t --tail 30000 v5_extractor_container 2>&1 | grep "error rate above"
check the summary of all shop enrichments:
curl --silent "http://localhost:1115/list_history?non_empty=1"| python3 -m json.tool
if a shop has errors » 0 or processed_images « total_images, check the logs:
docker logs -t --tail 20000 v5_extractor_container 2>&1 | grep SHOP_ID
Image Blocking
Recently it happened that shops block the image cloud, despite a user-agent, to check for shops for which this happened, check the tiny proxy log:
ssh picalike@v22019026221283999.supersrv.de grep ": CONNECT" /home/picalike/proxy/tinyproxy.log |cut -d':' -f5|cut -d' ' -f3|sort -u grep "Dec 21" /home/picalike/proxy/tinyproxy.log | grep Timeout | wc -l
For an overview time-outs per domain use dump_timeouts.py:
https://git.picalike.corpex-kunden.de/incubator/swiss-army-knife/-/blob/master/tinyproxy/dump_timeouts.py
A domain is listed there if at least one image had status_code 403 (forbidden/access denied). After this happened, all further requests for this domain are routed through the proxy!
Misc
- push_attrs: check that there are any uploads and the times are no super high
- upload_int_scores: check that uploads have been performed to reports engine (multiple times per day) and also to pre-live (at least once a day)
- grep for ERRORs and WARNINGs in refresh_daily_[pre]live logs to see if there are new (not in weekly logs) problems
Keywords: devops v5 operations qa daily mutt import ops