====== V5 Image Picker (AKA Kat Doctor) ======
The long-term goal is to replace the v3 image picker with new v5 services. As a first step, the 'size table' detection for Sheego is replaced to get rid of the old features 'semi_net_inception_2'.
===== Overview =====
git: https://git.picalike.corpex-kunden.de/incubator/v5-image-picker\\
The sheego service is running on tower01: http://tower01.picalike.corpex-kunden.de:8042/docs
===== Size Table Detection =====
Feed: index03:/mnt/storage/var/data/feeds/1110_Wednesday.csv
===== Services =====
For the size table detection there will be a new service with the following API:\\
''%% [POST] input: urls List[str] output: result List[bool] %%'' A value of True means that the corresponding URL contains a size table object
===== Interal Service =====
A version of the picker for internal use only(!) is available on sandy with the same settings.
===== Sheego =====
The 'size table' detection is a live service and paid by customers. Thus, please be careful with the tower01 services and also coordinate deployments.
===== Models =====
For historic reasons the models are saved here: sg01: /var/www/picalike.corpex-kunden.de/sg01/htdocs/ml_data
But now a PostgreSQL is used to store the models:
postgresql://picalike:picalike@cloud01.picalike.corpex-kunden.de:5401/features
The name of the relation is 'models' and ModelDB from db/models.py can be used to access it.
===== Image Picker =====
API-Request:
requests.post(URL, json={'urls': image_url_list, 'category': 'shoes__boots__ankle boots'})
See sandy: /home/picalike/missing_deit_features_worker/start_missing_worker.sh for a list of active instances
Logs can be found in
/home/picalike/missing_deit_features_worker/nohup.out
==== Client ====
The cronjob to trigger the weekly picking process is located at dev02.
/home/picalike/v5/v5_image_picker/scripts
The script is just a wrapper and some logic to trigger the endpoints at sandy. The idea is to use the cron output as some kind of QA.
===== Missing Worker =====
sandy: /home/picalike/missing_deit_features_worker
HINT: the v5_image_picker / kat doctor schedules unknown features in the missing table to avoid on-the-fly feature extraction which is slow.
The scripts emulates local v5_image pickers by using ssh port forwarding to the list of frontends. The bash script is generated by generate_missing_worker_script.py\\
The worker is an infinite loop that scans the missing table for non-modified URLs and create working packages for the individual services. The worker itself only schedules packages to hosts and is not doing any work himself, except for the distribution.
===== Worker Mapping =====
local port | netcup\\
10000 | picalike@v22019026221283998.hotsrv.de\\
10001 | picalike@v22019026221283999.supersrv.de\\
10002 | picalike@v22019026221284000.happysrv.de\\
10003 | picalike@v22019026221284001.happysrv.de\\
10004 |picalike@v22019036221284368.ultrasrv.de\\
10005 | picalike@v22019036221284369.happysrv.de\\
10006 | picalike@v22019036221284366.powersrv.de\\
10007 | picalike@v22019036221284367.megasrv.de\\
10008 | picalike@v22019036221284365.hotsrv.de\\
10010 | picalike@v22019026221283951.powersrv.de\\
10009 | picalike@v220201062212128886.quicksrv.de\\
the remote port is sandy:2000*.
A quick health check for the netcup servers:
http://sandy.picalike.corpex-kunden.de:20004/health
and replace 20004 with 0-9.
If there is a problem login to the node and check the container 'v5_image_picker_container'
===== Kat Doctor Models =====
sg02: /mnt/storage/var/v5_models
Acts as a backup and also for a forward rsync to some servers, since sg02 is allowed to copy remote from local to any server.