====== V5 Image Picker (AKA Kat Doctor) ====== The long-term goal is to replace the v3 image picker with new v5 services. As a first step, the 'size table' detection for Sheego is replaced to get rid of the old features 'semi_net_inception_2'. ===== Overview ===== git: https://git.picalike.corpex-kunden.de/incubator/v5-image-picker\\ The sheego service is running on tower01: http://tower01.picalike.corpex-kunden.de:8042/docs ===== Size Table Detection ===== Feed: index03:/mnt/storage/var/data/feeds/1110_Wednesday.csv ===== Services ===== For the size table detection there will be a new service with the following API:\\ ''%% [POST] input: urls List[str] output: result List[bool] %%'' A value of True means that the corresponding URL contains a size table object ===== Interal Service ===== A version of the picker for internal use only(!) is available on sandy with the same settings. ===== Sheego ===== The 'size table' detection is a live service and paid by customers. Thus, please be careful with the tower01 services and also coordinate deployments. ===== Models ===== For historic reasons the models are saved here: sg01: /var/www/picalike.corpex-kunden.de/sg01/htdocs/ml_data But now a PostgreSQL is used to store the models: postgresql://picalike:picalike@cloud01.picalike.corpex-kunden.de:5401/features The name of the relation is 'models' and ModelDB from db/models.py can be used to access it. ===== Image Picker ===== API-Request: requests.post(URL, json={'urls': image_url_list, 'category': 'shoes__boots__ankle boots'}) See sandy: /home/picalike/missing_deit_features_worker/start_missing_worker.sh for a list of active instances Logs can be found in /home/picalike/missing_deit_features_worker/nohup.out ==== Client ==== The cronjob to trigger the weekly picking process is located at dev02. /home/picalike/v5/v5_image_picker/scripts The script is just a wrapper and some logic to trigger the endpoints at sandy. The idea is to use the cron output as some kind of QA. ===== Missing Worker ===== sandy: /home/picalike/missing_deit_features_worker HINT: the v5_image_picker / kat doctor schedules unknown features in the missing table to avoid on-the-fly feature extraction which is slow. The scripts emulates local v5_image pickers by using ssh port forwarding to the list of frontends. The bash script is generated by generate_missing_worker_script.py\\ The worker is an infinite loop that scans the missing table for non-modified URLs and create working packages for the individual services. The worker itself only schedules packages to hosts and is not doing any work himself, except for the distribution. ===== Worker Mapping ===== local port | netcup\\ 10000 | picalike@v22019026221283998.hotsrv.de\\ 10001 | picalike@v22019026221283999.supersrv.de\\ 10002 | picalike@v22019026221284000.happysrv.de\\ 10003 | picalike@v22019026221284001.happysrv.de\\ 10004 |picalike@v22019036221284368.ultrasrv.de\\ 10005 | picalike@v22019036221284369.happysrv.de\\ 10006 | picalike@v22019036221284366.powersrv.de\\ 10007 | picalike@v22019036221284367.megasrv.de\\ 10008 | picalike@v22019036221284365.hotsrv.de\\ 10010 | picalike@v22019026221283951.powersrv.de\\ 10009 | picalike@v220201062212128886.quicksrv.de\\ the remote port is sandy:2000*. A quick health check for the netcup servers: http://sandy.picalike.corpex-kunden.de:20004/health and replace 20004 with 0-9. If there is a problem login to the node and check the container 'v5_image_picker_container' ===== Kat Doctor Models ===== sg02: /mnt/storage/var/v5_models Acts as a backup and also for a forward rsync to some servers, since sg02 is allowed to copy remote from local to any server.