Table of Contents
V5 Image Picker (AKA Kat Doctor)
The long-term goal is to replace the v3 image picker with new v5 services. As a first step, the 'size table' detection for Sheego is replaced to get rid of the old features 'semi_net_inception_2'.
Overview
git: https://git.picalike.corpex-kunden.de/incubator/v5-image-picker
The sheego service is running on tower01: http://tower01.picalike.corpex-kunden.de:8042/docs
Size Table Detection
Feed: index03:/mnt/storage/var/data/feeds/1110_Wednesday.csv
Services
For the size table detection there will be a new service with the following API:
[POST] input: urls List[str] output: result List[bool]
A value of True means that the corresponding URL contains a size table object
Interal Service
A version of the picker for internal use only(!) is available on sandy with the same settings.
Sheego
The 'size table' detection is a live service and paid by customers. Thus, please be careful with the tower01 services and also coordinate deployments.
Models
For historic reasons the models are saved here: sg01: /var/www/picalike.corpex-kunden.de/sg01/htdocs/ml_data
But now a PostgreSQL is used to store the models:
postgresql://picalike:picalike@cloud01.picalike.corpex-kunden.de:5401/features
The name of the relation is 'models' and ModelDB from db/models.py can be used to access it.
Image Picker
API-Request:
requests.post(URL, json={'urls': image_url_list, 'category': 'shoes__boots__ankle boots'})
See sandy: /home/picalike/missing_deit_features_worker/start_missing_worker.sh for a list of active instances
Logs can be found in
/home/picalike/missing_deit_features_worker/nohup.out
Client
The cronjob to trigger the weekly picking process is located at dev02.
/home/picalike/v5/v5_image_picker/scripts
The script is just a wrapper and some logic to trigger the endpoints at sandy. The idea is to use the cron output as some kind of QA.
Missing Worker
sandy: /home/picalike/missing_deit_features_worker
HINT: the v5_image_picker / kat doctor schedules unknown features in the missing table to avoid on-the-fly feature extraction which is slow.
The scripts emulates local v5_image pickers by using ssh port forwarding to the list of frontends. The bash script is generated by generate_missing_worker_script.py
The worker is an infinite loop that scans the missing table for non-modified URLs and create working packages for the individual services. The worker itself only schedules packages to hosts and is not doing any work himself, except for the distribution.
Worker Mapping
local port | netcup
10000 | picalike@v22019026221283998.hotsrv.de
10001 | picalike@v22019026221283999.supersrv.de
10002 | picalike@v22019026221284000.happysrv.de
10003 | picalike@v22019026221284001.happysrv.de
10004 |picalike@v22019036221284368.ultrasrv.de
10005 | picalike@v22019036221284369.happysrv.de
10006 | picalike@v22019036221284366.powersrv.de
10007 | picalike@v22019036221284367.megasrv.de
10008 | picalike@v22019036221284365.hotsrv.de
10010 | picalike@v22019026221283951.powersrv.de
10009 | picalike@v220201062212128886.quicksrv.de
the remote port is sandy:2000*.
A quick health check for the netcup servers:
http://sandy.picalike.corpex-kunden.de:20004/health
and replace 20004 with 0-9.
If there is a problem login to the node and check the container 'v5_image_picker_container'
Kat Doctor Models
sg02: /mnt/storage/var/v5_models
Acts as a backup and also for a forward rsync to some servers, since sg02 is allowed to copy remote from local to any server.