User Tools

Site Tools


v5_image_picker

V5 Image Picker (AKA Kat Doctor)

The long-term goal is to replace the v3 image picker with new v5 services. As a first step, the 'size table' detection for Sheego is replaced to get rid of the old features 'semi_net_inception_2'.

Overview

Size Table Detection

Feed: index03:/mnt/storage/var/data/feeds/1110_Wednesday.csv

Services

For the size table detection there will be a new service with the following API:
[POST] input: urls List[str] output: result List[bool] A value of True means that the corresponding URL contains a size table object

Interal Service

A version of the picker for internal use only(!) is available on sandy with the same settings.

Sheego

The 'size table' detection is a live service and paid by customers. Thus, please be careful with the tower01 services and also coordinate deployments.

Models

For historic reasons the models are saved here: sg01: /var/www/picalike.corpex-kunden.de/sg01/htdocs/ml_data

But now a PostgreSQL is used to store the models:

postgresql://picalike:picalike@cloud01.picalike.corpex-kunden.de:5401/features

The name of the relation is 'models' and ModelDB from db/models.py can be used to access it.

Image Picker

API-Request:

requests.post(URL, json={'urls': image_url_list, 'category': 'shoes__boots__ankle boots'})

See sandy: /home/picalike/missing_deit_features_worker/start_missing_worker.sh for a list of active instances

Logs can be found in

/home/picalike/missing_deit_features_worker/nohup.out

Client

The cronjob to trigger the weekly picking process is located at dev02.

/home/picalike/v5/v5_image_picker/scripts

The script is just a wrapper and some logic to trigger the endpoints at sandy. The idea is to use the cron output as some kind of QA.

Missing Worker

sandy: /home/picalike/missing_deit_features_worker

HINT: the v5_image_picker / kat doctor schedules unknown features in the missing table to avoid on-the-fly feature extraction which is slow.

The scripts emulates local v5_image pickers by using ssh port forwarding to the list of frontends. The bash script is generated by generate_missing_worker_script.py

The worker is an infinite loop that scans the missing table for non-modified URLs and create working packages for the individual services. The worker itself only schedules packages to hosts and is not doing any work himself, except for the distribution.

Worker Mapping

local port | netcup
10000 | picalike@v22019026221283998.hotsrv.de
10001 | picalike@v22019026221283999.supersrv.de
10002 | picalike@v22019026221284000.happysrv.de
10003 | picalike@v22019026221284001.happysrv.de
10004 |picalike@v22019036221284368.ultrasrv.de
10005 | picalike@v22019036221284369.happysrv.de
10006 | picalike@v22019036221284366.powersrv.de
10007 | picalike@v22019036221284367.megasrv.de
10008 | picalike@v22019036221284365.hotsrv.de
10010 | picalike@v22019026221283951.powersrv.de
10009 | picalike@v220201062212128886.quicksrv.de

the remote port is sandy:2000*.

A quick health check for the netcup servers:

http://sandy.picalike.corpex-kunden.de:20004/health

and replace 20004 with 0-9.

If there is a problem login to the node and check the container 'v5_image_picker_container'

Kat Doctor Models

sg02: /mnt/storage/var/v5_models

Acts as a backup and also for a forward rsync to some servers, since sg02 is allowed to copy remote from local to any server.

v5_image_picker.txt · Last modified: 2024/04/11 14:23 by 127.0.0.1