====== V5 ML Pipeline ======

The aim is to create a flexible but also easy-to-use machine learning pipeline that allows to train arbitrary classifier by using a standardized input format.


====== Overview ======

We use a pre-trained network (ResNet34) with a PyTorch setup for our environment.


====== Feature Storage ======

psql01 is for now the primary feature storage: ''%%/home/picalike/v5/features%%''

There lives a single SQLite database that contains the compressed conv features (1, 512, 47). Each row consists of (image_url, features).

The code for processing the features can be found in 'store.py' here:

https://git.picalike.corpex-kunden.de/hackathon/awesome_ai


====== Product Classifier Interface ======

The interface from the product classifier to the category doctor is described in the PDF. [[dokuwiki/lib/exe/fetch.php?media=product_classifier.pdf|product_classifier.pdf]]

The input is a shop_id, a list of image URLs and a set of label candidates. The output is a dictionary that maps from the image URL to the input labels enhanced with a score.

<code>
rank_label_candidates(shop_id, url_list, label_list) -> {url: [label, score]}
</code>

====== Rest ======

keywords: kat doctor category