====== V5 ML Pipeline ====== The aim is to create a flexible but also easy-to-use machine learning pipeline that allows to train arbitrary classifier by using a standardized input format. ====== Overview ====== We use a pre-trained network (ResNet34) with a PyTorch setup for our environment. ====== Feature Storage ====== psql01 is for now the primary feature storage: ''%%/home/picalike/v5/features%%'' There lives a single SQLite database that contains the compressed conv features (1, 512, 47). Each row consists of (image_url, features). The code for processing the features can be found in 'store.py' here: https://git.picalike.corpex-kunden.de/hackathon/awesome_ai ====== Product Classifier Interface ====== The interface from the product classifier to the category doctor is described in the PDF. [[dokuwiki/lib/exe/fetch.php?media=product_classifier.pdf|product_classifier.pdf]] The input is a shop_id, a list of image URLs and a set of label candidates. The output is a dictionary that maps from the image URL to the input labels enhanced with a score. rank_label_candidates(shop_id, url_list, label_list) -> {url: [label, score]} ====== Rest ====== keywords: kat doctor category