The aim is to create a flexible but also easy-to-use machine learning pipeline that allows to train arbitrary classifier by using a standardized input format.
We use a pre-trained network (ResNet34) with a PyTorch setup for our environment.
psql01 is for now the primary feature storage: /home/picalike/v5/features
There lives a single SQLite database that contains the compressed conv features (1, 512, 47). Each row consists of (image_url, features).
The code for processing the features can be found in 'store.py' here:
The interface from the product classifier to the category doctor is described in the PDF. product_classifier.pdf
The input is a shop_id, a list of image URLs and a set of label candidates. The output is a dictionary that maps from the image URL to the input labels enhanced with a score.
rank_label_candidates(shop_id, url_list, label_list) -> {url: [label, score]}
keywords: kat doctor category