V5 ML Pipeline

The aim is to create a flexible but also easy-to-use machine learning pipeline that allows to train arbitrary classifier by using a standardized input format.

Overview

We use a pre-trained network (ResNet34) with a PyTorch setup for our environment.

Feature Storage

psql01 is for now the primary feature storage: /home/picalike/v5/features

There lives a single SQLite database that contains the compressed conv features (1, 512, 47). Each row consists of (image_url, features).

The code for processing the features can be found in 'store.py' here:

https://git.picalike.corpex-kunden.de/hackathon/awesome_ai

Product Classifier Interface

The interface from the product classifier to the category doctor is described in the PDF. product_classifier.pdf

The input is a shop_id, a list of image URLs and a set of label candidates. The output is a dictionary that maps from the image URL to the input labels enhanced with a score.

rank_label_candidates(shop_id, url_list, label_list) -> {url: [label, score]}

Rest

keywords: kat doctor category

Table of Contents

V5 ML Pipeline

Overview

Feature Storage

Product Classifier Interface

Rest