git: https://git.picalike.corpex-kunden.de/smessuti/witt_customers
database: you_api on sandy
A visual vocabulary consisting of all the products from the witt feed is learned using a mixture of gaussians. The results are saved in fv_data as gmm.dat for 128 gaussians and gmm16.dat for 16 gaussians.
The visual vocabulary quantizes the feature space into different cells, and image features are soft-assigned to these cells. The assignments are then aggregated over the images of all the purchased products to obtain a customer representation.
The fisher vector is calculated as follows: for each product we take its semi_net_inception_3 features from the feature_db and compute the gradient of the log-likelihood of the image wrt the mixing weight, the mean and the covariance (using diagonal covariance matrices).
By averaging the gradients over all products, this yields a customer descriptor of size K(1 + 2d), where d (= 100) is the dimension of the feature space and K is the number of gaussians. The vector is then normalized and saved in the database.
To match new products to customers we calculate the fisher vector of the product image and its euclidean distance to all customers vectors. The distance is then weighted according to the gender and the category of the product.
For K = 128 the fisher vectors have size 25728 and for K = 16 the size is 3216. In order to speed up the search we apply product quantization to the K = 16 fisher vectors.
The product quantization is learned over all witt products and the lookup table is saved in fv_data as pq16.dat. Each new product vector is encoded and saved in the database.
The query is the fisher vector fv16 of a customer. The asymmetric distance between this vector and the pq16 vectors of the new products is calculated and weighted according to gender and categories. The nearest products are returned.
Results of these experiments with 100 products and 100 customers can be seen at http://frontend04-hpc.picalike.corpex-kunden.de:5000/fv_test by clicking on a customer id. The code is contained in fv/test_2 for fisher vectors and in fv/test_2.1 for product quantization. The code for the web page is in the you_api git (http://dokuwiki.picalike.corpex-kunden.de/you_api) in src/fv.py.
references:
https://hal.inria.fr/file/index/docid/619403/filename/final.r1.pdf
https://gist.github.com/danoneata/9927923
https://bitbucket.org/doneata/fv4a/src/9cd355701c1657eff11a71f8ce4cc42ddd381113/evaluate.py#lines-50:85