====== Product-twins ======
GIT: https://git.picalike.corpex-kunden.de/incubator/product_twins
This service fetches product metadata from MongoDB ('meta_db_collection') for two given shops and uses the first image provided in the 'images' metadata link to search for very similar products (above threshold) of the same brand. These product, the twins, are then stored on an internal database and may be retrieved when needed. The idea is to find the same product on different shops.
===== Operations =====
This project was developed as a customer special solution, however this is no longer active. However, it's still used in endpoints of v5-sim-api (git https://git.picalike.corpex-kunden.de/incubator/v5-sim-api). It is called once a week via cron.
URL: http://report-db01.picalike.corpex-kunden.de:9811/docs
===== API =====
There are 4 endpoints.
* health: check required backend database connections.
* update_data: fetch up-to-date data from MongoDB for two given shops.
* generate_twins: go through the internal data and find twins above the specified threshold (default: 0.95).
* retrieve_twins: retrieve latest twins for a pair of shops. Results can be filtered by category, but the default is to retrieve all categories.
The service is live on http://report-db01.picalike.corpex-kunden.de:9811/
===== Overlap =====
If a shop wants to find out the self-similarity of each category, the API can be used with specific settings:
http://report-db01.picalike.corpex-kunden.de:9811/generate_self_twins\\
requests.post('report-db01.picalike.corpex-kunden.de:9811/generate_self_twins', json={'ref_shop': shop_id})
After the process to find near duplicates has finished, it is possible to fetch the summary of a shop with this call:
http://report-db01.picalike.corpex-kunden.de:9811//simtwins/overlap_category\\
shop_id = 'bonprix_de_feed'
service_url = 'http://report-db01.picalike.corpex-kunden.de:9811/simtwins/overlap_category'
requests.post(service_url, json={'ref_shop_id': shop_id, 'comp_shop_ids': [shop_id]})
===== Related =====
The product twins are related to the simtwins in the sense that the output is the same, just the procedure is different. It might be possible that we will migrate to the new solution altogether.