GIT: https://git.picalike.corpex-kunden.de/incubator/product_twins
This service fetches product metadata from MongoDB ('meta_db_collection') for two given shops and uses the first image provided in the 'images' metadata link to search for very similar products (above threshold) of the same brand. These product, the twins, are then stored on an internal database and may be retrieved when needed. The idea is to find the same product on different shops.
This project was developed as a customer special solution, however this is no longer active. However, it's still used in endpoints of v5-sim-api (git https://git.picalike.corpex-kunden.de/incubator/v5-sim-api). It is called once a week via cron.
There are 4 endpoints.
The service is live on http://report-db01.picalike.corpex-kunden.de:9811/
If a shop wants to find out the self-similarity of each category, the API can be used with specific settings:
http://report-db01.picalike.corpex-kunden.de:9811/generate_self_twins
requests.post('report-db01.picalike.corpex-kunden.de:9811/generate_self_twins', json={'ref_shop': shop_id})
After the process to find near duplicates has finished, it is possible to fetch the summary of a shop with this call:
http://report-db01.picalike.corpex-kunden.de:9811//simtwins/overlap_category
shop_id = 'bonprix_de_feed' service_url = 'http://report-db01.picalike.corpex-kunden.de:9811/simtwins/overlap_category' requests.post(service_url, json={'ref_shop_id': shop_id, 'comp_shop_ids': [shop_id]})
The product twins are related to the simtwins in the sense that the output is the same, just the procedure is different. It might be possible that we will migrate to the new solution altogether.