Table of Contents
Feed Export
This application's purpose is to extract product information of s24 feed products from solr and upload them as “s24_export_[session].jsonl” file in the folder “picalike_export” in the ftp server of s24.
Host
Git
S24 FTP Login
- User: s24-csv-picalike
- PW: VLSKzCdSXRfgHgz7
Description of the input
To start the Export, we need post request with shop_id as “s24_de_feed”, the update_id and the session. Besides monitoring reasons the session as timestamp is also used for the file_output name of our trend_export.
Required fields from solr are id, trend, popularity, keywords, label, category. The id is the product_id and the trend is the cluster_trend. The popularity currently is mapped as null. The label are solely the attributes of each trend cluster. So are the “keywords” the keywords collected from each trend cluster. The category is a picalike category that got the highest similarity from the feature extraction. The picalike category will be accessed by the solr field “top_cat_in_common_images_text”.
Description of the output
The Feed_export writes a jsonl file, which consists of multiple json - lines, into the container, uploads it in the ftp server and deletes the localfile from the container. The file named as “s24_export_[session].jsonl” will be uploaded in the folder “picalike_export” in the ftp server of s24.
Description of the script
Works with Fast API and Celery. So API can be tested locally with …/docs. Multiple requests are allowed, since Celery manages the queue in the redis DB. A Network for the redis DB and API needs to be included in order to make the export work. Uses zookeeper to locate the solr db and uses the solr_client from picalike_v5 libary to download the data. Downloads data in a batch with 1000 products. Counts the products with and without cluster_trends. Returns the counter to the Shop Conveyor Belt and in the logs.
TODO
Currently the fail message to scb will be only sent, if solr does not have any data to offer. Internal Errors won't be catched. Better build in more possibilities!
Take out the possibility to test API with crawler products. It still uploads results under the file name “s24_export_TEST.jsonl” and consist the data from the given crawler shop.
Customer Requirements
S24 wants to improve their ranking system of their search engine in order increase their search results. Additional to their existing ranking criteria they want to add our trend cluster score to make their search results more trend relevant.
They need a jsonl data export on their FTP server regulary (daily). Their feed can also be downloaded there. Their feed is changing everyday by 30%, therefore our trend export needs to run every day.
S24's customer are relative broad and therefore their fashion portfolio is pretty broad too. They do not sell low or high priced products, mostly products from the otto group. Important to say, S24 is not an online shop, more a “Vermittler” that earns its money through affiliate links. So every customer click on one of their product links to another shops. That specific shop then pays for each transfer to their shop page.
Required Output Fields
- id
- popularity (currently set as none)
- trend
- cluster keywords
- label (sorted list of most trendy attributes)
- category (list of categories sorted by highest similarity score)
S24 defined Trendsetter
- jackjones_de_crawler
- about_you_de_crawler
- baur_de_crawler
- heine_de_crawler
- hm_de_crawler
- meinfischer_de_crawler
- otto_de_crawler
- peterhahn_de_crawler
- schwab_de_crawler
- zara_de_crawler