Feed Export

This application's purpose is to extract product information of s24 feed products from solr and upload them as “s24_export_[session].jsonl” file in the folder “picalike_export” in the ftp server of s24.

Host

http://pci01.picalike.corpex-kunden.de:8007/start

Git

https://git.picalike.corpex-kunden.de/picalike/feed_export

S24 FTP Login

URL: ftp://transfer.s24.com
User: s24-csv-picalike
PW: VLSKzCdSXRfgHgz7

Description of the input

To start the Export, we need post request with shop_id as “s24_de_feed”, the update_id and the session. Besides monitoring reasons the session as timestamp is also used for the file_output name of our trend_export.

Required fields from solr are id, trend, popularity, keywords, label, category. The id is the product_id and the trend is the cluster_trend. The popularity currently is mapped as null. The label are solely the attributes of each trend cluster. So are the “keywords” the keywords collected from each trend cluster. The category is a picalike category that got the highest similarity from the feature extraction. The picalike category will be accessed by the solr field “top_cat_in_common_images_text”.

Description of the output

The Feed_export writes a jsonl file, which consists of multiple json - lines, into the container, uploads it in the ftp server and deletes the localfile from the container. The file named as “s24_export_[session].jsonl” will be uploaded in the folder “picalike_export” in the ftp server of s24.

Description of the script

Works with Fast API and Celery. So API can be tested locally with …/docs. Multiple requests are allowed, since Celery manages the queue in the redis DB. A Network for the redis DB and API needs to be included in order to make the export work. Uses zookeeper to locate the solr db and uses the solr_client from picalike_v5 libary to download the data. Downloads data in a batch with 1000 products. Counts the products with and without cluster_trends. Returns the counter to the Shop Conveyor Belt and in the logs.

TODO

Currently the fail message to scb will be only sent, if solr does not have any data to offer. Internal Errors won't be catched. Better build in more possibilities!

Take out the possibility to test API with crawler products. It still uploads results under the file name “s24_export_TEST.jsonl” and consist the data from the given crawler shop.

Customer Requirements

S24 wants to improve their ranking system of their search engine in order increase their search results. Additional to their existing ranking criteria they want to add our trend cluster score to make their search results more trend relevant.

They need a jsonl data export on their FTP server regulary (daily). Their feed can also be downloaded there. Their feed is changing everyday by 30%, therefore our trend export needs to run every day.

S24's customer are relative broad and therefore their fashion portfolio is pretty broad too. They do not sell low or high priced products, mostly products from the otto group. Important to say, S24 is not an online shop, more a “Vermittler” that earns its money through affiliate links. So every customer click on one of their product links to another shops. That specific shop then pays for each transfer to their shop page.

Required Output Fields

id
popularity (currently set as none)
trend
cluster keywords
label (sorted list of most trendy attributes)
category (list of categories sorted by highest similarity score)

S24 defined Trendsetter

jackjones_de_crawler
about_you_de_crawler
baur_de_crawler
heine_de_crawler
hm_de_crawler
meinfischer_de_crawler
otto_de_crawler
peterhahn_de_crawler
schwab_de_crawler
zara_de_crawler

Picalike Dokuwiki Archive

Table of Contents