====== DEPRECATED ====== New Feed Import can be found at: [[v5-feed-import|V5 Feed Import]] ====== Feed Import ====== Imports metadata, history, images, and features from feeds. ===== Host ===== http://pci01.picalike.corpex-kunden.de:9900 ===== Git ===== Feed reader and processor: https://git.picalike.corpex-kunden.de/picalike/feed_import_fastapi\\ DB writers: https://git.picalike.corpex-kunden.de/picalike/feed_writers\\ Feed objects: https://git.picalike.corpex-kunden.de/picalike/feed_objects ===== Database ===== Host: **osa_data** === meta_db === * **meta_db_collection** contains metadata for all products in all shops; unique key: (''%%shop_id%%'', ''%%prod_id%%'') or ''%%picalike_id%%''. * **categories** contains number of products for all categories in all shops in all sessions; unique key: (''%%shop_id%%'', ''%%path%%'', ''%%session%%''). * **history** contains historical data for all products in all shops in all session; unique key: (''%%shop_id%%'', ''%%prod_id%%'', ''%%sessions%%'') or (''%%picalike_id%%'', ''%%session%%''). === feature_db === * **feature_db_collection** contains features for all images in all shops; unique key: ''%%url%%''. ===== Usage ===== The feed import receives commands and sends stats to the [[shop_conveyor_belt|Shop Conveyor Belt]] through port 9900. Some other commands can be send in exceptional cases through the FastAPI interface at http://pci01.picalike.corpex-kunden.de:9900/docs ===== Feed Import ===== The feed import module extracts relevant information from feeds and stores it into persistent queues that are later processed by the feed writers. It receives ''%%start%%'' commands from the [[shop_conveyor_belt|shop conveyor belt]] through a FastAPI interface and is currently configured to parallel process 5 feeds in as many Celery tasks.\\ Each feed is parsed using a feed object as iterator and its status is recorded in a SQLite table. When the reading phase is complete, a ''%%start%%'' command is issued to all writers and their responses are collected in the SQLite table. When all writers are done, the feed import notifies the [[shop_conveyor_belt|shop conveyor belt]] with the outcome and the related stats and errors. ===== Feed Writers ===== Each feed writer independently reads data from its queue and writes it in the database. Currently, there are writers for categories, metadata, history, and features. All of them use ''%%base_writer%%'' as base image and specialize only in the ''%%writer.py%%'' file which should reference the appropriate client in the library ''%%picalike_v5%%''. Writers are implemented as FastAPI services with a Celery task manager currently configured to process 5 feeds in parallel.\\ Feed writers only interact with the ''%%feed_import%%'' module (i.e., the [[shop_conveyor_belt|shop conveyor belt]] is not aware of their existence), by receiving commands such as ''%%start%%'' and ''%%cancel%%'' and sending out status information such as ''%%started%%'', ''%%failed%%'' and ''%%done%%'' along with statistics and errors. ===== Feed Objects ===== Feed objects are responsible for actually reading the feed. There are three general feed objects that can be applied depending on the file type (csv, jsons, zipped csv) and behave according to the feed settings. For special needs, custom feed objects can be implemented (e.g., Witt).\\ More information on how fields are processed and how to configure feed settings can be found here: http://pci01.picalike.corpex-kunden.de:9900/conf_docs