User Tools

Site Tools


feed_import

DEPRECATED

New Feed Import can be found at: V5 Feed Import

Feed Import

Imports metadata, history, images, and features from feeds.

Host

Git

Database

Host: osa_data

meta_db

  • meta_db_collection contains metadata for all products in all shops; unique key: (shop_id, prod_id) or picalike_id.
  • categories contains number of products for all categories in all shops in all sessions; unique key: (shop_id, path, session).
  • history contains historical data for all products in all shops in all session; unique key: (shop_id, prod_id, sessions) or (picalike_id, session).

feature_db

  • feature_db_collection contains features for all images in all shops; unique key: url.

Usage

The feed import receives commands and sends stats to the Shop Conveyor Belt through port 9900. Some other commands can be send in exceptional cases through the FastAPI interface at http://pci01.picalike.corpex-kunden.de:9900/docs

Feed Import

The feed import module extracts relevant information from feeds and stores it into persistent queues that are later processed by the feed writers. It receives start commands from the shop conveyor belt through a FastAPI interface and is currently configured to parallel process 5 feeds in as many Celery tasks.
Each feed is parsed using a feed object as iterator and its status is recorded in a SQLite table. When the reading phase is complete, a start command is issued to all writers and their responses are collected in the SQLite table. When all writers are done, the feed import notifies the shop conveyor belt with the outcome and the related stats and errors.

Feed Writers

Each feed writer independently reads data from its queue and writes it in the database. Currently, there are writers for categories, metadata, history, and features. All of them use base_writer as base image and specialize only in the writer.py file which should reference the appropriate client in the library picalike_v5. Writers are implemented as FastAPI services with a Celery task manager currently configured to process 5 feeds in parallel.
Feed writers only interact with the feed_import module (i.e., the shop conveyor belt is not aware of their existence), by receiving commands such as start and cancel and sending out status information such as started, failed and done along with statistics and errors.

Feed Objects

Feed objects are responsible for actually reading the feed. There are three general feed objects that can be applied depending on the file type (csv, jsons, zipped csv) and behave according to the feed settings. For special needs, custom feed objects can be implemented (e.g., Witt).
More information on how fields are processed and how to configure feed settings can be found here: http://pci01.picalike.corpex-kunden.de:9900/conf_docs

feed_import.txt · Last modified: 2024/04/11 14:23 by 127.0.0.1