====== Migration of v3 sim/trend Customers ======
Something like the status quo of the migration of the v3 sim to the new v5 world.
**THIS IS WORK IN PROGRESS AND SUBJECT TO CHANGE AT ANY TIME**
===== GIT repos =====
* https://git.picalike.corpex-kunden.de/incubator/v5-color-extractor
* https://git.picalike.corpex-kunden.de/incubator/v5-backend
* https://git.picalike.corpex-kunden.de/incubator/v5-sim-api
* https://git.picalike.corpex-kunden.de/incubator/shop-extractor
===== Alpha Pipeline =====
The pipeline does not operate on CSV files, but on feeds in the old v3 Mongo
host: mongodb0{1, 2}.live.picalike.corpex-kunden.de:27017
database: picalike3
username: picalike3
password: [get me someplace else]
connection string: mongodb://picalike3:
mkdir -p /tmp/feeds; python3 ./v3_v5_feed.py --output-path /tmp/feeds/
The resulting jsons files in the folder can be directly used by the import_data script
==== Shop Extractor ====
The goal of the shop extractor is to send extraction requests to different extractor instances and communicate the extraction status to the shop conveyor belt. It also ensures that all ‘recent’ shop images have features for a given shop. This is a work in progress and was never fully tested nor deployed.
==== Grapex Environment ====
Link: [[gpu_machines|GPU machines]]
The base folder is here:
/home/picalike/v5
The environment is a Python 3.8 environment that is managed via
pip3 install --user
But be careful since the environment is shared with other users. Furthermore, the access to grapex is limited.
==== [optional] Feature Enrichment: v3 color / v5 shape ====
The v5_extractor service does not support v3 feeds yet and thus, a dedicated bulk extractor is used. The situation is further complicated because due to lack of hardware, grapex is used for the extraction.
Color:
ssh grapex
cd /home/picalike/v5/v3_colors
./color_enrichment.sh
which expects color_urls.distinct in the v3_mongo folder\\
Shape:
ssh grapex
cd /home/picalike/v5/v5_extractor
./export_features.sh
which expects urls.jsons in the same folder\\
Both files are generated with the v3_image_urls.py script from v5_color_extractor
==== Import v3 feeds into the v5 ====
The import_data.py script is located in the v5_backend:
python3 scripts/import_data.py --db_uri postgresql://docker:pgsql@localhost:5401/products --source_uri /tmp/feeds --shop_ids ags_v3_de_feed hirmergg_v3_de_feed madeleine_v3_de_feed sheego_v3_de_feed sportscheck_v3_de_feed witt_v3_de_feed
The call uses a local backend database and a one-json-per-line file as input. All v3 feeds should have an extra prefix _v3_.
HINT: only products with existing shape features are imported. the lookup is done via –replica_uri and the database is filled by the feature extraction step.
And now the v3 color features:
scripts/import_v3_color.py --db-uri postgresql://docker:pgsql@localhost:5401/products --shop-ids ags_v3_de_feed hirmergg_v3_de_feed madeleine_v3_de_feed sheego_v3_de_feed sportscheck_v3_de_feed witt_v3_de_feed
==== Create V3 SIM (materialized) view ====
cd /home/$USER/repos/v5_backend
export PYTHONPATH=.
python3 scripts/dump_schema.py --v3-only | psql postgresql://docker:pgsql@localhost:5401/products
This step allows the v5_sim service to access and use the data.
After imports, you need to refresh the view:
refresh materialized view CONCURRENTLY v3_sim ;