Date: 2020-11-10
Host: index01-hpc
Component: v3-billing
Script: bin/distributer.py
Problem: messages could not be sent to collection endpoints on frontend05-hpc. this created a growing and ultimately ended in out-of-memory (oom). this was documented in /var/log/syslog (or older files)
Solution: added a semaphore with size 50
Date: 2020-11-10
Host: frontend05-hpc
docker containers: top_viewed_api, also_viewed_api, get_cat_trends
Problem: containers could not connect to local mongo (top_viewed_mongo). This problem has been encountered recently for the first time.
Solution: place all container in a docker network (top_viewed_network)
Cause: unknown, may related to docker storage engine changes (Corpex-Ticket: 6609)
Date: 2020-11-27
Host: all frontends
Problem: strange sim api results, especially different results for different frontends
Solution: call lilly sync from sg02 (check history for sync call)
Cause: unknown, maybe related to vpn problems over the last days
Date: 2020-12-01
Host: all frontends
Problem: The look
API did not yield results for some products of a look (seen first for Otto AT, but later on for
many shops).
Cause: Incorrect handling of category names. Usually a category should be saved as is into the corresponding mongo collections (ic_<uid>, ic_7510, pci_styles) but for some reason special characters (Umlaute, <, >, etc.) were substituted by whitespace for some products (in some shops). While handling these categories in Look
API and inside Lilly, they were not mapped by the category ID (which is also saved in the mongo) for each product, but encoded in the Lilly (crc32 function) which then searched for a wrong, not existing ID and yielded “category not existing”. It is still an open question why/how the category is saved this wrong way at all.
Solution: We manually replaced the wrong categories by correct categories in ic_7510 and pci_styles and then changed the lilly_data_from_mongo script on sg02 (the former version is named lilly_data_from_mongo_20201202, check changes with diff command) to not use the wrong categories when exporting ic_905 (special solution for one of the affected customers) but the correct data given for each product.
Open/ToDo : the special solution has to be replaced by importing the categories correctly. This must also include the fix for all other shops with that error.
Additional Solution: for the corresponding shops, we added they entry “strip_chars” : false to the mongo user collection, which should fix this issue from the import (for both ic_* collection and pci_styles collection, not for ic_7510 which is not used any more)
Date: 2020-12-19
Host: report-engine
Problem: slow answers from report-engine
Cause: running view refresh + lack of disk space
Solution: add more disk space
Date: 2021-01-04
Host: index04
Problem: V3 Feed-Imports did not finish since 2021-01-03
Cause: redis_export on index04 was missing..
Log: index04:/home/picalike/var/log/kpo_collector.log
2021-01-04 09:06:25,709 ERROR [kpo] {index04.picalike.corpex-kunden.de} No label exporter of type _v4-labels-jrpc._tcp.local. with service level live found!
2021-01-04 09:06:30,712 ERROR [kpo] {index04.picalike.corpex-kunden.de} failed to send label notifications, will retry in 300.0000 seconds
Solution: start redis_export (as described in TODO.restart)
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "/mnt/storage/var/live/lilly_sync_data/lib/picapika/rabbitclient.py", line 232, in background_thread
self._connection.processEvents()
File "/mnt/storage/var/live/lilly_sync_data/lib/picapika/rabbitclient.py", line 111, in processEvents
self._connection.process_data_events(0)
File "/home/picalike/.local/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 650, in process_data_events
self._flush_output(timer.is_ready, common_terminator)
File "/home/picalike/.local/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 411, in _flush_output
self._impl.ioloop.process_timeouts()
File "/home/picalike/.local/lib/python2.7/site-packages/pika/adapters/select_connection.py", line 283, in process_timeouts
timer['callback']()
File "/home/picalike/.local/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 93, in signal_once
assert not self._ready, '_CallbackResult was already set'
AssertionError: _CallbackResult was already set
Date: 2021-01-18, 2021-04-14
Host: frontend-hpc02, v22019026221283998.hotsrv.de
Problem: heartbeat_monitor.py on sg01 says: LillySyncService:lilly.frontend02-hpc.picalike.corpex-kunden.de is down
Mitigation: check the lilly directory on the machine (netcup: ls –sort=time -l /home/picalike/lilly-data/|head -n10, corpex: ls –sort=time /mnt/storage/var/lilly-data/ -l|head -n10) if there are recently changed directories. To check the logs, do NOT use the docker logs, but ~/log/{sync_data.log). If not fresh or in doubt: docker restart frontend_instance1
; Additionally make sure that the frontend is responsive (Same-O-Same-O Mail).
[2021-01-18 16:24:49 +0000] [19] [CRITICAL] WORKER TIMEOUT (pid:335)
2021-01-18 16:31:27,127 CRITICAL -- Connection close detected
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/local/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/mnt/storage/var/live/lilly_sync_data/lib/picapika/rabbitclient.py", line 232, in background_thread
self._connection.processEvents()
File "/mnt/storage/var/live/lilly_sync_data/lib/picapika/rabbitclient.py", line 111, in processEvents
self._connection.process_data_events(0)
File "/usr/local/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 650, in process_data_events
self._flush_output(timer.is_ready, common_terminator)
File "/usr/local/lib/python2.7/site-packages/pika/adapters/blocking_connection.py", line 426, in _flush_output
raise exceptions.ConnectionClosed()
ConnectionClosed
#######################################################################################################
Date: 2021-01-21
Host: frontend06-hpc
Component: new-api
Problem: In F06, is running an old flask application named 'new_api'. From time to time, the partition /mnt/storage reaches 100 % usage, because this new_api logs a lot and does not have log-rotation.
Solution:
Date: 2021-03-14
Host: pci01
Component: all on pci01
Problem: VM was unresponsive and had to be restarted (maybe problem in feed_import or memory problem in witt_reports?)
Solution: had to restart witt_reports and visualytics_notification_api the billing service also went down as collateral (needs pci on port 8090 to function) TODO_restart on pci01 was updated
Date: 2021-03-18
Host: frontend05-hpc
Component: newly started docker services
Problem: The uptime of the machine was > 700 days. We deployed a new container there and started the service. But we could not connect to the public TCP port. The service was correctly initialized because internally the port was reachable. Thus, we expected a problem with docker connect outside→inside.
Solution: The whole server was restarted, but it might have sufficed if we restarted dockerd. But we do not know this for sure. After the restart, the service worked as expected.
Date: 2021-03-29
Host: frontend06-hpc
Component: new_api
Problem: logging filled /mnt/storage to the max
Further Information: Widgets on madeleine.de stopped working!
Solution: stopping new_api (echo “q” > /run/shm/new_api.ctrl), deleting log (rm /mnt/storage/var/log/new_api.log), starting new_api (uwsgi –ini /mnt/storage/var/etc/new_api.ini &)
Date: 2021-04-23, 2021-05-05
Host: dev02
Component: v5_extractor
Problem: the refresh procedure seems to be sometimes in a deadlock which means scheduled updates are not finished. This can be also seen by sorting the feature stores by time (ls -l –sort=time) and no updates within the last 24h. A symptom is that push_attrs.py did not transfer any attributes for a day.
Further Information: It can be seen that the docker logs are mostly warnings and no actual processing steps. Without new features for products, no OSA queries or int scores are possible.
Solution: Stop the service. Check if there are any journal files in /home/picalike/v5/v5_backend/feeds. If so, execute sqlite3 <shop_id>.db vacuum which applies the journal and after it, restart the service. Finally, refresh feed + crawler manually with the curl requests from the cronjob (dev02)
Date: 2021-04-29
Host: dev02
Component: v5_extractor / push_attrs
Problem: push_attrs was executed while v5_extractor worked on features. The database was locked and other components did not handle it correctly.
Solution: A bugfix was deployed, but as a fallback docker restart
Date: 2021-10-03
Host: tegraboards
Component: v4 feature extraction
Problem: some tegraboards are not responding. (sometimes tegraboards are not responding in time to the monitoring skript, in this case no further steps are required)
Solution:
check if gpu_extractor is still working (are the new entries in the log under ~/var/log)
if the gpu_extractor is stalled kill the current instance (ps aux | grep gpu_extractor) and get the command line to restart the service from “history | grep gpu_extractor”
Date: 2021-05-10
Host: tegraboards
Component: v4 feature extraction
Problem: some tegraboards are not working
Solution: (how to swap tegra board usage of feature calculation)
passe /mnt/storage/var/etc/v3/kpo_iterware.conf an
checke kpo log (less /mnt/storage/var/log/kpo_collector.log)
kille alle kpo_collector prozesse (ps aux | grep kpo_collector, kill …)
checke TODO restart um kpo collector wieder zu starten
checke dass das starten funktioniert hat , kann passieren dass der service sich nicht mit dem redis labels export verbinden kann und daher terminiert (dann nochmal starten)
index03: aktiviere den feed import wieder
Date: 2021-05-20
Host: frontends/sg02
Component: look api
Problem: there was no result for a look api query given a product that is included in a look. in the logs one saw the message: “No styles found: no look fulfills the constraints” and, checking further, that the product in style collection ic_7510 was not similar enough to itself (in the product collection).
Solution: we checked the features in the v4 image cloud and found that they are not equal for the product and the product as part of the style (product collection vs 7510 collection); then we found strange behavior in the lilly seeing that different products have been mapped to the same features, indicating that they have the same PHash. We confirmed this checking the phash mongo field of both products. to resolve the issue we than deactivated deduplication via the shop settings in the mongo, inserting <“deduplication” : false> to the settings field (compare uid: 3530 in the user collection).
Date: 2021-05-31, 2021-09-15
Host: frontends
Component: frontend_instance1 container
Problem: heartbeat_monitor.py on sg01 says: LillySyncService:lilly.v22019026221284001.happysrv.de is down. In the logs 'CRITICAL – Connection close detected'
Solution: go to the affected frontend, execute curl “
http://localhost:9000/disable”, wait until query to lilly stop (check log in ~/log/lilly.log), execute docker restart frontend_instance1
-
Date: 2021-06-28, 2022-12-22
Host: redis01
Component: redis (symptom is that results differ between look.php?… vs look.php?…&redis=off)
Problem: redis cache contains old data
Solution: ssh redis01 then use redis-cli -n 2 –scan –pattern '*<apikey>*
' to show all entries. you can pipe this into xargs redis-cli -n 2 del
to actually delete: redis-cli -n 2 –scan –pattern '*<apikey>*' | xargs redis-cli -n 2 del
Date: 2021-07-22
Host: netcup psql01
Component: v5_cat_top_trends, v5_sim [all components that are using psql01]
Problem: possible I/O problems and temporary not reachable which triggered the monitoring and lead to stalled v5 imports
Solution: none, find better hosting provider [corpex ticket #10430]
Date: 2021-08-10
Host: psql02
Component: Host/Net
Problem: was not reachable via ping at 03:30 in the morning. But this might be just a sympton for a bigger issue
Solution: none, it only happened once but never before which is why I created the incident
Date: 2021-08-12
Host: psql02
Component: refresh_trend_data
Problem: the database was in an inconstent state 'primary key violations'
Solution: none, it only happened once but the cause is not known
Date: 2021-08
Host: most netcup servers
Component: v5_image_picker
Problem: the DB connection to cloud01:psql was gone
Solution: restart, but unclear why it happened
Date: 2021-08-20
Host: shop conveor belt(?)
Component: feed import
Problem: All feed shops were stalled and did not finish which is why there was old data in the v5 psql backends. Might be related with the migration of the report engine or not.
Solution: re-import triggered
Date: 2021-08-24
Host: pg02 (v220201062212128885.bestsrv.de)
Component: live postgres host
Problem: From 01:00-06:00 we got a lot of alerts due to high ping times to the host. At this time the daily live import was running and maybe partly responsible for the times. But since a full import is done every day, on Friday also with all jobs, it does not explain why this pattern occurred right now.
Solution: None for now
Date: 2021-08-23 19:00
Host: cloud01
Component: various, like image-cloud
Problem: a lot of timeouts '[2021-08-23 19:02:20,248 ERROR/v5_extractor]: 140645008144128: HTTPConnectionPool(host='cloud01.picalike.corpex-kunden.de', port=5000): Read timed out. (read timeout=5)' probably due to a very high load. Cross reference: a lot of stalled shops were refreshed and might be terminated at this time.
Solution: Fixed itself
Date: 2021-09-14 12:42
Host: multiple netcup server (frontend)
Problem: fatal uwsgi errors related to the health check Command '['uwsgi', '–socket', '/tmp/sim_api.sock', '–nagios']' returned non-zero exit status 2
Solution: None. restarted
Date: 2021-09-28
Host: frontend05-hpc
-
Solution: the updatescript /home/picalike/docker_bin/top_looks_fill_db/update_collections.py
was not running any more as one could see checking /mnt/storage/var/log/top_looks_update.log
. Started it with python3 update_collections.py &> /mnt/storage/var/log/top_looks_update.log &
(compare docker restart script). waited till it finished the updates (took some minutes) and deleted the cache (mongo collection count_cache (!!CAREFUL - DONT DELETE ANOTHER COLLECTION!!), this is optional, otherwise wait one hour)
Date: 2021-09-28 12:42
Host: dev02
Problem: very few products for a feed shop where we expect many products
Solution: a preprocessor didnt run correctly due to RAM issues on sg01. increased the maschines RAM size. found the problem by looking at postgres tables and meta_db_collection using different timestamps.
Traceback (most recent call last):
File "/app/tasks/feed_reader.py", line 42, in read_feed
failed, stats = process_feed(shop_id, feed_object, session, n_items=-1)
File "/app/feed_processor.py", line 129, in process_feed
for item_nr, data in enumerate(feed_object):
File "/app/feed_objects/FeedObject.py", line 53, in __next__
row = self.reader.__next__()
_csv.Error: field larger than field limit (131072) (from docker logs after retrying the shop)
* **Solution**: inform customer about broken feed
Date: 2021-11-16
Host: sg01
Component: pci_style_updater
Problem: in alerts channel: heartbeat_monitor.py on sg01 says: style updater:sg01.picalike.corpex-kunden.de is down
Solution: check if style_updater is still running, by checking the log: “tail style_updater.log” in home directory on sg01 (are there current entries? → if yes, then everything is fine). if the style_updater is really down, the v3 feed imports should start to pile up, because they send every product to the style updater
Date: 2021-11-24, 2021-12-14
Host: v22019026221284000 (netcup)
Problem: reboot after migration / container restart → corpex monitoring failed (connection check port 9000)
Solution: check if openvpn needs to be restarted on the server and do so (ssh as root then (1) curl “
http://dev02.picalike.corpex-kunden.de:8006/health” if not ok (2) kill vpn process and use openvpn –log /root/vpn.log –daemon –config config.ovpn); restart ssh port forward on sandy (compare /home/picalike/missing_deit_features_worker/start_missing_worker.sh); docker restart image picker and frontend on the netcup server
Date: 2021-12-08
Host: dev02 (core problem at image-cloud)
Problem: v5_feed_extractor logs did show that image downloads failed for hm_de_crawler and asos_de_crawler with error code 403 (forbidden). Manually checking the urls on the local machine worked, also on a netcup machine, but did not work on any corpex machine (even changing the user-agent had no effect) → probably ip range block by 'akamai' who is the image host for both hm_de_crawler and asos_de_crawler (and also for other shops)
Solution: image-cloud needs to use a tinyproxy (but only for shops hosted by akamai)
ERROR http://pci01.picalike.corpex-kunden.de:8320 down: HTTPConnectionPool(host='pci01.picalike.corpex-kunden.de', port=8320): Max retries exceeded with url: /health (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f536f306e48>: Failed to establish a new connection: [Errno 111] Connection refused',))
Date: 2022-01-01
Host: a lot
Problem: Juggling with dates. Instead of using deltas it was a mix of missing leading zeros 'W' instead of 'WW' and a combination of current year and last week like '202252'. As a result, views did not contain any data or wrong data.
Solution: use proper PSQL functions instead of hard coding / manual combinations.
Date: 2022-01-13
Host: dev01 / psql
Component: osa sim
Problem: The product 5176524#withfeed_de_feed has no int score, but osa sim calls return candidates. The problem is that int scores are using the new cluster scheme, while the sim calls do not not. And with the clustering some top-k results won't be returned.
Solution: none for the moment. without the new clustering queries are slow, but with them, some nearest neighbors won't be found.
Date: 2022-01-13
Host: Corpex-Frontends
Component: V3
Problem: We get a lot of requests (probably due to a Newsletter from witt). We get a lot of requests from 66.249.81.99 User-Agent: “Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko Firefox/11.0 (via ggpht.com GoogleImageProxy)”
Hint: tail -f /var/log/apache2/access-frontend01.picalike.corpex-kunden.de.log, vimail-stuff is synced from frontend04-hpc to the other frontend[01-03]-hpc
Solution: None
Date: 2022-01-14
Host: dev02
Component: Mongo / Feed Import / Top Trends / psql 02 live
Problem: for at least two runs, no shop data could be read from the MongoDB. It is unclear what happened and where.
Solution: None
Date: 2022-01-19
Host: pci01
Component: PCI01 Server overall, noticed through Feed Import
Problem: the storage was full (Feed Import Alert said “ feed_import_import failed for 'universal_at_feed' session '…': <class 'asyncpg.exceptions.DiskFullError'>”) → problem was the /mnt/storage/ directory which hosts both docker and the upci service. during inserting data into the feed import postgres, the storage went full.
Hints: upci operations:
upci
Solution: docker system prune to immediately get some space, then reached out to corpex for more disk space. but the main problem was that the upci logging files grew to large, probably a problem with the log rotation. Still open: limit the log size in the service and truncate the logs.
Date: 2022-01-19
Host: pci01
Component: Feed Import (export), Mapping Service
Problem: too many calls of the mapping endpoints to insert genders (also possible with brands or categories) lead to a blocking of the mapping service (in this case 300k 'different' genders for one shop). If other shops then get imported they can not be finished as they wait and at some point raise asyncio.exceptions.TimeoutError. the blocking shop at some points raises aiohttp.client_exceptions.ServerDisconnectedError.
Solution: We had to restart the mapping service to work again (~/docker_bin/mapping-service/ice_tea_run.sh). Still open: fixing the problem in the import / mapping service
Date: 2022-02-03
Host: sandy:8042
Component: Image Picker
Problem: in some cases the health check will block → timeout, log alert
Solution: none right now, but the service is still working beside the blocking
python /mnt/storage/var/live/indexer/scripts/update/feedUpdateList.py /mnt/storage/var/etc/v3/feedUpdate.json
<HTML><p></HTML>will return a list where the V4 field no longer decreases, e.g.:<HTML></p></HTML>
c30d54b60ac625b74e2bb0c9ae2662b2 - 376, 0 of 5 packages pending (V4: 3381 urls to go), 01-Feb 07:22, export started: None
the number 3381 will never go down and there are more feeds queued waiting for the previous feed to finish
maybe fs is read-only → reboot as user ubuntu, restart service as user picalike (use command history)
disk full → kill process, rm log, restart service (takes a while))
only start shape extractor or color extractor except on t1of1, use /mnt/storage/var/etc/v3/kpo_iterware.conf to check which service needs to run
services on worker01:
make sure the redis_export is running
restart the kpo_collector (make sure the kpo_collector finds the redis_exporter in the log file… zeroconf)
restart all feed imports that have waiting features (cancel, retrigger update)
Date: 2021-02-24
Host: sandy / netcup
Component: v5 features
Script: image_picker/missing_worker.py
Problem: netcup services responded with 500 since the postgresql connection could not be recovered
Solution: restarted the docker contains
Date: 2022-03-11
Host: pci01
Component: v5 feed import (export) & Shop Conveyor Belt
Problem: (many) shops are hanging in the feed_import_export stage for a long time (which can be seen in the SCB)
Solution: restarted the docker containers via /home/picalike/docker_bin/feed-import/run.sh
. Then, for each shop: (1) in the SCB set busy state to false (2) go to http://pci01.picalike.corpex-kunden.de:1337/docs
and use the /delete_shop_session
endpoint using the session from the SCB and setting delete_stats to true (3) restart the shop from SCB (all stages starting with feed_import_import)
Date: 2022-05-19 (?)
Host: report-db01 → netcup, sandy
Component: postgresql (?)
Problem: all connections to the feature DB were 'lost'. This broke the missing_worker.py on sandy and all the worker (v5_image_picker) nodes.
Solution: restart of all docker contains plus the script
Date: 2022-05-25
Host: v3 frontends
Component: lilly
Problem: '2022-05-25T12:56 UWSGI CRITICAL: could not connect() to workers Connection refused'. it seemed that partly the connection to corpex was unreliable, but the VPN service was started. The containers also had a 'created' timestamp of ~10 min. auto restart by 'mini_health_check.sh' (sg01)
Solution restarted docker
Date: 2022-04-13 (detected: 2022-05-30!)
Host: index04
Component: redis_export
Problem: after an exception 'ResponseError: OOM command not allowed when used memory > 'maxmemory' File “/home/picalike/.local/lib/python2.7/site-packages/collector/service.py”, line 125, in run missing = self.proc.push([item]) no new log entries were written. the process is likely in an error state
Solution: restarted service according to TODO.restart, after killing it
Date: 2022-07-26
Host: report-engine
Component: report-engine
-
Solution: Not sure …
Date: 2022-08-15
Host: sandy
Component: deit features (extractor)
Problem: The feature DB container has been restarted, likely by an sys update and some services did not recover 'psycopg2.errors.AdminShutdown: terminating connection due to administrator command'. An alert is usually triggered with port 2000*.
Solution: Check that the DB is up and running, then restart the missing_worker.py and the more painful task to restart all workers
worker nodes
Date: 2022-09-01
Host: ic*
Component: image cloud
Problem: For the domain image1.lacoste.com we have various errors. some images are blocked (403) but some lead too bad request (400) while some can be normally downloaded (200). and even after the blocking of some images, some still can be downloaded.
Solution: Using shifter if all requests are blocked
Date: 2022-09-05
Host: netcup servers
Component: missing_worker, v5_image_picker (worker)
Problem: the mongo connection was lost and all workers 'crashed': [sandy.picalike.corpex-kunden.de]:
http://localhost:10007: health check failed with code 500
Solution: docker restart v5_image_picker on all front-ends. check sandy:~/image_picker/nohup.out if there are still 500 errors, kill and restart (see TODO_restart)
Date: 2022-09-12
Host: dev01, sandy
Component: v5_image_picker, color_extractor
Problem: color extractor relies on person detection and the service (v5_image_picker) was down and thus health generated 'not ok' because of the 500 response. why the 500 response … DB restart and DB connection was not reconnected.
Solution: None yet. A restart of the image picker solved the problem, but is no solution!
Date: 2022-09-15
Host: sandy
Component: unclear / system
Problem: the ssh port forwarding 1000-10005 were somehow terminated and an alert for 20000-20005 (socat public port) were triggered
Solution: restarted ssh connections ~/image_picker/start_missing_worker.sh
Date: 2022-09-28 (2022-09-30 detected)
Host: index04
Component redis_export
Problem: redis_export.log was too old, the reason was an exception “ResponseError: OOM command not allowed when used memory > 'maxmemory'”
Solution: kill process and restart according to TODO restart
Date: 2022-09-29 (2022-09-30)
Host: sg01 (?)
Component: SCB
Problem: all madeleine shops were hanging at the mapping service. the scheduling seemed broken, since the mapping service worked.
Solution: set busy to false and restarted all services in the shop conveyor belt
Date: 2022-10-11
Host: pci
Component: feed_import_export
Problem: feed_import_export failed for 'zalando_de_crawler' session '4164411c17d548ff8b42e30c9a9bd1bb': <class 'asyncio.exceptions.TimeoutError'>
Solution: check that busy is false and restarted all services from feed import export on (no QA …)
Date: 2022-10-11
Host: pci01
Component: mapping_service
Problem: mapping service [2022-10-11 15:01: INFO/USER] get_category_lookup_v3 - loaded 4846 picalike categories in 815770 ms … 14 mins instead of the usual ~100 msecs.
Solution: None so far. Also the feed import timed-out twice
Date: 2022-10-13
Host: pci01
Component: feed_import
Problem: the mongo insert into the history / metadb about 15 min per batch and speed never increased
Solution: if possible wait for the other import/exports to finish, then restart the container and run swiss_army_knife/v5/restart_feed_import_export.py to restart the pending shops.
Date: 2022-10-13
Host: pci01
Component: feed_import, mapping_service
Problem: the feed import sends single requests for brand/gender/category which overloaded the mapping service. due to time-outs zalando constantly failed to import.
Solution: Both components were modified to allow batching.