====== Interest Scores V1 ====== The interest scores are stored in the psql2 live database and each week uploaded to the reporting DB. The procedure is described in the v5_sim_api git and for now is done with two scripts:
/home/picalike/v5/v5_extractor/sim_hash
and is triggered by the ''%%/home/picalike/v5/scripts/refresh_daily_prelive.sh%%'' script, since the step needs a finished import both for live and prelive. The script determine all 'unhashed' products and perform the cluster step for them. The result goes into the relation ''%%sim_clusters%%''.
[1] https://git.picalike.corpex-kunden.de/incubator/v5-extractor/-/blob/master/sim_hash/scripts/simi_hash.py
===== Calculation =====
The new method uses the relation 'simshot', not 'oneshot' and is now a materialized view that is refreshed after the import is done, also in the refresh_daily_prelive.sh script.
To perform a full int score calculation for all feeds with competitors, just call:
python3 scripts/db_intscores.py --db-uri postgresql://docker:live_sfHjZ0i6GYKc2hIh@v220201062212128885.bestsrv.de:5401/products --full
The actual procedure is very similar to the old procedure and is very often 100% backward compatible. The only used filters are:
* picalike_category
* picalike_gender
* updated_weeks is set to the previous week
* similarity threshold of 0.85 (of 512 bits)
The results will be stored in the ''%%new_intscores%%'' that is 100% compatible with the old table ''%%interest_score%%''.
====== Database ======
The table to store the values is interest_score.
====== Backup ======
Just in case, a simple dump_script is available at psql02 to export the scores from the database as a sequence of COPY commands.
python3 scripts/dump_intscores.py > backup/intscores_16dec2020.sql
====== Cronjob ======
dev02 → contrab -l
The interest scores a calculated daily after the live import. Usually this yields only a small number of new interest scores as most of them are calculated on monday in the weekly refresh script (new calendar week means new timespan for interest scores, so for all products new interest scores are calculated here)
scripts:
/home/picalike/v5/scripts/refresh_weekly.sh
/home/picalike/v5/scripts/refresh_daily_live.sh
The upload script checks for new int scores and every x hours and uploads on demand.
/home/picalike/v5/scripts/upload_int_scores.sh
====== Possible Issues ======
If the upload of int scores is cancelled like in the case below, nothing has to be done, since the next upload will automatically import the missing scores.
Traceback (most recent call last):
File "scripts/push_intscores.py", line 169, in
main()
File "scripts/push_intscores.py", line 155, in main
num_inserted = incremental_transfer(args.source_uri, target_uri, args.report_date, verbose=args.verbose)
File "scripts/push_intscores.py", line 65, in incremental_transfer
target_cur.executemany("INSERT INTO interest_scores_import (picalike_id,report_date,interest_score) VALUES(%s, %s, %s) ON CONFLICT DO NOTHING", inserts)
psycopg2.errors.AdminShutdown: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
However, for the calculation of interest scores, the next call is at the next day, so manually starting it again should be thought of.
keywords: v5 osa interest scores psql02