Table of Contents
OnSight Analytics Interface
branch01 ist nur für die Zeit der Entwicklung die Heimat, keine Garantie für Uptime und Version
Host: http://branch01.picalike.corpex-kunden.de:9095/
Host: http://branch01.picalike.corpex-kunden.de:9995/ (No timeout requests)(Please just one request per time)
Beispiel: http://branch01.picalike.corpex-kunden.de:9095/hello
git
Im git zu finden als:
$ git clone ssh://picalike@sg01.picalike.corpex-kunden.de/home/picalike/repositories/onsight_analytics.git
Allgemeines
MongoDB-Collections
attributes (alle picalike-Attribute) categories (alle picalike-Kategorien) genders (alle picalike-Genders)
mit-Verwendung
sessions
Apis
http://<host>:<port>/<api>
Solr Search
/update_solr_index (POST/json)
creates a new index in solr for shop_id or updates an existing one
in:
shop_id: <str> session_history: <int>
Data used for update must have a session >= (latest_session - session_history)
out:
{}
/request_solr_index (POST/json)
make an search request to solr
in:
shop_id: <str> or <list of str> query: <str> start_date: yyyy-mm-dd <str> end_date: yyyy-mm-dd <str> start: <int, default:0> rows: <int, default:10> sort: <str, optional>
example for start and row:
- start: 5 means you will skip the first 5 results
- rows: 5 means you will get 5 results
example for query_str:
gender:"Damen" AND name:"Schal" AND price:[* TO 2000]
example for sort:
price desc price asc
known field names for now: pid, price, name, text, brand, gender, images, picalike_gender, picalike_cat, shop_cat
some fields may be implemented in future
out:
{"message": <str>, "response": {"result_list": [<result_list>], "total_match": <int>, "stats":{"price": {"min": <int>, "max": <int>, "mean": <float>, "median": <int>}}}}
/is_synchronized (POST/json)
checks if session number is the same for import and solr
in:
{"shop_ids": [<str>]}
out:
{<shop_id>: <bool>}
Shops
/translate_id (POST)
translate apikey ↔ shop_id for feeds
in:
shop_id: <str> OR apikey: <str>
out:
{"message": <str>, "response": {<str> (apikey or shop_id)}}
/get_shops (GET)
returns all shop_ids (unique) of shops that have been crawled during last week (7*24*60*60 sec)
ohne cache, direkte Anfrage an die mongoDB
in:
{}
in_opt:
start_date=<str> (yyyy-mm-dd) with_feed=<bool>
out:
{"message": <str>, "response": {"shop_ids": [<str>], "infos":{ <shop_id>: {"segment": <str>, "shop_name": <str>, "type": <str>}}}}
/get_shop_crawler_mapping (POST/json)
returns all crawler_shop_ids which are mapped to the shop_id
in:
shop_id: <str>
out:
{"message: <str>, "response": {"shop_ids": [<str>], "infos":{ <shop_id>: {"segment": <str>, "shop_name": <str>, "type": <str>}}}}
/remove_shop_crawler_mapping (POST/json)
remove all crawler_shop_ids which are mapped to the shop_id
in:
shop_id: <str>
out:
{"message: <str>, "response": {}}
/add_shop_crawler_mapping (POST/json)
update mapping of crawler_shop_ids to shop_id (will overwrite previous entries)
writes to shop_crawler_mapping in mongoDB
in:
shop_id: <str> crawler_shops: [<str>]
out:
{"message: <str>, "response": {}}
/add_inspiration_shops (POST/json)
update mapping of inspiration_shop_ids to shop_id (will overwrite previous entries)
writes to shop_crawler_mapping in mongoDB
in:
shop_id: <str> inspiration_shops: [<str>] # list of shop_ids
out:
{"message: <str>, "response": {}}
/get_inspiration_shops (POST/json)
returns all inspiration_shop_ids which are mapped to the shop_id
in:
shop_id: <str>
out:
{"message: <str>, "response": {"shop_ids": [<str>], "infos":{ <shop_id>: {"segment": <str>, "shop_name": <str>, "type": <str>}}}}
/get_shop_stats (POST/json)
returns stats for a shop
in:
shop_ids: [<str>] include_unmapped: bool, max_age: <int, days, default: 30>
out:
{"message: <str>, "response": { "total": <int> # total number of mapped products "by_cat": <dict> # picalike_cat -> (absolute count, percent) # only if include_unmapped == True "unmapped_count": <int> "unmapped": <dict> # <str, shop_cat> -> <int, absolute count> }}
Products
/get_product_history (POST/json)
returns history for given products
in:
picalike_ids: [<str>]
out:
{"message": <str>, "response": { <picalike_id_1>: [{ "timestamp": <float>, "price": <int>, "sort_key": <int>, "position": [{"cat": [<str>], "pos": <float>}, ...]}, ...], <picalike_id_2>: [], ...}
for each picalike_id, the array is sorted by “timestamp” price is in cents (int)
Diverse
/get_attributes (GET)
returns all picalike attributes
ohne cache, direkte Anfrage an die mongoDB
in:
{}
out:
{"message": <str>, "response": {"attributes": [<str>]}}
Categories
/get_categories (GET)
returns all picalike categories
ohne cache, direkte Anfrage an die mongoDB
in:
{}
out:
{"message": <str>, "response": {"categories": [<str>]}}
/get_shop_categories (GET)
return all categorie pathes from shop from last session
erstmalige Anfrage mongoDB, danach cache, update des caches mit Parameter use_cache=false
in:
shop_id=<str>
in_opt:
use_cache=false (default is true) session_history=<int, default=1>
Shown data must haven an session >= (latest_session - session_history)
out:
{"message": <str>, "response": {"categories": [ list(<str>) ]}}
list(<str>) liefert die Baumstruktur der Kategorien des Shops, z.B. [ “Damen”, “Hosen”, “Jeans” ]
/add_category_relation (POST/json)
add a shop_category → picalike_category relation
in:
shop_id: <str> p_cat: <picalike category str> shop_cat: <shop category object>
siehe auch /get_categories und /get_shop_categories
out:
/remove_category_relation (POST/json)
remove a shop_category → picalike_category relation
in:
shop_id: <str> shop_cat: <shop category object>
out:
/get_category_relation (GET)
get all relations for shop_id
in:
shop_id=<str>
out:
{"message": <str>, "response": [ tupel(<picalike category str>, <shop category object>) ]}
Das Tupel wird in json als eine Liste mit zwei Elementen abgelegt.
/get_top_categories (GET)
get top categories for all shops in shop_id together
in:
shop_id=[<str>]
in_opt:
top_n=<int>
out:
{"message": <str>, "response": {"n_products": <int>, "top_cats": [ tuple(<picalike category str>, <number of products int>) ]}}
Categories are ordered from most to least frequent.
/get_categorization (POST)
get picalike category suggestions for origin shop category
in:
shop_id:<str>
in_opt:
shop_cat:[<str>, <str>,...] # optional: default looks for all existing shop categories
out:
{"message": <str>, "response": [{"shop_cat": "Damen_Hosen_Lange Hosen", "picalike_cat": "fashion_leg_pants", "rel_freq": 1.0}]}
http:%%//%%branch01.picalike.corpex-kunden.de:5003/prepare_categorization (GET)
prepares suggestions of picalike categories for origin shop categories based on the relative frequencies of the picalike categories in an origin shop category. Saves results in MongoDB:
Sandy Picalike –> patrick_test –> categorization
Brands
/get_shop_brands (POST)
return all brands from shop from last session
in:
shop_id:<str>
out:
{"message": <str>, "response": {"brands": [ <str> ], "count":{<brand>: <int>}}}
Genders
/get_genders (GET)
returns all picalike genders
ohne cache, direkte Anfrage an die mongoDB
in:
out:
{"message": <str>, "response": {"genders": [<str>]}}
/get_shop_genders (GET)
return all genders from shop from last session
erstmalige Anfrage mongoDB, danach cache, update des caches mit Parameter use_cache=false
in:
shop_id=<str>
in_opt:
use_cache=false (default is true)
out:
{"message": <str>, "response": {"genders": [ <str> ]}}
/add_gender_relation (POST/json)
add a shop_genders → picalike_genders relation
in:
shop_id: <str> p_gender: <picalike genders str> shop_gender: <shop genders str>
out:
/remove_gender_relation (POST/json)
remove a shop_genders → picalike_genders relation
in:
shop_id: <str> shop_gender: <shop genders str>
out:
/get_gender_relation (GET)
get all relations for shop_id
in:
shop_id=<str>
out:
{"message": <str>, "response": [ tupel(<picalike genders str>, <shop genders str>) ]}
Das Tupel wird in json als eine Liste mit zwei Elementen abgelegt.
Reports
/existing_report (GET)
checks if report name exists
in:
report_name=<str> shop_id=<str>
out:
{"message": <str>, "response": {"exists": <bool>}}
/list_reports (GET)
return all reports for given product and shop
in:
shop_id=<str> product_id=<str>
out:
{"message": <str>, "response": [{"report_id": <str>, "user_id": <str>, "report_name": <str>}]}
/get_all_reports (POST)
return all reports that match the filters (only manually created reports)
in:
ALL OPTIONAL shop_id: <str> user_id: <str> product_id: <str> date_from: <str, "%m/%d/%Y"> # e.g.: "01/31/2019" date_till: <str, "%m/%d/%Y">
out:
{"message": <str>, "response": [{"report_id": <str>, "user_id": <str>, "report_name": <str>, ...}]}
/add_report (POST/json)
add a new report or edit if report ID is provided
in:
shop_id: <str> product_id: <str> user_id: <str> report_name: <str> filter: { see below }
in opt:
report_id: <str>
out:
{"message": <str>, "response": {"report_id": <str>}
filter explanation:
cluster_dist: <float> - similarity cluster_price_from <str> - e.g. "10,00" cluster_price_till <str> - e.g. "29,99" cluster_trendsetters <list of base64 encoded shop_ids> cluster_competitors <list of base64 encoded shop_ids> cluster_brands <list of brand names> cluster_genders <list of picalike gender names> cluster_date_range <str> - e.g. "01/13/2019 - 01/17/2019" cluster_categories <list of picalike category names>
cluster_genders can contain the value “all” which needs special treatment
/remove_report (POST/json)
remove a report
in:
report_id: <str>
out:
{"message": <str>}
/get_report (GET)
get a report
in:
report_id=<str>
out:
{"message": <str>, "response": {"report_id": <str>, "report_name": <str>, "product_id": <str>, "shop_id": <str>, "user_id": <str>, "filter": {}, "date": <date>}}
/exclude_product (POST)
remove product from results
in:
{"report_id": <str>, "picalike_id": <str>, "user_id": <str>}
/unexclude_product (POST)
remove product from exclude list
in:
{"report_id": <str>, "picalike_id": <str>}
/get_excluded (GET)
get list of excluded products
in:
report_id=<str>
out:
{"message": <str>, "response": {"excluded": [(<str: picalike_id>, <str: user_id>)]}}
Cluster
/comp_cluster (POST/json)
creates all the data that is needed by comp_cluster.php. It is also used to create automatic reports
in:
shop_id: <str> prod_id: <str> or image: <base64 encoded image bytes> optional: report_id: <str>, if given, the report is stored immediately price: <int, price in cents> # only needed if image is given start: <int, default: 0> rows: <int, default: 10> limit: <int, default: 500> -- used for nearest neighbor search filter fields with the same name and in the same format as in /add_report
out:
shop_id: <str> prod_id: <str> # None if image was supplied reco_data: <dict of data that is given to the reco algorithm> currently: { min_price, max_price, mean_price, ref_price } reco: { timestamp, error, uncertain, huge_markup, price_range, margin_range } cluster_trend: <int> excluded: <list of tuples, (picalike_ids, user_id)> total_match: <int> - total number of solr matches cluster: <list of cluster products + distance, prod_trend and prod_history>
if report_id is supplied, the report will be saved on the fly
reco_data is a dictionary with the required data to calculate the recommendation. Currently the fields min_price, max_price, mean_price, ref_price are needed.
reco contains at least timestamp and error. error can be either a boolean or a string. uncertain is a boolean that is set to True if the algorithm does not want to give a recommendation. huge_markup is a boolean that is set to True if the price of the reference article is significantly below the mean price in the cluster. price_range returns a list of prices [low, high]. margin_range is the relative price increase in percent [low, high]
cluster_trend indicates whether the products in the cluster of similiar products from trendsetter shops have recently been more (close to 1) or less (near -1) popular. A value of 0 indicates that there was no change in popularity in recently. The value 1 indicates a fast growing popularity, the value 0 predicts a negativ trend.
cluster contains a list of similar products. The documents are the solr responses with the additional fields: distance, prod_trend and prod_history. The reference product is always at the beginning of the list. Data generated from similiar products from shop competitors.
/get_cluster_report (POST/json)
returns the most recent results of the batch processing of the cluster reports
in:
report_id: <str> - can be a report_id or a picalike_id
out:
data: <dict> - a merge of 'reco_data' and 'reco' from /comp_cluster
Trends
http:%%//%%branch01.picalike.corpex-kunden.de:5002/prepare_trends (GET)
ONLY USE DAILY - PROCESS TAKES MORE THAN 30 MIN Receives a get requests and downloads the product infos from solr and from the API “http://branch01.picalike.corpex-kunden.de:9095/get_product_history” to find trendy products. It then forecast the future trend with double double_exponential_smoothing and saves its results in the MongoDB
/sort_trend_products (POST/json)
returns the most trendy products from the recent trend analysis from the MongoDB only looks into products from shops, that are mapped for the specific shop
in:
shop_id: <str> # only feed shops accepted optional: limit:<int> # default: 10 category: [<str>] # default: [] gender: [<str>] # default: ["Men", "Women", "undefined"] --> some products do not have gender info selected_shops:[<str>] # default: [] --> looks for all shops
out:
{"message": <str>, "response": [{"product_id":<str>, "category":<str>,"datetime":<datetime>,"gender":<str>,"pos_mov":<float>,"shop_id":<str>}]
/sort_trend_brands (POST/json)
returns the most trendy brands from the recent trend analysis from the MongoDB only looks into products from shops, that are mapped for the specific shop
in:
shop_id: <str> # only feed shops accepted optional: limit:<int> # default: 10 category: [<str>] # default: [] gender: [<str>] # default: ["Men", "Women", "undefined"] --> some products do not have gender info selected_shops:[<str>] # default: [] --> looks for all shops
out:
{"message": <str>, "response": [{"brand":<str>, "category":[<str>],"datetime":<datetime>,"gender":[<str>],"pos_mov":<float>,"shop_id":[<str>]}]
/sort_trend_category (POST/json)
returns the most trendy categories from the recent trend analysis from the MongoDB only looks into products from shops, that are mapped for the specific shop
in:
shop_id: <str> # only feed shops accepted optional: limit:<int> # default: 10 gender: [<str>] # default: ["Men", "Women", "undefined"] --> some products do not have gender info selected_shops:[<str>] # default: [] --> looks for all shops
out:
{"message": <str>, "response": [{"category":<str>, "datetime":<datetime>","gender":[<str>],"pos_mov":<float>,"shop_id":[<str>]}]
/sort_trend_colors (POST/json)
returns the most trendy colors from the recent trend analysis from the MongoDB only looks into products from shops, that are mapped for the specific shop
in:
shop_id: <str> # only feed shops accepted optional: limit:<int> # default: 10 category: [<str>] # default: [] gender: [<str>] # default: ["Men", "Women", "undefined"] --> some products do not have gender info selected_shops:[<str>] # default: [] --> looks for all shops
out:
{"message": <str>, "response": [{"color":<str>, "category":[<str>],"datetime":<datetime>,"gender":[<str>],"pos_mov":<float>,"shop_id":[<str>]}]
/sort_trend_attributes (POST/json)
returns the most trendy attributes from the recent trend analysis from the MongoDB only looks into products from shops, that are mapped for the specific shop
in:
shop_id: <str> # only feed shops accepted optional: limit:<int> # default: 10 category: [<str>] # default: [] gender: [<str>] # default: ["Men", "Women", "undefined"] --> some products do not have gender info selected_shops:[<str>] # default: [] --> looks for all shops
out:
{"message": <str>, "response": [{"attribute":<str>, "category":[<str>],"datetime":<datetime>,"gender":[<str>],"pos_mov":<float>,"shop_id":[<str>]}]
/get_sim_trend_products (POST/json)
looks for products in feed shop that are similiar to the given trend product API is currently used by /sort_sim_trend_products
in:
{"shop_id_feed":<str>, "shop_id_crawler":<str>, "product_id":<int>, "distance_max": <float> # optional: limits similiarity distance, default is 0.5 }
out:
{"message":"OK","response":{ "product_ids:["96607881#bonprix_de_feed, "91524296#bonprix_de_feed", "90838081#bonprix_de_feed", "95056195#bonprix_de_feed", "95020895#bonprix_de_feed"] } }
http:%%//%%frontend01-hpc.picalike.corpex-kunden.de:5003/get_cat_trends (GET) Port 5003
Returns the most trendy products from a category of one shop. Only returns available products. Products sorted in descending order by cluster_trend.
in:
shop_id: <str> prod_id: <int> # Becomes optional, if "i" as prod_ref is given as argument optional: limit:<int> # default: 10 brand: <str> or [<str>, <str>,...] # default: all shop_cat: <str> or [<str>, <str>,...] # default: all gender: <str> or [<str>, <str>,...] # default: all size: <int> or [<int>, <int>,...] # CURRENTLY NOT POSSIBLE TO FILTER. default: all max_price: <int> # default: no max min_price: <int> # default: no min category_limitation: True/False # default: True i: <picalike_id> # if "i" is set, gender and category are same as given picalike_id. Default: None
out:
{"count": 10, "description": "Category Trends", "generator": "http://picalike.com", "modified": "2019-04-29 08:24:04.480205", "title": "picalike Request", "ids": {"0": {"brand": "NO", "cluster_trend": 0.0, "extraimg": "https://www.witt.eu/product/resized/029/029.00K3F.072-127.002.u_5.jpg", "gender": "Damenmode", "shop_cat":"Frauen_Bekleidung_Hosen_Sweatpants", "id": "184097", "img": "https://www.witt.eu/product/resized/027/027.00KAT.022-123.013.i_5.jpg", "location": "https://www.witt-weiden.de/287202?articleNumber=184097", "name": "Hose", "price": 1599}, "1":{...}} }
Krawla
http:%%//%%branch01.picalike.corpex-kunden.de:9095/get_krawla_sum (GET)
in:
shop_id: <str> # only feed shops accepted
out:
{"message": <str>, "response": {"w1": {"total_all" : <int>, "total_new_prod" : <int>, "total_sale" : <int> }, "w2": {"total_all" : <int>, "total_new_prod" : <int>, "total_sale" : <int> }, "w3": {"total_all" : <int>, "total_new_prod" : <int>, "total_sale" : <int> }, "w4": {"total_all" : <int>, "total_new_prod" : <int>, "total_sale" : <int> }, "w_all" : {"total_all" : <int>, "total_new_prod" : <int>, "total_sale" : <int> } } }