OnSight Analytics Interface

branch01 ist nur für die Zeit der Entwicklung die Heimat, keine Garantie für Uptime und Version

Host: http://branch01.picalike.corpex-kunden.de:9095/

Host: http://branch01.picalike.corpex-kunden.de:9995/ (No timeout requests)(Please just one request per time)

Beispiel: http://branch01.picalike.corpex-kunden.de:9095/hello

git

Im git zu finden als:

$ git clone ssh://picalike@sg01.picalike.corpex-kunden.de/home/picalike/repositories/onsight_analytics.git

Allgemeines

MongoDB-Collections

attributes (alle picalike-Attribute)
categories (alle picalike-Kategorien)
genders (alle picalike-Genders)

mit-Verwendung

sessions

Apis

http://<host>:<port>/<api>

Solr Search

/update_solr_index (POST/json)

creates a new index in solr for shop_id or updates an existing one

in:

shop_id: <str>
session_history: <int>

Data used for update must have a session >= (latest_session - session_history)

out:

{}

/request_solr_index (POST/json)

make an search request to solr

in:

shop_id: <str> or <list of str>
query: <str>
start_date: yyyy-mm-dd <str>
end_date: yyyy-mm-dd <str>
start: <int, default:0> 
rows: <int, default:10>
sort: <str, optional>

example for start and row:

start: 5 means you will skip the first 5 results
rows: 5 means you will get 5 results

example for query_str:

gender:"Damen" AND name:"Schal" AND price:[* TO 2000]

example for sort:

price desc
price asc

known field names for now: pid, price, name, text, brand, gender, images, picalike_gender, picalike_cat, shop_cat some fields may be implemented in future

out:

{"message": <str>, "response": {"result_list": [<result_list>], "total_match": <int>, "stats":{"price": {"min": <int>, "max": <int>, "mean": <float>, "median": <int>}}}}

/is_synchronized (POST/json)

checks if session number is the same for import and solr

in:

{"shop_ids": [<str>]}

out:

{<shop_id>: <bool>}

Shops

/translate_id (POST)

translate apikey ↔ shop_id for feeds

in:

shop_id: <str>
OR
apikey: <str>

out:

{"message": <str>, "response": {<str> (apikey or shop_id)}}

/get_shops (GET)

returns all shop_ids (unique) of shops that have been crawled during last week (7*24*60*60 sec)

ohne cache, direkte Anfrage an die mongoDB

in:

{}

in_opt:

start_date=<str> (yyyy-mm-dd)
with_feed=<bool>

out:

{"message": <str>, "response": {"shop_ids": [<str>], "infos":{ <shop_id>: {"segment": <str>, "shop_name": <str>, "type": <str>}}}}

/get_shop_crawler_mapping (POST/json)

returns all crawler_shop_ids which are mapped to the shop_id

in:

shop_id: <str>

out:

{"message: <str>, "response": {"shop_ids": [<str>], "infos":{ <shop_id>: {"segment": <str>, "shop_name": <str>, "type": <str>}}}}

/remove_shop_crawler_mapping (POST/json)

remove all crawler_shop_ids which are mapped to the shop_id

in:

shop_id: <str>

out:

{"message: <str>, "response": {}}

/add_shop_crawler_mapping (POST/json)

update mapping of crawler_shop_ids to shop_id (will overwrite previous entries)

writes to shop_crawler_mapping in mongoDB

in:

shop_id: <str>
crawler_shops: [<str>]

out:

{"message: <str>, "response": {}}

/add_inspiration_shops (POST/json)

update mapping of inspiration_shop_ids to shop_id (will overwrite previous entries)

writes to shop_crawler_mapping in mongoDB

in:

shop_id: <str>
inspiration_shops: [<str>]  # list of shop_ids

out:

{"message: <str>, "response": {}}

/get_inspiration_shops (POST/json)

returns all inspiration_shop_ids which are mapped to the shop_id

in:

shop_id: <str>

out:

{"message: <str>, "response": {"shop_ids": [<str>], "infos":{ <shop_id>: {"segment": <str>, "shop_name": <str>, "type": <str>}}}}

/get_shop_stats (POST/json)

returns stats for a shop

in:

shop_ids: [<str>]
include_unmapped: bool,
max_age: <int, days, default: 30>

out:

{"message: <str>, "response": {
  "total": <int>  # total number of mapped products
  "by_cat": <dict>  # picalike_cat -> (absolute count, percent)
  # only if include_unmapped == True
  "unmapped_count": <int>
  "unmapped": <dict>  # <str, shop_cat> -> <int, absolute count>
}}

Products

/get_product_history (POST/json)

returns history for given products

in:

picalike_ids: [<str>]

out:

{"message": <str>, "response": {
  <picalike_id_1>: [{
    "timestamp": <float>, 
    "price": <int>, 
    "sort_key": <int>, 
    "position": [{"cat": [<str>], "pos": <float>}, ...]}, ...], 
  <picalike_id_2>: [], ...}

for each picalike_id, the array is sorted by “timestamp” price is in cents (int)

Diverse

/get_attributes (GET)

returns all picalike attributes

ohne cache, direkte Anfrage an die mongoDB

in:

{}

out:

{"message": <str>, "response": {"attributes": [<str>]}}

Brands

/get_shop_brands (POST)

return all brands from shop from last session

in:

shop_id:<str>

out:

{"message": <str>, "response": {"brands": [ <str> ], "count":{<brand>: <int>}}}

Genders

/get_genders (GET)

returns all picalike genders

ohne cache, direkte Anfrage an die mongoDB

in:

out:

{"message": <str>, "response": {"genders": [<str>]}}

/get_shop_genders (GET)

return all genders from shop from last session

erstmalige Anfrage mongoDB, danach cache, update des caches mit Parameter use_cache=false

in:

shop_id=<str>

in_opt:

use_cache=false (default is true)

out:

{"message": <str>, "response": {"genders": [ <str> ]}}

/add_gender_relation (POST/json)

add a shop_genders → picalike_genders relation

in:

shop_id: <str>
p_gender: <picalike genders str>
shop_gender: <shop genders str>

out:

/remove_gender_relation (POST/json)

remove a shop_genders → picalike_genders relation

in:

shop_id: <str>
shop_gender: <shop genders str>

out:

/get_gender_relation (GET)

get all relations for shop_id

in:

shop_id=<str>

out:

{"message": <str>, "response": [ tupel(<picalike genders str>, <shop genders str>) ]}

Das Tupel wird in json als eine Liste mit zwei Elementen abgelegt.

Reports

/existing_report (GET)

checks if report name exists

in:

report_name=<str>
shop_id=<str>

out:

{"message": <str>, "response": {"exists": <bool>}}

/list_reports (GET)

return all reports for given product and shop

in:

shop_id=<str>
product_id=<str>

out:

{"message": <str>, "response": [{"report_id": <str>, "user_id": <str>, "report_name": <str>}]}

/get_all_reports (POST)

return all reports that match the filters (only manually created reports)

in:

ALL OPTIONAL
shop_id: <str>
user_id: <str>
product_id: <str>
date_from: <str, "%m/%d/%Y">  # e.g.: "01/31/2019"
date_till: <str, "%m/%d/%Y">

out:

{"message": <str>, "response": [{"report_id": <str>, "user_id": <str>, "report_name": <str>, ...}]}

/add_report (POST/json)

add a new report or edit if report ID is provided

in:

shop_id: <str>
product_id: <str>
user_id: <str>
report_name: <str>
filter: { see below }

in opt:

report_id: <str>

out:

{"message": <str>, "response": {"report_id": <str>}

filter explanation:

cluster_dist: <float> - similarity
cluster_price_from <str> - e.g. "10,00"
cluster_price_till <str> - e.g. "29,99"
cluster_trendsetters <list of base64 encoded shop_ids>
cluster_competitors <list of base64 encoded shop_ids>
cluster_brands <list of brand names>
cluster_genders <list of picalike gender names>
cluster_date_range <str> - e.g. "01/13/2019 - 01/17/2019"
cluster_categories <list of picalike category names>

cluster_genders can contain the value “all” which needs special treatment

/remove_report (POST/json)

remove a report

in:

report_id: <str>

out:

{"message": <str>}

/get_report (GET)

get a report

in:

report_id=<str>

out:

{"message": <str>, "response": {"report_id": <str>, "report_name": <str>, "product_id": <str>, "shop_id": <str>, "user_id": <str>, "filter": {}, "date": <date>}}

/exclude_product (POST)

remove product from results

in:

{"report_id": <str>, "picalike_id": <str>, "user_id": <str>}

/unexclude_product (POST)

remove product from exclude list

in:

{"report_id": <str>, "picalike_id": <str>}

/get_excluded (GET)

get list of excluded products

in:

report_id=<str>

out:

{"message": <str>, "response": {"excluded": [(<str: picalike_id>, <str: user_id>)]}}

Cluster

/comp_cluster (POST/json)

creates all the data that is needed by comp_cluster.php. It is also used to create automatic reports

in:

shop_id: <str>
prod_id: <str> or image: <base64 encoded image bytes>
optional:
report_id: <str>, if given, the report is stored immediately
price: <int, price in cents>  # only needed if image is given
start: <int, default: 0>
rows: <int, default: 10>
limit: <int, default: 500>  -- used for nearest neighbor search
filter fields with the same name and in the same format as in /add_report

out:

shop_id: <str>
prod_id: <str>  # None if image was supplied
reco_data: <dict of data that is given to the reco algorithm> currently: { min_price, max_price, mean_price, ref_price }
reco: { timestamp, error, uncertain, huge_markup, price_range, margin_range }
cluster_trend: <int>
excluded: <list of tuples, (picalike_ids, user_id)>
total_match: <int> - total number of solr matches
cluster: <list of cluster products + distance, prod_trend and prod_history>

if report_id is supplied, the report will be saved on the fly

reco_data is a dictionary with the required data to calculate the recommendation. Currently the fields min_price, max_price, mean_price, ref_price are needed.

reco contains at least timestamp and error. error can be either a boolean or a string. uncertain is a boolean that is set to True if the algorithm does not want to give a recommendation. huge_markup is a boolean that is set to True if the price of the reference article is significantly below the mean price in the cluster. price_range returns a list of prices [low, high]. margin_range is the relative price increase in percent [low, high]

cluster_trend indicates whether the products in the cluster of similiar products from trendsetter shops have recently been more (close to 1) or less (near -1) popular. A value of 0 indicates that there was no change in popularity in recently. The value 1 indicates a fast growing popularity, the value 0 predicts a negativ trend.

cluster contains a list of similar products. The documents are the solr responses with the additional fields: distance, prod_trend and prod_history. The reference product is always at the beginning of the list. Data generated from similiar products from shop competitors.

/get_cluster_report (POST/json)

returns the most recent results of the batch processing of the cluster reports

in:

report_id: <str>  - can be a report_id or a picalike_id

out:

data: <dict> - a merge of 'reco_data' and 'reco' from /comp_cluster

Trends

http:%%//%%branch01.picalike.corpex-kunden.de:5002/prepare_trends (GET)

ONLY USE DAILY - PROCESS TAKES MORE THAN 30 MIN Receives a get requests and downloads the product infos from solr and from the API “http://branch01.picalike.corpex-kunden.de:9095/get_product_history” to find trendy products. It then forecast the future trend with double double_exponential_smoothing and saves its results in the MongoDB

/sort_trend_products (POST/json)

returns the most trendy products from the recent trend analysis from the MongoDB only looks into products from shops, that are mapped for the specific shop

in:

shop_id: <str> # only feed shops accepted
optional:
limit:<int> # default: 10
category: [<str>] # default: []
gender: [<str>] # default: ["Men", "Women", "undefined"] --> some products do not have gender info
selected_shops:[<str>] # default: [] --> looks for all shops

out:

{"message": <str>, "response": [{"product_id":<str>, "category":<str>,"datetime":<datetime>,"gender":<str>,"pos_mov":<float>,"shop_id":<str>}]

/sort_trend_brands (POST/json)

returns the most trendy brands from the recent trend analysis from the MongoDB only looks into products from shops, that are mapped for the specific shop

in:

shop_id: <str> # only feed shops accepted
optional:
limit:<int> # default: 10
category: [<str>] # default: []
gender: [<str>] # default: ["Men", "Women", "undefined"] --> some products do not have gender info
selected_shops:[<str>] # default: [] --> looks for all shops

out:

{"message": <str>, "response": [{"brand":<str>, "category":[<str>],"datetime":<datetime>,"gender":[<str>],"pos_mov":<float>,"shop_id":[<str>]}]

/sort_trend_category (POST/json)

returns the most trendy categories from the recent trend analysis from the MongoDB only looks into products from shops, that are mapped for the specific shop

in:

shop_id: <str> # only feed shops accepted
optional:
limit:<int> # default: 10
gender: [<str>] # default: ["Men", "Women", "undefined"] --> some products do not have gender info
selected_shops:[<str>] # default: [] --> looks for all shops

out:

{"message": <str>, "response": [{"category":<str>, "datetime":<datetime>","gender":[<str>],"pos_mov":<float>,"shop_id":[<str>]}]

/sort_trend_colors (POST/json)

returns the most trendy colors from the recent trend analysis from the MongoDB only looks into products from shops, that are mapped for the specific shop

in:

shop_id: <str> # only feed shops accepted
optional:
limit:<int> # default: 10
category: [<str>] # default: []
gender: [<str>] # default: ["Men", "Women", "undefined"] --> some products do not have gender info
selected_shops:[<str>] # default: [] --> looks for all shops

out:

{"message": <str>, "response": [{"color":<str>, "category":[<str>],"datetime":<datetime>,"gender":[<str>],"pos_mov":<float>,"shop_id":[<str>]}]

/sort_trend_attributes (POST/json)

returns the most trendy attributes from the recent trend analysis from the MongoDB only looks into products from shops, that are mapped for the specific shop

in:

shop_id: <str> # only feed shops accepted
optional:
limit:<int> # default: 10
category: [<str>] # default: []
gender: [<str>] # default: ["Men", "Women", "undefined"] --> some products do not have gender info
selected_shops:[<str>] # default: [] --> looks for all shops

out:

{"message": <str>, "response": [{"attribute":<str>, "category":[<str>],"datetime":<datetime>,"gender":[<str>],"pos_mov":<float>,"shop_id":[<str>]}]

/get_sim_trend_products (POST/json)

looks for products in feed shop that are similiar to the given trend product API is currently used by /sort_sim_trend_products

in:

{"shop_id_feed":<str>,
"shop_id_crawler":<str>,
"product_id":<int>,
"distance_max": <float> # optional: limits similiarity distance, default is 0.5
}

out:

{"message":"OK","response":{
            "product_ids:["96607881#bonprix_de_feed,
                          "91524296#bonprix_de_feed",
                          "90838081#bonprix_de_feed",
                          "95056195#bonprix_de_feed",
                          "95020895#bonprix_de_feed"]
            }
 }

http:%%//%%frontend01-hpc.picalike.corpex-kunden.de:5003/get_cat_trends (GET) Port 5003

Returns the most trendy products from a category of one shop. Only returns available products. Products sorted in descending order by cluster_trend.

in:

shop_id: <str>
prod_id: <int> # Becomes optional, if "i" as prod_ref is given as argument
optional:
limit:<int> # default: 10
brand: <str> or [<str>, <str>,...] # default: all
shop_cat: <str> or [<str>, <str>,...] # default: all
gender: <str> or [<str>, <str>,...] # default: all
size: <int> or [<int>, <int>,...] # CURRENTLY NOT POSSIBLE TO FILTER. default: all
max_price: <int> # default: no max
min_price: <int> # default: no min 
category_limitation: True/False # default: True
i: <picalike_id> # if "i" is set, gender and category are same as given picalike_id. Default: None

out:

{"count": 10,
"description": "Category Trends",
"generator": "http://picalike.com",
"modified": "2019-04-29 08:24:04.480205",
"title": "picalike Request",
"ids": {"0": {"brand": "NO", "cluster_trend": 0.0, "extraimg": "https://www.witt.eu/product/resized/029/029.00K3F.072-127.002.u_5.jpg", "gender": "Damenmode", "shop_cat":"Frauen_Bekleidung_Hosen_Sweatpants", "id": "184097", "img": "https://www.witt.eu/product/resized/027/027.00KAT.022-123.013.i_5.jpg", "location": "https://www.witt-weiden.de/287202?articleNumber=184097", "name": "Hose", "price": 1599},
"1":{...}}
}

Krawla

http:%%//%%branch01.picalike.corpex-kunden.de:9095/get_krawla_sum (GET)

in:

shop_id: <str> # only feed shops accepted

out:

{"message": <str>, 
"response": {"w1": {"total_all" : <int>, "total_new_prod" : <int>, "total_sale" : <int> },
"w2": {"total_all" : <int>, "total_new_prod" : <int>, "total_sale" : <int> },
"w3": {"total_all" : <int>, "total_new_prod" : <int>, "total_sale" : <int> },
"w4": {"total_all" : <int>, "total_new_prod" : <int>, "total_sale" : <int> }, 
"w_all" : {"total_all" : <int>, "total_new_prod" : <int>, "total_sale" : <int> }
}
}

Table of Contents

OnSight Analytics Interface

git

Allgemeines

MongoDB-Collections

Apis

Solr Search

/update_solr_index (POST/json)

/request_solr_index (POST/json)

/is_synchronized (POST/json)

Shops

/translate_id (POST)

/get_shops (GET)

/get_shop_crawler_mapping (POST/json)

/remove_shop_crawler_mapping (POST/json)

/add_shop_crawler_mapping (POST/json)

/add_inspiration_shops (POST/json)

/get_inspiration_shops (POST/json)

/get_shop_stats (POST/json)

Products

/get_product_history (POST/json)

Diverse

/get_attributes (GET)

Categories

/get_categories (GET)

/get_shop_categories (GET)

/add_category_relation (POST/json)

/remove_category_relation (POST/json)

/get_category_relation (GET)

/get_top_categories (GET)

/get_categorization (POST)

http:%%//%%branch01.picalike.corpex-kunden.de:5003/prepare_categorization (GET)

Brands

/get_shop_brands (POST)

Genders

/get_genders (GET)

/get_shop_genders (GET)

/add_gender_relation (POST/json)

/remove_gender_relation (POST/json)

/get_gender_relation (GET)

Reports

/existing_report (GET)

/list_reports (GET)

/get_all_reports (POST)

/add_report (POST/json)

/remove_report (POST/json)

/get_report (GET)

/exclude_product (POST)

/unexclude_product (POST)

/get_excluded (GET)

Cluster

/comp_cluster (POST/json)

/get_cluster_report (POST/json)

Trends

http:%%//%%branch01.picalike.corpex-kunden.de:5002/prepare_trends (GET)

/sort_trend_products (POST/json)

/sort_trend_brands (POST/json)

/sort_trend_category (POST/json)

/sort_trend_colors (POST/json)

/sort_trend_attributes (POST/json)

/get_sim_trend_products (POST/json)

http:%%//%%frontend01-hpc.picalike.corpex-kunden.de:5003/get_cat_trends (GET) Port 5003

Krawla

http:%%//%%branch01.picalike.corpex-kunden.de:9095/get_krawla_sum (GET)