Table of Contents
Trend Analyzer
Purpose of this app is to extract and aggregate trend information for attributes from the many cluster we have. Attributes can be color, brand, attribute in general. With the results we can tell how e.g. the trend score of the attribute “red” is.
Maintainer: Patrick
Host
Git
Database
reads from
Host: Live Solr
Zookeeper_Client necessary: solr01.picalike.corpex-kunden.de:2185,solr02.picalike.corpex-kunden.de:2186,sg03.picalike.corpex-kunden.de:2187
writes into
Host: OSA Database
DB: trends
Collection: trends (unique key: (shop_id
, picalike_cat
, attribute
, gender
))
Input to Start Calculation
Endpoint: pci01.picalike.corpex-kunden.de:1113/start
The application expects a post request with the following parameters:
- shop_id: string
- update_id: string
- session: int
Upload Output to Mongo
"trend_id":"{}#{}#{}#{}".format(shop_id,cat,element, gender), "shop_id":shop_id, "picalike_cat":cat, "attribute":element, "position":current_position, "type":"{}".format(element_type), "position_hist": position_hist, "cluster_trend":cluster_trend, "timestamp": time.time(), "gender": gender
"/get_trends"
Input
Endpoint: pci01.picalike.corpex-kunden.de:1113/get_trends
The application expects a post request with the following parameters:
- shop_id: string
- type: “brand”, “attribute”, “color”, “pattern”, “no_brand”, “picalike_cat” or “all”
- ignore_my_shop: boolean
- picalike_cat: list of picalike categories or just one category
- gender: string
- sort_by: boolean
Output Example
{'results': [{'attribute': 'Cashmere Victim', 'position_hist': {'5': 0.8240069150924683, '6': 0.8240069150924683}, 'cluster_trend': 0.8240069, 'position': 0.25751608239999996}, {'attribute': 'Clamp', 'position_hist': {'3': 0.8273629397153854, '5': 0.7457152545452118, '6': 0.7457152545452118, '2': 0.8651039004325867}, 'cluster_trend': 0.745715242, 'position': -404.0}, 'status': 200, 'msg': 'Found 6086 documents for s24_de_feed. This was a trend analysis on attribute'}
How the calculation works
With the shop_id as reference the app filters docs in solr using its defined trendsetter shops. From the search results the app collects each products attributes and its trend_score from solr and aggregates the overall trend_score for each attribute combination.
All product metadata are collected in the following data structure:
container = {picalike_cat: {attribute: {gender: {position_hist: {calendar_week1: [past_positions], calendarweeek2: [past_positions], ...}, position: [current_positions], cluster_trend: [current_cluster_trend] } } } }
During the calculations the last 30 days were considered.If the requested shop_id is a feed_shop, then the position information will not be considered. If the requested shop_id is a crawler_shop, then the cluster_trend information will be seen as null.
= attribute combination = This kind of structure was chosen so that you can count the many trend_score in a list for each attribute combination and calculate the average. An attribute combination consist the picalike_cat, attribute_name and gender, which also describes the combinations id (picalike_cat#attribute_name#gender).
Every product metadata will be processed into a new attribute combination, if the combination already exists, only the position/cluster_trend scores will be appended.
After the collection of all the positions or cluster_trends of each attribute combination (depending on crawler/feed shop), the average value of each will be determined.