====== Trend Analyzer ======

Purpose of this app is to extract and aggregate trend information for attributes from the many cluster we have. Attributes can be color, brand, attribute in general. With the results we can tell how e.g. the trend score of the attribute “red” is.

Maintainer: Patrick


===== Host =====

http://pci01.picalike.corpex-kunden.de:1113/


===== Git =====

https://git.picalike.corpex-kunden.de/picalike/trend_analyzer_api


===== Database =====


==== reads from ====

Host: Live Solr\\
Zookeeper_Client necessary: solr01.picalike.corpex-kunden.de:2185,solr02.picalike.corpex-kunden.de:2186,sg03.picalike.corpex-kunden.de:2187


==== writes into ====

Host: OSA Database\\
DB: trends\\
Collection: trends (unique key: (''%%shop_id%%'', ''%%picalike_cat%%'', ''%%attribute%%'', ''%%gender%%''))


===== Input to Start Calculation =====

Endpoint: pci01.picalike.corpex-kunden.de:1113/start\\
The application expects a post request with the following parameters:


  * shop_id: string
  * update_id: string
  * session: int


===== Upload Output to Mongo =====

<code>
    "trend_id":"{}#{}#{}#{}".format(shop_id,cat,element, gender),
                      "shop_id":shop_id,
                      "picalike_cat":cat,
                      "attribute":element,
                      "position":current_position,
                      "type":"{}".format(element_type),
                      "position_hist": position_hist,
                      "cluster_trend":cluster_trend,
                      "timestamp": time.time(),
                      "gender": gender
</code>

===== "/get_trends" =====


==== Input ====

Endpoint: pci01.picalike.corpex-kunden.de:1113/get_trends\\
The application expects a post request with the following parameters:


  * shop_id: string
  * type: “brand”, “attribute”, “color”, “pattern”, “no_brand”, “picalike_cat” or “all”
  * ignore_my_shop: boolean
  * picalike_cat: list of picalike categories or just one category
  * gender: string
  * sort_by: boolean


==== Output Example ====

<code>
 {'results': [{'attribute': 'Cashmere Victim',
 'position_hist': {'5': 0.8240069150924683, '6': 0.8240069150924683},
 'cluster_trend': 0.8240069,
 'position': 0.25751608239999996},
{'attribute': 'Clamp',
 'position_hist': {'3': 0.8273629397153854,
  '5': 0.7457152545452118,
  '6': 0.7457152545452118,
  '2': 0.8651039004325867},
 'cluster_trend': 0.745715242,
 'position': -404.0},
 'status': 200,
 'msg': 'Found 6086 documents for s24_de_feed. This was a trend analysis on attribute'}
</code>

===== How the calculation works =====

With the shop_id as reference the app filters docs in solr using its defined trendsetter shops. From the search results the app collects each products attributes and its trend_score from solr and aggregates the overall trend_score for each attribute combination.

All product metadata are collected in the following data structure:\\


<code>
 container = {picalike_cat:
                 {attribute:
                      {gender:
                          {position_hist:
                              {calendar_week1: [past_positions],
                              calendarweeek2: [past_positions], ...},
                           position: [current_positions],
                           cluster_trend: [current_cluster_trend]
                           }
                       }
                  }
             }
             
</code>
During the calculations the last 30 days were considered.If the requested shop_id is a feed_shop, then the position information will not be considered. If the requested shop_id is a crawler_shop, then the cluster_trend information will be seen as null.

= attribute combination = This kind of structure was chosen so that you can count the many trend_score in a list for each attribute combination and calculate the average. An attribute combination consist the picalike_cat, attribute_name and gender, which also describes the combinations id (picalike_cat#attribute_name#gender).

Every product metadata will be processed into a new attribute combination, if the combination already exists, only the position/cluster_trend scores will be appended.

After the collection of all the positions or cluster_trends of each attribute combination (depending on crawler/feed shop), the average value of each will be determined.