The Load Balancer manages tasks, workers and proxies.
When there are more workers than tasks we perform a worker selection. Otherwise we perform a task selection. For each task that we send out, we need to perform a proxy selection.
Tasks are organized by tokens. Tokens identify a krawla session, they contain the shop_id, a hashed config key and a session counter.
Currently we perform a task selection for each idle worker while there are tasks available.
<HTML><ol></HTML>
<HTML><ol></HTML>
priority is calculated taking the following metrics into account:
<HTML><ol></HTML>
for a detailed view of the priority calculation check out src/krawla/utils/lb_master.py
(search for: find_task_for_worker
) in the git.
The proxy assignment is currently done by a class called ProxyProviderHelper in git:/src/krawla/utils/proxy_provider_helper.py
.
We keep track of the following information:
When selecting a proxy we consider the following information:
LB_master → LB_client
LB_master sends tasks that it receives from the controller to the LB_client where it is passed on to a worker.
LB_client → LB_master
LB_client sends all messages from the worker to the LB_master. If the command 'done' or 'error' is received in the LB_master, than the task is considered finished.