Table of Contents

Krawla Load Balancer

The Load Balancer manages tasks, workers and proxies.

When there are more workers than tasks we perform a worker selection. Otherwise we perform a task selection. For each task that we send out, we need to perform a proxy selection.

Tasks are organized by tokens. Tokens identify a krawla session, they contain the shop_id, a hashed config key and a session counter.

Git: https://git.picalike.corpex-kunden.de/krawla/utils

Worker Selection

Currently we perform a task selection for each idle worker while there are tasks available.

Task Selection

<HTML><ol></HTML>

<HTML><ol></HTML>

priority is calculated taking the following metrics into account:

<HTML><ol></HTML>

for a detailed view of the priority calculation check out src/krawla/utils/lb_master.py (search for: find_task_for_worker) in the git.

Proxy Selection

The proxy assignment is currently done by a class called ProxyProviderHelper in git:/src/krawla/utils/proxy_provider_helper.py.

We keep track of the following information:

When selecting a proxy we consider the following information:

LB_master <-> LB_client protocol

LB_master → LB_client

LB_master sends tasks that it receives from the controller to the LB_client where it is passed on to a worker.

LB_client → LB_master

LB_client sends all messages from the worker to the LB_master. If the command 'done' or 'error' is received in the LB_master, than the task is considered finished.

Proxy Services