Table of Contents

Image Cloud

HOSTS & PORTS

new version Corpex:
ic00: image-cloud.picalike.corpex-kunden.de:5000

connection to the database:

psql "postgresql://image_cloud:Q%25c%21rqHLiszD%232PpS6geZ2Y%5Et5@image-cloud.picalike.corpex-kunden.de:5436/image_cloud"

[Password escaped urllib.parse.quote]

All image cloud nodes (deploy also to those hosts):
ic00: image-cloud.picalike.corpex-kunden.de
ic01: v220220762212195282.goodsrv.de
ic02: v220220762212195284.supersrv.de
ic03: v220220762212195283.ultrasrv.de

Old Netcup:
37.221.193.171 v22019086221294614.goodsrv.de (backup node data, ic0x)
Unused:
188.68.50.113 v22019086221295244.supersrv.de (given to marcus and wellington for experiments)
46.38.233.37 v22019086221295243.yourvserver.n_e_t (DO NOT USE DNS NAME!!! DNS funktioniert nicht!!!)

Load Balancer: FIXME: unused
image-cloud.lb.int.picalike.corpex-kunden.de (hier wird port 80 an port 5000 weitergeleitet)

TCP Ports: 5000

Authentication / Token

Since the Image cloud is used from outside corpex from now on, we use a token to ensure some level of security. Compare the projects README.

DEPLOYMENT

The build_and_deploy.sh script only works with ssh aliases to resolve the actual host name,

build_and_deploy.sh ic00

those ic0{0,1,2,3} must be added to the ssh config to work.

Furthermore, the deployer needs to have SSH access to the netcup machines.

ENDPOINTS

V4 Cloud?

inzwischen auf Python V3 migriert

Starten

nohup /home/picalike/venv_image_cloud/bin/uwsgi --ini /mnt/cloud/image_cloud/image_cloud.ini &> /mnt/storage/var/log/imageCloud.log &  # cloud01

git

$ git clone ssh://picalike@sg01.picalike.corpex-kunden.de/home/picalike/repositories/rakete/image_cloud.git

Konfigurationsdatein

Ablageorte der Daten

Cloud01

image_storage: /mnt/cloud/images/ (siehe image_cloud.ini)

sqlite_db: /mnt/cloud/icdb.sql (siehe image_cloud.ini)

log_file: /mnt/storage/var/log.imageCloud.log (siehe image_cloud.ini)

Netcup

image_storage: /home/picalike/pg_ic_data

sqlite_db /home/picalike/data/icdb.db

log_file: ?

check clone requests backlog

psql -c “select count(*) from clone_requests;” “postgresql:image_cloud:Q%25c%21rqHLiszD%232PpS6geZ2Y%5Et5@image-cloud.picalike.corpex-kunden.de:5436/image_cloud” ; for host in ic01 ic02 ic03 ; do ssh $host 'psql -c “select count(*) from clone_requests;” “postgresql:image_cloud:Q%25c%21rqHLiszD%232PpS6geZ2Y%5Et5@localhost:5436/image_cloud”' ; done ; date :!: i have no idea how to format this properly

Experiments / Results / Knowledge Base

2020-06-17: Image Cloud Remove Duplicates

Cloud01 is close to its memory limit, so maybe checking for duplicate images (with different urls) and deleting them could be helpful.

Maybe Hashing?

Hackathon-Result: the result of the hackathon was, that deduplication by md5 hash will not save a lot of space (<1%)

Beschreibung der Komponenten

ic_frontend

Stellt get_image und fire_and_forget bereit.

get_image

Nimmt per GET eine Url entgegen und liefert das Bild zurück. Cache wenn möglich, Download wenn nötig.

http://cloud01.picalike.corpex-kunden.de:5000/get_image?url=<url>

Beispiel:

http://cloud01.picalike.corpex-kunden.de:5000/get_image?url=https%3A%2F%2Fs-media-cache-ak0.pinimg.com%2Foriginals%2F91%2Fca%2Ffc%2F91cafceef8204c3ce8375f4fde34a0e5.jpg

Weitere Parameter:

Beispiele:

http://cloud01.picalike.corpex-kunden.de:5000/get_image?url=https%3A%2F%2Fs-media-cache-ak0.pinimg.com%2Foriginals%2F91%2Fca%2Ffc%2F91cafceef8204c3ce8375f4fde34a0e5.jpg&resize_width=100&resize_height=100&force_to_size=true
http://cloud01.picalike.corpex-kunden.de:5000/get_image?url=https%3A%2F%2Fs-media-cache-ak0.pinimg.com%2Foriginals%2F91%2Fca%2Ffc%2F91cafceef8204c3ce8375f4fde34a0e5.jpg&resize_width=100&resize_height=100&keep_aspect_ratio=false

fire_and_forget

Übergebe via POST eine json-Datei der Form:

{"urls": [<url>, ...]}

Die Urls der Liste werden in den Cache geladen.

Beispiel:

curl --data-binary @<url_file>.json -H "Content-type: application/json" "http://cloud01.picalike.corpex-kunden.de:5500/fire_and_forget"

ic_backend

Schreibt meta-Daten zum Bild in eine sqlite_db. Angebunden an ic_frontend:get_image via zeroMQ.

faf_fan

Managed den asyncronen Voratsdownload für ic_frontend:fire_and_forget. Via max_downloads.json kann je Domain ein Maximum für parallele Downloads angegeben werden.

faf_worker

Ruft ic_frontend:get_image auf und verwirft das Bild, damit es für eine zukünfige Anfrage bereits im Cache ist.

fw_worker

Speichert das Bild auf Platte ab. Es wird eine Multipart-Message per ZeroMQ geschickt: [path, fname, file content]

iproxy.php

Visualytics/OSA is using iproxy.php to access the image cloud. It used to have its own caching layer

Visualytics iproxy.php kann auf web02 gefunden werden. Images are cached under /mnt/storage/var/www/thumbnails … image file can be found with echo -n “$size$url” | md5sum. For size we found that the values 200 and 250 were used.

OSA iproxy.php kann auf osa.picalike.corpex-kunden.de gefunden werden.

TAGS

iproxy iproxy.php image cloud auth token backup balancer netcup