Table of Contents

!!!!!!!!!!! PARTLY OBSOLETE

Overview

The services are accessible via ssh:

to actually login

ssh -p 9999 picalike@grapex.duckdns.org

If you are asked for the password, it is the super secret AI password only the cool kids know.

Known Issues

When we found a bug in the feature extraction, we also realized that the numerical results are very different if we compare gpu03 vs. grapex. This is the reason why we now only use TF 2.2 [tensorflow/tensorflow:2.2.0] dockerized, regardless if with or without GPU support.
For the out of the box docker image, you just need to install

pip3 install --user Pillow

inside the docker. The container is started with:

docker run -d --shm-size 6G -v /home/picalike/v5:/v5 -it tensorflow/tensorflow:2.2.0 bash

The shared memory option is need to access all GPU memory and we further mount, the v5 folder with all the code in the docker. After it, a bash is started:

docker exec -ti condescending_wright /bin/bash

Docker: Shared Memory

If you use nvidia + docker, make sure that the shared memory is at least as big as the GPU memory. For gpu03 the appropriate setting would thus be: docker run –shm-size 6G Without this trick, PyTorch is not able to utilize the full memory.

Docker: Images

To avoid breaking the system, only the video driver from nivida is installed and the actual CUDA setup is done in a docker container. Thanks to nvidia docker, GPU support works out of the box:
docker run –shm-size 6G –gpus all [..]

PyTorch

There is no image for our environment yet, but the installation is simple:

pip3 install --user torch torchvision

Auto-Update Strikes Again

The holy GPU grail, grapex was also de-gpued because of automatic updates. At the end, cuda-9 was removed for no good reasons. The disabled unattended upgrades should prevent further damage. We manually installed the required packages as root again:

The quick test is 'import tensorflow' with ipython. If there is a missing library, it would be dumped on stdout and the so-name usually equals the package, so it is easy to find the corresponding one.

GPU03

This machine is not used productive. It was used by Julius (2019) for his machine learning stuff. It uses the most recent Nvidia driver (440), CUDA (10) and Tensorflow [tensorflow/tensorflow:2.2.0-gpu-jupyter]

Hidden Knowledge

Some machines use unattended upgrades which is a huge problem for CUDA since it might happpen that kernel + library version do not match any longer. Then a 'reboot' as root is required.

To disable this behavior: picalike@gpu03:~$ cat /etc/apt/apt.conf.d/20auto-upgrades

APT::Periodic::Update-Package-Lists “1”; APT::Periodic::Unattended-Upgrade “0”;

If the last line contains a “1”, it need to be adjusted to “0”.

Keywords: grapu grapex grapy gpu duckdns