!!!!!!!!!!! PARTLY OBSOLETE
The services are accessible via ssh:
to actually login
ssh -p 9999 picalike@grapex.duckdns.org
If you are asked for the password, it is the super secret AI password only the cool kids know.
When we found a bug in the feature extraction, we also realized that the numerical results are very different if we compare gpu03 vs. grapex. This is the reason why we now only use TF 2.2 [tensorflow/tensorflow:2.2.0] dockerized, regardless if with or without GPU support.
For the out of the box docker image, you just need to install
pip3 install --user Pillow
inside the docker. The container is started with:
docker run -d --shm-size 6G -v /home/picalike/v5:/v5 -it tensorflow/tensorflow:2.2.0 bash
The shared memory option is need to access all GPU memory and we further mount, the v5 folder with all the code in the docker. After it, a bash is started:
docker exec -ti condescending_wright /bin/bash
If you use nvidia + docker, make sure that the shared memory is at least as big as the GPU memory. For gpu03 the appropriate setting would thus be: docker run –shm-size 6G Without this trick, PyTorch is not able to utilize the full memory.
To avoid breaking the system, only the video driver from nivida is installed and the actual CUDA setup is done in a docker container. Thanks to nvidia docker, GPU support works out of the box:
docker run –shm-size 6G –gpus all [..]
There is no image for our environment yet, but the installation is simple:
pip3 install --user torch torchvision
The holy GPU grail, grapex was also de-gpued because of automatic updates. At the end, cuda-9 was removed for no good reasons. The disabled unattended upgrades should prevent further damage. We manually installed the required packages as root again:
The quick test is 'import tensorflow' with ipython. If there is a missing library, it would be dumped on stdout and the so-name usually equals the package, so it is easy to find the corresponding one.
This machine is not used productive. It was used by Julius (2019) for his machine learning stuff. It uses the most recent Nvidia driver (440), CUDA (10) and Tensorflow [tensorflow/tensorflow:2.2.0-gpu-jupyter]
Some machines use unattended upgrades which is a huge problem for CUDA since it might happpen that kernel + library version do not match any longer. Then a 'reboot' as root is required.
To disable this behavior: picalike@gpu03:~$ cat /etc/apt/apt.conf.d/20auto-upgrades
APT::Periodic::Update-Package-Lists “1”; APT::Periodic::Unattended-Upgrade “0”;
If the last line contains a “1”, it need to be adjusted to “0”.
Keywords: grapu grapex grapy gpu duckdns