AI Laboratory

This is the home of our new v6/v5+ lab to plan, design and implement cool new stuff in a disciplined way with respect to reproducibility, maintenance, quality and operations.

The framework of choice for experiments is PyTorch, but as long as TF is used in production, both frameworks will co-exist.

For now, all the AI stuff is located in a playground git repo which is subject to change: Git Repo.

Right now, there are two major projects:

BigNet Extractor: A set of tools to extract bits / clusters from a pre-trained network
UniNet: A PyTorch implementation of the Alignment + Uniform loss to learn features in an unsupervised manner.<HTML></ol></HTML>

Building Blocks

ConvNet

Since the literature widely adopted ResNets for various tasks with good results, we also stick with it. This also helps to gain experience and derive recipes and/or cook books how to train a network for different tasks. We usually start with a pre-trained ResNet and also use a 34-layer version since it often suffices for the problem a hand.

To use such a network in PyTorch, all you have to do is:

import torch
model = torch.hub.load('pytorch/vision:v0.8.2', 'resnet34', pretrained=True).eval()

Classifier

Here we adopted the idea of self-attention in various forms. Examples can be found here:

https://git.picalike.corpex-kunden.de/hackathon/awesome_ai/-/blob/master/src/model_transformer.py
https://git.picalike.corpex-kunden.de/hackathon/awesome_ai/-/blob/master/src/model.py

The idea is to use the output of individual convolution filters to let a classifier decide what filter is useful for the final prediction and give it the choice to (completely) ignore others.

Example: The output of the last conv layer of a ResNet34 looks like this (1, 512, 7, 7) for a batch of size one. In words, we have 512 filters with a shape of 7×7. Since self-attention is 1D, the tensor is reshaped to (1, 512, 49). This means we can attend individually over each of the 512 filters.

A further idea is to use hierarchical classifiers to model disjoint label levels. For example: fashion#shoes#sneakers has three levels

fashion
shoes
sneakers<HTML></ol></HTML>

We decided to progressively predict depth levels, like this:

fashion
fashion#shoes
fashion#shoes#snakers<HTML></ol></HTML>

The idea is to be more fine-grained at the next level with the benefit, if a coarse classifier is required, we can just use the first level.

An example can be found here:

https://git.picalike.corpex-kunden.de/hackathon/awesome_ai/-/blob/master/src/dataset/sampler.py (DatasetHierachy)

This is an illustration of the concept. Instead of using patches directly from image, we feed the output of the ConvNet into the module (Hybrid in the paper):

Links: https://jacobgil.github.io/deeplearning/vision-transformer-explainability

Docker

Since docker has cuda/gpu support for quite some time, the ideal solution is to use docker images instead of system-wide modifications. The idea is just to provide the NVIDIA driver on the host name and to do the rest in the docker container. There are no PyTorch images yet, but the dependencies are minimal.

PyTorch

There is a ready-to-use PIP package which can be easily integrated:

pip3 install --user torch==1.7.1+cpu torchvision==0.8.2+cpu -f https://download.pytorch.org/whl/torch_stable.html

This version is for local use only, just CPU support, for GPU support just use:

pip3 install --user torch

References

External Project Page

Picalike Dokuwiki Archive

Table of Contents