Image Segmentation / Color Extraction

Link to the summary document: https://docs.google.com/document/d/1gZrcfbXSyyOHOe5mucHHs5gbO8of9yRL9JLc7OLHyzI/edit

git: https://git.picalike.corpex-kunden.de/incubator/image-segmentation

possible test setup:
https://github.com/bearpaw/clothing-co-parsing in combination with a Transformer

References:

Survey: https://medium.com/swlh/image-segmentation-using-deep-learning-a-survey-e37e0f0a1489
Image Transformer: https://arxiv.org/abs/2010.11929

Conclusion

The idea to use the attention masks (rollout.py) of a pre-trained vision transformer to get region of interest for images does not work for fashion images.

Furthermore, the resize/aspect does not work since transformers are trained with square 224×224 images and thus, the attention masks cannot cope with introduced “white boxes”.

Fine-tuning the network with 8 fashion categories and a couple of 1,000 images also did not help, but the evaluation time was limited and we likely need to repeat the experiment with a bigger setup.

The plan is to use the segmentation data set, but it is rather small, to decide if we can come up with better masks.

At the end the segmentation mask is used to extract the colors of the region of interest, which are very likely products.