Image Segmentation / Color Extraction
Link to the summary document: https://docs.google.com/document/d/1gZrcfbXSyyOHOe5mucHHs5gbO8of9yRL9JLc7OLHyzI/edit
git: https://git.picalike.corpex-kunden.de/incubator/image-segmentation
possible test setup:
https://github.com/bearpaw/clothing-co-parsing in combination with a Transformer
References:
Survey: https://medium.com/swlh/image-segmentation-using-deep-learning-a-survey-e37e0f0a1489
Image Transformer: https://arxiv.org/abs/2010.11929
Conclusion
The idea to use the attention masks (rollout.py) of a pre-trained vision transformer to get region of interest for images does not work for fashion images.
Furthermore, the resize/aspect does not work since transformers are trained with square 224×224 images and thus, the attention masks cannot cope with introduced “white boxes”.
Fine-tuning the network with 8 fashion categories and a couple of 1,000 images also did not help, but the evaluation time was limited and we likely need to repeat the experiment with a bigger setup.
The plan is to use the segmentation data set, but it is rather small, to decide if we can come up with better masks.
At the end the segmentation mask is used to extract the colors of the region of interest, which are very likely products.