User Tools

Site Tools


image_segmentation

Image Segmentation / Color Extraction

Conclusion

The idea to use the attention masks (rollout.py) of a pre-trained vision transformer to get region of interest for images does not work for fashion images.

Furthermore, the resize/aspect does not work since transformers are trained with square 224×224 images and thus, the attention masks cannot cope with introduced “white boxes”.

Fine-tuning the network with 8 fashion categories and a couple of 1,000 images also did not help, but the evaluation time was limited and we likely need to repeat the experiment with a bigger setup.

The plan is to use the segmentation data set, but it is rather small, to decide if we can come up with better masks.

At the end the segmentation mask is used to extract the colors of the region of interest, which are very likely products.

image_segmentation.txt · Last modified: 2024/04/11 14:23 by 127.0.0.1