Ideas:
Inspiration from NLP and pre-trained models https://github.com/ddangelov/Top2Vec
Use a set of features to allow to represent different aspects not crowded in a single vector:
https://arxiv.org/abs/2204.00949 (feature sets)
General: https://distill.pub/ articles about analyzing neural net internals
Refs:
Group Masked Autoencoder: https://arxiv.org/abs/2205.14986 which helps to learn concepts that are larger than a single patch.
Perceptional Loss:
https://sanjivgautamofficial.medium.com/perceptual-loss-well-it-sounds-interesting-after-neural-style-transfer-d09a48b6fb7d
https://arxiv.org/pdf/1603.08155.pdf (Perceptual Losses for Real-Time Style Transfer and Super-Resolution)
https://arxiv.org/pdf/1801.03924.pdf (The Unreasonable Effectiveness of Deep Features as a Perceptual Metric)