osc.loss_objects.matching_similarity_loss_per_img

matching_similarity_loss_per_img(f, p, reduction='mean')[source]

Cosine similarity per object, only between corresponding images.

The S slots of the i-th image are matched with the slots of the i+B-th image and vice versa. The loss encourages high cosine similarity between pairs. Similarity is defined as (1-cos)/2.

Parameters
  • f (Tensor) – [2B, S, C] tensor of pre-projection image features

  • p (Tensor) – [2B, S, C] tensor of post-projection image features

  • reduction (str) – ‘mean’, ‘sum’, or ‘none’

Return type

Tensor

Returns

Scalar loss over all samples and slots if reduction is ‘mean’ or ‘sum’. A vector 2BS of losses if reduction is ‘none’

Example

A batch of B=4 images, augmented twice, each with S=3 slots. The X matching pairs whose cosine similarity will be increased. When computing the cosine, the vectors along the column axis are detached to prevent gradient propagation:

                        aug_0                    aug_1
               -----------------------  -----------------------
                 0     1     2     3      0     1     2     3
      |       [     |     |     |     ||    X|     |     |     ]
      | img_0 [     |     |     |     ||  X  |     |     |     ]
      |       [     |     |     |     ||X    |     |     |     ]
      |       [-----+-----+-----+------------+-----+-----+-----]
      |       [     |     |     |     ||     |X    |     |     ]
      | img_1 [     |     |     |     ||     |  X  |     |     ]
      |       [     |     |     |     ||     |    X|     |     ]
aug_0 |       [-----+-----+-----+------------+-----+-----+-----]
      |       [     |     |     |     ||     |     |    X|     ]
      | img_2 [     |     |     |     ||     |     |X    |     ]
      |       [     |     |     |     ||     |     |  X  |     ]
      |       [-----+-----+-----+------------+-----+-----+-----]
      |       [     |     |     |     ||     |     |     |    X]
      | img_3 [     |     |     |     ||     |     |     |  X  ]
      |       [     |     |     |     ||     |     |     |X    ]
              [=====|=====|=====|============|=====|=====|=====]
      |       [    X|     |     |     ||     |     |     |     ]
      | img_0 [  X  |     |     |     ||     |     |     |     ]
      |       [X    |     |     |     ||     |     |     |     ]
      |       [-----+-----+-----+------------+-----+-----+-----]
      |       [     |X    |     |     ||     |     |     |     ]
      | img_1 [     |  X  |     |     ||     |     |     |     ]
      |       [     |    X|     |     ||     |     |     |     ]
aug_1 |       [-----+-----+-----+------------+-----+-----+-----]
      |       [     |     |  X  |     ||     |     |     |     ]
      | img_2 [     |     |    X|     ||     |     |     |     ]
      |       [     |     |X    |     ||     |     |     |     ]
      |       [-----+-----+-----+------------+-----+-----+-----]
      |       [     |     |     |    X||     |     |     |     ]
      | img_3 [     |     |     |  X  ||     |     |     |     ]
      |       [     |     |     |X    ||     |     |     |     ]