Variable Rate Allocation for Vector-Quantized Autoencoders

An image from Kodak compressed at 0.16 bits per pixel. Top: the rate budget is allocated uniformly to all patches. Bottom: semantic information from DINO controls the number of product quantization indexes retained for each patch, resulting in a more detailed subject. Total bits are depicted in blue (low) and red (high).


Vector-quantized autoencoders have recently gained interest in image compression, generation and self-supervised learning. However, as a neural compression method, they lack the possibility to allocate a variable number of bits to each image location, e.g. according to the semantic content or local saliency. In this paper, we address this limitation in a simple yet effective way. We adopt a product quantizer (PQ) that produces a set of discrete codes for each image patch rather than a single index. This PQ-autoencoder is trained end-to-end with a structured dropout that selectively masks a variable number of codes at each location. These mechanisms force the decoder to reconstruct the original image based on partial information and allow us to control the local rate. The resulting model can compress images on a wide range of operating points of the rate-distortion curve and can be paired with any external method for saliency estimation to control the compression rate at a local level. We demonstrate the effectiveness of our approach on the popular Kodak and ImageNet datasets by measuring both distortion and perceptual quality metrics.

International Conference on Acoustics, Speech, and Signal Processing
Federico Baldassarre
Federico Baldassarre
PhD Student in Deep Learning

My research focuses on explainability and reasoning in Deep Learning.

comments powered by Disqus