Lossy Data Compression at Finite Blocklength: State of the Art, Challenges, Techniques

Victoria Kostina, California Institute of Technology

The quality of data compression systems is gauged by an appropriately defined distortion measure between the input and the output, by the amount (rate) of communication expended by the system, for a given length of the data block. Quantifying a fundamental rate-distortion-blocklength tradeoff, unsurpassable by any technology, is the holy grail of the non-asymptotic rate-distortion theory.

Naturally, the roots of the non-asymptotic rate-distortion theory lie in the classical asymptotic rate-distortion theory, which studies the minimum information rate required to sustain a given distortion in the limit of infinite blocklength. That fundamental limit is called the rate-distortion function. Shannon’s asymptotic coding theorems establish that under certain conditions, the rate-distortion given by the solution to a mutual information minimization problem. Although that convex optimization problem can often be solved numerically, in general it does not have an explicit solution. Consequently, the non-asymptotic bounds and approximations, which extend the asymptotic Shannon theory, are also in general expressed parametrically, rather than explicitly. Fortunately, for a class of distortion measures, the rate-distortion function is bound ed from below by a simple, explicit formula known as Shannon’s lower bound. Remarkably, that formula is known to be tight in the limit of low distortion.

In this tutorial, we present a nonasymptotic version of Shannon’s lower bound, and demonstrate how it leads to a vast simplification of the analysis and offers new intuitions for the fundamental tradeoffs in data compression. We show that lattice quantizers approach that lower bound, provided that the source density is smooth enough and the distortion is low. This implies that fine multidimensional lattice coverings are nearly optimal in rate-distortion sense even at finite blocklength. Since these new techniques are simple and intuitive, we hope that they will pave the way for a nonasymptotic analysis of some notoriously difficult rate-distortion problems, such as lossy compression of sources with memory and multiterminal lossy compression.

Victoria Kostina joined Caltech as an Assistant Professor of Electrical Engineering in the fall of 2014. She holds a Bachelor's degree from Moscow institute of Physics and Technology (2004), where she was affiliated with the Institute for Information Transmission Problems of the Russian Academy of Sciences, a Master's degree from University of Ottawa (2006), and a PhD from Princeton University (2013). Her PhD dissertation on information-theoretic limits of lossy data compression received Princeton Electrical Engineering Best Dissertation award. She is also a recipient of Simons-Berkeley research fellowship (2015).

Victoria Kostina's research spans information theory, coding, and wireless communications. Her current efforts explore the nonasymptotic regime in information theory. Leveraging tools from the theory of random processes and concentration of measure, she pursues fundamental insight into modern delay-constrained communication systems, including lossy and lossless data compressors, joint source-channel codes, communication with feedback.