Lossless Data Compression (LLDC)

People

Research areas

Description

Lossless data compression allows exact recovery of the original data from a compressed file [CT06, Chp.5]. Its most important uses are for saving storage space and channel bandwidth, and also as components in lossy compressors. Continuous theoretical advances and increase in computation power result in a steady improvement of compression algorithms [Mah12, Hut06]. Improvements of a few percent translate into million-dollar savings in mobile and broadband networks. Many compressors use arithmetic coding based on a statistical model of the data which itself is learnt from the data. A popular such method is Context Tree Weighting (CTW) [WST95, WST97]. Extensions thereof are among the best existing compressors. They also come with strong theoretical guarantees [OHSS12, VH12, VNHB12, VWBG13]. The learnt statistical models are actually ideal sequential predictors, and can also be used in intelligent agents [VNH+11].

Goals

  • improve or combine existing or develop new Extensions of CTW (ETCW)
  • prove performance bounds for ECTW compression
  • experimentally test ECTW and verify the bounds
  • beat state-of-the-art compressors

Requirements

  • excellent general math skills
  • mastering elementary probability calculus and statistics
  • ideally a background in information theory
  • reliable handling of long formulas
  • creativity in finding and constructing proofs
  • basic programming skills
  • ideally experience with implementing numerical algorithms

Background Literature

  • [CT06] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley- Intersience, 2nd edition, 2006.
  • [Hut06] M. Hutter. Human knowledge compression prize, 2006. open ended, http://prize.hutter1.net/.
  • [Kul11] M. O. Kulekci. Compressed context modeling for text compression. In Data Compression Conference (DCC 2011), pages 373–382, Snowbird, UT, USA, March 2011. IEEE Computer Society.
  • [Mah12] M. Mahoney. Data Compression Explained. Dell, Inc, http://mattmahoney.net/dc/dce.html, 2012.
  • [OHSS12] Alexander O’Neill, Marcus Hutter, Wen Shao, and Peter Sunehag. Adaptive context tree weighting. In Proc. Data Compression Conference (DCC’12), pages 317–326, Snowbird, Utah, 2012. IEEE Computer Society.
  • [Hut13] M. Hutter. Sparse adaptive Dirichlet-multinomial-like processes. Journal of Machine Learning Research, W&CP: COLT, 30:432--459, 2013.
  • [VNHUS11] J. Veness, K. S. Ng, M. Hutter, W. Uther, and D. Silver. A Monte Carlo AIXI approximation. Journal of Artificial Intelligence Research, 40:95–142, 2011.
  • [VNHB] J. Veness, K. S. Ng, M. Hutter, and M. Bowling. Context tree switching. In Proc. Data Compression Conference (DCC’12), pages 327–336, Snowbird, Utah. IEEE Computer Society.
  • [VSH12] J. Veness, P. Sunehag, and M. Hutter. On ensemble techniques for AIXI approximation. In Proc. 5th Conf. on Artificial General Intelligence (AGI’12), volume 7716, pages 341–351, Springer, Berlin, 2012.
  • [V+12] J. Veness et al. Partition tree weighting. In Proc. Data Compression Conference (DCC'13), pages 321-330, Snowbird, Utah, USA, 2013. IEEE Computer Society.
  • [WST95] F. M. J. Willems, Y. M. Shtarkov, and T. J. Tjalkens. The context tree weighting method: Basic properties. IEEE Transactions on Information Theory, 41:653–664, 1995.
  • [WST97] F. M. J. Willems, Y. M. Shtarkov, and T. J. Tjalkens. Reflections on the prize paper: The context-tree weighting method: Basic properties. IEEE Information Theory Society Newsletter, pages 20–27, 1997.

Gain

  • getting a deep understanding of state-of-the-art compression algorithms
  • acquiring experience in proving theorems
  • writing a small workshop or conference paper
  • improving your math skills
  • ideal preparation for the Human Knowledge Compression Contest [Hut06]

Keywords

compression; information; arithmetic coding; context tree; probability; statistics; theory; bounds; prediction; model.

Updated:  1 September 2018/Responsible Officer:  Dean, CECS/Page Contact:  CECS Marketing