Skip to the content.

Data compression and compressed data structures

Lecturers: Giovanni Manzini, Paolo Ferragina. Computer Science Dept. University of Pisa.

Tentative lesson plan:

  1. Statement of the Data compression problem. Main ideas beyond compression techniques: symbol substitution, phrase encoding, data transformations. Kraft inequality (16/5)
  2. Consequences of Kraft Inequality. Order-0 Entropy. Optimal codes, Huffman codes. Oder-k entropy PPM algorithm. LZ77 parsing, definition, bounds in terms of the order-k entropy (17/5)
  3. Burrows-Wheeler transform, relation with k-order entropy. Compression boosting (18/5)
  4. Beyond entropy: others measures of compressibility (25/5)
  5. Introduction to compressed indices and Wheeler Graphs (26/5)
  6. Compressed indices for highly repetitive collections (r-index) (27/5)
  7. Predecessor search: models of computation and various contexts of application. Binary Search. The case of bounded universe: x-fast trie and y-fast trie. The case of uniform key distributions: interpolation search with complexity evaluation. (30/5)
  8. Elias-fano coding with constant time access. Learned data structures (31/5)

All lectures are 9am-11am in Sala Seminari EST (Room 351). Streaming will be available on Google Meet: email me for the link. Lectures will not be recorded.

There are two possible exam formats:

  1. students can present a problem/idea related to their research and show how it can take advantage of the techniques described in the course, possibly including a prototype of the suggested solution

  2. students can present a related topic not covered in the course using materials provided by the instructors

Study material

Students’ presentations

Name Topic Date
Antonio Boffa Compressing ultra-large source code datasets using locality-sensitive hashing 29-07
Cosimo Rulli Neural networks compression 14-07
Daniele Atzeni, Francesco Tosoni Neural networks compression for edge devices 29-07
Federica Baccini Graph Compression 27-07
Lorenzo Catania Compression of genomic datasets via INR 26-07
Domenico Tortola On the compression of blockchain 27-07
Luca Corbucci, Rudy Semola Neural network-based compression 28-07
Alessandro Bocci EEG compression 28-07
Andrea Guerra A heuristic for the packaging of source code in Software Heritage 29-07
Domenico Tortorella