Data compression and compressed data structures
Lecturer: Giovanni Manzini, Computer Science Dept. University of Pisa.
Tentative lesson plan:
 Statement of the lossless data compression problem. Kraft inequality. Order0 Entropy. Optimal codes, Huffman codes and arithmetic coding (22/11).
 Beyond order0 Entropy: MTF + gamma codes. Orderk entropy. PPM algorithms (24/11).
 LZ77 parsing: definition, bounds in terms of the orderk entropy. Introduction to the BurrowsWheeler transform (27/11).
 Compression bounds for the BWT. Introduction to compressed indices (29/11).
 Wavelet Trees for efficient rank operations. Bitvectors supporting Rank/Select operations (4/12).
 From compressed bitvectors to compressed indices. Finding the position of the occurrences. Introduction to Wheeler graphs (6/12)
 (13/12)
 (15/12)
All lectures are 9am11am in Sala Seminari EST (Room 351). There will be no streaming or recording of the lectures.
There are two possible exam formats:

Students can present a problem/idea related to their research and show how it can take advantage of the techniques described in the course; this must include a working prototype of the suggested solution.

Students can present a related to the course using materials provided by the instructor.

All the exams should be given no later than 31/5/2024.
Study material

Book: Cover, Thomas. Elements of Information Theory, Chapter 5 covers Kraft inequality and Huffman Coding.

Paper: LZ parsing and entropy. Indepth analysis of LZ77 and LZ78 algorithms, a little bit technical.
Possible exam material
Some topics can be studied by two people that will later do a joint presentation. Regardless of the topic, each student should do a 30 minute presentation describing: 1) the problem, 2) the previous state of the art, 3) the content of the new contribution. Tentative list of topics (paper links will be added soon):
 Asymmetric Numeral Systems
 Suffix Array Construction
 Parametrized pattern matching
 Wavelet matrix
 Tunneling on BWTs and Wheeler Graphs
 Range Minumum and related Queries (2 people)
 LZindex (2 people)
 Grammar compression and indexing (2 people)