CSC 499 S09: Compression Techniques & Rates
From CSWiki
Contents |
Compression Techniques & Rates
Various compression techniques we've examined, as well as their expected or actual compression rates for enwik8.
paq8hp12
The current Hutter Prize winner. 1.319 BPC 16,481,655 bytes
bzip2
A realization of the Burrows-Wheeler transform followed by a Huffman encoding. 2.321 BPC 29,008,758 bytes
paq1
Compiled and run on the Noyce workstation:
enwik8: 22156953/100000000 = 1.7726 bpc (22.16%) in 1172.98 sec
22156982/100000000 = 1.7726 bpc (22.16%) in 1172.98 sec
real 19m33.507s
user 19m32.541s
sys 0m0.736s
paq8l
The latest paq written by Matt Mahoney. Compiled on noyce with:
g++ paq8l.cpp -DNOASM -DUNIX -O2 -Os -s -march=pentiumpro -fomit-frame-pointer -o paq8l
We can expect time to improve (if I'm reading this correctly) by not using -DNOASM, but that requires the nasm command.
Memory levels (straight from code):
"level: -0 = store, -1 -2 -3 = faster (uses 35, 48, 59 MB)\n" "-4 -5 -6 -7 -8 = smaller (uses 133, 233, 435, 837, 1643 MB)\n"
Results
Using default memory level (5). 9 crashed the system.
noyce$ time ./paq8l enwik8 Creating archive enwik8.paq8l with 1 file(s)... enwik8 100000000 -> 18961042 100000000 -> 18961072 Time -473.39 sec, used 233735179 bytes of memory Close this window or press ENTER to continue... real 872m50.174s user 564m42.234s sys 0m4.140s
1.517 BPC
XML Model only
friedman$ time ./paq8l_xml enwik8 Creating archive enwik8.paq8lg0_xml with 1 file(s)... enwik8 100000000 -> 20556211 100000000 -> 20556247 Time 1970.73 sec, used 121168227 bytes of memory real 32m51.774s user 32m49.075s sys 0m1.676s
Tables at following links describes runtimes and compression rates for paq8 on enwik8.
paq8 was compiled using using the following commands:
nasm -f elf paq7asm.asm g++ paq8l.cpp -DUNIX -O2 -Os -s -march=pentiumpro -fomit-frame-pointer -o paq8l paq7asm.o
All benchmarks taken from a current(as of May 2009) MathLAN workstation: Linux 2.6.18-6-686 Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz 2 GB RAM
Results are summarized at:
http://www.cs.grinnell.edu/~leguiato/MAP499/paq8_runtimes_table.pdf
http://spreadsheets.google.com/pub?key=rJ3ZcLK14pyo20ctweURJyQ

