CSC 499 S09: Compression Techniques & Rates

From CSWiki

Jump to: navigation, search

Front Door

Contents

Compression Techniques & Rates

Various compression techniques we've examined, as well as their expected or actual compression rates for enwik8.

paq8hp12

The current Hutter Prize winner. 1.319 BPC 16,481,655 bytes

bzip2

A realization of the Burrows-Wheeler transform followed by a Huffman encoding. 2.321 BPC 29,008,758 bytes

paq1

Compiled and run on the Noyce workstation:
enwik8: 22156953/100000000 = 1.7726 bpc (22.16%) in 1172.98 sec
22156982/100000000 = 1.7726 bpc (22.16%) in 1172.98 sec

real 19m33.507s
user 19m32.541s
sys 0m0.736s

paq8l

The latest paq written by Matt Mahoney. Compiled on noyce with:

g++ paq8l.cpp -DNOASM -DUNIX -O2 -Os -s -march=pentiumpro -fomit-frame-pointer -o paq8l

We can expect time to improve (if I'm reading this correctly) by not using -DNOASM, but that requires the nasm command.

Memory levels (straight from code):

"level: -0 = store, -1 -2 -3 = faster (uses 35, 48, 59 MB)\n"
 "-4 -5 -6 -7 -8 = smaller (uses 133, 233, 435, 837, 1643 MB)\n"

Results
Using default memory level (5). 9 crashed the system.

noyce$ time ./paq8l enwik8
Creating archive enwik8.paq8l with 1 file(s)...
enwik8 100000000 -> 18961042
100000000 -> 18961072
Time -473.39 sec, used 233735179 bytes of memory

Close this window or press ENTER to continue...

real    872m50.174s
user    564m42.234s
sys     0m4.140s

1.517 BPC


XML Model only

friedman$ time ./paq8l_xml enwik8
Creating archive enwik8.paq8lg0_xml with 1 file(s)...
enwik8 100000000 -> 20556211
100000000 -> 20556247
Time 1970.73 sec, used 121168227 bytes of memory

real    32m51.774s
user    32m49.075s
sys     0m1.676s


Tables at following links describes runtimes and compression rates for paq8 on enwik8. paq8 was compiled using using the following commands:

  nasm -f elf paq7asm.asm
  g++ paq8l.cpp -DUNIX -O2 -Os -s -march=pentiumpro -fomit-frame-pointer -o paq8l paq7asm.o

All benchmarks taken from a current(as of May 2009) MathLAN workstation: Linux 2.6.18-6-686 Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz 2 GB RAM

Results are summarized at:

http://www.cs.grinnell.edu/~leguiato/MAP499/paq8_runtimes_table.pdf

http://spreadsheets.google.com/pub?key=rJ3ZcLK14pyo20ctweURJyQ

Personal tools