CSC 499 S09: Code

From CSWiki

Jump to: navigation, search

Front Door

Contents

Coding Guidelines

Repository

Our definitive repository is located at http://www.cs.grinnell.edu/~leguiato/MAP499/ . Each piece of code should be accompanied by a wiki page below that provides information on it and links directly to it. The above directory should be editable by the research group.

When adding code (or any other file) to the repository, please remember to change the group to "compressors" and modify permissions to allow the group to make changes. The following appears to accomplish this:

chown [user]:compressors [filename]
chmod 775 [filename]

If adding an entire directory, remember the -R (recursive) flag.

It might be worth considering using an actual repository given the man-hours we're putting in to this.

Errors

In the vein of reusing code, it would be good if we created a master errors file that all of our programs that had to interact with one another imported.

Header Files

Since we like code reuse, it is a good idea to give everything an interface to be used by other programs we write. Unless there is no chance of reusing your code, please create a header file with all publicly accessible functions. In that case, put the main description of the program in the header file, but put the author, copyright and date info in all files. Functions with prototypes in the header file should have descriptions in the header only, functions only referenced in the .c file should be described there.

Comments

Each code file should start with comments that look like this:

/* Tony Leguia, Ian Bone
 * CSC499 (Mentored Advanced Project): Data Compression
 * Advisor: Dr. John Stone
 * Grinnell College, DD Month 2009
 * Copyright 2009.
 */

/* This is a description of what the code contained in
 * this file does
 */

Ideally, each function will also feature comments explaining its intended purpose

Licensing

It was Ian's intention that code would be open sourced. Given the ever-changing nature of open source licenses, he currently suggests choosing a satisfactory license when preparing the code for release, and at that point adding a license comment to the beginning of every file.

The Code

Here should be a link to every code file we've done, to keep them all accessible-like. If you move something in the repository, please update its link here.

Bitwriter

Utility for packing encoded bits into chars and outputting them. Has been tested with test harness test_bitwriter.c. http://www.cs.grinnell.edu/~leguito/MAP499/bitwriter.c http://www.cs.grinnell.edu/~leguito/MAP499/bitwriter.h

Linked List

A fully tested implementation of linked list and stack functions for integers, tests are test_linked_list.c. Could be refactored to work with void* with minimal effort, but integers make more sense to be used in queues to be used in BFS to be used in tries with potentially millions of values. http://www.cs.grinnell.edu/~leguiato/MAP499/linked_list.h http://www.cs.grinnell.edu/~leguiato/MAP499/linked_list.c

Priority Queue

A tested implementation of priority queues. http://www.cs.grinnell.edu/~leguiato/MAP499/pqueue.h http://www.cs.grinnell.edu/~leguiato/MAP499/pqueue.c

Tries

An attempt to implement trie data structure. http://www.cs.grinnell.edu/~leguiato/MAP499/tries.h http://www.cs.grinnell.edu/~leguiato/MAP499/tries.c

Hastable

hashtable implementation, working but lacks some desired features. Size of hastable is currently static. http://www.cs.grinnell.edu/~leguiato/MAP499/hashtable.c http://www.cs.grinnell.edu/~leguiato/MAP499/hashtable.h

sshzlib

Putty's implementiation of zlib, which uses a hybrid LZ77/Huffman approach to compress data. Commonly known as DEFLATE. Very readable, freely redistributable and hackable code. http://www.cs.grinnell.edu/~leguiato/MAP499/sshzlib.c

Huffman compressor

A Huffman compression algorithm implemented in C. http://www.cs.grinnell.edu/~leguiato/MAP499/huffman.c

char count

Reads a file and provides frequencies of each byte found in the file. http://www.cs.grinnell.edu/~leguiato/MAP499/char_count.c

N gram counter

While exploring dictionary compression methods we thought it would be good to do some analyses on the n grams of enwik8, to see if we could tailor make a dictionary to it. This culminated with the creation of the following python program along with the data and run times. counter.py goes through a file and finds all n grams up to a user given input. http://www.cs.grinnell.edu/~leguiato/MAP499/counter.py

Times for finding ngrams of various sizes for enwik8 at: http://www.cs.grinnell.edu/~leguiato/MAP499/ngram_times.txt

Files containing ngrams at: http://www.cs.grinnell.edu/~leguiato/MAP499/dir_ngram_files

utility

utility.c contains useful functions that have a broad range of applications and uses. Examples include printing all the bits in a char and a string generation function. http://www.cs.grinnell.edu/~leguiato/MAP499/utility.h http://www.cs.grinnell.edu/~leguiato/MAP499/utility.c

Personal tools