For a school assignment we’re supposed to make a Java implementation of a compressor/decompresser using Huffman’s algorithm.
I’ve been reading a bit about it, specially this C++ tutorial: http://www.cprogramming.com/tutorial/computersciencetheory/huffman.html
In my program, I’ve been thinking about having Nodes that have the following properties:
- Total Frequency
- Character (if a leaf)
- Right child (if any)
- Left child (if any)
- Parent (if any)
So when building the Huffman tree, it is just a matter of linking a node to others, etc.
However, I’m a bit confused with the following quote (emphasis mine):
First, every letter starts off as part of its own tree and the trees
are ordered by the frequency of the letters in the original string.
Then the two least-frequently used letters are combined into a single
tree, and the frequency of that tree is set to be the combined
frequency of the two trees that it links together.
My question: why should I create a tree per letter, instead of just a node per letter and then do the linking later?
I have not begun coding, I’m just studying the algorithm first, so I guess I’m missing an important detail. What is it?
The preceding line says: “The basic idea behind the algorithm is to build the tree bottom-up.”
What the author is saying is that you start off by creating a TreeNode
(or whatever you choose to call your nodes) object for each letter and then combine them in order of frequency (lowest first) to build up the final tree.
The detail you’re missing is that you can think of a TreeNode
as itself being a tree.
5