If we note, the frequency of characters a , b , c and d are 4 , 2 , 1 , 1 , respectively. We start by randomly assigning a single bit code 0 to a , 2—bit code 11 to b , and 3—bit code and to characters c and d , respectively. So, the string aabacdab will be encoded to 0 0 11 0 0 11 using the above codes. But the real problem lies in decoding. If we try to decode the string , it will lead to ambiguity as it can be decoded to,.
The prefix rule states that no code is a prefix of another code. By code, we mean the bits used for a particular character. In the above example, 0 is the prefix of , which violates the prefix rule. If our codes satisfy the prefix rule, the decoding will be unambiguous and vice versa. This time we assign codes that satisfy the prefix rule to characters 'a' , 'b' , 'c' , and 'd'. Using the above codes, the string aabacdab will be encoded to 0 0 10 0 0 Now we can uniquely decode back to our original string aabacdab.
The technique works by creating a binary tree of nodes. A node can be either a leaf node or an internal node. Initially, all nodes are leaf nodes, which contain the character itself, the weight frequency of appearance of the character. Internal nodes contain character weight and links to two child nodes. As a common convention, bit 0 represents following the left child, and a bit 1 represents following the right child.
A finished tree has n leaf nodes and n-1 internal nodes. It is recommended that Huffman Tree should discard unused characters in the text to produce the most optimal code lengths.
We will use a priority queue for building Huffman Tree, where the node with the lowest frequency has the highest priority.
Following are the complete steps:. Consider some text consisting of only 'A' , 'B' , 'C' , 'D' , and 'E' characters, and their frequencies are 15 , 7 , 6 , 6 , 5 , respectively. The following figures illustrate the steps followed by the algorithm:. The path from the root to any leaf node stores the optimal prefix code also called Huffman code corresponding to the character associated with that leaf node.
Download Run Code. Output: Huffman Codes are: c h f r t p i g l a o d H The encoded string is: The decoded string is: Huffman coding is a data compression algorithm. The character-encoding induced by the tree can be used to decode a stream of bits as well as encode a string into a stream of bits.
You can try to decode the following bitstream; the answer with an explanation follows:. To decode the stream, start at the root of the encoding tree, and follow a left-branch for a 0, a right branch for a 1. When you reach a leaf, write the character stored at the leaf, and start again at the top of the tree. To start, the bits are This yields left-right-left-right to the letter 'h', followed starting again at the root with left-right-right-left to the letter 'e', followed by left-right-right-right to the letter 'r'.
Continuing until all the bits are processed yields. This makes it possible to decode a bitstream using the coding tree by following root-to-leaf paths. The tree shown above for "go go gophers" is an optimal tree: there are no other trees with the same characters that use fewer bits to encode the string "go go gophers". There are other trees that use 37 bits; for example you can simply swap any sibling nodes and get a different encoding that uses the same number of bits.
This algorithm is called Huffman coding, and was invented by D. Huffman in It is an example of a greedy algorithm.
We'll use Huffman's algorithm to construct a tree that is used for data compression. In the previous section we saw examples of how a stream of bits can be generated from an encoding, e. We also saw how the tree can be used to decode a stream of bits. We'll discuss how to construct the tree here. We'll assume that each character has an associated weight equal to the number of times the character occurs in a file, for example. In the "go go gophers" example, the characters 'g' and 'o' have weight 3, the space has weight 2, and the other characters have weight 1.
When compressing a file we'll need to calculate these weights, we'll ignore this step for now and assume that all character weights have been calculated. Huffman's algorithm assumes that we're building a single tree from a group or forest of trees.
Initially, all the trees have a single node with a character and the character's weight. Trees are combined by picking two trees, and making a new tree from the two trees.
This decreases the number of trees by one at each step since two trees are combined into one tree. The algorithm is as follows:. Begin with a forest of trees. All trees are one node, with the weight of the tree equal to the weight of the character in the node. Characters that occur most frequently have the highest weights. Characters that occur least frequently have the smallest weights.
Repeat this step until there is only one tree:. We'll use the string "go go gophers" as an example. Initially we have the forest shown below. We pick two minimal nodes. There are five nodes with the minimal weight of one, it doesn't matter which two we pick. In a program, the deterministic aspects of the program will dictate which two are chosen, e.
We create a new tree whose root is weighted by the sum of the weights chosen. We now have a forest of seven trees as shown here:. Choosing two minimal trees yields another tree with weight two as shown below. There are now six trees in the forest of trees that will eventually build an encoding tree. Again we must choose the two trees of minimal weight. There are three trees with weight two, we can choose any of these to create a new tree whose weight will be three.
Now there are two trees with weight equal to two. These are joined into a new tree whose weight is four. Build a min heap that contains 6 nodes where each node represents root of a tree with single node. Step 2 Extract two minimum frequency nodes from min heap. Now min heap contains 5 nodes where 4 nodes are roots of trees with single element each, and one heap node is root of tree with 3 elements character Frequency c 12 d 13 Internal Node 14 e 16 f 45 Step 3: Extract two minimum frequency nodes from heap.
Step 5: Extract two minimum frequency nodes. Steps to print codes from Huffman Tree: Traverse the tree formed starting from the root. Maintain an auxiliary array. While moving to the left child, write 0 to the array. While moving to the right child, write 1 to the array. Print the array when a leaf node is encountered. PriorityQueue; import java. Scanner; import java. So, overall complexity is O nlogn. If the input array is sorted, there exists a linear time algorithm.
We will soon be discussing in our next post. Applications of Huffman Coding: They are used for transmitting fax and text. It is useful in cases where there is a series of frequently occurring characters.
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above. Skip to content. Change Language. Related Articles. Table of Contents. Save Article.
0コメント