我们为什么需要Huffman树
目前,我们常用的图像、音频、视频等多媒体信息,信息量大,以传统方式存储无疑会占用大量存储空间,因此,我们引入了Huffman编码来解决这个问题。
什么是Huffman编码
Huffman编码是一种变长的编码方案,数据的编码因其使用频率不同而长短不一,使用频率高的数据其编码较短,使用频率低的数据其编码较长,从而使得所有数据的编码总长度最短。
如何实现Huffman编码
考虑实现Huffman编码的结构,我们决定使用二叉树来实现。
这里涉及到一个概念,二叉树的带权外路径长度:树中所有叶子的带权路径长度之和。而Huffman树则是二叉树的带权外路径长度最小的二叉树,也称最优二叉树。
构造Huffman树并得到Huffman编码
这里,我们使用静态三叉链表存储Huffman树。
这是静态三叉链表结点类TriElement
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
|
public class TriElement{ int data; int parent,left,right;
public TriElement(int data, int parent, int left, int right) { this.data = data; this.parent = parent; this.left = left; this.right = right; } public TriElement(int data) { this(data, -1, -1, -1); } public String toString() { return "("+this.data+","+this.parent+","+this.left+","+this.right+")"; } public boolean isLeaf() { return this.left==-1 && this.right==-1; } public boolean equals(Object obj) { return obj==this || obj instanceof TriElement && this.data==((TriElement)obj).data; } }
|
这是Huffman树类HuffmanTree
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152
| public class HuffmanTree{
private String charset; private TriElement[] huftree;
public HuffmanTree(int[] weights) { this.charset = ""; for (int i = 0; i < weights.length; i++) {
if(i <= 25){ this.charset += (char)('A'+i); }else if(i <= 51){ this.charset += (char)('a'+(i - 26)); }else if(i == 52){ this.charset += " "; }
}
int n = weights.length; this.huftree = new TriElement[2*n-1]; for(int i=0; i<n; i++) this.huftree[i] = new TriElement(weights[i]);
for(int i = n; i < 2*n-1; i++) { int min1 = Integer.MAX_VALUE, min2 = min1; int x1=-1, x2=-1; for (int j = 0; j < i; j++) { if (this.huftree[j].parent == -1) if (this.huftree[j].data < min1) { min2 = min1; x2 = x1; min1 = this.huftree[j].data; x1 = j; }else { if (this.huftree[j].data < min2) { min2 = huftree[j].data; x2 = j; } } } this.huftree[x1].parent = i; this.huftree[x2].parent = i; this.huftree[i] = new TriElement(min1+min2, -1, x1, x2); } }
private String getCode(int i) { int n = 59;
char hufcode[] = new char[n]; int child = i, parent = this.huftree[child].parent; for (i = n-1; parent != -1; i--) { hufcode[i] = (huftree[parent].left==child) ? '0' : '1'; child = parent; parent = huftree[child].parent; } return new String(hufcode,i+1,n-1-i); }
public String toString() { String str="Huffman树的结点数组:"; for (int i=0; i < this.huftree.length; i++) str += this.huftree[i].toString()+","; str = str.substring(0, str.length()-1); str += "\nHuffman编码: "; for (int i=0; i<this.charset.length(); i++) str+=this.charset.charAt(i)+":"+getCode(i)+","; return str.substring(0, str.length()-1); }
public String encode(String text) { String compressed = ""; for (int i = 0; i < text.length(); i++) {
if(text.charAt(i)-'A' >= 0 && text.charAt(i)-'A' <= 25){ compressed += getCode(text.charAt(i)-'A'); }else if(text.charAt(i)-'a' >= 0 && text.charAt(i)-'a' <= 25){ compressed += getCode(text.charAt(i)-'a'+26); }else{ compressed += getCode(52); } } return compressed; }
public String decode(String compressed) { String text = ""; int node = this.huftree.length-1; for (int i = 0; i < compressed.length(); i++) { if (compressed.charAt(i) == '0') node = huftree[node].left; else node = huftree[node].right; if (huftree[node].isLeaf()) { text += this.charset.charAt(node); node = this.huftree.length-1; } } return text; } }
|
这是与Huffman树对应的字符映射类AsciiCodeMap,在Huffman树中指定的默认字符集则是来自于这里,此处我只设置了大小写英文字母和空格,实际上也可以继续添加,最好设置为ASCII的所有字符。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
| import java.util.HashMap; import java.util.Map;
public class AsciiCodeMap {
public Map<Character, Integer> asciiCodeMap = new HashMap<Character, Integer>(); public AsciiCodeMap(){ for(int i = 0; i <= 25; i++){ asciiCodeMap.put((char)('A'+(i-26)), i); } for(int i = 26; i <= 51; i++){ asciiCodeMap.put((char)('a'+(i-26)), i); } asciiCodeMap.put(' ', 52); } }
|
这里是测试类
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
| import java.util.Scanner;
public class HuffmanTree_ex { public static void main(String[] args) {
Scanner input = new Scanner(System.in); String text2 = input.nextLine(); AsciiCodeMap a = new AsciiCodeMap();
int[] weight = new int[59]; for(int i = 0; i<weight.length; i++){ weight[i] = 0; } for(int i = 0; i<text2.length(); i++){ int m = a.asciiCodeMap.get(text2.charAt(i)); weight[m]++;
} HuffmanTree huffman2 = new HuffmanTree(weight); System.out.println(huffman2.toString()); String compressed2 = huffman2.encode(text2); System.out.println("将"+text2+"压缩为"+compressed2+","+compressed2.length()+"位"); System.out.println("将"+compressed2+"解码为"+huffman2.decode(compressed2)); input.close(); } }
|
总结
构建Huffman树,需要了解Huffman编码的基本原理:也就是出现次数多的字符的编码尽可能地短,这样就可以实现了我们所说的数据压缩。