This Community site is new! Please help us build a community around the JNIOR.
Sign up, and help share your knowledge. Please sign-up even if you do not plan to post as a sign of support.
If there is evidence of a demand we will continue to develop the content here.
Sign up, and help share your knowledge. Please sign-up even if you do not plan to post as a sign of support.
If there is evidence of a demand we will continue to develop the content here.
DEFLATE Compression Algorithm
-
- Posts: 399
- Joined: Thu Sep 14, 2017 12:55 pm
Re: DEFLATE Compression Algorithm
Adjust the Huffman Tree Accordingly
We have created the optimum Huffman table for coding our data. Unfortunately we find out that it is not compatible for use with DEFLATE. The DEFLATE specification dictates that this particular table have a code length maximum of 7 bits. That being forced by the fact that these code lengths are stored in the DEFLATE stream using 3 bits each. That limits code lengths for this table to the set containing 0 thru 7.
Our table is too deep. It requires 8 bits to encode our data. What do we do about that? It seems that anything we do will reduce the efficiency of the compression. We need to create a less than optimum Huffman coding. How do we do that and keep the impact at a minimum? How do we not seriously damage the compression ratio?
We are going to adjust our Huffman table so that it does not exceed the maximum bit depth. Ideally we want to stop incrementing the code length of an symbols that reach the maximum. How can we properly do that? Well, we have to go back to the math. Let's understand what makes a certain set of code lengths valid while others are not?
In the prior post we assigned code lengths to symbols in our alphabet based upon a tree construction algorithm. The steps in this algorithm are those that create a valid tree. It is not surprising then that our set of code lengths represent a real tree. As a result when we calculate the starting codes for each bit length the process ends up with usable codes. And as we noticed the last code in the tree is in fact 2**N - 1. In our case this is 2**8 -1 or 255 and in binary that being 11111111b.
Let me expand the loop in the starting code generation so we can see what happens. Here we will generate starting codes (S0..S8) for our code lengths. We will use a shorthand for the code length counts (N0..N8). In our example those are N = {0 0 3 0 1 3 5 1 2}. Note too that a left shift of 1 is equivalent to multiplication by 2. The starting codes (S) are calculated as follows:
There are two things to notice here other than the fact that hist matches the table generated earlier. First, the count of 8 bit code lengths (N8) doesn't come into play. Yet we know that there are 2 and the first will be assigned 11111110b and the second 11111111b. This being the 2**8 -1 that we now expect. The second thing is that all of the starting codes are even numbers. That being driven by the fact that they are the product of multiplication by 2. We will use this fact later.
Now I can reverse this procedure to generate starting codes back from the maximum bit length knowing that the last code must be 2**N - 1. So for this table we get the following:
This is just a matter of running the calculations backwards and knowing (or realizing) that the last code has to be 2**N - 1. You can see it generates the same results.
Now what if we decide to not accept a bit depth exceeding the maximum? So we are going to force those two symbols wanting to be 8 bit codes to be two additional 7 bit codes. So our code length array will look like this: N = {0 0 3 0 1 3 5 3}. Here we combined S7 and S8 and eliminated 8 bit altogether. Legal? Of course not. You can't visualize what that does to a tree. let's try the calculations back from the new maximum of 2**7 - 1.
Here this fails immediately. We know that starting codes (S) must be even numbers and 125 is odd! Not surprising as we are kind of floating a pair of leaves up in the air somehow. Can we make a home for them?
Clearly if we were to reattach those leaves somewhere else in the tree structure other nodes must be increased in bit depth. We need another 7 bit symbol to get S7 to be an even 124. To do that with minimum impact on compression ratio we increase the 6 bit coded symbol with the lowest frequency to 7 bits. Our array now being N = {0 0 3 0 1 3 4 4}. Try again:
Um. Everything seemed to fit right in. Is this tree valid? Let's see.
This appears to have successfully generated a Huffman tree that would work with DEFLATE format! Let's look at the compression ratio for this.
Wait! This only cost us 1 bit? Yes it did but to store it we would need another whole byte. So the impact of this procedure is likely (though not proven) to have a minimum impact on compression ratio. Yet, it corrects the table to insure that it is compatible with DEFLATE.
To generalize the process, the reversed starting code calculation is repeated from 2**N - 1 when N is the maximum bit depth back to 0 for bit length 0. If at any point the calculated starting code is not even, you must set the bit depth for the next least frequent symbol to include it at this code length and make the starting code even.
In my next post I will show code for this.
We have created the optimum Huffman table for coding our data. Unfortunately we find out that it is not compatible for use with DEFLATE. The DEFLATE specification dictates that this particular table have a code length maximum of 7 bits. That being forced by the fact that these code lengths are stored in the DEFLATE stream using 3 bits each. That limits code lengths for this table to the set containing 0 thru 7.
Our table is too deep. It requires 8 bits to encode our data. What do we do about that? It seems that anything we do will reduce the efficiency of the compression. We need to create a less than optimum Huffman coding. How do we do that and keep the impact at a minimum? How do we not seriously damage the compression ratio?
We are going to adjust our Huffman table so that it does not exceed the maximum bit depth. Ideally we want to stop incrementing the code length of an symbols that reach the maximum. How can we properly do that? Well, we have to go back to the math. Let's understand what makes a certain set of code lengths valid while others are not?
In the prior post we assigned code lengths to symbols in our alphabet based upon a tree construction algorithm. The steps in this algorithm are those that create a valid tree. It is not surprising then that our set of code lengths represent a real tree. As a result when we calculate the starting codes for each bit length the process ends up with usable codes. And as we noticed the last code in the tree is in fact 2**N - 1. In our case this is 2**8 -1 or 255 and in binary that being 11111111b.
Let me expand the loop in the starting code generation so we can see what happens. Here we will generate starting codes (S0..S8) for our code lengths. We will use a shorthand for the code length counts (N0..N8). In our example those are N = {0 0 3 0 1 3 5 1 2}. Note too that a left shift of 1 is equivalent to multiplication by 2. The starting codes (S) are calculated as follows:
There are two things to notice here other than the fact that hist matches the table generated earlier. First, the count of 8 bit code lengths (N8) doesn't come into play. Yet we know that there are 2 and the first will be assigned 11111110b and the second 11111111b. This being the 2**8 -1 that we now expect. The second thing is that all of the starting codes are even numbers. That being driven by the fact that they are the product of multiplication by 2. We will use this fact later.
Now I can reverse this procedure to generate starting codes back from the maximum bit length knowing that the last code must be 2**N - 1. So for this table we get the following:
This is just a matter of running the calculations backwards and knowing (or realizing) that the last code has to be 2**N - 1. You can see it generates the same results.
Now what if we decide to not accept a bit depth exceeding the maximum? So we are going to force those two symbols wanting to be 8 bit codes to be two additional 7 bit codes. So our code length array will look like this: N = {0 0 3 0 1 3 5 3}. Here we combined S7 and S8 and eliminated 8 bit altogether. Legal? Of course not. You can't visualize what that does to a tree. let's try the calculations back from the new maximum of 2**7 - 1.
Here this fails immediately. We know that starting codes (S) must be even numbers and 125 is odd! Not surprising as we are kind of floating a pair of leaves up in the air somehow. Can we make a home for them?
Clearly if we were to reattach those leaves somewhere else in the tree structure other nodes must be increased in bit depth. We need another 7 bit symbol to get S7 to be an even 124. To do that with minimum impact on compression ratio we increase the 6 bit coded symbol with the lowest frequency to 7 bits. Our array now being N = {0 0 3 0 1 3 4 4}. Try again:
Um. Everything seemed to fit right in. Is this tree valid? Let's see.
This appears to have successfully generated a Huffman tree that would work with DEFLATE format! Let's look at the compression ratio for this.
Wait! This only cost us 1 bit? Yes it did but to store it we would need another whole byte. So the impact of this procedure is likely (though not proven) to have a minimum impact on compression ratio. Yet, it corrects the table to insure that it is compatible with DEFLATE.
To generalize the process, the reversed starting code calculation is repeated from 2**N - 1 when N is the maximum bit depth back to 0 for bit length 0. If at any point the calculated starting code is not even, you must set the bit depth for the next least frequent symbol to include it at this code length and make the starting code even.
In my next post I will show code for this.
-
- Posts: 399
- Joined: Thu Sep 14, 2017 12:55 pm
Re: DEFLATE Compression Algorithm
The complete procedure for generating a DEFLATE format compatible Huffman table limited to a maximum bit depth is shown here. I know that this is not optimized code. There is some unnecessary execution but I had wanted to keep steps separate and clear. You can be sure that over time I will optimize the coding and obfuscate it suitably for all future generations.
This routine has the capacity to return FALSE if a table cannot be created. It was doing just that when the bit depth (code length) exceeded maximum. That has since been corrected. It will always return TRUE now.
Here are the steps that it performs:

This routine has the capacity to return FALSE if a table cannot be created. It was doing just that when the bit depth (code length) exceeded maximum. That has since been corrected. It will always return TRUE now.
// Establish bit length for the Huffman table based upon frequencies static int _bitlength(struct huff_t *tbl, int ncodes, int maxb) { uint16_t *list = MM_alloc(ncodes * sizeof(uint16_t), &_bitlength); int *freq = MM_alloc(ncodes * sizeof(int), &_bitlength); int nlist = 0; int n, c, p; int ret = TRUE; uint16_t *ptr; // List all of the symbols used in the data along with their frequencies. Note that // we store pointers +1 so as to keep 0 as a linked list terminator. for (n = 0; n < ncodes; n++) if (tbl[n].cnt > 0) { list[nlist] = n + 1; freq[nlist] = tbl[n].cnt; nlist++; } // Note that there is a special boundary case when only 1 code is used. In this case // the single code is encoded using 1 bit and not 0. if (nlist == 1) tbl[list[0] - 1].len = 1; // process this list down to a single node while (nlist > 1) { // sort the list by decreasing frequency for (n = 0; n < nlist - 1; n++) if (freq[n] < freq[n + 1]) { // swap order c = list[n]; list[n] = list[n + 1]; list[n + 1] = c; c = freq[n]; freq[n] = freq[n + 1]; freq[n + 1] = c; // need to sort back if (n > 0) n -= 2; } // Combine the member lists associated with the last two entries. We combine the // linked lists for the two low frequency nodes. p = list[nlist - 2]; while (tbl[p - 1].link) p = tbl[p - 1].link; tbl[p - 1].link = list[nlist - 1]; // The new node has the combined frequency. freq[nlist - 2] += freq[nlist - 1]; nlist--; // Sort the members of this node by decreasing code length. Longer codes to the // left. This will also sort the frequency of the symbols in increasing order // when code lengths are equal. We need this arrangement for the next step should // we be required to balance the tree and avoid exceeding the maximum code // length (maxb). p = TRUE; while (p) { p = FALSE; ptr = &list[nlist - 1]; while (*ptr && tbl[*ptr - 1].link) { c = tbl[*ptr - 1].link; if ((tbl[*ptr - 1].len < tbl[c - 1].len) || (tbl[*ptr - 1].len == tbl[c - 1].len && tbl[*ptr - 1].cnt > tbl[c - 1].cnt)) { n = tbl[c - 1].link; tbl[*ptr - 1].link = n; tbl[c - 1].link = *ptr; *ptr = c; p = TRUE; } ptr = &tbl[*ptr - 1].link; } } // Increase the code length for members of this node. We cannot exceed the maximum // code length (maxb). p = list[nlist - 1]; while (p) { if (tbl[p - 1].len < maxb) tbl[p - 1].len++; p = tbl[p - 1].link; } // Now verify the structure. This should be absolutely proper up until the point when // we prevent code lengths from exceeding the maximum. Once we do that we are likely // creating an impossible tree. We will need to correct that. p = list[nlist - 1]; c = tbl[p - 1].len; if (c == maxb) { n = 1 << c; while (p) { if (tbl[p - 1].len == c) n--; else { // n must be even at this point or we extend the length group if (n & 1) { tbl[p - 1].len = c; n--; } else { c--; n /= 2; } } p = tbl[p - 1].link; } } } MM_free(freq); MM_free(list); return (ret); }
Here are the steps that it performs:
- Create an array for all used alphabet symbols (those with non-zero frequency).
- Check for the special case where there is only one symbol. In that case we use a 1 bit code where one is unused.
- Sort the list array by decreasing frequency. The least frequent symbols are then at the end of the list.
- Combine the rightmost two least frequent symbols or nodes into a single new node having the combined frequency. All members of the two combined nodes become members of the new node.
- Sort the member list for the new node by decreasing bit depth (current code length) and secondly by increasing frequency.
- Increase the bit depth for all members of the new node by 1. If a symbol will exceed the maximum bit depth do not increment it.
- If we have reached the maximum bit depth then confirm the tree structure using the reverse starting code length calculations. Elevate the next least frequent symbol to the current bit depth if the calculated starting code is not an even number. Check all code lengths.
- If there is more than one entry in the list array (not down to one node yet) then repeat at Step #3.
-
- Posts: 399
- Joined: Thu Sep 14, 2017 12:55 pm
Re: DEFLATE Compression Algorithm
So to review....
First, we covered how to perform a reasonably fast version of LZ77. This filling a 64 KB buffer with compressed data which contains literals and length-distance codes.
When we need to we will flush the 64 KB buffer and generate a block of DEFLATE format. At that point we call a routine that first analyzes the data for code frequencies. There are to be two alphabets. One for literals and length codes. The other for distance codes.
Now we've developed a method for generating DEFLATE compatible Huffman tables for our two alphabets. The bulk of the complexity is now behind us but there is still some work to do before we can start generate our compressed bit stream.
Moving on...
Now we need the size of our literal alphabet (HLIT) and our distance alphabet (HDIST) since it is unlikely the we have utilized all literal symbols (0..285) or distance codes (0..29). So here we scan each to trim unused symbols.
So we have the two alphabets. One for literals and length codes (0..HLIT) and one for distance codes (0..HDIST). We've seen that all we need are the code lengths for each symbol set to define the associated Huffman tree. We need to convey this information to the decompressor. The DEFLATE format combines the code length arrays for the two alphabets now into one long array of HLIT + HDIST code lengths.
Now we know that the end-of-block code (256) is used as well as at least a handful of length codes and distance codes. So this array of code lengths itself is fairly long. It will be somewhere between 260 and 315 entries. Each code length is is in the set 0..15. So typically there is a lot of repetition. There can be a large number of 0's in a row. Consider data that is constrained to 7-bit ASCII. In that case there are 128 literal codes that never occur and would have a code length of 0.
The DEFLATE specification defines a kind of run-length encoding for this array. This encodes sequences of code lengths using three additional codes.
So as you can see we are heading towards our third Huffman table. This one with an alphabet of 19 codes (0..18). That's why the symbol set I used for the example in handling maximum code length has 19 members. I used one of these tables as an example.
In my next post I will show the procedure I use for applying the repeat codes in this alphabet.
First, we covered how to perform a reasonably fast version of LZ77. This filling a 64 KB buffer with compressed data which contains literals and length-distance codes.
When we need to we will flush the 64 KB buffer and generate a block of DEFLATE format. At that point we call a routine that first analyzes the data for code frequencies. There are to be two alphabets. One for literals and length codes. The other for distance codes.
Now we've developed a method for generating DEFLATE compatible Huffman tables for our two alphabets. The bulk of the complexity is now behind us but there is still some work to do before we can start generate our compressed bit stream.
Moving on...
Now we need the size of our literal alphabet (HLIT) and our distance alphabet (HDIST) since it is unlikely the we have utilized all literal symbols (0..285) or distance codes (0..29). So here we scan each to trim unused symbols.
Code: Select all
// Now we combine the bit length arrays into a single array for run-length-like repeat encoding.
// In DEFLATE this encoding can overlap for the literal table to the distance code table as
// if a single array. First determine the alphabet size for each table.
for (hlit = 286; 0 < hlit; hlit--)
if (littbl[hlit - 1].len > 0)
break;
for (hdist = 30; 0 < hdist; hdist--)
if (dsttbl[hdist - 1].len > 0)
break;
Code: Select all
// Now create the bit length array
cntlist = MM_alloc((hlit + hdist) * sizeof(int), &_bufflush);
for (n = 0; n < hlit; n++)
cntlist[n] = littbl[n].len;
for (n = 0; n < hdist; n++)
cntlist[hlit + n] = dsttbl[n].len;
The DEFLATE specification defines a kind of run-length encoding for this array. This encodes sequences of code lengths using three additional codes.
So as you can see we are heading towards our third Huffman table. This one with an alphabet of 19 codes (0..18). That's why the symbol set I used for the example in handling maximum code length has 19 members. I used one of these tables as an example.
In my next post I will show the procedure I use for applying the repeat codes in this alphabet.
-
- Posts: 399
- Joined: Thu Sep 14, 2017 12:55 pm
Re: DEFLATE Compression Algorithm
The current version of
We determine HLIT and HDIST for this and combine the code lengths into one HLIT + HDIST length array.
So this array has 307 entries and you can see that there is a lot of repetition. We next apply the repeat codes as appropriate to shorten this array to 139 entries.
Here when one of the repeat codes are used I show the value of the extra bits we will insert following the Huffman code for the symbol.
Now we create the Huffman table for this array and determine frequencies. Note that extra bits are inserted later and are not part of the Huffman encoding. Here the maximum code length is 7 bits.
This results in the following third Huffman table that we will use to code the code lengths which in turn are used to generate the literal and distance tables for the data compression.
Are we there yet?
Well, we are getting close.
jniorsys.log
on my development JNIOR compresses into a single block using the following literal and distance tables.Code: Select all
littbl
0x00a 2 11 11111101110
0x00d 2 12 111111111000
0x020 93 6 101000
0x027 4 11 11111101111
0x028 14 9 111101010
0x029 3 11 11111110000
0x02a 2 12 111111111001
0x02b 9 10 1111101000
0x02c 9 10 1111101001
0x02d 23 8 11101100
0x02e 158 5 01100
0x02f 18 9 111101011
0x030 120 6 101001
0x031 182 5 01101
0x032 147 5 01110
0x033 150 5 01111
0x034 127 6 101010
0x035 132 6 101011
0x036 113 6 101100
0x037 110 6 101101
0x038 113 6 101110
0x039 106 6 101111
0x03a 91 6 110000
0x03c 2 12 111111111010
0x03d 3 11 11111110001
0x03e 3 11 11111110010
0x041 6 10 1111101010
0x043 5 11 11111110011
0x044 1 13 1111111111100
0x046 2 12 111111111011
0x048 1 13 1111111111101
0x04a 3 11 11111110100
0x04c 3 11 11111110101
0x04d 5 11 11111110110
0x04e 4 11 11111110111
0x04f 5 10 1111101011
0x050 11 9 111101100
0x052 10 9 111101101
0x053 6 10 1111101100
0x054 5 10 1111101101
0x055 2 12 111111111100
0x057 3 11 11111111000
0x05a 1 13 1111111111110
0x05f 5 10 1111101110
0x061 44 7 1101100
0x062 7 10 1111101111
0x063 19 9 111101110
0x064 30 8 11101101
0x065 52 7 1101101
0x066 14 9 111101111
0x067 13 9 111110000
0x068 9 10 1111110000
0x069 45 7 1101110
0x06a 6 10 1111110001
0x06b 3 11 11111111001
0x06c 29 8 11101110
0x06d 21 8 11101111
0x06e 46 7 1101111
0x06f 51 7 1110000
0x070 31 8 11110000
0x071 2 12 111111111101
0x072 37 8 11110001
0x073 27 8 11110010
0x074 40 7 1110001
0x075 16 9 111110001
0x076 6 10 1111110010
0x077 8 10 1111110011
0x078 4 11 11111111010
0x079 9 10 1111110100
0x07a 4 11 11111111011
0x100 1 13 1111111111111
0x101 1248 2 00
0x102 580 4 0100
0x103 168 5 10000
0x104 160 5 10001
0x105 72 7 1110010
0x106 119 6 110001
0x107 82 6 110010
0x108 17 9 111110010
0x109 116 6 110011
0x10a 253 5 10010
0x10b 120 6 110100
0x10c 52 7 1110011
0x10d 69 7 1110100
0x10e 79 6 110101
0x10f 20 8 11110011
0x110 54 7 1110101
0x111 236 5 10011
0x112 371 4 0101
0x113 34 8 11110100
0x114 6 10 1111110101
0x115 17 9 111110011
0x116 6 10 1111110110
dsttbl
0x002 1 10 1111111100
0x003 2 10 1111111101
0x008 2 10 1111111110
0x009 2 10 1111111111
0x00a 14 8 11111110
0x00b 204 4 0100
0x00c 225 4 0101
0x00d 46 7 1111110
0x00e 168 5 11100
0x00f 85 6 111110
0x010 202 4 0110
0x011 130 5 11101
0x012 193 4 0111
0x013 146 5 11110
0x014 259 4 1000
0x015 209 4 1001
0x016 301 4 1010
0x017 255 4 1011
0x018 377 3 000
0x019 293 4 1100
0x01a 433 3 001
0x01b 332 4 1101
So this array has 307 entries and you can see that there is a lot of repetition. We next apply the repeat codes as appropriate to shorten this array to 139 entries.
Here when one of the repeat codes are used I show the value of the extra bits we will insert following the Huffman code for the symbol.
Now we create the Huffman table for this array and determine frequencies. Note that extra bits are inserted later and are not part of the Huffman encoding. Here the maximum code length is 7 bits.
Code: Select all
// Ugh. Now we need yet another Huffman table for this run-length alphabet. First we establish
// the frequencies. Note that we skip the byte defining the extra bits.
cnttbl = MM_alloc(HUFF_HLEN * sizeof(struct huff_t), &_bufflush);
for (n = 0; n < ncnt; n++)
{
cnttbl[cntlist[n]].cnt++;
if (cntlist[n] >= 16)
n++;
}
// We need to determine the bit lengths.
if (!_bitlength(cnttbl, HUFF_HLEN, 7))
{
err = TRUE;
break;
}
Are we there yet?

Well, we are getting close.
-
- Posts: 399
- Joined: Thu Sep 14, 2017 12:55 pm
Re: DEFLATE Compression Algorithm
As before we will need to convey the code lengths in this 'cnttbl' to the decompressor. We have N = {3 0 6 6 4 3 4 4 4 4 3 3 4 5 0 0 6 5 6}. I mentioned earlier that these are stored each with 3 bits in the bit stream limiting the code length to 7 bits.
But DEFLATE wants to save every possible byte. Each of these code lengths has a probability of being used over some general set of data types. They decided to sequence these into an array i a custom order such that the least likely to be used code lengths fall at the end and can be trimmed. So we get the order from the specification and sequence these. Notice that our three 0 bit code length do in fact get trimmed.
It is hard to believe but we now have everything that we need to actually generate the compressed bit stream. Well, all except a couple of routines to actually do the serialization. So that will be the next step.
But DEFLATE wants to save every possible byte. Each of these code lengths has a probability of being used over some general set of data types. They decided to sequence these into an array i a custom order such that the least likely to be used code lengths fall at the end and can be trimmed. So we get the order from the specification and sequence these. Notice that our three 0 bit code length do in fact get trimmed.
Code: Select all
// Finally (haha) we establish a custom order for these bit lengths
// array defined above
//static const char hclen_order[HUFF_HLEN] = {
// 16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2,
// 14, 1, 15
//};
bitlens = MM_alloc(HUFF_HLEN * sizeof(int), &_bufflush);
for (n = 0; n < HUFF_HLEN; n++)
bitlens[n] = cnttbl[hclen_order[n]].len;
// Now the the end of this array should be 0's so we find a length for the array
for (hclen = HUFF_HLEN; 0 < hclen; hclen--)
if (bitlens[hclen - 1] > 0)
break;
-
- Posts: 399
- Joined: Thu Sep 14, 2017 12:55 pm
Re: DEFLATE Compression Algorithm
We're going to output to a bit stream. Each write will likely involve a different number of bits. Someplace we have to properly sequence these into bytes to append to the output buffer.
To do this we need one routine to write called
So to tie this together we use a structure.
This seems simple enough but there are a couple things to understand. The DEFLATE specification gets into it right off the bat.
The first bit of the bit stream is the least significant bit of the first byte in the buffer. Once 8 bits are retrieved the 9th is the least significant bit of the second byte and so on.
A value appears in the stream starting with it's least significant bit. That means that the bit order does not need to be reversed to pack the value at the tail of the current bit stream.
Huffman codes are processed a bit at a time. When you are reading a Huffman code you do not know how many bits you will need to retrieve the code for a valid symbol in the alphabet. So in this case you must insert the Huffman code so the most-significant bit is read first. The order of Huffman bits needs to be reversed. Armed with that fact, I have stored these codes in the tables in reverse order. That will be apparent in the following code to generate the tables for coding.
So the last thing we need to do is generate the actual Huffman codes for the three tables.
Coming up next: Actually Generating the Bit Stream
To do this we need one routine to write called
_writeb()
and one to use at the end to flush any remaining unwritten bits called _flushb()
.Code: Select all
// Routine to stream bits
static void _writeb(int num, int code, struct bitstream_t *stream)
{
// if no bits then nothing to do
if (num == 0)
return;
// insert bits into the stream
code &= ((1 << num) - 1);
code <<= stream->nbits;
stream->reg |= code;
stream->nbits += num;
// stream completed bytes
while (stream->nbits >= 8)
{
stream->buffer[stream->length++] = (stream->reg & 0xff);
stream->reg >>= 8;
stream->nbits -= 8;
}
}
// Routine flushes the bitstream
static void _flushb(struct bitstream_t *stream)
{
if (stream->nbits)
{
stream->buffer[stream->length++] = (stream->reg & 0xff);
stream->reg = 0;
stream->nbits = 0;
}
}
This seems simple enough but there are a couple things to understand. The DEFLATE specification gets into it right off the bat.
The first bit of the bit stream is the least significant bit of the first byte in the buffer. Once 8 bits are retrieved the 9th is the least significant bit of the second byte and so on.
A value appears in the stream starting with it's least significant bit. That means that the bit order does not need to be reversed to pack the value at the tail of the current bit stream.
Huffman codes are processed a bit at a time. When you are reading a Huffman code you do not know how many bits you will need to retrieve the code for a valid symbol in the alphabet. So in this case you must insert the Huffman code so the most-significant bit is read first. The order of Huffman bits needs to be reversed. Armed with that fact, I have stored these codes in the tables in reverse order. That will be apparent in the following code to generate the tables for coding.
So the last thing we need to do is generate the actual Huffman codes for the three tables.
Code: Select all
// Now we need Huffman codes for these tables because eventually someday we will actually be
// generating a compressed bit stream.
// Next we total the number of symbols using each bit length. These will be used to assign
// bit codes for each alphabet symbol.
litcnts = MM_alloc(16 * sizeof(int), &_bufflush);
for (n = 0; n < 286; n++)
if (littbl[n].len > 0)
litcnts[littbl[n].len]++;
dstcnts = MM_alloc(16 * sizeof(int), &_bufflush);
for (n = 0; n < 30; n++)
if (dsttbl[n].len > 0)
dstcnts[dsttbl[n].len]++;
bitcnts = MM_alloc(16 * sizeof(int), &_bufflush);
for (n = 0; n < HUFF_HLEN; n++)
if (cnttbl[n].len)
bitcnts[cnttbl[n].len]++;
// Now we calculate starting codes for each bit length group. This procedure is defined in the
// DEFLATE specification. We can define the Huffman tables in a compressed format provided that
// the Huffman tables follow a couple of additional rules. Using these starting codes we
// can assing codes for each alphabet symbol. Note that Huffman codes are processed bit-by-bit
// and therefore must be generated here in reverse bit order.
// Define codes for the literal Huffman table
startcd = MM_alloc(16 * sizeof(int), &_bufflush);
for (n = 0; n < 15; n++)
startcd[n + 1] = (startcd[n] + litcnts[n]) << 1;
for (n = 0; n < 286; n++)
{
len = littbl[n].len;
if (len)
{
c = startcd[len]++;
while (len--)
{
littbl[n].code <<= 1;
littbl[n].code |= (c & 1);
c >>= 1;
}
}
}
// Define codes for the distance Huffman table
for (n = 0; n < 15; n++)
startcd[n + 1] = (startcd[n] + dstcnts[n]) << 1;
for (n = 0; n < 30; n++)
{
len = dsttbl[n].len;
if (len)
{
c = startcd[len]++;
while (len--)
{
dsttbl[n].code <<= 1;
dsttbl[n].code |= (c & 1);
c >>= 1;
}
}
}
// Define codes for the bit length Huffman table
for (n = 0; n < 15; n++)
startcd[n + 1] = (startcd[n] + bitcnts[n]) << 1;
for (n = 0; n < HUFF_HLEN; n++)
{
len = cnttbl[n].len;
if (len)
{
c = startcd[len]++;
while (len--)
{
cnttbl[n].code <<= 1;
cnttbl[n].code |= (c & 1);
c >>= 1;
}
}
}
-
- Posts: 399
- Joined: Thu Sep 14, 2017 12:55 pm
Re: DEFLATE Compression Algorithm
The DEFLATE specification greatly oversimplifies the whole process by defining the block format in about a single page.
There are a couple of reasons why the suggestion to teach JANOS how to compress files has been in Redmine for 5 years. That's about how long ago when in development JANOS began to directly use JAR files for application programming. The obvious reason that it took 5 years to implement is that there really isn't a huge need for compression in JNIOR. Storage was limited in the previous series and if you had wanted to keep log files around for any serious length of time it would have helped if we could compress them. The Series 4 has much more storage but still not a lot by today's standards.
The real reason you may now realize. It is a bit involved. Given that JANOS uses no third party developed code and that I had not been able to find usable reference materials (search engines having been damaged by marketing greed).There were some hurdles that left this suggestion sit on the list at a low priority for practically ever. Well, we've got it done now.
Let's generate the bit stream and be done with this tome.
There are a couple of reasons why the suggestion to teach JANOS how to compress files has been in Redmine for 5 years. That's about how long ago when in development JANOS began to directly use JAR files for application programming. The obvious reason that it took 5 years to implement is that there really isn't a huge need for compression in JNIOR. Storage was limited in the previous series and if you had wanted to keep log files around for any serious length of time it would have helped if we could compress them. The Series 4 has much more storage but still not a lot by today's standards.
The real reason you may now realize. It is a bit involved. Given that JANOS uses no third party developed code and that I had not been able to find usable reference materials (search engines having been damaged by marketing greed).There were some hurdles that left this suggestion sit on the list at a low priority for practically ever. Well, we've got it done now.
Let's generate the bit stream and be done with this tome.
-
- Posts: 399
- Joined: Thu Sep 14, 2017 12:55 pm
Re: DEFLATE Compression Algorithm
First we stream the Block Header. The very first bit indicates whether or not the block is the last for the compression. For the last block this BFINAL bit will be a 1. It is 0 otherwise. We also are using the dynamic Huffman table type. The next two bits indicate the block type BTYPE. Here we use 10b as we will be providing our own Huffman tables. Yeah, we could have taken the easy path and used a fixed set of tables (BTYPE 01b) but that is no fun.
We have already determined the sizes of our alphabets. We have HLIT defining the size of the literal alphabet that includes codes for sequence lengths. The complete alphabet space covers the range 0..285 but since we are not likely to use them all HLIT defines how many actually are used (0..HLIT). We also have HDIST playing a similar role for the distance alphabet which could range 0..29.
Those two parameters are supplied next. Here each values is conveyed in 5 bits. We can do that since we know that HDIST has to be at least 257 since we do have to use the end-of-block code of 256 in our alphabet. So we supply really the count to length codes used which is similar in magnitude to teh count of distance codes. So HLIT is supplied as HLIT - 257 and HDIST as HDIST - 1.
Following the delivery of HLIT and HDIST we supply HCLEN as HCLEN - 4. Recall that HCLEN defines the size of alphabet we are going to use to encode the array of code lengths for the main two Huffman tables. HCLEN defines the the number of elements we are going to supply for that 19 element array that has the custom order. The balance beyond HCLEN elements is assumed to be 0. This is the array whose elements are supplied with 3 bits and thus limited to the range 0..7. We include HCLEN elements simply as shown above.
Next we stream the concatenated array of code lengths. This array has HLIT + HDIST elements that have been compressed using repeat codes. Here we output the compressed array using the 19 symbol alphabet that we just defined for the decompressor by sending the array above. We encounter the extra bits usage for the first time. You can see in the following that after streaming the repeat codes 16, 17 or 18 we insert the field specifying the repeat count using the required number of extra bits.
If you are the decompressor at this point you have everything you need to develop the Huffman tables for literals and distance codes. You had to expand thee repeat compression and separate the two code length sets. You then tallied code length counts and calculated starting codes for each bit length. Finally you assigned codes to each used alphabet symbol. So we are ready to deal with the actual compressed data.
When we initially scanned the LZ77 compressed data to determine frequencies we had to temporarily expand the length-distance references to determine which codes from the length and distance alphabets we would be using. Well the following loop is practically identical because we again process the LZ77 compressed data expanding the length-distance references. This time we will output the codes to the bit stream. Here again we have the extra bits to insert. Where we ignored them before, now we insert them into the stream when necessary.
And finally we output that end-of-block code (256). If BFINAL were set and this were the last block in the compression (LZ77 flushing the last partial buffer of data) we would flush the serial buffer. This provides a final byte with the remaining bits for the stream.
At this point the buffer supplied by the LZ77 compression has been flushed to the bit stream. We would reset that buffer pointer and return. If there is more data the LZ77 compressor will then proceed with it. Remember that I had defined this buffer to be 64KB so there will likely be more in the case of larger files.
Code: Select all
// Now we have codes and everything that we need (Finally!) to generate the compressed bit
// stream.
// set BFINAL and type BTYPE of 10
_writeb(1, final ? 1 : 0, stream);
_writeb(2, 0x02, stream);
Those two parameters are supplied next. Here each values is conveyed in 5 bits. We can do that since we know that HDIST has to be at least 257 since we do have to use the end-of-block code of 256 in our alphabet. So we supply really the count to length codes used which is similar in magnitude to teh count of distance codes. So HLIT is supplied as HLIT - 257 and HDIST as HDIST - 1.
Code: Select all
// Now we output HLIT, HDIST, and HCLEN
_writeb(5, hlit - 257, stream);
_writeb(5, hdist - 1, stream);
_writeb(4, hclen - 4, stream);
// Output the HCLEN counts from the bit length array. Note that we have already ordered it
// as required.
for (n = 0; n < hclen; n++)
_writeb(3, bitlens[n], stream);
Next we stream the concatenated array of code lengths. This array has HLIT + HDIST elements that have been compressed using repeat codes. Here we output the compressed array using the 19 symbol alphabet that we just defined for the decompressor by sending the array above. We encounter the extra bits usage for the first time. You can see in the following that after streaming the repeat codes 16, 17 or 18 we insert the field specifying the repeat count using the required number of extra bits.
Code: Select all
// Output the run-length compressed HLIT + HDIST code lengths using the code length
// Huffman codes. The two tables are blended together in the run-length encoding.
// Note that we need to insert extra bits where appropriate.
for (n = 0; n < ncnt; n++)
{
c = cntlist[n];
_writeb(cnttbl[c].len, cnttbl[c].code, stream);
if (c >= 16)
{
switch (c)
{
case 16:
_writeb(2, cntlist[++n], stream);
break;
case 17:
_writeb(3, cntlist[++n], stream);
break;
case 18:
_writeb(7, cntlist[++n], stream);
break;
}
}
}
When we initially scanned the LZ77 compressed data to determine frequencies we had to temporarily expand the length-distance references to determine which codes from the length and distance alphabets we would be using. Well the following loop is practically identical because we again process the LZ77 compressed data expanding the length-distance references. This time we will output the codes to the bit stream. Here again we have the extra bits to insert. Where we ignored them before, now we insert them into the stream when necessary.
Code: Select all
// Unbelievable! We can actually now output the compressed data for the block! This is
// encoded using the literal and distance Huffman tables as required. Note we need to
// again process the escaping and length-distance codes.
for (n = 0; n < osize; n++)
{
c = obuf[n];
if (c != 0xff)
_writeb(littbl[c].len, littbl[c].code, stream);
else
{
// check and tally escaped 0xff
if (obuf[++n] == 0xff)
_writeb(littbl[0xff].len, littbl[0xff].code, stream);
else
{
// table defined above
//static const int lcode_ofs[29] = {
// 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17, 19, 23, 27, 31,
// 35, 43, 51, 59, 67, 83, 99, 115, 131, 163, 195, 227, 258
//};
// determine required length code for lengths (3..258). This code is
// coded in the literal table.
len = (obuf[n++] & 0xff) + 3;
for (c = 0; c < 29; c++)
if (lcode_ofs[c] > len)
break;
code = 256 + c;
_writeb(littbl[code].len, littbl[code].code, stream);
// insert extra bits as required by the code
// table defined above
//static const int lcode_bits[29] = {
// 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3,
// 3, 4, 4, 4, 4, 5, 5, 5, 5, 0
//};
c = lcode_bits[code - 257];
if (c)
_writeb(c, len - lcode_ofs[code - 257], stream);
// table define above
//static const int dcode_ofs[30] = {
// 1, 2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193,
// 257, 385, 513, 769, 1025, 1537, 2049, 3073, 4097, 6145,
// 8193, 12289, 16385, 24577
//};
// determine required distance code for distances (1..32768). This code is
// coded in the distance table.
dst = (obuf[n++] & 0xff) << 8;
dst |= (obuf[n] & 0xff);
for (c = 0; c < 30; c++)
if (dcode_ofs[c] > dst)
break;
code = c - 1;
_writeb(dsttbl[code].len, dsttbl[code].code, stream);
// insert extra bits as required by the code
// table defined above
//static const int dcode_bits[30] = {
// 0, 0, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8,
// 9, 9, 10, 10, 11, 11, 12, 12, 13, 13
//};
c = dcode_bits[code];
if (c)
_writeb(c, dst - dcode_ofs[code], stream);
}
}
}
Code: Select all
// And finally the end-of-block code and flush
_writeb(littbl[256].len, littbl[256].code, stream);
if (final)
_flushb(stream);
-
- Posts: 399
- Joined: Thu Sep 14, 2017 12:55 pm
Re: DEFLATE Compression Algorithm
This DEFLATE compression capability will be part of JANOS v1.6.4 and later OS. I am likely to do some optimization before release. Here we see that it works. I'll create an archive containing the files in my JNIOR's root folder.
This archive verifies. This verification does decompress each file and confirm the CRC32.
We can repeat the construction in verbose output mode. Here we see timing. Again, keep in mind that JANOS is running on a 100 MHz Renesas RX63N micro-controller.
The /etc/JanosClasses.jar file compresses and this file is where I first encountered Huffman tables whose bit depth (code length) exceeded the DEFLATE maximums (15 bit for literal and distance tables, 7 bit for code length encoding).
I know that 24 seconds for 1/4 megabyte file is nothing to write home about. Now that things are fully functional I can go back and work on the LZ77 where basically all of the time is consumed. I can certainly improve on this performance but as is is, it isn't that bad. The JNIOR is a controller after all and you likely wouldn't need to compress other archives.
This archive verifies. This verification does decompress each file and confirm the CRC32.
We can repeat the construction in verbose output mode. Here we see timing. Again, keep in mind that JANOS is running on a 100 MHz Renesas RX63N micro-controller.
The /etc/JanosClasses.jar file compresses and this file is where I first encountered Huffman tables whose bit depth (code length) exceeded the DEFLATE maximums (15 bit for literal and distance tables, 7 bit for code length encoding).
I know that 24 seconds for 1/4 megabyte file is nothing to write home about. Now that things are fully functional I can go back and work on the LZ77 where basically all of the time is consumed. I can certainly improve on this performance but as is is, it isn't that bad. The JNIOR is a controller after all and you likely wouldn't need to compress other archives.