This Community site is new! Please help us build a community around the JNIOR.
Sign up, and help share your knowledge. Please sign-up even if you do not plan to post as a sign of support.
If there is evidence of a demand we will continue to develop the content here.

DEFLATE Compression Algorithm

INTEG's own Real-time Operating System. No 3rd party code here. 100% Supportable.
bscloutier
Posts: 399
Joined: Thu Sep 14, 2017 12:55 pm

Re: DEFLATE Compression Algorithm

Post by bscloutier » Tue Jan 30, 2018 1:03 pm

Adjust the Huffman Tree Accordingly

We have created the optimum Huffman table for coding our data. Unfortunately we find out that it is not compatible for use with DEFLATE. The DEFLATE specification dictates that this particular table have a code length maximum of 7 bits. That being forced by the fact that these code lengths are stored in the DEFLATE stream using 3 bits each. That limits code lengths for this table to the set containing 0 thru 7.

Our table is too deep. It requires 8 bits to encode our data. What do we do about that? It seems that anything we do will reduce the efficiency of the compression. We need to create a less than optimum Huffman coding. How do we do that and keep the impact at a minimum? How do we not seriously damage the compression ratio?

We are going to adjust our Huffman table so that it does not exceed the maximum bit depth. Ideally we want to stop incrementing the code length of an symbols that reach the maximum. How can we properly do that? Well, we have to go back to the math. Let's understand what makes a certain set of code lengths valid while others are not?

In the prior post we assigned code lengths to symbols in our alphabet based upon a tree construction algorithm. The steps in this algorithm are those that create a valid tree. It is not surprising then that our set of code lengths represent a real tree. As a result when we calculate the starting codes for each bit length the process ends up with usable codes. And as we noticed the last code in the tree is in fact 2**N - 1. In our case this is 2**8 -1 or 255 and in binary that being 11111111b.

Let me expand the loop in the starting code generation so we can see what happens. Here we will generate starting codes (S0..S8) for our code lengths. We will use a shorthand for the code length counts (N0..N8). In our example those are N = {0 0 3 0 1 3 5 1 2}. Note too that a left shift of 1 is equivalent to multiplication by 2. The starting codes (S) are calculated as follows:
    S1 = (S0 + N0) << 1 = (0 + 0) << 1 = 2 * 0 = 0
    S2 = (S1 + N1) << 1 = (0 + 0) << 1 = 2 * 0 = 0         00
    S3 = (S2 + N2) << 1 = (0 + 3) << 1 = 2 * 3 = 6         110
    S4 = (S3 + N3) << 1 = (6 + 0) << 1 = 2 * 6 = 12        1100
    S5 = (S4 + N4) << 1 = (12 + 1) << 1 = 2 * 13 = 26      11010
    S6 = (S5 + N5) << 1 = (26 + 3) << 1 = 2 * 29 = 58      111010
    S7 = (S6 + N6) << 1 = (58 + 5) << 1 = 2 * 63 = 126     1111110
    S8 = (S7 + N7) << 1 = (126 + 1) << 1 = 2 * 127 = 254   11111110
There are two things to notice here other than the fact that hist matches the table generated earlier. First, the count of 8 bit code lengths (N8) doesn't come into play. Yet we know that there are 2 and the first will be assigned 11111110b and the second 11111111b. This being the 2**8 -1 that we now expect. The second thing is that all of the starting codes are even numbers. That being driven by the fact that they are the product of multiplication by 2. We will use this fact later.

Now I can reverse this procedure to generate starting codes back from the maximum bit length knowing that the last code must be 2**N - 1. So for this table we get the following:
    S8 = 2**8 - N8 = 2**8 - 2 = 254
    S7 = S8/2 - N7 = 254/2 - 1 = 127 - 1 = 126
    S6 = S7/2 - N6 = 126/2 - 5 = 63 - 5 = 58
    S5 = S6/2 - N5 = 58/2 - 3 = 29 - 3 = 26
    S4 = S5/2 - N4 = 26/2 - 1 = 13 - 1 = 12
    S3 = S4/2 - N3 = 12/2 - 0 = 6
    S2 = S3/2 - N2 = 6/2 - 3 = 3 - 3 = 0
    S1 = 0
    S0 = 0
This is just a matter of running the calculations backwards and knowing (or realizing) that the last code has to be 2**N - 1. You can see it generates the same results.

Now what if we decide to not accept a bit depth exceeding the maximum? So we are going to force those two symbols wanting to be 8 bit codes to be two additional 7 bit codes. So our code length array will look like this: N = {0 0 3 0 1 3 5 3}. Here we combined S7 and S8 and eliminated 8 bit altogether. Legal? Of course not. You can't visualize what that does to a tree. let's try the calculations back from the new maximum of 2**7 - 1.
    S7 = 2**7 - N7 = 128 - 3 = 125
Here this fails immediately. We know that starting codes (S) must be even numbers and 125 is odd! Not surprising as we are kind of floating a pair of leaves up in the air somehow. Can we make a home for them?

Clearly if we were to reattach those leaves somewhere else in the tree structure other nodes must be increased in bit depth. We need another 7 bit symbol to get S7 to be an even 124. To do that with minimum impact on compression ratio we increase the 6 bit coded symbol with the lowest frequency to 7 bits. Our array now being N = {0 0 3 0 1 3 4 4}. Try again:
    S7 = 2**7 - N7 = 128 - 4 = 124
    S6 = 124/2 - N6 = 62 - 4 = 58
    S5 = S6/2 - N5 = 58/2 - 3 = 29 - 3 = 26
    S4 = S5/2 - N4 = 26/2 - 1 = 13 - 1 = 12
    S3 = S4/2 - N3 = 12/2 - 0 = 6
    S2 = S3/2 - N2 = 6/2 - 3 = 3 - 3 = 0
    S1 = 0
    S0 = 0
Um. Everything seemed to fit right in. Is this tree valid? Let's see.
N  bl_count[N]  next_code[N]
0      0
1      0            0
2      3            0     00
3      0            6     110
4      1           12     1100
5      3           26     11010
6      4           58     111010
7      4          124     1111100
        freq  clen     code
A  0x000 4     5      11010
D  0x003 2     6      111010
E  0x004 6     4      1100
F  0x005 3     6      111011
G  0x006 2     6      111100
H  0x007 2     6      111101
J  0x008 53    2      00
K  0x009 26    2      01
L  0x00a 5     5      11011
M  0x00b 4     5      11100
N  0x00c 3     7      1111100
P  0x00d 1     7      1111101
Q  0x00e 1     7      1111110
R  0x00f 1     7      1111111
S  0x010 37    2      10
This appears to have successfully generated a Huffman tree that would work with DEFLATE format! Let's look at the compression ratio for this.
        freq  clen     freq*clen
A  0x000 4     5           20
D  0x003 2     6           12
E  0x004 6     4           24
F  0x005 3     6           18
G  0x006 2     6           12
H  0x007 2     6           12
J  0x008 53    2          106
K  0x009 26    2           52
L  0x00a 5     5           25
M  0x00b 4     5           20
N  0x00c 3     7           21
P  0x00d 1     7            7
Q  0x00e 1     7            7
R  0x00f 1     7            7
S  0x010 37    2           74
                      -----------
                          417 bits (53 bytes)
Wait! This only cost us 1 bit? Yes it did but to store it we would need another whole byte. So the impact of this procedure is likely (though not proven) to have a minimum impact on compression ratio. Yet, it corrects the table to insure that it is compatible with DEFLATE.

To generalize the process, the reversed starting code calculation is repeated from 2**N - 1 when N is the maximum bit depth back to 0 for bit length 0. If at any point the calculated starting code is not even, you must set the bit depth for the next least frequent symbol to include it at this code length and make the starting code even.

In my next post I will show code for this.

bscloutier
Posts: 399
Joined: Thu Sep 14, 2017 12:55 pm

Re: DEFLATE Compression Algorithm

Post by bscloutier » Tue Jan 30, 2018 1:56 pm

The complete procedure for generating a DEFLATE format compatible Huffman table limited to a maximum bit depth is shown here. I know that this is not optimized code. There is some unnecessary execution but I had wanted to keep steps separate and clear. You can be sure that over time I will optimize the coding and obfuscate it suitably for all future generations. ;)

This routine has the capacity to return FALSE if a table cannot be created. It was doing just that when the bit depth (code length) exceeded maximum. That has since been corrected. It will always return TRUE now.

// Establish bit length for the Huffman table based upon frequencies
static int _bitlength(struct huff_t *tbl, int ncodes, int maxb)
{
	uint16_t *list = MM_alloc(ncodes * sizeof(uint16_t), &_bitlength);
	int *freq = MM_alloc(ncodes * sizeof(int), &_bitlength);
	int nlist = 0;
	int n, c, p;
	int ret = TRUE;
	uint16_t *ptr;

	// List all of the symbols used in the data along with their frequencies. Note that
	//  we store pointers +1 so as to keep 0 as a linked list terminator.
	for (n = 0; n < ncodes; n++)
		if (tbl[n].cnt > 0)
		{
			list[nlist] = n + 1;
			freq[nlist] = tbl[n].cnt;
			nlist++;
		}

	// Note that there is a special boundary case when only 1 code is used. In this case
	//  the single code is encoded using 1 bit and not 0.
	if (nlist == 1)
		tbl[list[0] - 1].len = 1;

	// process this list down to a single node
	while (nlist > 1)
	{
		// sort the list by decreasing frequency
		for (n = 0; n < nlist - 1; n++)
			if (freq[n] < freq[n + 1])
			{
				// swap order
				c = list[n];
				list[n] = list[n + 1];
				list[n + 1] = c;
				c = freq[n];
				freq[n] = freq[n + 1];
				freq[n + 1] = c;

				// need to sort back
				if (n > 0)
					n -= 2;
			}

		// Combine the member lists associated with the last two entries. We combine the
		//  linked lists for the two low frequency nodes.
		p = list[nlist - 2];
		while (tbl[p - 1].link)
			p = tbl[p - 1].link;
		tbl[p - 1].link = list[nlist - 1];

		// The new node has the combined frequency.
		freq[nlist - 2] += freq[nlist - 1];
		nlist--;

		// Sort the members of this node by decreasing code length. Longer codes to the
		//  left. This will also sort the frequency of the symbols in increasing order
		//  when code lengths are equal. We need this arrangement for the next step should
		//  we be required to balance the tree and avoid exceeding the maximum code
		//  length (maxb).
		p = TRUE;
		while (p)
		{
			p = FALSE;

			ptr = &list[nlist - 1];
			while (*ptr && tbl[*ptr - 1].link)
			{
				c = tbl[*ptr - 1].link;
				if ((tbl[*ptr - 1].len < tbl[c - 1].len) ||
						(tbl[*ptr - 1].len == tbl[c - 1].len && tbl[*ptr - 1].cnt > tbl[c - 1].cnt))
				{
					n = tbl[c - 1].link;
					tbl[*ptr - 1].link = n;
					tbl[c - 1].link = *ptr;
					*ptr = c;
					p = TRUE;
				}
				ptr = &tbl[*ptr - 1].link;
			}
		}

		// Increase the code length for members of this node. We cannot exceed the maximum
		//  code length (maxb).
		p = list[nlist - 1];
		while (p)
		{
			if (tbl[p - 1].len < maxb)
				tbl[p - 1].len++;

			p = tbl[p - 1].link;
		}

		// Now verify the structure. This should be absolutely proper up until the point when
		//  we prevent code lengths from exceeding the maximum. Once we do that we are likely
		//  creating an impossible tree. We will need to correct that.
		p = list[nlist - 1];
		c = tbl[p - 1].len;
		if (c == maxb)
		{

			n = 1 << c;
			while (p)
			{
				if (tbl[p - 1].len == c)
					n--;
				else
				{
					// n must be even at this point or we extend the length group
					if (n & 1)
					{
						tbl[p - 1].len = c;
						n--;
					}
					else
					{
						c--;
						n /= 2;
					}
				}

				p = tbl[p - 1].link;
			}
		}
	}

	MM_free(freq);
	MM_free(list);
	return (ret);
}

Here are the steps that it performs:
  1. Create an array for all used alphabet symbols (those with non-zero frequency).
  2. Check for the special case where there is only one symbol. In that case we use a 1 bit code where one is unused.
  3. Sort the list array by decreasing frequency. The least frequent symbols are then at the end of the list.
  4. Combine the rightmost two least frequent symbols or nodes into a single new node having the combined frequency. All members of the two combined nodes become members of the new node.
  5. Sort the member list for the new node by decreasing bit depth (current code length) and secondly by increasing frequency.
  6. Increase the bit depth for all members of the new node by 1. If a symbol will exceed the maximum bit depth do not increment it.
  7. If we have reached the maximum bit depth then confirm the tree structure using the reverse starting code length calculations. Elevate the next least frequent symbol to the current bit depth if the calculated starting code is not an even number. Check all code lengths.
  8. If there is more than one entry in the list array (not down to one node yet) then repeat at Step #3.

bscloutier
Posts: 399
Joined: Thu Sep 14, 2017 12:55 pm

Re: DEFLATE Compression Algorithm

Post by bscloutier » Tue Jan 30, 2018 2:47 pm

So to review....

First, we covered how to perform a reasonably fast version of LZ77. This filling a 64 KB buffer with compressed data which contains literals and length-distance codes.

When we need to we will flush the 64 KB buffer and generate a block of DEFLATE format. At that point we call a routine that first analyzes the data for code frequencies. There are to be two alphabets. One for literals and length codes. The other for distance codes.

Now we've developed a method for generating DEFLATE compatible Huffman tables for our two alphabets. The bulk of the complexity is now behind us but there is still some work to do before we can start generate our compressed bit stream.

Moving on...

Now we need the size of our literal alphabet (HLIT) and our distance alphabet (HDIST) since it is unlikely the we have utilized all literal symbols (0..285) or distance codes (0..29). So here we scan each to trim unused symbols.

Code: Select all

		// Now we combine the bit length arrays into a single array for run-length-like repeat encoding.
		//  In DEFLATE this encoding can overlap for the literal table to the distance code table as
		//  if a single array. First determine the alphabet size for each table.
		for (hlit = 286; 0 < hlit; hlit--)
			if (littbl[hlit - 1].len > 0)
				break;

		for (hdist = 30; 0 < hdist; hdist--)
			if (dsttbl[hdist - 1].len > 0)
				break;
So we have the two alphabets. One for literals and length codes (0..HLIT) and one for distance codes (0..HDIST). We've seen that all we need are the code lengths for each symbol set to define the associated Huffman tree. We need to convey this information to the decompressor. The DEFLATE format combines the code length arrays for the two alphabets now into one long array of HLIT + HDIST code lengths.

Code: Select all

		// Now create the bit length array
		cntlist = MM_alloc((hlit + hdist) * sizeof(int), &_bufflush);
		for (n = 0; n < hlit; n++)
			cntlist[n] = littbl[n].len;
		for (n = 0; n < hdist; n++)
			cntlist[hlit + n] = dsttbl[n].len;
Now we know that the end-of-block code (256) is used as well as at least a handful of length codes and distance codes. So this array of code lengths itself is fairly long. It will be somewhere between 260 and 315 entries. Each code length is is in the set 0..15. So typically there is a lot of repetition. There can be a large number of 0's in a row. Consider data that is constrained to 7-bit ASCII. In that case there are 128 literal codes that never occur and would have a code length of 0.

The DEFLATE specification defines a kind of run-length encoding for this array. This encodes sequences of code lengths using three additional codes.
         The Huffman codes for the two alphabets appear in the block
         immediately after the header bits and before the actual
         compressed data, first the literal/length code and then the
         distance code.  Each code is defined by a sequence of code
         lengths, as discussed in Paragraph 3.2.2, above.  For even
         greater compactness, the code length sequences themselves are
         compressed using a Huffman code.  The alphabet for code lengths
         is as follows:

               0 - 15: Represent code lengths of 0 - 15
                   16: Copy the previous code length 3 - 6 times.
                       The next 2 bits indicate repeat length
                             (0 = 3, ... , 3 = 6)
                          Example:  Codes 8, 16 (+2 bits 11),
                                    16 (+2 bits 10) will expand to
                                    12 code lengths of 8 (1 + 6 + 5)
                   17: Repeat a code length of 0 for 3 - 10 times.
                       (3 bits of length)
                   18: Repeat a code length of 0 for 11 - 138 times
                       (7 bits of length)
So as you can see we are heading towards our third Huffman table. This one with an alphabet of 19 codes (0..18). That's why the symbol set I used for the example in handling maximum code length has 19 members. I used one of these tables as an example.

In my next post I will show the procedure I use for applying the repeat codes in this alphabet.

bscloutier
Posts: 399
Joined: Thu Sep 14, 2017 12:55 pm

Re: DEFLATE Compression Algorithm

Post by bscloutier » Tue Jan 30, 2018 3:06 pm

The current version of jniorsys.log on my development JNIOR compresses into a single block using the following literal and distance tables.

Code: Select all

littbl
 0x00a 2 11 11111101110
 0x00d 2 12 111111111000
 0x020 93 6 101000
 0x027 4 11 11111101111
 0x028 14 9 111101010
 0x029 3 11 11111110000
 0x02a 2 12 111111111001
 0x02b 9 10 1111101000
 0x02c 9 10 1111101001
 0x02d 23 8 11101100
 0x02e 158 5 01100
 0x02f 18 9 111101011
 0x030 120 6 101001
 0x031 182 5 01101
 0x032 147 5 01110
 0x033 150 5 01111
 0x034 127 6 101010
 0x035 132 6 101011
 0x036 113 6 101100
 0x037 110 6 101101
 0x038 113 6 101110
 0x039 106 6 101111
 0x03a 91 6 110000
 0x03c 2 12 111111111010
 0x03d 3 11 11111110001
 0x03e 3 11 11111110010
 0x041 6 10 1111101010
 0x043 5 11 11111110011
 0x044 1 13 1111111111100
 0x046 2 12 111111111011
 0x048 1 13 1111111111101
 0x04a 3 11 11111110100
 0x04c 3 11 11111110101
 0x04d 5 11 11111110110
 0x04e 4 11 11111110111
 0x04f 5 10 1111101011
 0x050 11 9 111101100
 0x052 10 9 111101101
 0x053 6 10 1111101100
 0x054 5 10 1111101101
 0x055 2 12 111111111100
 0x057 3 11 11111111000
 0x05a 1 13 1111111111110
 0x05f 5 10 1111101110
 0x061 44 7 1101100
 0x062 7 10 1111101111
 0x063 19 9 111101110
 0x064 30 8 11101101
 0x065 52 7 1101101
 0x066 14 9 111101111
 0x067 13 9 111110000
 0x068 9 10 1111110000
 0x069 45 7 1101110
 0x06a 6 10 1111110001
 0x06b 3 11 11111111001
 0x06c 29 8 11101110
 0x06d 21 8 11101111
 0x06e 46 7 1101111
 0x06f 51 7 1110000
 0x070 31 8 11110000
 0x071 2 12 111111111101
 0x072 37 8 11110001
 0x073 27 8 11110010
 0x074 40 7 1110001
 0x075 16 9 111110001
 0x076 6 10 1111110010
 0x077 8 10 1111110011
 0x078 4 11 11111111010
 0x079 9 10 1111110100
 0x07a 4 11 11111111011
 0x100 1 13 1111111111111
 0x101 1248 2 00
 0x102 580 4 0100
 0x103 168 5 10000
 0x104 160 5 10001
 0x105 72 7 1110010
 0x106 119 6 110001
 0x107 82 6 110010
 0x108 17 9 111110010
 0x109 116 6 110011
 0x10a 253 5 10010
 0x10b 120 6 110100
 0x10c 52 7 1110011
 0x10d 69 7 1110100
 0x10e 79 6 110101
 0x10f 20 8 11110011
 0x110 54 7 1110101
 0x111 236 5 10011
 0x112 371 4 0101
 0x113 34 8 11110100
 0x114 6 10 1111110101
 0x115 17 9 111110011
 0x116 6 10 1111110110

dsttbl
 0x002 1 10 1111111100
 0x003 2 10 1111111101
 0x008 2 10 1111111110
 0x009 2 10 1111111111
 0x00a 14 8 11111110
 0x00b 204 4 0100
 0x00c 225 4 0101
 0x00d 46 7 1111110
 0x00e 168 5 11100
 0x00f 85 6 111110
 0x010 202 4 0110
 0x011 130 5 11101
 0x012 193 4 0111
 0x013 146 5 11110
 0x014 259 4 1000
 0x015 209 4 1001
 0x016 301 4 1010
 0x017 255 4 1011
 0x018 377 3 000
 0x019 293 4 1100
 0x01a 433 3 001
 0x01b 332 4 1101
We determine HLIT and HDIST for this and combine the code lengths into one HLIT + HDIST length array.
HLIT: 279
HDIST: 28
 0 0 0 0 0 0 0 0 0 0 11 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 11 9 11 12 10 10 8 5 9 6 5 5 5 6 6 6 6 6 6 6 0 12 11 11 0 0 10 0 11 13 0 12 0 13 0 11 0 11 11 11 10 9 0 9 10 10 12 0 11 0 0 13 0 0 0 0 10 0 7 10 9 8 7 9 9 10 7 10 11 8 8 7 7 8 12 8 8 7 9 10 10 11 10 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 2 4 5 5 7 6 6 9 6 5 6 7 7 6 8 7 5 4 8 10 9 10 0 0 10 10 0 0 0 0 10 10 8 4 4 7 5 6 4 5 4 5 4 4 4 4 3 4 3 4
So this array has 307 entries and you can see that there is a lot of repetition. We next apply the repeat codes as appropriate to shorten this array to 139 entries.
HLIT: 279
HDIST: 28
 0 0 0 0 0 0 0 0 0 0 11 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 11 9 11 12 10 10 8 5 9 6 5 5 5 6 6 6 6 6 6 6 0 12 11 11 0 0 10 0 11 13 0 12 0 13 0 11 0 11 11 11 10 9 0 9 10 10 12 0 11 0 0 13 0 0 0 0 10 0 7 10 9 8 7 9 9 10 7 10 11 8 8 7 7 8 12 8 8 7 9 10 10 11 10 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 2 4 5 5 7 6 6 9 6 5 6 7 7 6 8 7 5 4 8 10 9 10 0 0 10 10 0 0 0 0 10 10 8 4 4 7 5 6 4 5 4 5 4 4 4 4 3 4 3 4
NCNT: 139
 17/7 11 0 0 12 18/7 6 17/3 11 9 11 12 10 10 8 5 9 6 5 5 5 6 16/3 0 12 11 11 0 0 10 0 11 13 0 12 0 13 0 11 0 11 11 11 10 9 0 9 10 10 12 0 11 0 0 13 17/1 10 0 7 10 9 8 7 9 9 10 7 10 11 8 8 7 7 8 12 8 8 7 9 10 10 11 10 11 18/122 13 2 4 5 5 7 6 6 9 6 5 6 7 7 6 8 7 5 4 8 10 9 10 0 0 10 10 17/1 10 10 8 4 4 7 5 6 4 5 4 5 4 16/0 3 4 3 4
Here when one of the repeat codes are used I show the value of the extra bits we will insert following the Huffman code for the symbol.

Now we create the Huffman table for this array and determine frequencies. Note that extra bits are inserted later and are not part of the Huffman encoding. Here the maximum code length is 7 bits.

Code: Select all

		// Ugh. Now we need yet another Huffman table for this run-length alphabet. First we establish
		//  the frequencies.  Note that we skip the byte defining the extra bits.
		cnttbl = MM_alloc(HUFF_HLEN * sizeof(struct huff_t), &_bufflush);
		for (n = 0; n < ncnt; n++)
		{
			cnttbl[cntlist[n]].cnt++;
			if (cntlist[n] >= 16)
				n++;
		}

		// We need to determine the bit lengths. 
		if (!_bitlength(cnttbl, HUFF_HLEN, 7))
		{
			err = TRUE;
			break;
		}
This results in the following third Huffman table that we will use to code the code lengths which in turn are used to generate the literal and distance tables for the data compression.
cnttbl
 0x000 17 3 000
 0x002 1 6 111100
 0x003 2 6 111101
 0x004 9 4 1000
 0x005 11 3 001
 0x006 9 4 1001
 0x007 11 4 1010
 0x008 10 4 1011
 0x009 10 4 1100
 0x00a 19 3 010
 0x00b 14 3 011
 0x00c 6 4 1101
 0x00d 4 5 11100
 0x010 2 6 111110
 0x011 4 5 11101
 0x012 2 6 111111

Are we there yet? :shock:

Well, we are getting close.

bscloutier
Posts: 399
Joined: Thu Sep 14, 2017 12:55 pm

Re: DEFLATE Compression Algorithm

Post by bscloutier » Tue Jan 30, 2018 3:29 pm

As before we will need to convey the code lengths in this 'cnttbl' to the decompressor. We have N = {3 0 6 6 4 3 4 4 4 4 3 3 4 5 0 0 6 5 6}. I mentioned earlier that these are stored each with 3 bits in the bit stream limiting the code length to 7 bits.

But DEFLATE wants to save every possible byte. Each of these code lengths has a probability of being used over some general set of data types. They decided to sequence these into an array i a custom order such that the least likely to be used code lengths fall at the end and can be trimmed. So we get the order from the specification and sequence these. Notice that our three 0 bit code length do in fact get trimmed.

Code: Select all

		// Finally (haha) we establish a custom order for these bit lengths
		// array defined above
		//static const char hclen_order[HUFF_HLEN] = {
		//	16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2,
		//	14, 1, 15
		//};
		bitlens = MM_alloc(HUFF_HLEN * sizeof(int), &_bufflush);
		for (n = 0; n < HUFF_HLEN; n++)
			bitlens[n] = cnttbl[hclen_order[n]].len;

		// Now the the end of this array should be 0's so we find a length for the array
		for (hclen = HUFF_HLEN; 0 < hclen; hclen--)
			if (bitlens[hclen - 1] > 0)
				break;
 6 5 6 3 4 4 4 4 3 3 3 4 4 6 5 6 0 0 0
HCLEN: 16
It is hard to believe but we now have everything that we need to actually generate the compressed bit stream. Well, all except a couple of routines to actually do the serialization. So that will be the next step.

bscloutier
Posts: 399
Joined: Thu Sep 14, 2017 12:55 pm

Re: DEFLATE Compression Algorithm

Post by bscloutier » Tue Jan 30, 2018 4:05 pm

We're going to output to a bit stream. Each write will likely involve a different number of bits. Someplace we have to properly sequence these into bytes to append to the output buffer.

To do this we need one routine to write called _writeb() and one to use at the end to flush any remaining unwritten bits called _flushb().

Code: Select all

// Routine to stream bits
static void _writeb(int num, int code, struct bitstream_t *stream)
{
	// if no bits then nothing to do
	if (num == 0)
		return;

	// insert bits into the stream
	code &= ((1 << num) - 1);
	code <<= stream->nbits;
	stream->reg |= code;
	stream->nbits += num;

	// stream completed bytes
	while (stream->nbits >= 8)
	{
		stream->buffer[stream->length++] = (stream->reg & 0xff);
		stream->reg >>= 8;
		stream->nbits -= 8;
	}
}


// Routine flushes the bitstream
static void _flushb(struct bitstream_t *stream)
{
	if (stream->nbits)
	{
		stream->buffer[stream->length++] = (stream->reg & 0xff);
		stream->reg = 0;
		stream->nbits = 0;
	}
}
So to tie this together we use a structure.
// structure to assist with bit streaming
struct bitstream_t {
	char *buffer;
	int length;
	uint32_t reg;
	int nbits;
};
This seems simple enough but there are a couple things to understand. The DEFLATE specification gets into it right off the bat.

The first bit of the bit stream is the least significant bit of the first byte in the buffer. Once 8 bits are retrieved the 9th is the least significant bit of the second byte and so on.

A value appears in the stream starting with it's least significant bit. That means that the bit order does not need to be reversed to pack the value at the tail of the current bit stream.

Huffman codes are processed a bit at a time. When you are reading a Huffman code you do not know how many bits you will need to retrieve the code for a valid symbol in the alphabet. So in this case you must insert the Huffman code so the most-significant bit is read first. The order of Huffman bits needs to be reversed. Armed with that fact, I have stored these codes in the tables in reverse order. That will be apparent in the following code to generate the tables for coding.

So the last thing we need to do is generate the actual Huffman codes for the three tables.

Code: Select all

		// Now we need Huffman codes for these tables because eventually someday we will actually be
		//  generating a compressed bit stream.

		// Next we total the number of symbols using each bit length. These will be used to assign
		//  bit codes for each alphabet symbol.
		litcnts = MM_alloc(16 * sizeof(int), &_bufflush);
		for (n = 0; n < 286; n++)
			if (littbl[n].len > 0)
				litcnts[littbl[n].len]++;

		dstcnts = MM_alloc(16 * sizeof(int), &_bufflush);
		for (n = 0; n < 30; n++)
			if (dsttbl[n].len > 0)
				dstcnts[dsttbl[n].len]++;

		bitcnts = MM_alloc(16 * sizeof(int), &_bufflush);
		for (n = 0; n < HUFF_HLEN; n++)
			if (cnttbl[n].len)
				bitcnts[cnttbl[n].len]++;

		// Now we calculate starting codes for each bit length group. This procedure is defined in the
		//  DEFLATE specification. We can define the Huffman tables in a compressed format provided that
		//  the Huffman tables follow a couple of additional rules. Using these starting codes we
		//  can assing codes for each alphabet symbol. Note that Huffman codes are processed bit-by-bit
		//  and therefore must be generated here in reverse bit order.

		// Define codes for the literal Huffman table
		startcd = MM_alloc(16 * sizeof(int), &_bufflush);
		for (n = 0; n < 15; n++)
			startcd[n + 1] = (startcd[n] + litcnts[n]) << 1;
		for (n = 0; n < 286; n++)
		{
			len = littbl[n].len;
			if (len)
			{
				c = startcd[len]++;
				while (len--)
				{
					littbl[n].code <<= 1;
					littbl[n].code |= (c & 1);
					c >>= 1;
				}
			}
		}

		// Define codes for the distance Huffman table
		for (n = 0; n < 15; n++)
			startcd[n + 1] = (startcd[n] + dstcnts[n]) << 1;
		for (n = 0; n < 30; n++)
		{
			len = dsttbl[n].len;
			if (len)
			{
				c = startcd[len]++;
				while (len--)
				{
					dsttbl[n].code <<= 1;
					dsttbl[n].code |= (c & 1);
					c >>= 1;
				}
			}
		}

		// Define codes for the bit length Huffman table
		for (n = 0; n < 15; n++)
			startcd[n + 1] = (startcd[n] + bitcnts[n]) << 1;
		for (n = 0; n < HUFF_HLEN; n++)
		{
			len = cnttbl[n].len;
			if (len)
			{
				c = startcd[len]++;
				while (len--)
				{
					cnttbl[n].code <<= 1;
					cnttbl[n].code |= (c & 1);
					c >>= 1;
				}
			}
		}
Coming up next: Actually Generating the Bit Stream

bscloutier
Posts: 399
Joined: Thu Sep 14, 2017 12:55 pm

Re: DEFLATE Compression Algorithm

Post by bscloutier » Wed Jan 31, 2018 7:55 am

The DEFLATE specification greatly oversimplifies the whole process by defining the block format in about a single page.
         We can now define the format of the block:

               5 Bits: HLIT, # of Literal/Length codes - 257 (257 - 286)
               5 Bits: HDIST, # of Distance codes - 1        (1 - 32)
               4 Bits: HCLEN, # of Code Length codes - 4     (4 - 19)

               (HCLEN + 4) x 3 bits: code lengths for the code length
                  alphabet given just above, in the order: 16, 17, 18,
                  0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15

                  These code lengths are interpreted as 3-bit integers
                  (0-7); as above, a code length of 0 means the
                  corresponding symbol (literal/length or distance code
                  length) is not used.

               HLIT + 257 code lengths for the literal/length alphabet,
                  encoded using the code length Huffman code

               HDIST + 1 code lengths for the distance alphabet,
                  encoded using the code length Huffman code

               The actual compressed data of the block,
                  encoded using the literal/length and distance Huffman
                  codes

               The literal/length symbol 256 (end of data),
                  encoded using the literal/length Huffman code

         The code length repeat codes can cross from HLIT + 257 to the
         HDIST + 1 code lengths.  In other words, all code lengths form
         a single sequence of HLIT + HDIST + 258 values.
There are a couple of reasons why the suggestion to teach JANOS how to compress files has been in Redmine for 5 years. That's about how long ago when in development JANOS began to directly use JAR files for application programming. The obvious reason that it took 5 years to implement is that there really isn't a huge need for compression in JNIOR. Storage was limited in the previous series and if you had wanted to keep log files around for any serious length of time it would have helped if we could compress them. The Series 4 has much more storage but still not a lot by today's standards.

The real reason you may now realize. It is a bit involved. Given that JANOS uses no third party developed code and that I had not been able to find usable reference materials (search engines having been damaged by marketing greed).There were some hurdles that left this suggestion sit on the list at a low priority for practically ever. Well, we've got it done now.

Let's generate the bit stream and be done with this tome.

bscloutier
Posts: 399
Joined: Thu Sep 14, 2017 12:55 pm

Re: DEFLATE Compression Algorithm

Post by bscloutier » Wed Jan 31, 2018 8:39 am

First we stream the Block Header. The very first bit indicates whether or not the block is the last for the compression. For the last block this BFINAL bit will be a 1. It is 0 otherwise. We also are using the dynamic Huffman table type. The next two bits indicate the block type BTYPE. Here we use 10b as we will be providing our own Huffman tables. Yeah, we could have taken the easy path and used a fixed set of tables (BTYPE 01b) but that is no fun.

Code: Select all

		// Now we have codes and everything that we need (Finally!) to generate the compressed bit
		//  stream.

		// set BFINAL and type BTYPE of 10
		_writeb(1, final ? 1 : 0, stream);
		_writeb(2, 0x02, stream);
We have already determined the sizes of our alphabets. We have HLIT defining the size of the literal alphabet that includes codes for sequence lengths. The complete alphabet space covers the range 0..285 but since we are not likely to use them all HLIT defines how many actually are used (0..HLIT). We also have HDIST playing a similar role for the distance alphabet which could range 0..29.

Those two parameters are supplied next. Here each values is conveyed in 5 bits. We can do that since we know that HDIST has to be at least 257 since we do have to use the end-of-block code of 256 in our alphabet. So we supply really the count to length codes used which is similar in magnitude to teh count of distance codes. So HLIT is supplied as HLIT - 257 and HDIST as HDIST - 1.

Code: Select all

		// Now we output HLIT, HDIST, and HCLEN
		_writeb(5, hlit - 257, stream);
		_writeb(5, hdist - 1, stream);
		_writeb(4, hclen - 4, stream);

		// Output the HCLEN counts from the bit length array. Note that we have already ordered it
		//  as required.
		for (n = 0; n < hclen; n++)
			_writeb(3, bitlens[n], stream);
Following the delivery of HLIT and HDIST we supply HCLEN as HCLEN - 4. Recall that HCLEN defines the size of alphabet we are going to use to encode the array of code lengths for the main two Huffman tables. HCLEN defines the the number of elements we are going to supply for that 19 element array that has the custom order. The balance beyond HCLEN elements is assumed to be 0. This is the array whose elements are supplied with 3 bits and thus limited to the range 0..7. We include HCLEN elements simply as shown above.

Next we stream the concatenated array of code lengths. This array has HLIT + HDIST elements that have been compressed using repeat codes. Here we output the compressed array using the 19 symbol alphabet that we just defined for the decompressor by sending the array above. We encounter the extra bits usage for the first time. You can see in the following that after streaming the repeat codes 16, 17 or 18 we insert the field specifying the repeat count using the required number of extra bits.

Code: Select all

		// Output the run-length compressed HLIT + HDIST code lengths using the code length
		//  Huffman codes. The two tables are blended together in the run-length encoding.
		//  Note that we need to insert extra bits where appropriate.
		for (n = 0; n < ncnt; n++)
		{
			c = cntlist[n];
			_writeb(cnttbl[c].len, cnttbl[c].code, stream);

			if (c >= 16)
			{
				switch (c)
				{
				case 16:
					_writeb(2, cntlist[++n], stream);
					break;
				case 17:
					_writeb(3, cntlist[++n], stream);
					break;
				case 18:
					_writeb(7, cntlist[++n], stream);
					break;
				}
			}
		}
If you are the decompressor at this point you have everything you need to develop the Huffman tables for literals and distance codes. You had to expand thee repeat compression and separate the two code length sets. You then tallied code length counts and calculated starting codes for each bit length. Finally you assigned codes to each used alphabet symbol. So we are ready to deal with the actual compressed data.

When we initially scanned the LZ77 compressed data to determine frequencies we had to temporarily expand the length-distance references to determine which codes from the length and distance alphabets we would be using. Well the following loop is practically identical because we again process the LZ77 compressed data expanding the length-distance references. This time we will output the codes to the bit stream. Here again we have the extra bits to insert. Where we ignored them before, now we insert them into the stream when necessary.

Code: Select all

		// Unbelievable! We can actually now output the compressed data for the block! This is
		//  encoded using the literal and distance Huffman tables as required. Note we need to
		//  again process the escaping and length-distance codes.
		for (n = 0; n < osize; n++)
		{
			c = obuf[n];
			if (c != 0xff)
				_writeb(littbl[c].len, littbl[c].code, stream);
			else
			{
				// check and tally escaped 0xff
				if (obuf[++n] == 0xff)
					_writeb(littbl[0xff].len, littbl[0xff].code, stream);
				else
				{
					// table defined above
					//static const int lcode_ofs[29] = {
					//	3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17, 19, 23, 27, 31,
					//	35, 43, 51, 59, 67, 83, 99, 115, 131, 163, 195, 227, 258
					//};

					// determine required length code for lengths (3..258). This code is
					//  coded in the literal table.
					len = (obuf[n++] & 0xff) + 3;
					for (c = 0; c < 29; c++)
						if (lcode_ofs[c] > len)
							break;
					code = 256 + c;
					_writeb(littbl[code].len, littbl[code].code, stream);

					// insert extra bits as required by the code
					// table defined above
					//static const int lcode_bits[29] = {
					//	0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3,
					//	3, 4, 4, 4, 4, 5, 5, 5, 5, 0
					//};
					c = lcode_bits[code - 257];
					if (c)
						_writeb(c, len - lcode_ofs[code - 257], stream);

					// table define above
					//static const int dcode_ofs[30] = {
					//	1, 2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193,
					//	257, 385, 513, 769, 1025, 1537, 2049, 3073, 4097, 6145,
					//	8193, 12289, 16385, 24577
					//};

					// determine required distance code for distances (1..32768). This code is
					//  coded in the distance table.
					dst = (obuf[n++] & 0xff) << 8;
					dst |= (obuf[n] & 0xff);
					for (c = 0; c < 30; c++)
						if (dcode_ofs[c] > dst)
							break;
					code = c - 1;
					_writeb(dsttbl[code].len, dsttbl[code].code, stream);

					// insert extra bits as required by the code
					// table defined above
					//static const int dcode_bits[30] = {
					//	0, 0, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8,
					//	9, 9, 10, 10, 11, 11, 12, 12, 13, 13
					//};
					c = dcode_bits[code];
					if (c)
						_writeb(c, dst - dcode_ofs[code], stream);
				}
			}
		}
And finally we output that end-of-block code (256). If BFINAL were set and this were the last block in the compression (LZ77 flushing the last partial buffer of data) we would flush the serial buffer. This provides a final byte with the remaining bits for the stream.

Code: Select all

			// And finally the end-of-block code and flush
			_writeb(littbl[256].len, littbl[256].code, stream);
			if (final)
				_flushb(stream);
At this point the buffer supplied by the LZ77 compression has been flushed to the bit stream. We would reset that buffer pointer and return. If there is more data the LZ77 compressor will then proceed with it. Remember that I had defined this buffer to be 64KB so there will likely be more in the case of larger files.

bscloutier
Posts: 399
Joined: Thu Sep 14, 2017 12:55 pm

Re: DEFLATE Compression Algorithm

Post by bscloutier » Wed Jan 31, 2018 9:00 am

This DEFLATE compression capability will be part of JANOS v1.6.4 and later OS. I am likely to do some optimization before release. Here we see that it works. I'll create an archive containing the files in my JNIOR's root folder.
bruce_dev /> zip -c test.zip /
 6 files saved
bruce_dev /> 

bruce_dev /> zip test.zip
     Size   Packed          CRC32        Modified
    56096    10764   81%  0b779605  Jan 31 2018 08:42  jniorsys.log
    40302     6249   84%  b1dffc05  Jan 26 2018 07:38  web.log
    22434     9499   58%  059a09d9  Jan 25 2018 14:53  manifest.json
       89       89    0%  f97bbba2  Jan 28 2018 09:11  access.log
      990      460   54%  9699bbfe  Jan 30 2018 14:48  jniorboot.log.bak
      990      461   53%  8f3c0390  Jan 31 2018 08:42  jniorboot.log
 6 files listed
bruce_dev /> 
This archive verifies. This verification does decompress each file and confirm the CRC32.
bruce_dev /> zip -vl test.zip
  verifying: jniorsys.log (compressed)
  verifying: web.log (compressed)
  verifying: manifest.json (compressed)
  verifying: access.log
  verifying: jniorboot.log.bak (compressed)
  verifying: jniorboot.log (compressed)
 6 entries found - content verifies!
bruce_dev /> 
We can repeat the construction in verbose output mode. Here we see timing. Again, keep in mind that JANOS is running on a 100 MHz Renesas RX63N micro-controller.
bruce_dev /> zip -cl test.zip /
  deflate: /jniorsys.log (56096 bytes)
   saving: jniorsys.log (compressed 80.8%) 1.176 secs
  deflate: /web.log (40302 bytes)
   saving: web.log (compressed 84.5%) 0.671 secs
  deflate: /manifest.json (22434 bytes)
   saving: manifest.json (compressed 57.7%) 1.863 secs
   saving: access.log (stored) 0.011 secs
  deflate: /jniorboot.log.bak (990 bytes)
   saving: jniorboot.log.bak (compressed 53.5%) 0.054 secs
  deflate: /jniorboot.log (990 bytes)
   saving: jniorboot.log (compressed 53.4%) 0.054 secs
 6 files saved
bruce_dev />
The /etc/JanosClasses.jar file compresses and this file is where I first encountered Huffman tables whose bit depth (code length) exceeded the DEFLATE maximums (15 bit for literal and distance tables, 7 bit for code length encoding).
bruce_dev /> zip -cl test.zip /etc
  deflate: /etc/JanosClasses.jar (266601 bytes)
   saving: etc/JanosClasses.jar (compressed 11.2%) 24.215 secs
 1 files saved
bruce_dev /> 

bruce_dev /> zip test.zip
     Size   Packed          CRC32        Modified
   266601   236758   11%  20916587  Jan 11 2018 09:58  etc/JanosClasses.jar
 1 files listed
bruce_dev /> 

bruce_dev /> zip -vl test.zip
  verifying: etc/JanosClasses.jar (compressed)
 1 entries found - content verifies!
bruce_dev /> 
I know that 24 seconds for 1/4 megabyte file is nothing to write home about. Now that things are fully functional I can go back and work on the LZ77 where basically all of the time is consumed. I can certainly improve on this performance but as is is, it isn't that bad. The JNIOR is a controller after all and you likely wouldn't need to compress other archives.

Post Reply