This Community site is new! Please help us build a community around the JNIOR.
Sign up, and help share your knowledge. Please sign-up even if you do not plan to post as a sign of support.
If there is evidence of a demand we will continue to develop the content here.

DEFLATE Compression Algorithm

INTEG's own Real-time Operating System. No 3rd party code here. 100% Supportable.
Post Reply
bscloutier
Posts: 401
Joined: Thu Sep 14, 2017 12:55 pm

DEFLATE Compression Algorithm

Post by bscloutier » Fri Dec 29, 2017 1:33 pm

So in the past I have designed products that were capable of plotting collected data as a graphics file for display in the browser. This isn't currently possible in the JNIOR and it is a feature that could be added to JANOS. It can be useful. I should also mention that JANOS executes Java and serves files out of JAR and ZIP files which generally utilize DEFLATE compression. So we already perform DEFLATE decompression. With a compressor we can add the ability to create/modify JAR and ZIP archives as well as to generate PNG graphics for plots.

The issue with the specifications is that they describe the compression and file formats but do not give you the algorithms. They don't tell you how to do it. Just what you must do.

Sure often there is reference code or open source projects that you can find. Those often are complete projects and it is difficult to find the precise code in it that you need to understand if you are to implement the core algorithms. Then that code has been optimized and complicated with options over the years that end up obfuscating the structure. And, no, we don't just absorb 3rd party code into JANOS. That is not the way to maintain a stable embedded environment. Not in my book.

So far this is the case for DEFLATE. I am going to develop the algorithm from scratch so that I fully understand it, know precisely what every statement does, and how the code will perform in the JANOS platform. Maybe more importantly that it will not misbehave and drag JANOS down.

Well I was thinking that I would openly do that here so you can play along. Don't hesitate to create a legitimate account here and comment or even help. Some approaches to LZ77 have been patented. I have no clue and am not going to spend weeks trying to understand the patent prose. Supposedly the LZ77 implementation associated with DEFLATE is unencumbered. Maybe so. Still I might get creative and cross some line. I am not overly concerned about it but would like to know, at least before getting some sort of legal complaint.

Yeah so... let's reinvent the wheel. That's usually how I roll...

bscloutier
Posts: 401
Joined: Thu Sep 14, 2017 12:55 pm

Re: DEFLATE Compression Algorithm

Post by bscloutier » Fri Dec 29, 2017 2:11 pm

If you look into DEFLATE you will be quickly distracted by data structure and Huffman coding. There is some confusing but genius ways of efficiently conveying the Huffman tables and the Huffman coding even seems to be recursively applied. All of that will boggle your mind even before you get to the LZ77 compression at the heart of it all. So, ignore all of that. We will get to it. I am going to start at the heart and build outwards.

LZ77 compression

The compression works by identifying sequences of bytes that occur in the data stream that are repeated later. The repeated sequence is replaced by a short and efficient reference back to the earlier data thereby reducing the size of the overall stream. A 20-year old document by Antaeus Feldspar describes it well by example.

As everything has a limitation, DEFLATE defines a 32KB sliding window which may contain any referenced prior sequence. It just is not feasible to randomly access the entire data set and allow you to reference sequences all the way back to the beginning. This also keeps distances under control allowing only smaller integers to appear in the stream.

It all sounds great but then you realize that in compression you have search that entire 32KB window for matches to the current sequence of bytes each time a new byte is added. Lots of processor cycles are involved and the whole process could take forever. Of course while the decompressor needs to be ready to reference the whole 32KB window a compressor might use a smaller window. That would reduce the effort involved at the cost of compression efficiency. The specifications suggest window and sequence length factors that might be controlled in balancing speed and space efficiency.

It can all get more complex in that the prior sequence can actually overlap the current sequence (as in the example in the document). A further complication comes in if you consider that lazy matching might lead to better compression. A short sequence match might mask a potentially longer match which could have been more beneficial.

So how do I want to proceed here. Some of this reminds me of the fun in creating JANOS' Regex engine. Hmm...

bscloutier
Posts: 401
Joined: Thu Sep 14, 2017 12:55 pm

Re: DEFLATE Compression Algorithm

Post by bscloutier » Fri Dec 29, 2017 2:59 pm

The Regex engine ends up compiling the expression into one large state machine through which multiple pointers can progress simultaneously. Pointers are advanced as each character is obtained from the data stream. The first pointer (or last depending on the mode) to make it through the whole expression signals a match. If it sounds complicated, it sort of is but at the same time its pretty cool. As far as Regex goes I've been able to apply this to almost all of the standard Regex functionality. But JANOS hasn't implemented every Regex syntax.

For DEFLATE there is somewhat of a similar situation where we want to examine bytes from the data stream one at a time and have the compression algorithm raise its hand when a sequence can be optimally encoded by a [length, distance] pointer. But we want to consider all possibilities and to try to do what leads to the most efficient compression.

I will start by implementing the sliding window as a queue using my standard approach to such things. The size of the queue will be 32K entries or less. In fact, to start out I'll probably keep it very small. We can enlarge it later when we benchmark the compression algorithm.

Two index pointers will bound the data in the queue. Each new byte will be inserted at the INPTR and that pointer will be incremented and wrapped as required. The oldest byte will be located at the OUTPTR. Once the queue fills we will have to advance the OUTPTR to make room for the next entry. Older data will be dropped and the queue will run at maximum capacity. This is the sliding window.

The sliding window caches the uncompressed data stream. The compressed data stream will be generated by the algorithm separately. We need one more pointer in the sliding widow indicating the start of the current sequence being evaluated. Call that CURPTR. If the current sequence cannot be matched we would output the byte from CURPTR and advance and wrap the pointer as necessary. If the sequence is matched we output the [length, distance] code and advance CURPTR to skip the entire matched sequence.

CURPTR then will lag behind INPTR. It will not get in the way of OUTPTR as DEFLATE specifies a maximum length for the sequence match of 238 bytes and our sliding window will be much larger.

Now lets think about the sequence matching procedure...

bscloutier
Posts: 401
Joined: Thu Sep 14, 2017 12:55 pm

Re: DEFLATE Compression Algorithm

Post by bscloutier » Tue Jan 02, 2018 7:34 am

I am going to prototype my algorithm in Java on the JNIOR just to make it easier to test and debug. Later, once I have the structure, I can recast it in C and embed it into JANOS.

We're going to focus on the LZ77 part of the compression first. Our main goal is simply to be compatible with decompression. Our LZ77 algorithm then basically doesn't have to do anything. We wouldn't need to find a single repeated sequence nor replace anything with pointers back to any sliding window. Of course our compression ratio wouldn't be all that impressive. We would still gain through the latter Huffman Coding stages which I am leaving to later. But in the end we would still be able to create file collections and PNG graphics files that are universally usable.

But really, there no fun in a kludge. Let's see if we can achieve a LZ77 implementation that we can be proud of. Well, at least one that works.

So for development I am going to create a program that will read a selected file compress it using LZ77 into another. I'll isolate all of the compression effort into one routine and have the outer program report some statistics upon completion.

bscloutier
Posts: 401
Joined: Thu Sep 14, 2017 12:55 pm

Re: DEFLATE Compression Algorithm

Post by bscloutier » Tue Jan 02, 2018 8:32 am

Here's our program for testing compression. All of the compression work will be done in do_compress() and this will report the results. At this point we just copy the source file. Yeah, this will be slow. But it will let us examine what we are trying to do more closely than if I went straight to C and use the debugger. In that case I couldn't really share it with you.

package jtest;

import com.integpg.system.JANOS;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;

public class Main {
    
    public static void main(String[] args) throws Throwable {
        
        // Requires a test file (use log as default)
        String filename = "/jniorsys.log";
        if (args.length > 0)
            filename = args[0];
        
        // Open the selected file for reading
        File src = new File(filename);
        long srclen = src.length();
        BufferedReader infile = new BufferedReader(new FileReader(src));
        if (!infile.ready())
            System.exit(1);
        
        // Create an output file
        BufferedWriter outfile = new BufferedWriter(new FileWriter("/outfile.dat"));
        
        // perform compression
        long timer = JANOS.uptimeMillis();
        do_compress(outfile, infile);
        timer = JANOS.uptimeMillis() - timer;
        
        // Close files
        outfile.close();
        infile.close();
        
        // Output statistics
        File dest = new File("/outfile.dat");
        long destlen = dest.length();
        
        System.out.printf("Processing %.3f seconds.\n", timer/1000.);
        System.out.printf("Source %lld bytes.\n", srclen);
        System.out.printf("Result %lld bytes.\n", destlen);
        System.out.printf("Ratio %.2f%%\n", 100. - (100. * destlen)/srclen);
    }
        
}
    

This uses my throws Throwable trick to avoid having to worry about try-catch for the time being.

    
    // Our LZ77 compression engine
    static void do_compress(BufferedWriter outfile, BufferedReader infile) throws Throwable {
        
        // simply copy at first
        while (infile.ready()) {
            int ch = infile.read();
            outfile.write(ch);
        }
        
    }
bruce_dev /> jtest
Processing 32.360 seconds.
Source 36737 bytes.
Result 36737 bytes.
Ratio 0.00%

bruce_dev /> echo Blah blah blah blah > blah.dat

bruce_dev /> cat blah.dat
Blah blah blah blah

bruce_dev /> jtest blah.dat
Processing 0.023 seconds.
Source 21 bytes.
Result 21 bytes.
Ratio 0.00%

bruce_dev /> 

Now back to thinking about the actual compression and matching sequences from a sliding window.

bscloutier
Posts: 401
Joined: Thu Sep 14, 2017 12:55 pm

Re: DEFLATE Compression Algorithm

Post by bscloutier » Tue Jan 02, 2018 8:59 am

To start let's implement our sliding window. Recall that I would use a queue for that. You can see here that depending on the window size we will retain the previous so many bytes. New bytes are queued at the INPTR position and when the queue fills we will push the OUTPTR discarding older data.

    // Our LZ77 compression engine
    static void do_compress(BufferedWriter outfile, BufferedReader infile) throws Throwable {
        
        // create queue (sliding window)
        int window = 1024;
        byte[] data = new byte[window];
        int inptr = 0;
        int outptr = 0;
        
        // simply copy at first
        while (infile.ready()) {
            
            // obtain next byte
            int ch = infile.read();
            
            // matching (cannot yet so just output byte)
            outfile.write(ch);
            
            // queue uncompressed data
            data[inptr++] = (byte)ch;
            if (inptr == window)
                inptr = 0;
            if (inptr == outptr) {
                outptr++;
                if (outptr == window)
                    outptr = 0;
            }       
            
        }
        
    }

Now for a particular position in the input stream (CURPTR) we want to scan the queue for sequence matches. We could do that by brute force but for a large sliding window that would be very slow. Also there is the concept of a lazy match which if implemented might lead to better compression ratios. So how to approach the matching process?

bscloutier
Posts: 401
Joined: Thu Sep 14, 2017 12:55 pm

Re: DEFLATE Compression Algorithm

Post by bscloutier » Tue Jan 02, 2018 9:16 am

So for some position the input stream we will be searching prior data for a sequence match (3 or more bytes). So we create the CURPTR. If we cannot find a match (and right now we cannot because we haven't implemented matching) we will just output the data byte and bump the current position. Searching will continue for a match starting at the new position.

Right now CURPTR will track INTPTR. Later while we are watching for matches it will lag behind. Here we have created CURPTR. There is otherwise no major functional change.

    // Our LZ77 compression engine
    static void do_compress(BufferedWriter outfile, BufferedReader infile) throws Throwable {
        
        // create queue (sliding window)
        int window = 1024;
        byte[] data = new byte[window];
        int inptr = 0;
        int outptr = 0;
        int curptr = 0;
        
        // simply copy at first
        while (infile.ready()) {
            
            // obtain next byte
            int ch = infile.read();
            
            // matching (cannot yet so just output byte)
            outfile.write(ch);
            curptr++;
            if (curptr == window)
                curptr = 0;
            
            // queue uncompressed data
            data[inptr++] = (byte)ch;
            if (inptr == window)
                inptr = 0;
            if (inptr == outptr) {
                outptr++;
                if (outptr == window)
                    outptr = 0;
            }       
            
        }
        
    }

Now how are we going to do this "watching for matches" thing?

bscloutier
Posts: 401
Joined: Thu Sep 14, 2017 12:55 pm

Re: DEFLATE Compression Algorithm

Post by bscloutier » Tue Jan 02, 2018 9:49 am

Let's create the concept of an active match. At any given time we there will be from 0 to some number of active matches. Each will represent a match to the sequence of bytes appearing at the an input stream position. As new bytes are retrieved from the uncompressed input stream we will check any active matches and advance them . If a match is no longer valid it will be removed and forgotten. At that point though maybe the match warrants replacement in the stream with a pointer. We will see.

I made the data queue and related pointers static members of the program and created the following class representing an active match.

    // An active match. At any given position in the sliding WINDOW we compare and track
    //  matches to the incoming DATA stream.
    class match {
        public int start;
        public int ptr;
        public int len;
        
        match(int pos) {
            start = pos;
            ptr = pos + 1;
            if (ptr == WINDOW)
                ptr = 0;
            len = 1;
        }
        
        public boolean check(int ch) {
            if (DATA[ptr] != ch)
                return (false);
            
            ptr++;
            if (ptr == WINDOW)
                ptr = 0;
            len++;
            
            return (true);
        }
    }

When a new data byte is entered into the queue we will want to create these active match objects for every matching byte that previously exists. Every one of those represents a potential sequence.

bscloutier
Posts: 401
Joined: Thu Sep 14, 2017 12:55 pm

Re: DEFLATE Compression Algorithm

Post by bscloutier » Tue Jan 02, 2018 12:14 pm

Okay when we enter a new byte in the queue it becomes a candidate for replacement by a reference to a sequence starting with the byte somewhere earlier. So we want to start a new active match for the earlier bytes. We will process these matches as additional bytes are received from the input stream. To do this though we don't want to search the entire window for prior existences of each character. So to make things efficient I am going to maintain linked lists through the queue for each character.

Since a byte can have 256 values we create a HEAD pointer array with 256 entries. This is referenced using the byte value. Each queue position then will have both a forward FWD and backwards BACK pointer forming a bi-directional linked list. Yeah, this quintuples our memory requirement but with the benefit of processing speed.

The list has to be bi-directional because once the queue fills we are going to drop bytes. It is then necessary to trim the linked lists to remove pointers for data no longer in the queue. That can only be done efficiently if we can reference a previous entry in the linked list. So we need both directions.

Here are our static members so far. This is the memory usage.

    // create queue (sliding window)
    static final int WINDOW = 1024;
    static final byte[] DATA = new byte[WINDOW];
    static int INPTR = 0;
    static int OUTPTR = 0;
    static int CURPTR = 0;
    
    // data linked list arrays
    static final short[] HEAD = new short[256];
    static final short[] FWD = new short[WINDOW];
    static final short[] BACK = new short[WINDOW];

Now we maintain the linked lists as we add and remove data from the queue. We also can efficiently create new active match objects. Note that we store pointers in the links as +1 so as to keep 0 as a terminator.

            // queue uncompressed DATA
            DATA[INPTR] = (byte)ch;
            
            // Add byte to the head of the appropriate linked list. Note pointers are stored +1 so
            //  as to use 0 as an end of list marker. Lists are bi-directional so we can trim the 
            //  tail when data is dropped from the queue.
            short ptr = HEAD[ch];
            HEAD[ch] = (short)(INPTR + 1);
            FWD[INPTR] = ptr;
            BACK[INPTR] = 0;
            if (ptr != 0)
                BACK[ptr - 1] = (short)(INPTR + 1);
            
            // advance entry pointer
            INPTR++;
            if (INPTR == WINDOW)
                INPTR = 0;
            
            // drop data from queue when full
            if (INPTR == OUTPTR) {
                
                // trim linked list as byte is being dropped
                if (BACK[OUTPTR] == 0)
                    HEAD[DATA[OUTPTR]] = 0;
                else
                    FWD[BACK[OUTPTR] - 1] = 0;

                // push end of queue
                OUTPTR++;
                if (OUTPTR == WINDOW)
                    OUTPTR = 0;
            }

            // create new active match for all CH in the queue (except last)
            while (ptr != 0) {
                
                // new match started (not doing anything with it yet)
                match m = new match(ptr - 1);
                
                ptr = FWD[ptr - 1];
            }

I adjusted the program to dump non-zero HEAD entries and each occupied queue position including the links as a check. Remember that links are stored in here +1.

bruce_dev /> jtest blah.dat

HEAD
 0x0a 21
 0x0d 20
 0x20 15
 0x42 1
 0x61 18
 0x62 16
 0x68 19
 0x6c 17

QUEUE
 0 0x42 0 0
 1 0x6c 0 7
 2 0x61 0 8
 3 0x68 0 9
 4 0x20 0 10
 5 0x62 0 11
 6 0x6c 2 12
 7 0x61 3 13
 8 0x68 4 14
 9 0x20 5 15
 10 0x62 6 16
 11 0x6c 7 17
 12 0x61 8 18
 13 0x68 9 19
 14 0x20 10 0
 15 0x62 11 0
 16 0x6c 12 0
 17 0x61 13 0
 18 0x68 14 0
 19 0x0d 0 0
 20 0x0a 0 0

STATS
Processing 0.078 seconds.
Source 21 bytes.
Result 21 bytes.
Ratio 0.00%

bruce_dev /> 
Attachments
Main.java
(5.02 KiB) Downloaded 17 times

bscloutier
Posts: 401
Joined: Thu Sep 14, 2017 12:55 pm

Re: DEFLATE Compression Algorithm

Post by bscloutier » Tue Jan 02, 2018 1:22 pm

Now as we create new active matches we are going to collect them in an ArrayList object.

    // active matching
    static ArrayList<match> SEQ = new ArrayList();

            // create new active matches for all CH in the queue (except last)
            while (ptr != 0) {
                SEQ.add(new match(ptr - 1));
                ptr = FWD[ptr - 1];
            }

So as each new data byte is retrieved from the uncompressed input stream we will process all active matches. Those that continue to match will be retained and others dropped. That code looks like this:

        // process uncompressed stream
        while (infile.ready()) {
            
            // obtain next byte
            int ch = infile.read();
            
            // process active match objects
            System.out.printf("New byte[%d]: 0x%02x\n", INPTR, ch & 0xff);
            for (int n = SEQ.size() - 1; 0 <= n; n--) {
                match m = SEQ.get(n);
                if (!m.check(ch))
                    SEQ.remove(n);
            }

If following this if I dump the remaining active matches we can watch those proceed. At this point though we have not interpreted the matching status so as to decide whether or not the stream can be altered.

            // dump remaining active matches
            Iterator i = SEQ.iterator();
            while (i.hasNext()) {
                match m = (match) i.next();
                System.out.printf(" Start: %d Ptr: %d Len: %d\n", m.start, m.ptr, m.len);
            }

So in reviewing the An Explanation of the DEFLATE Algorithm paper from 1997:
Antaeus Feldspar wrote:LZ77 compression

LZ77 compression works by finding sequences of data that are repeated. The term "sliding window" is used; all it really means is that at any given point in the data, there is a record of what characters went before. A 32K sliding window means that the compressor (and decompressor) have a record of what the last 32768 (32 * 1024) characters were. When the next sequence of characters to be compressed is identical to one that can be found within the sliding window, the sequence of characters is replaced by two numbers: a distance, representing how far back into the window the sequence starts, and a length, representing the number of characters for which the sequence is identical.

I realize this is a lot easier to see than to just be told. Let's look at some highly compressible data:
        Blah blah blah blah blah!
Our datastream starts by receiving the following characters: "B," "l," "a," "h," " ," and "b." However, look at the next five characters:
         vvvvv
        Blah blah blah blah blah!
              ^^^^^
There is an exact match for those five characters in the characters that have already gone into the datastream, and it starts exactly five characters behind the point where we are now. This being the case, we can output special characters to the stream that represent a number for length, and a number for distance.

The data so far:
	Blah blah b
The compressed form of the data so far:
	Blah b[D=5,L=5]
The compression can still be increased, though to take full advantage of it requires a bit of cleverness on the part of the compressor. Look at the two strings that we decided were identical. Compare the character that follows each of them. In both cases, it's "l" -- so we can make the length 6, and not just five. But if we continue checking, we find the next characters, and the next characters, and the next characters, are still identical -- even if the so-called 'previous' string is overlapping the string we're trying to represent in the compressed data!

It turns out that the 18 characters that start at the second character are identical to the 18 characters that start at the seventh character. It's true that when we're decompressing, and read the length, distance pair that describes this relationship, we don't know what all those 18 characters will be yet -- but if we put in place the ones that we know, we will know more, which will allow us to put down more... or, knowing that any length-and-distance pair where length > distance is going to be repeating (distance) characters again and again, we can set up the decompressor to do just that.

It turns out our highly compressible data can be compressed down to just this:
	Blah b[D=5, L=18]!

So if I feed this exact stream to what we have so far we can observe the sequencing:

Code: Select all

bruce_dev /> echo Blah blah blah blah blah! > blah.dat

bruce_dev /> cat blah.dat
Blah blah blah blah blah!

bruce_dev /> jtest blah.dat                           
New byte[0]: 0x42
New byte[1]: 0x6c
New byte[2]: 0x61
New byte[3]: 0x68
New byte[4]: 0x20
New byte[5]: 0x62
New byte[6]: 0x6c
New byte[7]: 0x61
 Start: 1 Ptr: 3 Len: 2
New byte[8]: 0x68
 Start: 1 Ptr: 4 Len: 3
 Start: 2 Ptr: 4 Len: 2
New byte[9]: 0x20
 Start: 1 Ptr: 5 Len: 4
 Start: 2 Ptr: 5 Len: 3
 Start: 3 Ptr: 5 Len: 2
New byte[10]: 0x62
 Start: 1 Ptr: 6 Len: 5
 Start: 2 Ptr: 6 Len: 4
 Start: 3 Ptr: 6 Len: 3
 Start: 4 Ptr: 6 Len: 2
New byte[11]: 0x6c
 Start: 1 Ptr: 7 Len: 6
 Start: 2 Ptr: 7 Len: 5
 Start: 3 Ptr: 7 Len: 4
 Start: 4 Ptr: 7 Len: 3
 Start: 5 Ptr: 7 Len: 2
New byte[12]: 0x61
 Start: 1 Ptr: 8 Len: 7
 Start: 2 Ptr: 8 Len: 6
 Start: 3 Ptr: 8 Len: 5
 Start: 4 Ptr: 8 Len: 4
 Start: 5 Ptr: 8 Len: 3
 Start: 6 Ptr: 8 Len: 2
 Start: 1 Ptr: 3 Len: 2
New byte[13]: 0x68
 Start: 1 Ptr: 9 Len: 8
 Start: 2 Ptr: 9 Len: 7
 Start: 3 Ptr: 9 Len: 6
 Start: 4 Ptr: 9 Len: 5
 Start: 5 Ptr: 9 Len: 4
 Start: 6 Ptr: 9 Len: 3
 Start: 1 Ptr: 4 Len: 3
 Start: 7 Ptr: 9 Len: 2
 Start: 2 Ptr: 4 Len: 2
New byte[14]: 0x20
 Start: 1 Ptr: 10 Len: 9
 Start: 2 Ptr: 10 Len: 8
 Start: 3 Ptr: 10 Len: 7
 Start: 4 Ptr: 10 Len: 6
 Start: 5 Ptr: 10 Len: 5
 Start: 6 Ptr: 10 Len: 4
 Start: 1 Ptr: 5 Len: 4
 Start: 7 Ptr: 10 Len: 3
 Start: 2 Ptr: 5 Len: 3
 Start: 8 Ptr: 10 Len: 2
 Start: 3 Ptr: 5 Len: 2
New byte[15]: 0x62
 Start: 1 Ptr: 11 Len: 10
 Start: 2 Ptr: 11 Len: 9
 Start: 3 Ptr: 11 Len: 8
 Start: 4 Ptr: 11 Len: 7
 Start: 5 Ptr: 11 Len: 6
 Start: 6 Ptr: 11 Len: 5
 Start: 1 Ptr: 6 Len: 5
 Start: 7 Ptr: 11 Len: 4
 Start: 2 Ptr: 6 Len: 4
 Start: 8 Ptr: 11 Len: 3
 Start: 3 Ptr: 6 Len: 3
 Start: 9 Ptr: 11 Len: 2
 Start: 4 Ptr: 6 Len: 2
New byte[16]: 0x6c
 Start: 1 Ptr: 12 Len: 11
 Start: 2 Ptr: 12 Len: 10
 Start: 3 Ptr: 12 Len: 9
 Start: 4 Ptr: 12 Len: 8
 Start: 5 Ptr: 12 Len: 7
 Start: 6 Ptr: 12 Len: 6
 Start: 1 Ptr: 7 Len: 6
 Start: 7 Ptr: 12 Len: 5
 Start: 2 Ptr: 7 Len: 5
 Start: 8 Ptr: 12 Len: 4
 Start: 3 Ptr: 7 Len: 4
 Start: 9 Ptr: 12 Len: 3
 Start: 4 Ptr: 7 Len: 3
 Start: 10 Ptr: 12 Len: 2
 Start: 5 Ptr: 7 Len: 2
New byte[17]: 0x61
 Start: 1 Ptr: 13 Len: 12
 Start: 2 Ptr: 13 Len: 11
 Start: 3 Ptr: 13 Len: 10
 Start: 4 Ptr: 13 Len: 9
 Start: 5 Ptr: 13 Len: 8
 Start: 6 Ptr: 13 Len: 7
 Start: 1 Ptr: 8 Len: 7
 Start: 7 Ptr: 13 Len: 6
 Start: 2 Ptr: 8 Len: 6
 Start: 8 Ptr: 13 Len: 5
 Start: 3 Ptr: 8 Len: 5
 Start: 9 Ptr: 13 Len: 4
 Start: 4 Ptr: 8 Len: 4
 Start: 10 Ptr: 13 Len: 3
 Start: 5 Ptr: 8 Len: 3
 Start: 11 Ptr: 13 Len: 2
 Start: 6 Ptr: 8 Len: 2
 Start: 1 Ptr: 3 Len: 2
New byte[18]: 0x68
 Start: 1 Ptr: 14 Len: 13
 Start: 2 Ptr: 14 Len: 12
 Start: 3 Ptr: 14 Len: 11
 Start: 4 Ptr: 14 Len: 10
 Start: 5 Ptr: 14 Len: 9
 Start: 6 Ptr: 14 Len: 8
 Start: 1 Ptr: 9 Len: 8
 Start: 7 Ptr: 14 Len: 7
 Start: 2 Ptr: 9 Len: 7
 Start: 8 Ptr: 14 Len: 6
 Start: 3 Ptr: 9 Len: 6
 Start: 9 Ptr: 14 Len: 5
 Start: 4 Ptr: 9 Len: 5
 Start: 10 Ptr: 14 Len: 4
 Start: 5 Ptr: 9 Len: 4
 Start: 11 Ptr: 14 Len: 3
 Start: 6 Ptr: 9 Len: 3
 Start: 1 Ptr: 4 Len: 3
 Start: 12 Ptr: 14 Len: 2
 Start: 7 Ptr: 9 Len: 2
 Start: 2 Ptr: 4 Len: 2
New byte[19]: 0x20
 Start: 1 Ptr: 15 Len: 14
 Start: 2 Ptr: 15 Len: 13
 Start: 3 Ptr: 15 Len: 12
 Start: 4 Ptr: 15 Len: 11
 Start: 5 Ptr: 15 Len: 10
 Start: 6 Ptr: 15 Len: 9
 Start: 1 Ptr: 10 Len: 9
 Start: 7 Ptr: 15 Len: 8
 Start: 2 Ptr: 10 Len: 8
 Start: 8 Ptr: 15 Len: 7
 Start: 3 Ptr: 10 Len: 7
 Start: 9 Ptr: 15 Len: 6
 Start: 4 Ptr: 10 Len: 6
 Start: 10 Ptr: 15 Len: 5
 Start: 5 Ptr: 10 Len: 5
 Start: 11 Ptr: 15 Len: 4
 Start: 6 Ptr: 10 Len: 4
 Start: 1 Ptr: 5 Len: 4
 Start: 12 Ptr: 15 Len: 3
 Start: 7 Ptr: 10 Len: 3
 Start: 2 Ptr: 5 Len: 3
 Start: 13 Ptr: 15 Len: 2
 Start: 8 Ptr: 10 Len: 2
 Start: 3 Ptr: 5 Len: 2
New byte[20]: 0x62
 Start: 1 Ptr: 16 Len: 15
 Start: 2 Ptr: 16 Len: 14
 Start: 3 Ptr: 16 Len: 13
 Start: 4 Ptr: 16 Len: 12
 Start: 5 Ptr: 16 Len: 11
 Start: 6 Ptr: 16 Len: 10
 Start: 1 Ptr: 11 Len: 10
 Start: 7 Ptr: 16 Len: 9
 Start: 2 Ptr: 11 Len: 9
 Start: 8 Ptr: 16 Len: 8
 Start: 3 Ptr: 11 Len: 8
 Start: 9 Ptr: 16 Len: 7
 Start: 4 Ptr: 11 Len: 7
 Start: 10 Ptr: 16 Len: 6
 Start: 5 Ptr: 11 Len: 6
 Start: 11 Ptr: 16 Len: 5
 Start: 6 Ptr: 11 Len: 5
 Start: 1 Ptr: 6 Len: 5
 Start: 12 Ptr: 16 Len: 4
 Start: 7 Ptr: 11 Len: 4
 Start: 2 Ptr: 6 Len: 4
 Start: 13 Ptr: 16 Len: 3
 Start: 8 Ptr: 11 Len: 3
 Start: 3 Ptr: 6 Len: 3
 Start: 14 Ptr: 16 Len: 2
 Start: 9 Ptr: 11 Len: 2
 Start: 4 Ptr: 6 Len: 2
New byte[21]: 0x6c
 Start: 1 Ptr: 17 Len: 16
 Start: 2 Ptr: 17 Len: 15
 Start: 3 Ptr: 17 Len: 14
 Start: 4 Ptr: 17 Len: 13
 Start: 5 Ptr: 17 Len: 12
 Start: 6 Ptr: 17 Len: 11
 Start: 1 Ptr: 12 Len: 11
 Start: 7 Ptr: 17 Len: 10
 Start: 2 Ptr: 12 Len: 10
 Start: 8 Ptr: 17 Len: 9
 Start: 3 Ptr: 12 Len: 9
 Start: 9 Ptr: 17 Len: 8
 Start: 4 Ptr: 12 Len: 8
 Start: 10 Ptr: 17 Len: 7
 Start: 5 Ptr: 12 Len: 7
 Start: 11 Ptr: 17 Len: 6
 Start: 6 Ptr: 12 Len: 6
 Start: 1 Ptr: 7 Len: 6
 Start: 12 Ptr: 17 Len: 5
 Start: 7 Ptr: 12 Len: 5
 Start: 2 Ptr: 7 Len: 5
 Start: 13 Ptr: 17 Len: 4
 Start: 8 Ptr: 12 Len: 4
 Start: 3 Ptr: 7 Len: 4
 Start: 14 Ptr: 17 Len: 3
 Start: 9 Ptr: 12 Len: 3
 Start: 4 Ptr: 7 Len: 3
 Start: 15 Ptr: 17 Len: 2
 Start: 10 Ptr: 12 Len: 2
 Start: 5 Ptr: 7 Len: 2
New byte[22]: 0x61
 Start: 1 Ptr: 18 Len: 17
 Start: 2 Ptr: 18 Len: 16
 Start: 3 Ptr: 18 Len: 15
 Start: 4 Ptr: 18 Len: 14
 Start: 5 Ptr: 18 Len: 13
 Start: 6 Ptr: 18 Len: 12
 Start: 1 Ptr: 13 Len: 12
 Start: 7 Ptr: 18 Len: 11
 Start: 2 Ptr: 13 Len: 11
 Start: 8 Ptr: 18 Len: 10
 Start: 3 Ptr: 13 Len: 10
 Start: 9 Ptr: 18 Len: 9
 Start: 4 Ptr: 13 Len: 9
 Start: 10 Ptr: 18 Len: 8
 Start: 5 Ptr: 13 Len: 8
 Start: 11 Ptr: 18 Len: 7
 Start: 6 Ptr: 13 Len: 7
 Start: 1 Ptr: 8 Len: 7
 Start: 12 Ptr: 18 Len: 6
 Start: 7 Ptr: 13 Len: 6
 Start: 2 Ptr: 8 Len: 6
 Start: 13 Ptr: 18 Len: 5
 Start: 8 Ptr: 13 Len: 5
 Start: 3 Ptr: 8 Len: 5
 Start: 14 Ptr: 18 Len: 4
 Start: 9 Ptr: 13 Len: 4
 Start: 4 Ptr: 8 Len: 4
 Start: 15 Ptr: 18 Len: 3
 Start: 10 Ptr: 13 Len: 3
 Start: 5 Ptr: 8 Len: 3
 Start: 16 Ptr: 18 Len: 2
 Start: 11 Ptr: 13 Len: 2
 Start: 6 Ptr: 8 Len: 2
 Start: 1 Ptr: 3 Len: 2
New byte[23]: 0x68
 Start: 1 Ptr: 19 Len: 18
 Start: 2 Ptr: 19 Len: 17
 Start: 3 Ptr: 19 Len: 16
 Start: 4 Ptr: 19 Len: 15
 Start: 5 Ptr: 19 Len: 14
 Start: 6 Ptr: 19 Len: 13
 Start: 1 Ptr: 14 Len: 13
 Start: 7 Ptr: 19 Len: 12
 Start: 2 Ptr: 14 Len: 12
 Start: 8 Ptr: 19 Len: 11
 Start: 3 Ptr: 14 Len: 11
 Start: 9 Ptr: 19 Len: 10
 Start: 4 Ptr: 14 Len: 10
 Start: 10 Ptr: 19 Len: 9
 Start: 5 Ptr: 14 Len: 9
 Start: 11 Ptr: 19 Len: 8
 Start: 6 Ptr: 14 Len: 8
 Start: 1 Ptr: 9 Len: 8
 Start: 12 Ptr: 19 Len: 7
 Start: 7 Ptr: 14 Len: 7
 Start: 2 Ptr: 9 Len: 7
 Start: 13 Ptr: 19 Len: 6
 Start: 8 Ptr: 14 Len: 6
 Start: 3 Ptr: 9 Len: 6
 Start: 14 Ptr: 19 Len: 5
 Start: 9 Ptr: 14 Len: 5
 Start: 4 Ptr: 9 Len: 5
 Start: 15 Ptr: 19 Len: 4
 Start: 10 Ptr: 14 Len: 4
 Start: 5 Ptr: 9 Len: 4
 Start: 16 Ptr: 19 Len: 3
 Start: 11 Ptr: 14 Len: 3
 Start: 6 Ptr: 9 Len: 3
 Start: 1 Ptr: 4 Len: 3
 Start: 17 Ptr: 19 Len: 2
 Start: 12 Ptr: 14 Len: 2
 Start: 7 Ptr: 9 Len: 2
 Start: 2 Ptr: 4 Len: 2
New byte[24]: 0x21
New byte[25]: 0x0d
New byte[26]: 0x0a
Processing 2.494 seconds.
Source 27 bytes.
Result 27 bytes.
Ratio 0.00%

bruce_dev /> 

Now the 'D' in the paper refers to the distance between our CURPTR at the point when a match is started and the matched sequence position. If you persevere through the above you can verify that D=5 would be correct and that our best match ran through the length of 18.

So we now need the logic controlling the advancement of CURPTR and what then to output to the compressed stream.

Post Reply