JANOS

The key to debugging these days lies in the ability to recreate the failure. Some of the more elusive problems start out as rumors where a user swears that something failed to do what was expected but that he/she cannot prove it. After a while perhaps the issue resurfaces and this time someone is able to collect enough evidence to barely prove that the problem is a real concern. What exactly occurs remains a complete mystery. Eventually someone files a bug report but the developers are clueless and cannot  effectively address the cause.

Sometimes there are suggestions as to what actions may have triggered the problem. The Support Team may take the time to try to recreate the problem on a local system but to no avail. Other issues which stand a better chance of being resolved take precedence. Meanwhile news comes in that the original issue has again occurred and may even now happen in completely different locations and environments. What to do next?

Perhaps it is appropriate to regenerate the application for the user’s site to add logging and other debugging statements in hopes of creating a better record of the events leading up to a future failure. The difficulty here is in deciding what exactly to log and to be logged by what aspects of the system given no real knowledge of the source of the issue. Some guesses are made and the logging is enabled. Some time later which could be weeks the issue reoccurs and the logging proves to have not been useful. It may even be that the content of the logs are now meaningless as the logic behind them has been forgotten. So it is difficult to make progress.

It would be nice to know when the issue is going to occur so you know exactly when you need to start watching for it. Better yet, if the problem occurs you could skip back in time to only a few seconds before and you could solve it! If the issue involves network interaction you could setup a sniffer like Wireshark and set it to collecting detailed network traffic. Now with an external device this may involve the use of a Network Hub as opposed to a Network Switch so you can add a machine running Wireshark that would see all of the traffic. But that is a lot of hardware and it might just not be feasible.

Here’s where JANOS and the JNIOR might surprise you. If the issue is with the JNIOR application you can use the netstat -c command from the Command Line interface to generate a PCAP capture file capturing network traffic. That PCAP file can be downloaded to a PC with Wireshark installed and it will open right up for analysis. And here’s where the magic occurs. It is almost like JANOS knows ahead of time when the issue will occur, that PCAP file can contain traffic from BEFORE the event happened although it has been generated after the fact.

Okay. So it is not magic but the foresight of the operating system which always is capturing network traffic. There is a capture buffer that by default is 512KB and that can be enlarged through the IpConfig/CaptureBuffer Registry Key to as much as 8MB. It is not so much a ‘buffer’ as it is a queue. At any one point in time it contains a significant amount of recent network traffic. You can also set filtering that is to be employed during capture to limit the collected traffic to that of concern and thereby greatly extend the time frames involved. You can also use a filter in generating the PCAP file to limit what you transfer for analysis.

Now what if the event occurs overnight and by the time you realize it the data that you need has been overwritten? A simple utility application can be added to the unit to automatically generate the network PCAP capture file when the event is detected. You can even then have that file emailed to you!

Those of you who may have worked with this aspect of the JNIOR will note that the network capture only includes data from the previous boot if it is not yet begun to wrap around. If the failure involves an assertion that might cause the JNIOR to reboot the information is lost. Well this is true, however, as of JANOS v1.7.1 which now as of this writing a Release Candidate the capture buffer is non-volatile. It will survive a reboot and contain a record of the prior network traffic. The buffer is only reinitialized when power is applied.

It is also possible to configure the JNIOR in Promiscuous Mode wherein it would capture all external network traffic and not only that targeted to the JNIOR. If you couple that with a Network Hub you could utilize the JNIOR to monitor the interaction of other devices on a local network segment. I bet you never thought of JNIOR as a IT debugging tool?

If this aspect of the JNIOR interests you or if you need to utilize it in debugging, please feel free to contact INTEG for support. You may also consult the Registry Key Assignments documentation for details on network capture. This information is supplied as part of the Dynamic Configuration Pages (DCP) served by the JNIOR WebServer accessible by your favorite browser.

 

 

 

 

Once a JNIOR is up and running in an application it rarely needs attention. In fact many users forget that it is even there. It is however good practice to check the system logs once and a while. Fortunately there are a number of ways that you can do that. If you open the DCP (Dynamic Configuration Pages) by accessing the unit with a browser there is a ‘Syslog’ tab. This displays the log with the most recent events right at the top. You can also go to the ‘Console’ tab or otherwise open the Command Line through a Telnet connection or the serial port where you can list the log content directly to the screen or window. In this case the latest entries are the last listed.

If you are developing a new application and testing it on the JNIOR you may wish to monitor the log more closely. Naturally you can leave the DCP open on the ‘Syslog’ tab and new events would just pop up when they are logged. That might not be as convenient as it sounds especially if you are building a network-facing application and are constantly testing and working with the JNIOR using various browsers. Perhaps you are rebooting the unit causing the DCP to reconnect far too often. More typically we periodically make a Command Line connection through Telnet and manually list the log. Here we make use of the CAT command.

mqtt-test /> help cat
CAT filespec

Options:
 -H             Dump file in hex
 -J             Formats JSON
 -P             Displays last page

Displays the contents of a file.
Aliases: CAT, TYPE

mqtt-test /> cat -p jniorsys.log

Now the jniorsys.log file is limited to about 64KB characters. That can represent a lot of lines. Do you really want to list those every time? Here we see that the CAT command also provides the -P option which will show only the last page (25 lines) of the file. Of course you do have to remember to include the option. One side note here is that with JANOS you can supply the options anywhere on the line. So if you at first forget to include the -P you can place it after the filename. No problem.

We can improve on this by making use of JANOS batch file capabilities. A batch file is a kind of script file that can contain commands to be executed one after another. Most operating systems, JANOS included, provide some facility for batch execution. But you might be wondering how do you create batch files or if you do that externally how do you get them onto the JNIOR? Well here’s a tip: You can create them on the fly. Of course, the simpler the file the easier that can be accomplished.

For example we will create a simple batch file called log.bat as as kind of short hand that we can use when we want to display the System Log. Here we simply ECHO the command routing the text to the file. Immediately we can use the short batch file name to execute the command.

mqtt-test /> echo cat -p jniorsys.log > log.bat

mqtt-test /> log
cat -p jniorsys.log
10/03/18 18:54:15.850, FTP/10.0.0.27:58531 uploaded /flash/MQTT.jar [325.8 kbps]
10/03/18 18:54:24.571, ... (etc.)

Here we see that we used the ‘>’ pipe character to route the output to the specified file. If you need to build a batch file with more than one command you can append to it using ‘>>’ as well. The power here comes in combining commands to achieve some goal. The advantage being that you can repeat the procedure easily using the batch file name as shorthand. While batch files in JANOS are not as fully featured as you will find in Linux or MS-DOS systems most of the basics are there.

Suppose instead we want to list only today’s events. Here we create our batch file in way that lists only those entries with today’s date. Check it out.

mqtt-test /> echo grep %DATE% jniorsys.log > log.bat

mqtt-test /> log
grep 10/04/18 jniorsys.log
10/04/18 12:10:48.657, Ending session Command/10.0.0.27:64514 (pid 132)
10/04/18 12:37:53.642, Command/10.0.0.27:64075 login 'jnior' (pid 405)
10/04/18 12:38:05.179, FTP/10.0.0.27:64118 login 'jnior'
10/04/18 12:38:05.711 ... (etc.)

And there you go. You now have a simple way to check recent System Log activity. Now I bet that you are thinking how you might alter this to be even more helpful. Well, if you find that you cannot achieve what you have in mind just let us know. An advantage that you have with INTEG is a direct link to the technical team and the advantage we have is the power to implement what you need.

In JANOS 1.6.3 there are new security measures to harden the Beacon protocol. This has been an issue since its inception. Any action that commands or configures the JNIOR will require credentials to be supplied. Those credentials along with a valid NONCE will be evaluated by JANOS to determine if the action or configuration attempt will be allowed.

The NONCE will be supplied at the end of the ALL_INFO packet. To get a valid NONCE an ALL_INFO packet will need to be requested shortly before the NONCE will be used. In the Support Tool, for example, when someone wants to save the configuration we will request the ALL_INFO packet when displaying the new login dialog. Then when the user clicks OK with the new credentials we will have received the NONCE. The nonce is then used along with the credentials to provide the new authentication.

Here is code from the Support Tool that adds the code to extract the NONCE from the ALL_INFO packet if it exists.

        private void ParseAllInfo(ref JniorInfo jnrInfo, BinaryReader br)
        {
            jnrInfo.Gateway = ReadString(br);
            jnrInfo.PrimaryDns = ReadString(br);
            jnrInfo.SecondaryDns = ReadString(br);
            jnrInfo.DNSTimeout = IPAddress.NetworkToHostOrder(br.ReadInt32());
            jnrInfo.DHCPServer = ReadString(br);
            jnrInfo.DomainName = ReadString(br);
            if ("n/a".Equals(jnrInfo.DomainName))
                jnrInfo.DomainName = "";
 
            //stringLength = IPAddress.NetworkToHostOrder(br.ReadInt16());
            jnrInfo.Timezone = ReadString(br); // ASCIIEncoding.ASCII.GetString(br.ReadBytes(stringLength));
 
            jnrInfo.DHCPEnabled = (br.ReadByte() == 1);
 
            // if there is more information then the nonce is provided
            if (br.BaseStream.Position < br.BaseStream.Length)
            {
                jnrInfo.Nonce = ReadString(br);
            }
        }

Once the NONCE is known, the stored credentials can be used to send the security string to the SET_INFO command

        public static byte[] SetInfo(JniorInfo jnrInfo)
        {
            using (MemoryStream ms = new MemoryStream())
            using (BinaryWriter bw = new BinaryWriter(ms))
            {
                WriteString(bw, "SET_INFO");
                WriteString(bw, jnrInfo.IPAddress);
                WriteString(bw, jnrInfo.SubnetMask);
                WriteString(bw, jnrInfo.Gateway);
                WriteString(bw, jnrInfo.PrimaryDns);
                WriteString(bw, jnrInfo.SecondaryDns);
                bw.Write(BitConverter.GetBytes(IPAddress.HostToNetworkOrder((Int32)jnrInfo.DNSTimeout)));
                WriteString(bw, jnrInfo.DomainName);
                bw.Write((byte)(jnrInfo.AutoAnnounce ? 1 : 0));
                bw.Write((byte)(jnrInfo.IsNew ? 0 : 1));
                WriteString(bw, jnrInfo.Timezone);
 
                // use the NONCE and the stored jniorinfo.credentials to send the security string.
                SendSecurity(bw, jnrInfo);
 
                return ms.ToArray();
            }
        }

The Support Tool will prompt the user for the credentials every time they are needed.

                        /**
                         *  check to see if the NONCE was filled in via the ALL_INFO packet.  
                         *  This is new in 1.6.3.  if the NONCE was provided we will prompt 
                         *  for credentials.
                         */
                        var nonceAvailable = null != configureJnrInfo.Nonce;
                        if (nonceAvailable)
                        {
                            var loginDlg = new Common.LoginDialog(configureJnrInfo.IPAddress);
                            /**
                             * if the user cancelled providing the credentials then we cancel 
                             * the configuration update
                             */
                            if (DialogResult.Cancel == loginDlg.ShowDialog(this))
                                return;
 
                            /**
                             * update the saved credentials so they can be used when sending the beacon commands
                             */
                            configureJnrInfo.UserName = loginDlg.UserName;
                            configureJnrInfo.Password = loginDlg.Password;
                        }
 
                        BeaconService.Beacon.Broadcast(BeaconService.Beacon.SetInfo(configureJnrInfo), m_configSerial);

The credentials are needed when issuing a Reboot. In the support tool we ask for an updated ALL_INFO packet before displaying a reboot confirmation.

                /**
                 * request a new ALL_INFO packet is sent with a new NONCE
                 */
                BeaconService.Beacon.Broadcast(BeaconService.Beacon.RequestInfo(), jnrInfo.SerialNumber);
 
                /**
                 * confirm with the user the disire to reboot the selected jnior
                 */
                if (Interaction.MsgBox("Are you sure you want to REBOOT the selected JNIOR?", MsgBoxStyle.YesNo, "Reboot?") == MsgBoxResult.No)
                    return;
 
                /**
                 * check to see if the NONCE was filled in via the ALL_INFO packet.  
                 * This is new in 1.6.3.  if the NONCE was provided we will prompt 
                 * for credentials.
                 */
                var nonceAvailable = null != configureJnrInfo.Nonce;
                if (nonceAvailable)
                {
                    var loginDlg = new Common.LoginDialog(configureJnrInfo.IPAddress);
                    /**
                     * if the user cancelled providing the credentials then we cancel 
                     * the configuration update
                     */
                    if (DialogResult.Cancel == loginDlg.ShowDialog(this))
                        return;
 
                    /**
                     * update the saved credentials so they can be used when sending the beacon commands
                     */
                    configureJnrInfo.UserName = loginDlg.UserName;
                    configureJnrInfo.Password = loginDlg.Password;
                }
 
                /**
                 * send the reboot command
                 */
                BeaconService.Beacon.Broadcast(BeaconService.Beacon.Reboot(jnrInfo), jnrInfo.SerialNumber);

And here is the Beacon reboot code

        public static byte[] Reboot(JniorInfo jnrInfo)
        {
            using (MemoryStream ms = new MemoryStream())
            using (BinaryWriter bw = new BinaryWriter(ms))
            {
                WriteString(bw, "REBOOT");
 
                // use the NONCE and the stored jniorinfo.credentials to send the security string.
                SendSecurity(bw, jnrInfo);
 
                return ms.ToArray();
            }
        }

So in the past I have designed products that were capable of plotting collected data as a graphics file for display in the browser. This isn’t currently possible in the JNIOR and it is a feature that could be added to JANOS. It can be useful. I should also mention that JANOS executes Java and serves files out of JAR and ZIP files which generally utilize DEFLATE compression. So we already perform DEFLATE decompression. With a compressor we can add the ability to create/modify JAR and ZIP archives as well as to generate PNG graphics for plots.

The issue with the specifications is that they describe the compression and file formats but do not give you the algorithms. They don’t tell you how to do it. Just what you must do.

Sure often there is reference code or open source projects that you can find. Those often are complete projects and it is difficult to find the precise code in it that you need to understand if you are to implement the core algorithms. Then that code has been optimized and complicated with options over the years that end up obfuscating the structure. And, no, we don’t just absorb 3rd party code into JANOS. That is not the way to maintain a stable embedded environment. Not in my book.

So far this is the case for DEFLATE. I am going to develop the algorithm from scratch so that I fully understand it, know precisely what every statement does, and how the code will perform in the JANOS platform. Maybe more importantly that it will not misbehave and drag JANOS down.

Well I was thinking that I would openly do that here so you can play along. Don’t hesitate to create a legitimate account here and comment or even help. Some approaches to LZ77 have been patented. I have no clue and am not going to spend weeks trying to understand the patent prose. Supposedly the LZ77 implementation associated with DEFLATE is unencumbered. Maybe so. Still I might get creative and cross some line. I am not overly concerned about it but would like to know, at least before getting some sort of legal complaint.

Yeah so… let’s reinvent the wheel. That’s usually how I roll…

If you look into DEFLATE you will be quickly distracted by data structure and Huffman coding. There is some confusing but genius ways of efficiently conveying the Huffman tables and the Huffman coding even seems to be recursively applied. All of that will boggle your mind even before you get to the LZ77 compression at the heart of it all. So, ignore all of that. We will get to it. I am going to start at the heart and build outwards.

LZ77 compression

The compression works by identifying sequences of bytes that occur in the data stream that are repeated later. The repeated sequence is replaced by a short and efficient reference back to the earlier data thereby reducing the size of the overall stream. A 20-year old document by Antaeus Feldspar describes it well by example.

As everything has a limitation, DEFLATE defines a 32KB sliding window which may contain any referenced prior sequence. It just is not feasible to randomly access the entire data set and allow you to reference sequences all the way back to the beginning. This also keeps distances under control allowing only smaller integers to appear in the stream.

It all sounds great but then you realize that in compression you have search that entire 32KB window for matches to the current sequence of bytes each time a new byte is added. Lots of processor cycles are involved and the whole process could take forever. Of course while the decompressor needs to be ready to reference the whole 32KB window a compressor might use a smaller window. That would reduce the effort involved at the cost of compression efficiency. The specifications suggest window and sequence length factors that might be controlled in balancing speed and space efficiency.

It can all get more complex in that the prior sequence can actually overlap the current sequence (as in the example in the document). A further complication comes in if you consider that lazy matching might lead to better compression. A short sequence match might mask a potentially longer match which could have been more beneficial.

So how do I want to proceed here. Some of this reminds me of the fun in creating JANOS’ Regex engine. Hmm…

The Regex engine ends up compiling the expression into one large state machine through which multiple pointers can progress simultaneously. Pointers are advanced as each character is obtained from the data stream. The first pointer (or last depending on the mode) to make it through the whole expression signals a match. If it sounds complicated, it sort of is but at the same time its pretty cool. As far as Regex goes I’ve been able to apply this to almost all of the standard Regex functionality. But JANOS hasn’t implemented every Regex syntax.

For DEFLATE there is somewhat of a similar situation where we want to examine bytes from the data stream one at a time and have the compression algorithm raise its hand when a sequence can be optimally encoded by a [length, distance] pointer. But we want to consider all possibilities and to try to do what leads to the most efficient compression.

I will start by implementing the sliding window as a queue using my standard approach to such things. The size of the queue will be 32K entries or less. In fact, to start out I’ll probably keep it very small. We can enlarge it later when we benchmark the compression algorithm.

Two index pointers will bound the data in the queue. Each new byte will be inserted at the INPTR and that pointer will be incremented and wrapped as required. The oldest byte will be located at the OUTPTR. Once the queue fills we will have to advance the OUTPTR to make room for the next entry. Older data will be dropped and the queue will run at maximum capacity. This is the sliding window.

The sliding window caches the uncompressed data stream. The compressed data stream will be generated by the algorithm separately. We need one more pointer in the sliding widow indicating the start of the current sequence being evaluated. Call that CURPTR. If the current sequence cannot be matched we would output the byte from CURPTR and advance and wrap the pointer as necessary. If the sequence is matched we output the [length, distance] code and advance CURPTR to skip the entire matched sequence.

CURPTR then will lag behind INPTR. It will not get in the way of OUTPTR as DEFLATE specifies a maximum length for the sequence match of 238 bytes and our sliding window will be much larger.

Now lets think about the sequence matching procedure…

I am going to prototype my algorithm in Java on the JNIOR just to make it easier to test and debug. Later, once I have the structure, I can recast it in C and embed it into JANOS.

We’re going to focus on the LZ77 part of the compression first. Our main goal is simply to be compatible with decompression. Our LZ77 algorithm then basically doesn’t have to do anything. We wouldn’t need to find a single repeated sequence nor replace anything with pointers back to any sliding window. Of course our compression ratio wouldn’t be all that impressive. We would still gain through the latter Huffman Coding stages which I am leaving to later. But in the end we would still be able to create file collections and PNG graphics files that are universally usable.

But really, there no fun in a kludge. Let’s see if we can achieve a LZ77 implementation that we can be proud of. Well, at least one that works.

So for development I am going to create a program that will read a selected file compress it using LZ77 into another. I’ll isolate all of the compression effort into one routine and have the outer program report some statistics upon completion.

Here’s our program for testing compression. All of the compression work will be done in do_compress() and this will report the results. At this point we just copy the source file. Yeah, this will be slow. But it will let us examine what we are trying to do more closely than if I went straight to C and use the debugger. In that case I couldn’t really share it with you.

package jtest;
 
import com.integpg.system.JANOS;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
 
public class Main {
    
    public static void main(String[] args) throws Throwable {
        
        // Requires a test file (use log as default)
        String filename = "/jniorsys.log";
        if (args.length > 0)
            filename = args[0];
        
        // Open the selected file for reading
        File src = new File(filename);
        long srclen = src.length();
        BufferedReader infile = new BufferedReader(new FileReader(src));
        if (!infile.ready())
            System.exit(1);
        
        // Create an output file
        BufferedWriter outfile = new BufferedWriter(new FileWriter("/outfile.dat"));
        
        // perform compression
        long timer = JANOS.uptimeMillis();
        do_compress(outfile, infile);
        timer = JANOS.uptimeMillis() - timer;
        
        // Close files
        outfile.close();
        infile.close();
        
        // Output statistics
        File dest = new File("/outfile.dat");
        long destlen = dest.length();
        
        System.out.printf("Processing %.3f seconds.\n", timer/1000.);
        System.out.printf("Source %lld bytes.\n", srclen);
        System.out.printf("Result %lld bytes.\n", destlen);
        System.out.printf("Ratio %.2f%%\n", 100. - (100. * destlen)/srclen);
    }
        
}

This uses my throws Throwable trick to avoid having to worry about try-catch for the time being.

    
    // Our LZ77 compression engine
    static void do_compress(BufferedWriter outfile, BufferedReader infile) throws Throwable {
        
        // simply copy at first
        while (infile.ready()) {
            int ch = infile.read();
            outfile.write(ch);
        }
        
    }
bruce_dev /> jtest
Processing 32.360 seconds.
Source 36737 bytes.
Result 36737 bytes.
Ratio 0.00%

bruce_dev /> echo Blah blah blah blah > blah.dat

bruce_dev /> cat blah.dat
Blah blah blah blah

bruce_dev /> jtest blah.dat
Processing 0.023 seconds.
Source 21 bytes.
Result 21 bytes.
Ratio 0.00%

bruce_dev />

Now back to thinking about the actual compression and matching sequences from a sliding window.

To start let’s implement our sliding window. Recall that I would use a queue for that. You can see here that depending on the window size we will retain the previous so many bytes. New bytes are queued at the INPTR position and when the queue fills we will push the OUTPTR discarding older data.

    // Our LZ77 compression engine
    static void do_compress(BufferedWriter outfile, BufferedReader infile) throws Throwable {
        
        // create queue (sliding window)
        int window = 1024;
        byte[] data = new byte[window];
        int inptr = 0;
        int outptr = 0;
        
        // simply copy at first
        while (infile.ready()) {
            
            // obtain next byte
            int ch = infile.read();
            
            // matching (cannot yet so just output byte)
            outfile.write(ch);
            
            // queue uncompressed data
            data[inptr++] = (byte)ch;
            if (inptr == window)
                inptr = 0;
            if (inptr == outptr) {
                outptr++;
                if (outptr == window)
                    outptr = 0;
            }       
            
        }
        
    }

Now for a particular position in the input stream (CURPTR) we want to scan the queue for sequence matches. We could do that by brute force but for a large sliding window that would be very slow. Also there is the concept of a lazy match which if implemented might lead to better compression ratios. So how to approach the matching process?

So for some position the input stream we will be searching prior data for a sequence match (3 or more bytes). So we create the CURPTR. If we cannot find a match (and right now we cannot because we haven’t implemented matching) we will just output the data byte and bump the current position. Searching will continue for a match starting at the new position.

Right now CURPTR will track INTPTR. Later while we are watching for matches it will lag behind. Here we have created CURPTR. There is otherwise no major functional change.

    // Our LZ77 compression engine
    static void do_compress(BufferedWriter outfile, BufferedReader infile) throws Throwable {
        
        // create queue (sliding window)
        int window = 1024;
        byte[] data = new byte[window];
        int inptr = 0;
        int outptr = 0;
        int curptr = 0;
        
        // simply copy at first
        while (infile.ready()) {
            
            // obtain next byte
            int ch = infile.read();
            
            // matching (cannot yet so just output byte)
            outfile.write(ch);
            curptr++;
            if (curptr == window)
                curptr = 0;
            
            // queue uncompressed data
            data[inptr++] = (byte)ch;
            if (inptr == window)
                inptr = 0;
            if (inptr == outptr) {
                outptr++;
                if (outptr == window)
                    outptr = 0;
            }       
            
        }
        
    }

Now how are we going to do this “watching for matches” thing?

Let’s create the concept of an active match. At any given time we there will be from 0 to some number of active matches. Each will represent a match to the sequence of bytes appearing at the an input stream position. As new bytes are retrieved from the uncompressed input stream we will check any active matches and advance them . If a match is no longer valid it will be removed and forgotten. At that point though maybe the match warrants replacement in the stream with a pointer. We will see.

I made the data queue and related pointers static members of the program and created the following class representing an active match.

    // An active match. At any given position in the sliding WINDOW we compare and track
    //  matches to the incoming DATA stream.
    class match {
        public int start;
        public int ptr;
        public int len;
        
        match(int pos) {
            start = pos;
            ptr = pos + 1;
            if (ptr == WINDOW)
                ptr = 0;
            len = 1;
        }
        
        public boolean check(int ch) {
            if (DATA[ptr] != ch)
                return (false);
            
            ptr++;
            if (ptr == WINDOW)
                ptr = 0;
            len++;
            
            return (true);
        }
    }

When a new data byte is entered into the queue we will want to create these active match objects for every matching byte that previously exists. Every one of those represents a potential sequence.

Okay when we enter a new byte in the queue it becomes a candidate for replacement by a reference to a sequence starting with the byte somewhere earlier. So we want to start a new active match for the earlier bytes. We will process these matches as additional bytes are received from the input stream. To do this though we don’t want to search the entire window for prior existences of each character. So to make things efficient I am going to maintain linked lists through the queue for each character.

Since a byte can have 256 values we create a HEAD pointer array with 256 entries. This is referenced using the byte value. Each queue position then will have both a forward FWD and backwards BACK pointer forming a bi-directional linked list. Yeah, this quintuples our memory requirement but with the benefit of processing speed.

The list has to be bi-directional because once the queue fills we are going to drop bytes. It is then necessary to trim the linked lists to remove pointers for data no longer in the queue. That can only be done efficiently if we can reference a previous entry in the linked list. So we need both directions.

Here are our static members so far. This is the memory usage.

    // create queue (sliding window)
    static final int WINDOW = 1024;
    static final byte[] DATA = new byte[WINDOW];
    static int INPTR = 0;
    static int OUTPTR = 0;
    static int CURPTR = 0;
    
    // data linked list arrays
    static final short[] HEAD = new short[256];
    static final short[] FWD = new short[WINDOW];
    static final short[] BACK = new short[WINDOW];

Now we maintain the linked lists as we add and remove data from the queue. We also can efficiently create new active match objects. Note that we store pointers in the links as +1 so as to keep 0 as a terminator.

            // queue uncompressed DATA
            DATA[INPTR] = (byte)ch;
            
            // Add byte to the head of the appropriate linked list. Note pointers are stored +1 so
            //  as to use 0 as an end of list marker. Lists are bi-directional so we can trim the 
            //  tail when data is dropped from the queue.
            short ptr = HEAD[ch];
            HEAD[ch] = (short)(INPTR + 1);
            FWD[INPTR] = ptr;
            BACK[INPTR] = 0;
            if (ptr != 0)
                BACK[ptr - 1] = (short)(INPTR + 1);
            
            // advance entry pointer
            INPTR++;
            if (INPTR == WINDOW)
                INPTR = 0;
            
            // drop data from queue when full
            if (INPTR == OUTPTR) {
                
                // trim linked list as byte is being dropped
                if (BACK[OUTPTR] == 0)
                    HEAD[DATA[OUTPTR]] = 0;
                else
                    FWD[BACK[OUTPTR] - 1] = 0;
 
                // push end of queue
                OUTPTR++;
                if (OUTPTR == WINDOW)
                    OUTPTR = 0;
            }
 
            // create new active match for all CH in the queue (except last)
            while (ptr != 0) {
                
                // new match started (not doing anything with it yet)
                match m = new match(ptr - 1);
                
                ptr = FWD[ptr - 1];
            }

I adjusted the program to dump non-zero HEAD entries and each occupied queue position including the links as a check. Remember that links are stored in here +1.

bruce_dev /> jtest blah.dat

HEAD
 0x0a 21
 0x0d 20
 0x20 15
 0x42 1
 0x61 18
 0x62 16
 0x68 19
 0x6c 17

QUEUE
 0 0x42 0 0
 1 0x6c 0 7
 2 0x61 0 8
 3 0x68 0 9
 4 0x20 0 10
 5 0x62 0 11
 6 0x6c 2 12
 7 0x61 3 13
 8 0x68 4 14
 9 0x20 5 15
 10 0x62 6 16
 11 0x6c 7 17
 12 0x61 8 18
 13 0x68 9 19
 14 0x20 10 0
 15 0x62 11 0
 16 0x6c 12 0
 17 0x61 13 0
 18 0x68 14 0
 19 0x0d 0 0
 20 0x0a 0 0

STATS
Processing 0.078 seconds.
Source 21 bytes.
Result 21 bytes.
Ratio 0.00%

bruce_dev />

Now as we create new active matches we are going to collect them in an ArrayList object.

    // active matching
    static ArrayList SEQ = new ArrayList();
            // create new active matches for all CH in the queue (except last)
            while (ptr != 0) {
                SEQ.add(new match(ptr - 1));
                ptr = FWD[ptr - 1];
            }

So as each new data byte is retrieved from the uncompressed input stream we will process all active matches. Those that continue to match will be retained and others dropped. That code looks like this:

        // process uncompressed stream
        while (infile.ready()) {
            
            // obtain next byte
            int ch = infile.read();
            
            // process active match objects
            System.out.printf("New byte[%d]: 0x%02x\n", INPTR, ch & 0xff);
            for (int n = SEQ.size() - 1; 0 <= n; n--) {
                match m = SEQ.get(n);
                if (!m.check(ch))
                    SEQ.remove(n);
            }

If following this if I dump the remaining active matches we can watch those proceed. At this point though we have not interpreted the matching status so as to decide whether or not the stream can be altered.

            // dump remaining active matches
            Iterator i = SEQ.iterator();
            while (i.hasNext()) {
                match m = (match) i.next();
                System.out.printf(" Start: %d Ptr: %d Len: %d\n", m.start, m.ptr, m.len);
            }

So in reviewing the An Explanation of the DEFLATE Algorithm paper from 1997:

Antaeus Feldspar wrote:LZ77 compression

LZ77 compression works by finding sequences of data that are repeated. The term “sliding window” is used; all it really means is that at any given point in the data, there is a record of what characters went before. A 32K sliding window means that the compressor (and decompressor) have a record of what the last 32768 (32 * 1024) characters were. When the next sequence of characters to be compressed is identical to one that can be found within the sliding window, the sequence of characters is replaced by two numbers: a distance, representing how far back into the window the sequence starts, and a length, representing the number of characters for which the sequence is identical.

I realize this is a lot easier to see than to just be told. Let’s look at some highly compressible data:

        Blah blah blah blah blah!

Our datastream starts by receiving the following characters: “B,” “l,” “a,” “h,” ” ,” and “b.” However, look at the next five characters:

         vvvvv
        Blah blah blah blah blah!
              ^^^^^

There is an exact match for those five characters in the characters that have already gone into the datastream, and it starts exactly five characters behind the point where we are now. This being the case, we can output special characters to the stream that represent a number for length, and a number for distance.

The data so far:

	Blah blah b

The compressed form of the data so far:

	Blah b[D=5,L=5]

The compression can still be increased, though to take full advantage of it requires a bit of cleverness on the part of the compressor. Look at the two strings that we decided were identical. Compare the character that follows each of them. In both cases, it’s “l” — so we can make the length 6, and not just five. But if we continue checking, we find the next characters, and the next characters, and the next characters, are still identical — even if the so-called ‘previous’ string is overlapping the string we’re trying to represent in the compressed data!

It turns out that the 18 characters that start at the second character are identical to the 18 characters that start at the seventh character. It’s true that when we’re decompressing, and read the length, distance pair that describes this relationship, we don’t know what all those 18 characters will be yet — but if we put in place the ones that we know, we will know more, which will allow us to put down more… or, knowing that any length-and-distance pair where length > distance is going to be repeating (distance) characters again and again, we can set up the decompressor to do just that.

It turns out our highly compressible data can be compressed down to just this:

	Blah b[D=5, L=18]!

So if I feed this exact stream to what we have so far we can observe the sequencing:

CODE: SELECT ALL

bruce_dev /> echo Blah blah blah blah blah! > blah.dat

bruce_dev /> cat blah.dat
Blah blah blah blah blah!

bruce_dev /> jtest blah.dat                           
New byte[0]: 0x42
New byte[1]: 0x6c
New byte[2]: 0x61
New byte[3]: 0x68
New byte[4]: 0x20
New byte[5]: 0x62
New byte[6]: 0x6c
New byte[7]: 0x61
 Start: 1 Ptr: 3 Len: 2
New byte[8]: 0x68
 Start: 1 Ptr: 4 Len: 3
 Start: 2 Ptr: 4 Len: 2
New byte[9]: 0x20
 Start: 1 Ptr: 5 Len: 4
 Start: 2 Ptr: 5 Len: 3
 Start: 3 Ptr: 5 Len: 2
New byte[10]: 0x62
 Start: 1 Ptr: 6 Len: 5
 Start: 2 Ptr: 6 Len: 4
 Start: 3 Ptr: 6 Len: 3
 Start: 4 Ptr: 6 Len: 2
New byte[11]: 0x6c
 Start: 1 Ptr: 7 Len: 6
 Start: 2 Ptr: 7 Len: 5
 Start: 3 Ptr: 7 Len: 4
 Start: 4 Ptr: 7 Len: 3
 Start: 5 Ptr: 7 Len: 2
New byte[12]: 0x61
 Start: 1 Ptr: 8 Len: 7
 Start: 2 Ptr: 8 Len: 6
 Start: 3 Ptr: 8 Len: 5
 Start: 4 Ptr: 8 Len: 4
 Start: 5 Ptr: 8 Len: 3
 Start: 6 Ptr: 8 Len: 2
 Start: 1 Ptr: 3 Len: 2
New byte[13]: 0x68
 Start: 1 Ptr: 9 Len: 8
 Start: 2 Ptr: 9 Len: 7
 Start: 3 Ptr: 9 Len: 6
 Start: 4 Ptr: 9 Len: 5
 Start: 5 Ptr: 9 Len: 4
 Start: 6 Ptr: 9 Len: 3
 Start: 1 Ptr: 4 Len: 3
 Start: 7 Ptr: 9 Len: 2
 Start: 2 Ptr: 4 Len: 2
New byte[14]: 0x20
 Start: 1 Ptr: 10 Len: 9
 Start: 2 Ptr: 10 Len: 8
 Start: 3 Ptr: 10 Len: 7
 Start: 4 Ptr: 10 Len: 6
 Start: 5 Ptr: 10 Len: 5
 Start: 6 Ptr: 10 Len: 4
 Start: 1 Ptr: 5 Len: 4
 Start: 7 Ptr: 10 Len: 3
 Start: 2 Ptr: 5 Len: 3
 Start: 8 Ptr: 10 Len: 2
 Start: 3 Ptr: 5 Len: 2
New byte[15]: 0x62
 Start: 1 Ptr: 11 Len: 10
 Start: 2 Ptr: 11 Len: 9
 Start: 3 Ptr: 11 Len: 8
 Start: 4 Ptr: 11 Len: 7
 Start: 5 Ptr: 11 Len: 6
 Start: 6 Ptr: 11 Len: 5
 Start: 1 Ptr: 6 Len: 5
 Start: 7 Ptr: 11 Len: 4
 Start: 2 Ptr: 6 Len: 4
 Start: 8 Ptr: 11 Len: 3
 Start: 3 Ptr: 6 Len: 3
 Start: 9 Ptr: 11 Len: 2
 Start: 4 Ptr: 6 Len: 2
New byte[16]: 0x6c
 Start: 1 Ptr: 12 Len: 11
 Start: 2 Ptr: 12 Len: 10
 Start: 3 Ptr: 12 Len: 9
 Start: 4 Ptr: 12 Len: 8
 Start: 5 Ptr: 12 Len: 7
 Start: 6 Ptr: 12 Len: 6
 Start: 1 Ptr: 7 Len: 6
 Start: 7 Ptr: 12 Len: 5
 Start: 2 Ptr: 7 Len: 5
 Start: 8 Ptr: 12 Len: 4
 Start: 3 Ptr: 7 Len: 4
 Start: 9 Ptr: 12 Len: 3
 Start: 4 Ptr: 7 Len: 3
 Start: 10 Ptr: 12 Len: 2
 Start: 5 Ptr: 7 Len: 2
New byte[17]: 0x61
 Start: 1 Ptr: 13 Len: 12
 Start: 2 Ptr: 13 Len: 11
 Start: 3 Ptr: 13 Len: 10
 Start: 4 Ptr: 13 Len: 9
 Start: 5 Ptr: 13 Len: 8
 Start: 6 Ptr: 13 Len: 7
 Start: 1 Ptr: 8 Len: 7
 Start: 7 Ptr: 13 Len: 6
 Start: 2 Ptr: 8 Len: 6
 Start: 8 Ptr: 13 Len: 5
 Start: 3 Ptr: 8 Len: 5
 Start: 9 Ptr: 13 Len: 4
 Start: 4 Ptr: 8 Len: 4
 Start: 10 Ptr: 13 Len: 3
 Start: 5 Ptr: 8 Len: 3
 Start: 11 Ptr: 13 Len: 2
 Start: 6 Ptr: 8 Len: 2
 Start: 1 Ptr: 3 Len: 2
New byte[18]: 0x68
 Start: 1 Ptr: 14 Len: 13
 Start: 2 Ptr: 14 Len: 12
 Start: 3 Ptr: 14 Len: 11
 Start: 4 Ptr: 14 Len: 10
 Start: 5 Ptr: 14 Len: 9
 Start: 6 Ptr: 14 Len: 8
 Start: 1 Ptr: 9 Len: 8
 Start: 7 Ptr: 14 Len: 7
 Start: 2 Ptr: 9 Len: 7
 Start: 8 Ptr: 14 Len: 6
 Start: 3 Ptr: 9 Len: 6
 Start: 9 Ptr: 14 Len: 5
 Start: 4 Ptr: 9 Len: 5
 Start: 10 Ptr: 14 Len: 4
 Start: 5 Ptr: 9 Len: 4
 Start: 11 Ptr: 14 Len: 3
 Start: 6 Ptr: 9 Len: 3
 Start: 1 Ptr: 4 Len: 3
 Start: 12 Ptr: 14 Len: 2
 Start: 7 Ptr: 9 Len: 2
 Start: 2 Ptr: 4 Len: 2
New byte[19]: 0x20
 Start: 1 Ptr: 15 Len: 14
 Start: 2 Ptr: 15 Len: 13
 Start: 3 Ptr: 15 Len: 12
 Start: 4 Ptr: 15 Len: 11
 Start: 5 Ptr: 15 Len: 10
 Start: 6 Ptr: 15 Len: 9
 Start: 1 Ptr: 10 Len: 9
 Start: 7 Ptr: 15 Len: 8
 Start: 2 Ptr: 10 Len: 8
 Start: 8 Ptr: 15 Len: 7
 Start: 3 Ptr: 10 Len: 7
 Start: 9 Ptr: 15 Len: 6
 Start: 4 Ptr: 10 Len: 6
 Start: 10 Ptr: 15 Len: 5
 Start: 5 Ptr: 10 Len: 5
 Start: 11 Ptr: 15 Len: 4
 Start: 6 Ptr: 10 Len: 4
 Start: 1 Ptr: 5 Len: 4
 Start: 12 Ptr: 15 Len: 3
 Start: 7 Ptr: 10 Len: 3
 Start: 2 Ptr: 5 Len: 3
 Start: 13 Ptr: 15 Len: 2
 Start: 8 Ptr: 10 Len: 2
 Start: 3 Ptr: 5 Len: 2
New byte[20]: 0x62
 Start: 1 Ptr: 16 Len: 15
 Start: 2 Ptr: 16 Len: 14
 Start: 3 Ptr: 16 Len: 13
 Start: 4 Ptr: 16 Len: 12
 Start: 5 Ptr: 16 Len: 11
 Start: 6 Ptr: 16 Len: 10
 Start: 1 Ptr: 11 Len: 10
 Start: 7 Ptr: 16 Len: 9
 Start: 2 Ptr: 11 Len: 9
 Start: 8 Ptr: 16 Len: 8
 Start: 3 Ptr: 11 Len: 8
 Start: 9 Ptr: 16 Len: 7
 Start: 4 Ptr: 11 Len: 7
 Start: 10 Ptr: 16 Len: 6
 Start: 5 Ptr: 11 Len: 6
 Start: 11 Ptr: 16 Len: 5
 Start: 6 Ptr: 11 Len: 5
 Start: 1 Ptr: 6 Len: 5
 Start: 12 Ptr: 16 Len: 4
 Start: 7 Ptr: 11 Len: 4
 Start: 2 Ptr: 6 Len: 4
 Start: 13 Ptr: 16 Len: 3
 Start: 8 Ptr: 11 Len: 3
 Start: 3 Ptr: 6 Len: 3
 Start: 14 Ptr: 16 Len: 2
 Start: 9 Ptr: 11 Len: 2
 Start: 4 Ptr: 6 Len: 2
New byte[21]: 0x6c
 Start: 1 Ptr: 17 Len: 16
 Start: 2 Ptr: 17 Len: 15
 Start: 3 Ptr: 17 Len: 14
 Start: 4 Ptr: 17 Len: 13
 Start: 5 Ptr: 17 Len: 12
 Start: 6 Ptr: 17 Len: 11
 Start: 1 Ptr: 12 Len: 11
 Start: 7 Ptr: 17 Len: 10
 Start: 2 Ptr: 12 Len: 10
 Start: 8 Ptr: 17 Len: 9
 Start: 3 Ptr: 12 Len: 9
 Start: 9 Ptr: 17 Len: 8
 Start: 4 Ptr: 12 Len: 8
 Start: 10 Ptr: 17 Len: 7
 Start: 5 Ptr: 12 Len: 7
 Start: 11 Ptr: 17 Len: 6
 Start: 6 Ptr: 12 Len: 6
 Start: 1 Ptr: 7 Len: 6
 Start: 12 Ptr: 17 Len: 5
 Start: 7 Ptr: 12 Len: 5
 Start: 2 Ptr: 7 Len: 5
 Start: 13 Ptr: 17 Len: 4
 Start: 8 Ptr: 12 Len: 4
 Start: 3 Ptr: 7 Len: 4
 Start: 14 Ptr: 17 Len: 3
 Start: 9 Ptr: 12 Len: 3
 Start: 4 Ptr: 7 Len: 3
 Start: 15 Ptr: 17 Len: 2
 Start: 10 Ptr: 12 Len: 2
 Start: 5 Ptr: 7 Len: 2
New byte[22]: 0x61
 Start: 1 Ptr: 18 Len: 17
 Start: 2 Ptr: 18 Len: 16
 Start: 3 Ptr: 18 Len: 15
 Start: 4 Ptr: 18 Len: 14
 Start: 5 Ptr: 18 Len: 13
 Start: 6 Ptr: 18 Len: 12
 Start: 1 Ptr: 13 Len: 12
 Start: 7 Ptr: 18 Len: 11
 Start: 2 Ptr: 13 Len: 11
 Start: 8 Ptr: 18 Len: 10
 Start: 3 Ptr: 13 Len: 10
 Start: 9 Ptr: 18 Len: 9
 Start: 4 Ptr: 13 Len: 9
 Start: 10 Ptr: 18 Len: 8
 Start: 5 Ptr: 13 Len: 8
 Start: 11 Ptr: 18 Len: 7
 Start: 6 Ptr: 13 Len: 7
 Start: 1 Ptr: 8 Len: 7
 Start: 12 Ptr: 18 Len: 6
 Start: 7 Ptr: 13 Len: 6
 Start: 2 Ptr: 8 Len: 6
 Start: 13 Ptr: 18 Len: 5
 Start: 8 Ptr: 13 Len: 5
 Start: 3 Ptr: 8 Len: 5
 Start: 14 Ptr: 18 Len: 4
 Start: 9 Ptr: 13 Len: 4
 Start: 4 Ptr: 8 Len: 4
 Start: 15 Ptr: 18 Len: 3
 Start: 10 Ptr: 13 Len: 3
 Start: 5 Ptr: 8 Len: 3
 Start: 16 Ptr: 18 Len: 2
 Start: 11 Ptr: 13 Len: 2
 Start: 6 Ptr: 8 Len: 2
 Start: 1 Ptr: 3 Len: 2
New byte[23]: 0x68
 Start: 1 Ptr: 19 Len: 18
 Start: 2 Ptr: 19 Len: 17
 Start: 3 Ptr: 19 Len: 16
 Start: 4 Ptr: 19 Len: 15
 Start: 5 Ptr: 19 Len: 14
 Start: 6 Ptr: 19 Len: 13
 Start: 1 Ptr: 14 Len: 13
 Start: 7 Ptr: 19 Len: 12
 Start: 2 Ptr: 14 Len: 12
 Start: 8 Ptr: 19 Len: 11
 Start: 3 Ptr: 14 Len: 11
 Start: 9 Ptr: 19 Len: 10
 Start: 4 Ptr: 14 Len: 10
 Start: 10 Ptr: 19 Len: 9
 Start: 5 Ptr: 14 Len: 9
 Start: 11 Ptr: 19 Len: 8
 Start: 6 Ptr: 14 Len: 8
 Start: 1 Ptr: 9 Len: 8
 Start: 12 Ptr: 19 Len: 7
 Start: 7 Ptr: 14 Len: 7
 Start: 2 Ptr: 9 Len: 7
 Start: 13 Ptr: 19 Len: 6
 Start: 8 Ptr: 14 Len: 6
 Start: 3 Ptr: 9 Len: 6
 Start: 14 Ptr: 19 Len: 5
 Start: 9 Ptr: 14 Len: 5
 Start: 4 Ptr: 9 Len: 5
 Start: 15 Ptr: 19 Len: 4
 Start: 10 Ptr: 14 Len: 4
 Start: 5 Ptr: 9 Len: 4
 Start: 16 Ptr: 19 Len: 3
 Start: 11 Ptr: 14 Len: 3
 Start: 6 Ptr: 9 Len: 3
 Start: 1 Ptr: 4 Len: 3
 Start: 17 Ptr: 19 Len: 2
 Start: 12 Ptr: 14 Len: 2
 Start: 7 Ptr: 9 Len: 2
 Start: 2 Ptr: 4 Len: 2
New byte[24]: 0x21
New byte[25]: 0x0d
New byte[26]: 0x0a
Processing 2.494 seconds.
Source 27 bytes.
Result 27 bytes.
Ratio 0.00%

bruce_dev /> 

Now the ‘D’ in the paper refers to the distance between our CURPTR at the point when a match is started and the matched sequence position. If you persevere through the above you can verify that D=5 would be correct and that our best match ran through the length of 18.

So we now need the logic controlling the advancement of CURPTR and what then to output to the compressed stream.

So the strategy at this point is to process all active matches. We don’t move the CURPTR when there is at least one potential match still in the works. When a data byte is received that does not extend a match we remove that match from the array. We keep track of the best of the matches that terminate as that would be a candidate for using a reference pointer if none remain active.

    // Our LZ77 compression engine
    static void do_compress(BufferedWriter outfile, BufferedReader infile) throws Throwable {
        
        // process uncompressed stream
        while (infile.ready()) {
            
            // obtain next byte
            int ch = infile.read();
            System.out.print((char)ch);
            
            // process active match objects
            boolean bActive = false;
            match best = null;
            for (int n = SEQ.size() - 1; 0 <= n; n--) { match m = SEQ.get(n); if (!m.check(ch)) { if (m.curptr == CURPTR && best == null && m.len >= 3)
                        best = m;
                    SEQ.remove(n);
                }
                else if (m.curptr == CURPTR)
                    bActive = true;
            }
            

From the above bActive will be true if there remain potentially longer sequences. In that case we will not do anything with CURPTR or output anything. We just eventually move on to process the next byte from the uncompressed input stream.

If bActive is false and best remains null then the byte at CURPTR will be output and CURPTR advanced.

Otherwise we have match to something in the sliding window and that can replace the uncompressed sequence.

            // If there is no active sequence then we need to generate some output
            if (!bActive) {
                
                // If there's been no match then we output data as is
                if (best == null) {
                    while (CURPTR != INPTR) {
                        outfile.write(DATA[CURPTR]);
                        CURPTR++;
                        if (CURPTR == WINDOW)
                            CURPTR = 0;
 
                        int n;
                        for (n = SEQ.size() - 1; 0 <= n; n--) {
                            match m = SEQ.get(n);
                            if (m.curptr == CURPTR)
                                break;
                        }
                        if (0 <= n)
                            break;
                    }
                }
                
                // otherwise we can substitute
                else {
                    int distance = best.curptr - best.start;
                    if (distance < 0) distance += WINDOW; String msg = String.format("[D=%d, L=%d]", distance, best.len); outfile.write(msg); // flush active matches int len = best.len; while (len-- > 0) {
                        CURPTR++;
                        if (CURPTR == WINDOW)
                            CURPTR = 0;
 
                        // remove overlapped active sequences
                        for (int n = SEQ.size() - 1; 0 <= n; n--) {
                            match m = SEQ.get(n);
                            if (m.curptr == CURPTR)
                                SEQ.remove(n);
                        }
                    }
                    
                }
            }  

In the above when we are outputting data uncompressed from CURPTR we continue until we encounter an active match. When we replace a sequence we flush any active matches involving the data replaced. Those matches (which potentially could be more beneficial) are no longer valid. Note that for debugging I am outputting the [D= L=] format so I can see what is being replaced and how more clearly.

Later the best match would also be the one with the lowest distance. That saves bits and takes advantage of the Huffman Coding which we have yet to implement.

The output using the above logic and debugging looks like this for the blah blah test data.

bruce_dev /> jtest blah.dat
Blah blah blah blah blah!
Processing 0.856 seconds.
Source 27 bytes.
Result 19 bytes.
Ratio 29.63%

bruce_dev /> cat outfile.dat
Blah b[D=5, L=18]!
bruce_dev />

This agrees nicely with the example from the paper. This works with more elaborate data as well. I know that this is a Java prototype but it seems even a bit too slow for larger files even considering the platform. I get the feeling that I am creating far too many new active match objects. Perhaps there is some logic as to what is worth creating and what isn’t.

It turns out that I am likely creating 3X the number of active match objects than is necessary however the overhead to detect them slows the process way too much. It seems better to create the unnecessary match objects and optimize them out later.

I decided to take a step back and not try to generate the compressed output stream just yet. Instead I want to look at the sequence matches that I detect to see what logic I would need to achieve an optimal compression. There is this lazy match optimization to consider.

You might recall that the approach is that when I retrieve a byte from the uncompressed input stream I process all of the active sequence match objects. If the new character extends a match then it remains active. Otherwise the sequence is complete and we decide whether it is useful or not before removing it from the active match list. A useful match is simply one that is 3 or more bytes in length.

The uncompressed data byte is queued (enters the sliding window) and we create new active sequence matches for every matching bat still in the window. We use a linked list to efficiently locate those characters.

If we simply list those potentially useful matches the code looks like what follows.

    // Our LZ77 compression engine
    static void do_compress(BufferedWriter outfile, BufferedReader infile) throws Throwable {
        
        // process uncompressed stream
        while (infile.ready()) {
            
            // obtain next byte
            int ch = infile.read();
            //System.out.print((char)ch);
            
            // process active Match objects
            for (int n = SEQ.size() - 1; 0 <= n; n--) { Match m = SEQ.get(n); if (!m.check(ch)) { if (m.len >= 3) {
                        System.out.printf("I=%04x C=%04x P=%04x, D=%d, L=%d\n", 
                                INPTR, m.curptr, m.start, m.distance, m.len);
                    }
                            
                    SEQ.remove(n);
                }
            }
            
            // queue uncompressed DATA
            int inp = INPTR;
            DATA[INPTR] = (byte)ch;
            
            // Add byte to the head of the appropriate linked list. Note pointers are stored +1 so
            //  as to use 0 as an end of list marker. Lists are bi-directional so we can trim the 
            //  tail when data is dropped from the queue.
            short ptr = HEAD[ch];
            HEAD[ch] = (short)(INPTR + 1);
            FWD[INPTR] = ptr;
            BACK[INPTR] = 0;
            if (ptr != 0)
                BACK[ptr - 1] = (short)(INPTR + 1);
            
            // advance entry pointer
            INPTR++;
            if (INPTR == WINDOW)
                INPTR = 0;
            
            // drop data from queue when full
            if (INPTR == OUTPTR) {
                
                // trim linked list as byte is being dropped
                if (BACK[OUTPTR] == 0)
                    HEAD[DATA[OUTPTR]] = 0;
                else
                    FWD[BACK[OUTPTR] - 1] = 0;
 
                // push end of queue
                OUTPTR++;
                if (OUTPTR == WINDOW)
                    OUTPTR = 0;
            }
 
            // create new active matches for all CH in the queue (except last)
            while (ptr != 0) {
                SEQ.add(new Match(inp, ptr - 1));
                ptr = FWD[ptr - 1];
            }
        }
    }

Now we display the match results with pointers in hex so we can locate them in a hex dump. These are ‘I’ giving the uncompressed input position; ‘C’ showing the input position when the match was created; ‘P’ is the position in the sliding window or queue; ‘D’ is the distance (basically C minus P); And, ‘L’ the length of the match. We get the following for the blah blah data.

CODE: SELECT ALL

bruce_dev /> cat blah.dat -h
00000000  42 6c 61 68 20 62 6c 61  68 20 62 6c 61 68 20 62  Blah.bla h.blah.b
00000010  6c 61 68 20 62 6c 61 68  21 0d 0a                 lah.blah !..

bruce_dev /> jtest blah.dat
I=0018 C=0015 P=0001, D=20, L=3
I=0018 C=0015 P=0006, D=15, L=3
I=0018 C=0015 P=000b, D=10, L=3
I=0018 C=0015 P=0010, D=5, L=3
I=0018 C=0014 P=0005, D=15, L=4
I=0018 C=0014 P=000a, D=10, L=4
I=0018 C=0014 P=000f, D=5, L=4
I=0018 C=0013 P=0004, D=15, L=5
I=0018 C=0013 P=0009, D=10, L=5
I=0018 C=0013 P=000e, D=5, L=5
I=0018 C=0012 P=0003, D=15, L=6
I=0018 C=0012 P=0008, D=10, L=6
I=0018 C=0012 P=000d, D=5, L=6
I=0018 C=0011 P=0002, D=15, L=7
I=0018 C=0011 P=0007, D=10, L=7
I=0018 C=0011 P=000c, D=5, L=7
I=0018 C=0010 P=0001, D=15, L=8
I=0018 C=0010 P=0006, D=10, L=8
I=0018 C=0010 P=000b, D=5, L=8
I=0018 C=000f P=0005, D=10, L=9
I=0018 C=000f P=000a, D=5, L=9
I=0018 C=000e P=0004, D=10, L=10
I=0018 C=000e P=0009, D=5, L=10
I=0018 C=000d P=0003, D=10, L=11
I=0018 C=000d P=0008, D=5, L=11
I=0018 C=000c P=0002, D=10, L=12
I=0018 C=000c P=0007, D=5, L=12
I=0018 C=000b P=0001, D=10, L=13
I=0018 C=000b P=0006, D=5, L=13
I=0018 C=000a P=0005, D=5, L=14
I=0018 C=0009 P=0004, D=5, L=15
I=0018 C=0008 P=0003, D=5, L=16
I=0018 C=0007 P=0002, D=5, L=17
I=0018 C=0006 P=0001, D=5, L=18
Processing 1.063 seconds.
Source 27 bytes.
Result 0 bytes.
Ratio 100.00%

bruce_dev />

You can see from this that we are processing a lot of matches for what we know will be just one replacement. You can see that for each new byte received (at C) we create potential matches for all matching bytes (1 or more P) each with a fixed D. We then have advanced the length L as additional characters extend the match.

Of all these matches completed there will be one best match. That would be the longest match (largest L) and if there is a choice between multiple matches of the same length (L) we would take the one with lowest distance (D).

Now let me add logic to select the best match and only display that one. I am also going to display a marker (“—-“) when we reach a point in the input stream after having found at least one match when there are no active matches in process. That point would be a good time to generate the compressed data based on the matching sequences we’ve found.

So that section of the code now looks like this.

    // Our LZ77 compression engine
    static void do_compress(BufferedWriter outfile, BufferedReader infile) throws Throwable {
        
        boolean bFound = false;
 
        // process uncompressed stream
        while (infile.ready()) {
            
            // obtain next byte
            int ch = infile.read();
            //System.out.print((char)ch);
            
            // process active Match objects
            Match best = null;
            for (int n = SEQ.size() - 1; 0 <= n; n--) { Match m = SEQ.get(n); if (!m.check(ch)) { if (m.len >= 3) {
                        if (best == null)
                            best = m;
                        else if (m.len > best.len)
                            best = m;
                        else if (m.len == best.len && m.distance < best.distance)
                            best = m;
                    }
                            
                    SEQ.remove(n);
                }
            }
            if (best != null) {
                System.out.printf("I=%04x C=%04x P=%04x, D=%d, L=%d\n", 
                        INPTR, best.curptr, best.start, best.distance, best.len);
                bFound = true;
            }
            
            if (bFound && SEQ.size() == 0) {
                System.out.println("----");
                bFound = false;
            }
            
            // queue uncompressed DATA

Processing the blah blah data yield the following which we have seen is the replacement that we are hoping for.

CODE: SELECT ALL

bruce_dev /> cat blah.dat -h
00000000  42 6c 61 68 20 62 6c 61  68 20 62 6c 61 68 20 62  Blah.bla h.blah.b
00000010  6c 61 68 20 62 6c 61 68  21 0d 0a                 lah.blah !..

bruce_dev /> jtest blah.dat 
I=0018 C=0006 P=0001, D=5, L=18
----
Processing 0.819 seconds.
Source 27 bytes.
Result 0 bytes.
Ratio 100.00%

bruce_dev /> 

What about some more involved input data?

CODE: SELECT ALL

bruce_dev /> cat jniorboot.log -h
00000000  30 31 2f 30 34 2f 31 38  20 30 38 3a 31 39 3a 31  01/04/18 .08:19:1
00000010  38 2e 31 31 31 2c 20 2a  2a 20 4f 53 20 43 52 43  8.111,.* *.OS.CRC
00000020  20 64 65 74 61 69 6c 20  75 70 64 61 74 65 64 0d  .detail. updated.
00000030  0a 30 31 2f 30 34 2f 31  38 20 30 38 3a 31 39 3a  .01/04/1 8.08:19:
00000040  31 38 2e 31 35 38 2c 20  2d 2d 20 4d 6f 64 65 6c  18.158,. --.Model
00000050  20 34 31 30 20 76 31 2e  36 2e 33 20 2d 20 4a 41  .410.v1. 6.3.-.JA
00000060  4e 4f 53 20 53 65 72 69  65 73 20 34 0d 0a 30 31  NOS.Seri es.4..01
00000070  2f 30 34 2f 31 38 20 30  38 3a 31 39 3a 31 38 2e  /04/18.0 8:19:18.
00000080  31 37 38 2c 20 43 6f 70  79 72 69 67 68 74 20 28  178,.Cop yright.(
00000090  63 29 20 32 30 31 32 2d  32 30 31 38 20 49 4e 54  c).2012- 2018.INT
000000A0  45 47 20 50 72 6f 63 65  73 73 20 47 72 6f 75 70  EG.Proce ss.Group
000000B0  2c 20 49 6e 63 2e 2c 20  47 69 62 73 6f 6e 69 61  ,.Inc.,. Gibsonia
000000C0  20 50 41 20 55 53 41 0d  0a 30 31 2f 30 34 2f 31  .PA.USA. .01/04/1
000000D0  38 20 30 38 3a 31 39 3a  31 38 2e 31 39 37 2c 20  8.08:19: 18.197,.
000000E0  4a 41 4e 4f 53 20 77 72  69 74 74 65 6e 20 61 6e  JANOS.wr itten.an
000000F0  64 20 64 65 76 65 6c 6f  70 65 64 20 62 79 20 42  d.develo ped.by.B
00000100  72 75 63 65 20 43 6c 6f  75 74 69 65 72 0d 0a 30  ruce.Clo utier..0
00000110  31 2f 30 34 2f 31 38 20  30 38 3a 31 39 3a 31 38  1/04/18. 08:19:18
00000120  2e 32 31 36 2c 20 53 65  72 69 61 6c 20 4e 75 6d  .216,.Se rial.Num
00000130  62 65 72 3a 20 36 31 34  30 37 30 35 30 30 0d 0a  ber:.614 070500..
00000140  30 31 2f 30 34 2f 31 38  20 30 38 3a 31 39 3a 31  01/04/18 .08:19:1
00000150  38 2e 32 33 36 2c 20 46  69 6c 65 20 53 79 73 74  8.236,.F ile.Syst
00000160  65 6d 20 6d 6f 75 6e 74  65 64 0d 0a 30 31 2f 30  em.mount ed..01/0
00000170  34 2f 31 38 20 30 38 3a  31 39 3a 31 38 2e 32 35  4/18.08: 19:18.25
00000180  37 2c 20 52 65 67 69 73  74 72 79 20 6d 6f 75 6e  7,.Regis try.moun
00000190  74 65 64 0d 0a 30 31 2f  30 34 2f 31 38 20 30 38  ted..01/ 04/18.08
000001A0  3a 31 39 3a 31 38 2e 33  30 36 2c 20 4e 65 74 77  :19:18.3 06,.Netw
000001B0  6f 72 6b 20 49 6e 69 74  69 61 6c 69 7a 65 64 0d  ork.Init ialized.
000001C0  0a 30 31 2f 30 34 2f 31  38 20 30 38 3a 31 39 3a  .01/04/1 8.08:19:
000001D0  31 38 2e 33 32 36 2c 20  45 74 68 65 72 6e 65 74  18.326,. Ethernet
000001E0  20 41 64 64 72 65 73 73  3a 20 39 63 3a 38 64 3a  .Address :.9c:8d:
000001F0  31 61 3a 30 30 3a 30 37  3a 65 65 0d 0a 30 31 2f  1a:00:07 :ee..01/
00000200  30 34 2f 31 38 20 30 38  3a 31 39 3a 31 38 2e 34  04/18.08 :19:18.4
00000210  34 37 2c 20 53 65 6e 73  6f 72 20 50 6f 72 74 20  47,.Sens or.Port.
00000220  69 6e 69 74 69 61 6c 69  7a 65 64 0d 0a 30 31 2f  initiali zed..01/
00000230  30 34 2f 31 38 20 30 38  3a 31 39 3a 31 38 2e 35  04/18.08 :19:18.5
00000240  30 32 2c 20 49 2f 4f 20  73 65 72 76 69 63 65 73  02,.I/O. services
00000250  20 69 6e 69 74 69 61 6c  69 7a 65 64 0d 0a 30 31  .initial ized..01
00000260  2f 30 34 2f 31 38 20 30  38 3a 31 39 3a 31 38 2e  /04/18.0 8:19:18.
00000270  35 33 35 2c 20 46 54 50  20 73 65 72 76 65 72 20  535,.FTP .server.
00000280  65 6e 61 62 6c 65 64 20  66 6f 72 20 70 6f 72 74  enabled. for.port
00000290  20 32 31 0d 0a 30 31 2f  30 34 2f 31 38 20 30 38  .21..01/ 04/18.08
000002A0  3a 31 39 3a 31 38 2e 35  35 36 2c 20 50 72 6f 74  :19:18.5 56,.Prot
000002B0  6f 63 6f 6c 20 73 65 72  76 65 72 20 65 6e 61 62  ocol.ser ver.enab
000002C0  6c 65 64 20 66 6f 72 20  70 6f 72 74 20 39 32 30  led.for. port.920
000002D0  30 0d 0a 30 31 2f 30 34  2f 31 38 20 30 38 3a 31  0..01/04 /18.08:1
000002E0  39 3a 31 38 2e 35 38 36  2c 20 57 65 62 53 65 72  9:18.586 ,.WebSer
000002F0  76 65 72 20 65 6e 61 62  6c 65 64 20 66 6f 72 20  ver.enab led.for.
00000300  70 6f 72 74 20 38 30 0d  0a 30 31 2f 30 34 2f 31  port.80. .01/04/1
00000310  38 20 30 38 3a 31 39 3a  31 38 2e 36 30 38 2c 20  8.08:19: 18.608,.
00000320  54 65 6c 6e 65 74 20 73  65 72 76 65 72 20 65 6e  Telnet.s erver.en
00000330  61 62 6c 65 64 20 66 6f  72 20 70 6f 72 74 20 32  abled.fo r.port.2
00000340  33 0d 0a 30 31 2f 30 34  2f 31 38 20 30 38 3a 31  3..01/04 /18.08:1
00000350  39 3a 31 38 2e 36 33 32  2c 20 50 4f 52 3a 20 35  9:18.632 ,.POR:.5
00000360  39 32 36 0d 0a 30 31 2f  30 34 2f 31 38 20 30 38  926..01/ 04/18.08
00000370  3a 31 39 3a 31 38 2e 36  35 33 2c 20 43 75 6d 75  :19:18.6 53,.Cumu
00000380  6c 61 74 69 76 65 20 52  75 6e 74 69 6d 65 3a 20  lative.R untime:.
00000390  38 20 57 65 65 6b 73 20  35 20 44 61 79 73 20 31  8.Weeks. 5.Days.1
000003A0  20 48 6f 75 72 20 32 34  3a 33 32 2e 32 38 31 0d  .Hour.24 :32.281.
000003B0  0a 30 31 2f 30 34 2f 31  38 20 30 38 3a 31 39 3a  .01/04/1 8.08:19:
000003C0  31 38 2e 36 37 38 2c 20  42 6f 6f 74 20 43 6f 6d  18.678,. Boot.Com
000003D0  70 6c 65 74 65 64 20 5b  32 2e 33 20 73 65 63 6f  pleted.[ 2.3.seco
000003E0  6e 64 73 5d 0d 0a                                 nds]..

CODE: SELECT ALL

bruce_dev /> jtest jniorboot.log
I=0044 C=0031 P=0000, D=49, L=19
----
I=0064 C=0061 P=001a, D=71, L=3
----
I=0081 C=006c P=002f, D=61, L=21
----
I=0085 C=0082 P=0045, D=61, L=3
----
I=009b C=0098 P=0093, D=5, L=3
I=009d C=009a P=0074, D=38, L=3
----
I=00d2 C=00cf P=009a, D=53, L=3
I=00dc C=00c7 P=006c, D=91, L=21
----
I=00e6 C=00df P=005d, D=130, L=7
----
I=00f4 C=00f1 P=0020, D=209, L=3
----
I=0118 C=0115 P=009a, D=123, L=3
I=0121 C=010d P=00c7, D=70, L=20
----
I=012a C=0125 P=0063, D=194, L=5
----
I=0149 C=0146 P=009a, D=172, L=3
I=0152 C=013e P=00c7, D=119, L=20
I=0153 C=013e P=010d, D=49, L=21
----
I=0157 C=0154 P=0123, D=49, L=3
----
I=0175 C=0172 P=009a, D=216, L=3
I=017e C=0167 P=002c, D=315, L=23
I=017f C=016a P=013e, D=44, L=21
----
I=0183 C=0180 P=00dd, D=163, L=3
----
I=019e C=019b P=009a, D=257, L=3
I=01a7 C=018b P=0162, D=41, L=28
----
I=01ac C=01a9 P=0154, D=85, L=3
----
I=01b6 C=01b3 P=00b1, D=258, L=3
I=01bb C=01b8 P=0129, D=143, L=3
----
I=01ca C=01c7 P=009a, D=301, L=3
I=01d3 C=01bd P=0168, D=85, L=22
I=01d4 C=01bd P=0191, D=44, L=23
----
I=01d8 C=01d5 P=01a9, D=44, L=3
----
I=01e8 C=01e5 P=00a7, D=318, L=3
----
I=0206 C=0203 P=009a, D=361, L=3
I=020f C=01fb P=01bf, D=60, L=20
----
I=0214 C=0211 P=0180, D=145, L=3
I=0216 C=0212 P=0124, D=238, L=4
----
I=0227 C=0224 P=0129, D=251, L=3
I=0236 C=0233 P=009a, D=409, L=3
I=023f C=0221 P=01b5, D=108, L=30
----
I=0245 C=0242 P=00b0, D=402, L=3
----
I=0250 C=024d P=00a6, D=423, L=3
I=0251 C=024e P=0068, D=486, L=3
I=0258 C=0255 P=0129, D=300, L=3
I=0267 C=0264 P=009a, D=458, L=3
I=0270 C=0252 P=01b5, D=157, L=30
I=0271 C=0250 P=021f, D=49, L=33
----
I=0276 C=0273 P=0155, D=286, L=3
----
I=027d C=0278 P=0247, D=49, L=5
----
I=0288 C=0285 P=00f9, D=396, L=3
----
I=028c C=0289 P=0218, D=113, L=3
----
I=0291 C=028d P=021c, D=113, L=4
----
I=029e C=029b P=009a, D=513, L=3
I=02a7 C=0293 P=01fb, D=152, L=20
I=02a8 C=0293 P=025c, D=55, L=21
----
I=02ac C=02a9 P=01d5, D=212, L=3
I=02af C=02ab P=00a2, D=521, L=4
----
I=02b9 C=02b4 P=0247, D=109, L=5
I=02c4 C=02c1 P=00f9, D=456, L=3
I=02c8 C=02c5 P=0218, D=173, L=3
I=02cd C=02b4 P=0278, D=60, L=25
----
I=02dc C=02d9 P=009a, D=575, L=3
I=02e5 C=02cf P=013c, D=403, L=22
I=02e6 C=02d1 P=0293, D=62, L=21
----
I=02ea C=02e7 P=02a9, D=62, L=3
----
I=02f0 C=02ed P=0126, D=455, L=3
I=02f1 C=02ee P=0249, D=165, L=3
I=02fc C=02f9 P=00f9, D=512, L=3
I=0300 C=02fd P=0218, D=229, L=3
I=0305 C=02ee P=02b6, D=56, L=23
----
I=0312 C=030f P=009a, D=629, L=3
I=031b C=0306 P=02d0, D=54, L=21
----
I=0320 C=031d P=0082, D=667, L=3
----
I=0327 C=0323 P=01dd, D=326, L=4
I=032b C=0326 P=0247, D=223, L=5
I=0336 C=0333 P=00f9, D=570, L=3
I=033a C=0337 P=0218, D=287, L=3
I=033f C=0326 P=02b4, D=114, L=25
I=0340 C=0326 P=0278, D=174, L=26
----
I=034c C=0349 P=009a, D=687, L=3
I=0355 C=0341 P=02d1, D=112, L=20
I=0356 C=0341 P=0307, D=58, L=21
----
I=035a C=0357 P=0241, D=278, L=3
I=035b C=0358 P=02aa, D=174, L=3
----
I=036e C=036b P=009a, D=721, L=3
I=0377 C=0363 P=02d1, D=146, L=20
I=0378 C=0363 P=0341, D=34, L=21
----
I=037d C=037a P=0083, D=759, L=3
----
I=038b C=0388 P=018e, D=506, L=3
----
I=0394 C=0391 P=02e9, D=168, L=3
----
I=03ba C=03b7 P=009a, D=797, L=3
I=03c3 C=03ae P=0292, D=284, L=21
I=03c4 C=03af P=0363, D=76, L=21
----
I=03c8 C=03c4 P=0081, D=835, L=4
----
I=03cf C=03cc P=0084, D=840, L=3
----
I=03d6 C=03d3 P=0190, D=579, L=3
I=03d7 C=03d4 P=0333, D=161, L=3
----
I=03dc C=03d9 P=0059, D=896, L=3
I=03de C=03db P=0326, D=181, L=3
----
Processing 77.371 seconds.
Source 998 bytes.
Result 0 bytes.
Ratio 100.00%

bruce_dev /> 

Don’t let the execution times scare you. Remember we are just bread-boarding this in Java and this is running on the JNIOR after all.

So at those points where we would generate compressed output when there is only one match our task is obvious. But what about when there is more than one?

Look closely these matches overlap! That means that if we had acted on the first one we may have missed the opportunity to employ one that follows which sometimes would be a serious improvement. This is the benefit of performing the lazy matches.

There are some cases where there are several usable sequences for a block. We need logic now to select one or more of those sequences so as to end up with the absolute minimum length of generated compressed data. It is not just taking the one longest as there could be two or more that do not overlap but that would result in the optimum outcome. This requires a little examination…

For example in this case:

I=009b C=0098 P=0093, D=5, L=3
I=009d C=009a P=0074, D=38, L=3
----

If we replace 3 bytes at position 0x98 with [D=5, L=3] that will conflict with data for 0x9a and that second match would be unusable. But in this case since both matches are L=3 we really have no choice but to use only one. Here we need to select the one with the shorter distance (D=5) as that would benefit likely the most from the Huffman coding yet to come.

The following case is a little more interesting:

I=00d2 C=00cf P=009a, D=53, L=3
I=00dc C=00c7 P=006c, D=91, L=21
----

Here we see that we can replace 21 bytes at 0xc7 with [D=91, L=21] and since this sequence completely contains the other it isn’t needed at all. In this case going with the longest match happens to provide the best compression. But that cannot always be the rule. Here we need to be careful that our algorithm just doesn’t blindly go for the first replacement.

A little further into the file we have this case:

I=0149 C=0146 P=009a, D=172, L=3
I=0152 C=013e P=00c7, D=119, L=20
I=0153 C=013e P=010d, D=49, L=21
----

Here the longest is again the most beneficial as it completely overlaps the other two.

How about this one?

I=0175 C=0172 P=009a, D=216, L=3
I=017e C=0167 P=002c, D=315, L=23
I=017f C=016a P=013e, D=44, L=21
----

The 2nd and 3rd matches both eliminate the usefulness of the 1st. The 2nd replaces 23 bytes from addresses 0x167 thru 0x17d inclusive. The 3rd replaces 21 bytes from 0x16a thru 0x17e inclusive. There is not a complete overlap but since we still have to choose one over the other the longer one is the most beneficial.

You can see how we might benefit from some careful implementation here. We do have some flexibility to partially implement one or both of the sequences.

Look at this case that occurs further into the file:

I=0250 C=024d P=00a6, D=423, L=3
I=0251 C=024e P=0068, D=486, L=3
I=0258 C=0255 P=0129, D=300, L=3
I=0267 C=0264 P=009a, D=458, L=3
I=0270 C=0252 P=01b5, D=157, L=30
I=0271 C=0250 P=021f, D=49, L=33
----

We should elect to use the first to replace bytes from 0x24d thru 0x24f inclusive with [D=423, L=3]. We then should use the 6th to replace bytes from 0x250 thru 0x270 inclusive with [D=49, L=33]. Thus we replace a total of 36 bytes with two references.

Okay, time to create some logic.

To help visualize I added a little plotting. I found this case as an example of partially applying a match.

I=0327 C=0323 P=01dd, D=326, L=4
I=032b C=0326 P=0247, D=223, L=5
I=0336 C=0333 P=00f9, D=570, L=3
I=033a C=0337 P=0218, D=287, L=3
I=033f C=0326 P=02b4, D=114, L=25
I=0340 C=0326 P=0278, D=174, L=26
0323 - 0340
|--|                          
   |---|                      
                |-|           
                    |-|       
   |-----------------------|  
   |------------------------| 

We can see from this that we can replace the entire range from 0x323 to 0x340 using two sequences.

One option is to truncate the 1st using only 3 of its 4 bytes along with the 6th sequence in its entirety. Remember that the minimum sequence is 3 bytes so we can do this. The replacement being [D=326, L=3][D=174, L=26].

The other option, which we would have to use if the 1st were only 3 bytes, is to use the 1st match and then skip the first byte of the 6th. The replacement then being [D=326, L=4][D=173, L=25].

Is one preferential over the other? I am not sure. This might come down to how the logic is implemented. This is fun though.

Here’s the jniorboot.log run with the sequences plotted if you are interested.

CODE: SELECT ALL

bruce_dev /> cat jniorboot.log -h 
00000000  30 31 2f 30 34 2f 31 38  20 30 38 3a 31 39 3a 31  01/04/18 .08:19:1
00000010  38 2e 31 31 31 2c 20 2a  2a 20 4f 53 20 43 52 43  8.111,.* *.OS.CRC
00000020  20 64 65 74 61 69 6c 20  75 70 64 61 74 65 64 0d  .detail. updated.
00000030  0a 30 31 2f 30 34 2f 31  38 20 30 38 3a 31 39 3a  .01/04/1 8.08:19:
00000040  31 38 2e 31 35 38 2c 20  2d 2d 20 4d 6f 64 65 6c  18.158,. --.Model
00000050  20 34 31 30 20 76 31 2e  36 2e 33 20 2d 20 4a 41  .410.v1. 6.3.-.JA
00000060  4e 4f 53 20 53 65 72 69  65 73 20 34 0d 0a 30 31  NOS.Seri es.4..01
00000070  2f 30 34 2f 31 38 20 30  38 3a 31 39 3a 31 38 2e  /04/18.0 8:19:18.
00000080  31 37 38 2c 20 43 6f 70  79 72 69 67 68 74 20 28  178,.Cop yright.(
00000090  63 29 20 32 30 31 32 2d  32 30 31 38 20 49 4e 54  c).2012- 2018.INT
000000A0  45 47 20 50 72 6f 63 65  73 73 20 47 72 6f 75 70  EG.Proce ss.Group
000000B0  2c 20 49 6e 63 2e 2c 20  47 69 62 73 6f 6e 69 61  ,.Inc.,. Gibsonia
000000C0  20 50 41 20 55 53 41 0d  0a 30 31 2f 30 34 2f 31  .PA.USA. .01/04/1
000000D0  38 20 30 38 3a 31 39 3a  31 38 2e 31 39 37 2c 20  8.08:19: 18.197,.
000000E0  4a 41 4e 4f 53 20 77 72  69 74 74 65 6e 20 61 6e  JANOS.wr itten.an
000000F0  64 20 64 65 76 65 6c 6f  70 65 64 20 62 79 20 42  d.develo ped.by.B
00000100  72 75 63 65 20 43 6c 6f  75 74 69 65 72 0d 0a 30  ruce.Clo utier..0
00000110  31 2f 30 34 2f 31 38 20  30 38 3a 31 39 3a 31 38  1/04/18. 08:19:18
00000120  2e 32 31 36 2c 20 53 65  72 69 61 6c 20 4e 75 6d  .216,.Se rial.Num
00000130  62 65 72 3a 20 36 31 34  30 37 30 35 30 30 0d 0a  ber:.614 070500..
00000140  30 31 2f 30 34 2f 31 38  20 30 38 3a 31 39 3a 31  01/04/18 .08:19:1
00000150  38 2e 32 33 36 2c 20 46  69 6c 65 20 53 79 73 74  8.236,.F ile.Syst
00000160  65 6d 20 6d 6f 75 6e 74  65 64 0d 0a 30 31 2f 30  em.mount ed..01/0
00000170  34 2f 31 38 20 30 38 3a  31 39 3a 31 38 2e 32 35  4/18.08: 19:18.25
00000180  37 2c 20 52 65 67 69 73  74 72 79 20 6d 6f 75 6e  7,.Regis try.moun
00000190  74 65 64 0d 0a 30 31 2f  30 34 2f 31 38 20 30 38  ted..01/ 04/18.08
000001A0  3a 31 39 3a 31 38 2e 33  30 36 2c 20 4e 65 74 77  :19:18.3 06,.Netw
000001B0  6f 72 6b 20 49 6e 69 74  69 61 6c 69 7a 65 64 0d  ork.Init ialized.
000001C0  0a 30 31 2f 30 34 2f 31  38 20 30 38 3a 31 39 3a  .01/04/1 8.08:19:
000001D0  31 38 2e 33 32 36 2c 20  45 74 68 65 72 6e 65 74  18.326,. Ethernet
000001E0  20 41 64 64 72 65 73 73  3a 20 39 63 3a 38 64 3a  .Address :.9c:8d:
000001F0  31 61 3a 30 30 3a 30 37  3a 65 65 0d 0a 30 31 2f  1a:00:07 :ee..01/
00000200  30 34 2f 31 38 20 30 38  3a 31 39 3a 31 38 2e 34  04/18.08 :19:18.4
00000210  34 37 2c 20 53 65 6e 73  6f 72 20 50 6f 72 74 20  47,.Sens or.Port.
00000220  69 6e 69 74 69 61 6c 69  7a 65 64 0d 0a 30 31 2f  initiali zed..01/
00000230  30 34 2f 31 38 20 30 38  3a 31 39 3a 31 38 2e 35  04/18.08 :19:18.5
00000240  30 32 2c 20 49 2f 4f 20  73 65 72 76 69 63 65 73  02,.I/O. services
00000250  20 69 6e 69 74 69 61 6c  69 7a 65 64 0d 0a 30 31  .initial ized..01
00000260  2f 30 34 2f 31 38 20 30  38 3a 31 39 3a 31 38 2e  /04/18.0 8:19:18.
00000270  35 33 35 2c 20 46 54 50  20 73 65 72 76 65 72 20  535,.FTP .server.
00000280  65 6e 61 62 6c 65 64 20  66 6f 72 20 70 6f 72 74  enabled. for.port
00000290  20 32 31 0d 0a 30 31 2f  30 34 2f 31 38 20 30 38  .21..01/ 04/18.08
000002A0  3a 31 39 3a 31 38 2e 35  35 36 2c 20 50 72 6f 74  :19:18.5 56,.Prot
000002B0  6f 63 6f 6c 20 73 65 72  76 65 72 20 65 6e 61 62  ocol.ser ver.enab
000002C0  6c 65 64 20 66 6f 72 20  70 6f 72 74 20 39 32 30  led.for. port.920
000002D0  30 0d 0a 30 31 2f 30 34  2f 31 38 20 30 38 3a 31  0..01/04 /18.08:1
000002E0  39 3a 31 38 2e 35 38 36  2c 20 57 65 62 53 65 72  9:18.586 ,.WebSer
000002F0  76 65 72 20 65 6e 61 62  6c 65 64 20 66 6f 72 20  ver.enab led.for.
00000300  70 6f 72 74 20 38 30 0d  0a 30 31 2f 30 34 2f 31  port.80. .01/04/1
00000310  38 20 30 38 3a 31 39 3a  31 38 2e 36 30 38 2c 20  8.08:19: 18.608,.
00000320  54 65 6c 6e 65 74 20 73  65 72 76 65 72 20 65 6e  Telnet.s erver.en
00000330  61 62 6c 65 64 20 66 6f  72 20 70 6f 72 74 20 32  abled.fo r.port.2
00000340  33 0d 0a 30 31 2f 30 34  2f 31 38 20 30 38 3a 31  3..01/04 /18.08:1
00000350  39 3a 31 38 2e 36 33 32  2c 20 50 4f 52 3a 20 35  9:18.632 ,.POR:.5
00000360  39 32 36 0d 0a 30 31 2f  30 34 2f 31 38 20 30 38  926..01/ 04/18.08
00000370  3a 31 39 3a 31 38 2e 36  35 33 2c 20 43 75 6d 75  :19:18.6 53,.Cumu
00000380  6c 61 74 69 76 65 20 52  75 6e 74 69 6d 65 3a 20  lative.R untime:.
00000390  38 20 57 65 65 6b 73 20  35 20 44 61 79 73 20 31  8.Weeks. 5.Days.1
000003A0  20 48 6f 75 72 20 32 34  3a 33 32 2e 32 38 31 0d  .Hour.24 :32.281.
000003B0  0a 30 31 2f 30 34 2f 31  38 20 30 38 3a 31 39 3a  .01/04/1 8.08:19:
000003C0  31 38 2e 36 37 38 2c 20  42 6f 6f 74 20 43 6f 6d  18.678,. Boot.Com
000003D0  70 6c 65 74 65 64 20 5b  32 2e 33 20 73 65 63 6f  pleted.[ 2.3.seco
000003E0  6e 64 73 5d 0d 0a                                 nds]..

bruce_dev />

CODE: SELECT ALL

bruce_dev /> jtest jniorboot.log
I=0044 C=0031 P=0000, D=49, L=19
0031 - 0044
|-----------------| 

I=0064 C=0061 P=001a, D=71, L=3
0061 - 0064
|-| 

I=0081 C=006c P=002f, D=61, L=21
006c - 0081
|-------------------| 

I=0086 C=0082 P=0045, D=61, L=3
0082 - 0085
|-| 

I=009d C=0098 P=0093, D=5, L=3
I=009d C=009a P=0074, D=38, L=3
0098 - 009d
|-|   
  |-| 

I=00dd C=00cf P=009a, D=53, L=3
I=00dd C=00c7 P=006c, D=91, L=21
00c7 - 00dc
        |-|           
|-------------------| 

I=00e6 C=00df P=005d, D=130, L=7
00df - 00e6
|-----| 

I=00f4 C=00f1 P=0020, D=209, L=3
00f1 - 00f4
|-| 

I=0121 C=0115 P=009a, D=123, L=3
I=0121 C=010d P=00c7, D=70, L=20
010d - 0121
        |-|          
|------------------| 

I=012b C=0125 P=0063, D=194, L=5
0125 - 012a
|---| 

I=0153 C=0146 P=009a, D=172, L=3
I=0153 C=013e P=00c7, D=119, L=20
I=0153 C=013e P=010d, D=49, L=21
013e - 0153
        |-|           
|------------------|  
|-------------------| 

I=0157 C=0154 P=0123, D=49, L=3
0154 - 0157
|-| 

I=017f C=0172 P=009a, D=216, L=3
I=017f C=0167 P=002c, D=315, L=23
I=017f C=016a P=013e, D=44, L=21
0167 - 017f
           |-|           
|---------------------|  
   |-------------------| 

I=0183 C=0180 P=00dd, D=163, L=3
0180 - 0183
|-| 

I=01a8 C=019b P=009a, D=257, L=3
I=01a8 C=018b P=0162, D=41, L=28
018b - 01a7
                |-|          
|--------------------------| 

I=01ad C=01a9 P=0154, D=85, L=3
01a9 - 01ac
|-| 

I=01bb C=01b3 P=00b1, D=258, L=3
I=01bb C=01b8 P=0129, D=143, L=3
01b3 - 01bb
|-|      
     |-| 

I=01d4 C=01c7 P=009a, D=301, L=3
I=01d4 C=01bd P=0168, D=85, L=22
I=01d4 C=01bd P=0191, D=44, L=23
01bd - 01d4
          |-|           
|--------------------|  
|---------------------| 

I=01d8 C=01d5 P=01a9, D=44, L=3
01d5 - 01d8
|-| 

I=01e8 C=01e5 P=00a7, D=318, L=3
01e5 - 01e8
|-| 

I=020f C=0203 P=009a, D=361, L=3
I=020f C=01fb P=01bf, D=60, L=20
01fb - 020f
        |-|          
|------------------| 

I=0217 C=0211 P=0180, D=145, L=3
I=0217 C=0212 P=0124, D=238, L=4
0211 - 0216
|-|   
 |--| 

I=023f C=0224 P=0129, D=251, L=3
I=023f C=0233 P=009a, D=409, L=3
I=023f C=0221 P=01b5, D=108, L=30
0221 - 023f
   |-|                         
                  |-|          
|----------------------------| 

I=0245 C=0242 P=00b0, D=402, L=3
0242 - 0245
|-| 

I=0271 C=024d P=00a6, D=423, L=3
I=0271 C=024e P=0068, D=486, L=3
I=0271 C=0255 P=0129, D=300, L=3
I=0271 C=0264 P=009a, D=458, L=3
I=0271 C=0252 P=01b5, D=157, L=30
I=0271 C=0250 P=021f, D=49, L=33
024d - 0271
|-|                                  
 |-|                                 
        |-|                          
                       |-|           
     |----------------------------|  
   |-------------------------------| 

I=0276 C=0273 P=0155, D=286, L=3
0273 - 0276
|-| 

I=0280 C=0278 P=0247, D=49, L=5
0278 - 027d
|---| 

I=0288 C=0285 P=00f9, D=396, L=3
0285 - 0288
|-| 

I=028c C=0289 P=0218, D=113, L=3
0289 - 028c
|-| 

I=0293 C=028d P=021c, D=113, L=4
028d - 0291
|--| 

I=02a8 C=029b P=009a, D=513, L=3
I=02a8 C=0293 P=01fb, D=152, L=20
I=02a8 C=0293 P=025c, D=55, L=21
0293 - 02a8
        |-|           
|------------------|  
|-------------------| 

I=02af C=02a9 P=01d5, D=212, L=3
I=02af C=02ab P=00a2, D=521, L=4
02a9 - 02af
|-|    
  |--| 

I=02ce C=02b4 P=0247, D=109, L=5
I=02ce C=02c1 P=00f9, D=456, L=3
I=02ce C=02c5 P=0218, D=173, L=3
I=02ce C=02b4 P=0278, D=60, L=25
02b4 - 02cd
|---|                     
             |-|          
                 |-|      
|-----------------------| 

I=02e7 C=02d9 P=009a, D=575, L=3
I=02e7 C=02cf P=013c, D=403, L=22
I=02e7 C=02d1 P=0293, D=62, L=21
02cf - 02e6
          |-|           
|--------------------|  
  |-------------------| 

I=02ea C=02e7 P=02a9, D=62, L=3
02e7 - 02ea
|-| 

I=0305 C=02ed P=0126, D=455, L=3
I=0305 C=02ee P=0249, D=165, L=3
I=0305 C=02f9 P=00f9, D=512, L=3
I=0305 C=02fd P=0218, D=229, L=3
I=0305 C=02ee P=02b6, D=56, L=23
02ed - 0305
|-|                      
 |-|                     
            |-|          
                |-|      
 |---------------------| 

I=031c C=030f P=009a, D=629, L=3
I=031c C=0306 P=02d0, D=54, L=21
0306 - 031b
         |-|          
|-------------------| 

I=0320 C=031d P=0082, D=667, L=3
031d - 0320
|-| 

I=0341 C=0323 P=01dd, D=326, L=4
I=0341 C=0326 P=0247, D=223, L=5
I=0341 C=0333 P=00f9, D=570, L=3
I=0341 C=0337 P=0218, D=287, L=3
I=0341 C=0326 P=02b4, D=114, L=25
I=0341 C=0326 P=0278, D=174, L=26
0323 - 0340
|--|                          
   |---|                      
                |-|           
                    |-|       
   |-----------------------|  
   |------------------------| 

I=0356 C=0349 P=009a, D=687, L=3
I=0356 C=0341 P=02d1, D=112, L=20
I=0356 C=0341 P=0307, D=58, L=21
0341 - 0356
        |-|           
|------------------|  
|-------------------| 

I=035b C=0357 P=0241, D=278, L=3
I=035b C=0358 P=02aa, D=174, L=3
0357 - 035b
|-|  
 |-| 

I=0378 C=036b P=009a, D=721, L=3
I=0378 C=0363 P=02d1, D=146, L=20
I=0378 C=0363 P=0341, D=34, L=21
0363 - 0378
        |-|           
|------------------|  
|-------------------| 

I=037d C=037a P=0083, D=759, L=3
037a - 037d
|-| 

I=038c C=0388 P=018e, D=506, L=3
0388 - 038b
|-| 

I=0395 C=0391 P=02e9, D=168, L=3
0391 - 0394
|-| 

I=03c4 C=03b7 P=009a, D=797, L=3
I=03c4 C=03ae P=0292, D=284, L=21
I=03c4 C=03af P=0363, D=76, L=21
03ae - 03c4
         |-|           
|-------------------|  
 |-------------------| 

I=03c9 C=03c4 P=0081, D=835, L=4
03c4 - 03c8
|--| 

I=03cf C=03cc P=0084, D=840, L=3
03cc - 03cf
|-| 

I=03d7 C=03d3 P=0190, D=579, L=3
I=03d7 C=03d4 P=0333, D=161, L=3
03d3 - 03d7
|-|  
 |-| 

I=03de C=03d9 P=0059, D=896, L=3
I=03de C=03db P=0326, D=181, L=3
03d9 - 03de
|-|   
  |-| 

Processing 80.766 seconds.
Source 998 bytes.
Result 0 bytes.
Ratio 100.00%

bruce_dev />

I think the first step is to filter out sequences that are completely covered by another. That seems to happen a lot.

I have filtered matched sequences that are eclipsed by another in the block. Since we know what to do when only one sequence remains (just use it) I filtered those from the output. So we are left with overlapping situations that remain to be studied.

CODE: SELECT ALL

bruce_dev /> jtest jniorboot.log
I=009d C=0098 P=0093, D=5, L=3
I=009d C=009a P=0074, D=38, L=3
0098 - 009d
|-|   
  |-| 

I=017f C=0167 P=002c, D=315, L=23
I=017f C=016a P=013e, D=44, L=21
0167 - 017f
|---------------------|  
   |-------------------| 

I=01bb C=01b3 P=00b1, D=258, L=3
I=01bb C=01b8 P=0129, D=143, L=3
01b3 - 01bb
|-|      
     |-| 

I=0217 C=0211 P=0180, D=145, L=3
I=0217 C=0212 P=0124, D=238, L=4
0211 - 0216
|-|   
 |--| 

I=0271 C=024d P=00a6, D=423, L=3
I=0271 C=024e P=0068, D=486, L=3
I=0271 C=0250 P=021f, D=49, L=33
024d - 0271
|-|                                  
 |-|                                 
   |-------------------------------| 

I=02af C=02a9 P=01d5, D=212, L=3
I=02af C=02ab P=00a2, D=521, L=4
02a9 - 02af
|-|    
  |--| 

I=02e7 C=02cf P=013c, D=403, L=22
I=02e7 C=02d1 P=0293, D=62, L=21
02cf - 02e6
|--------------------|  
  |-------------------| 

I=0305 C=02ed P=0126, D=455, L=3
I=0305 C=02ee P=02b6, D=56, L=23
02ed - 0305
|-|                      
 |---------------------| 

I=0341 C=0323 P=01dd, D=326, L=4
I=0341 C=0326 P=0278, D=174, L=26
0323 - 0340
|--|                          
   |------------------------| 

I=035b C=0357 P=0241, D=278, L=3
I=035b C=0358 P=02aa, D=174, L=3
0357 - 035b
|-|  
 |-| 

I=03c4 C=03ae P=0292, D=284, L=21
I=03c4 C=03af P=0363, D=76, L=21
03ae - 03c4
|-------------------|  
 |-------------------| 

I=03d7 C=03d3 P=0190, D=579, L=3
I=03d7 C=03d4 P=0333, D=161, L=3
03d3 - 03d7
|-|  
 |-| 

I=03de C=03d9 P=0059, D=896, L=3
I=03de C=03db P=0326, D=181, L=3
03d9 - 03de
|-|   
  |-| 

Processing 78.368 seconds.
Source 998 bytes.
Result 0 bytes.
Ratio 100.00%

bruce_dev />

There is one block where the remaining matches are mutually exclusive. It is also obvious what to do for that but I would still need to identify it. Maybe the goal is to reduce the sequence set to one but if you cannot then to get to a set of mutually exclusive matches.

Added a step to force mutual exclusivity. The logic now appears as follows:

    // Our LZ77 compression engine
    static void do_compress(BufferedWriter outfile, BufferedReader infile) throws Throwable {
        
        boolean bFound = false;
 
        // process uncompressed stream
        while (infile.ready()) {
            
            // obtain next byte
            int ch = infile.read();
            //System.out.print((char)ch);
            
            // process active Match objects
            Match best = null;
            for (int n = SEQ.size() - 1; 0 <= n; n--) { Match m = SEQ.get(n); if (!m.check(ch)) { if (m.len >= 3) {
                        if (best == null)
                            best = m;
                        else if (m.len > best.len)
                            best = m;
                        else if (m.len == best.len && m.distance < best.distance)
                            best = m;
                    }
                            
                    SEQ.remove(n);
                }
            }
            if (best != null) {
                REPL.add(best);
                bFound = true;
            }
            
            if (bFound && SEQ.size() == 0) {
                
                // filter out sequences eclipsed by another
                for (int n = REPL.size() - 1; 0 <= n; n--) {
                    Match mn = REPL.get(n);
                    
                    int k;
                    for (k = REPL.size() - 1; 0 <= k; k--) { if (k == n) continue; Match mk = REPL.get(k); if (mn.curptr >= mk.curptr && mn.curptr + mn.len <= mk.curptr + mk.len)
                            break;
                    }
                    if (0 <= k)
                        REPL.remove(n);
                }
                
                // Force mutual exclusivity. Note that REPL at this point has matchin SEQ with
                //  increasing CURPTR.
                for (int n = 0; n < REPL.size() - 1; n++) {
                    Match n1 = REPL.get(n);
                    Match n2 = REPL.get(n + 1);
                    if (n2.curptr < n1.curptr + n1.len) {
                        int adj = n1.curptr + n1.len - n2.curptr;
                        if (n2.len - adj < 3) REPL.remove(n2); else { n2.curptr += adj; if (n2.curptr >= WINDOW)
                                n2.curptr -= WINDOW;
                            n2.ptr += adj;
                            if (n2.ptr >= WINDOW)
                                n2.ptr -= WINDOW;
                            n2.distance -= adj;
                            n2.len -= adj;
                        }
                    }
                }
                
                // $$$$$$$$$$ - temporary only display when there are still choices
                if (REPL.size() > 1) {
 
                    // determine the overall affected range
                    int start = 0;
                    int end = 0;
                    for (int n = 0; n < REPL.size(); n++) {
                        Match m = REPL.get(n);
                        System.out.printf("I=%04x C=%04x P=%04x, D=%d, L=%d\n", 
                                INPTR, m.curptr, m.start, m.distance, m.len);
 
                        if (n == 0 || m.curptr < start) start = m.curptr; if (n == 0 || m.curptr + m.len > end)
                            end = m.curptr + m.len;
                    }
 
                    // plot
                    System.out.printf("%04x - %04x\n", start, end);
 
                    for (int n = 0; n < REPL.size(); n++) {
                        Match m = REPL.get(n);
                        for (int i = start; i <= end; i++) {
                            if (i < m.curptr || i >= m.curptr + m.len)
                                System.out.print(" ");
                            else if (i == m.curptr || i == m.curptr + m.len - 1)
                                System.out.print("|");
                            else
                                System.out.print("-");
                        }
                        System.out.println("");
                    }
                    System.out.println("");
                }
 
                REPL.clear();
                bFound = false;
            }
            
            // queue uncompressed DATA

As matches are located we select the best (lines 14 thru 29) and add them to a list (lines 30 thru 33). Then later when we reach that point where there are no more active matches and we can generate compressed output (line 35) we filter those eclipsed sequences (lines 38 thru 52). Next we force the remaining sequences to be mutually exclusive (lines 56 thru 74).

Since these sequences appear in the REPL list in increasing CURPTR order (at least they appear to be) we take pairs of sequences and shift the starting point of the next one so it no longer overlaps. If this shrinks the match to less than 3 bytes it is removed.

After that there is code to display and plot the remaining sequences if there are more than one. Here we see that in every case we have created a mutually exclusive set.

CODE: SELECT ALL

bruce_dev /> jtest jniorboot.log
I=01bb C=01b3 P=00b1, D=258, L=3
I=01bb C=01b8 P=0129, D=143, L=3
01b3 - 01bb
|-|      
     |-| 

I=0271 C=024d P=00a6, D=423, L=3
I=0271 C=0250 P=021f, D=49, L=33
024d - 0271
|-|                                  
   |-------------------------------| 

I=02af C=02a9 P=01d5, D=212, L=3
I=02af C=02ac P=00a2, D=520, L=3
02a9 - 02af
|-|    
   |-| 

I=0305 C=02ed P=0126, D=455, L=3
I=0305 C=02f0 P=02b6, D=54, L=21
02ed - 0305
|-|                      
   |-------------------| 

I=0341 C=0323 P=01dd, D=326, L=4
I=0341 C=0327 P=0278, D=173, L=25
0323 - 0340
|--|                          
    |-----------------------| 

Processing 77.415 seconds.
Source 998 bytes.
Result 0 bytes.
Ratio 100.00%

bruce_dev /> 

So it would appear now that I have what I need to generate the compressed output stream.

Generating the compressed output using the [D=,L=] format just to make things visible actually enlarges the file of course. But here it is (some line breaks at the right margin manually inserted).

bruce_dev /> cat outfile.dat
01/04/18 08:19:18.111, ** OS CRC detail updated
[D=49,L=19]58, -- Model 410 v1.6.3 - JAN[D=71,L=3]Series 4[D=61,L=21]7[D=61,L=3]Copyright (c) 2012-[D=5,L=3]8
 INTEG Process Group, Inc., Gibsonia PA USA[D=91,L=21]97,[D=130,L=7]written and[D=209,L=3]veloped by Bruce Cl
outier[D=70,L=20]216,[D=194,L=5]al Number: 614070500[D=49,L=21]3[D=49,L=3]File System moun[D=315,L=23]25[D=16
3,L=3]Registry[D=41,L=28]30[D=85,L=3]Network[D=258,L=3]it[D=143,L=3]iz[D=44,L=23]2[D=44,L=3]Ethernet Addr[D=3
18,L=3]: 9c:8d:1a:00:07:ee[D=60,L=20]44[D=145,L=3]Sensor Port i[D=108,L=30]502[D=402,L=3]/O servi[D=423,L=3][
D=49,L=33]35[D=286,L=3]TP[D=49,L=5]er enabl[D=396,L=3]f[D=113,L=3]p[D=113,L=4]21[D=55,L=21]5[D=212,L=3][D=520
,L=3]tocol[D=60,L=25]92[D=403,L=22]58[D=62,L=3]Web[D=455,L=3][D=54,L=21]8[D=54,L=21]60[D=667,L=3]Tel[D=326,L=
4][D=173,L=25]3[D=58,L=21]3[D=278,L=3]POR: 5926[D=34,L=21]53[D=759,L=3]umulative R[D=506,L=3]ime: 8[D=168,L=3
]eks 5 Days 1 Hour 24:32.28[D=284,L=21]6[D=835,L=4]Boot[D=840,L=3]mple[D=579,L=3] [2[D=896,L=3]seconds]
bruce_dev />

So it appears to be time to convert this into a bit stream with the proper length and distance codes in preparation for Huffman coding.

Sticking in a couple of placeholder bytes for the length and distance codes this is representative of the pre-Huffman coding compression ratio of this file.

bruce_dev /> jtest jniorboot.log
Processing 79.124 seconds.
Source 998 bytes.
Result 533 bytes.
Ratio 46.59%

bruce_dev />

Oh it’ll be fast in C and in the JANOS kernel. Okay… Huffman.

So before getting too deep into generating the Huffman coding with dynamic tables I figured that it would make sense to write a quick decompressor for my interim LZ77 compression as a check. I modified the compressor to output length and distance codes as shorts using a 0xff prefix which I then escaped. This stream I will later be able to digest in performing the Huffman coding. The decompressor would take outfile.dat and generate the decompressed newfile.dat.

Well after compressing and then decompressing the content of newfile.dat resembled jniorboot.log very closely but there were a few variances. First, I found the glitch in the step that eliminates any overlap in matched sequences (shouldn’t have modified distance when also modifying the CURPTR). Then I had to address the boundary conditions at the end of the file in order to properly process the entire file (I ended up a couple of bytes short initially). With that we achieved success.

You can see here how we can use MANIFEST to verify file size and content. Note that the MD5 are identical.

bruce_dev /> jtest jniorboot.log
Processing 69.323 seconds.
Source 950 bytes.
Result 671 bytes.
Ratio 1.42:1

bruce_dev /> jtest2

bruce_dev /> manifest jniorboot.log
JNIOR Manifest      Fri Jan 05 11:02:57 EST 2018
  Size                  MD5                  File Specification
 950      dc425a0283e22944b463eeab9e625adb  [Modified] /jniorboot.log
End of Manifest (1 files listed)

bruce_dev /> manifest newfile.dat  
JNIOR Manifest      Fri Jan 05 11:02:59 EST 2018
  Size                  MD5                  File Specification
 950      dc425a0283e22944b463eeab9e625adb  [New] /newfile.dat
End of Manifest (1 files listed)

bruce_dev />

Here is the original content and the resulting compressed format that I have used in bread-boarding this.

CODE: SELECT ALL

bruce_dev /> cat jniorboot.log
01/05/18 07:39:52.913, -- Model 410 v1.6.3 - JANOS Series 4
01/05/18 07:39:52.960, Copyright (c) 2012-2018 INTEG Process Group, Inc., Gibsonia PA USA
01/05/18 07:39:52.980, JANOS written and developed by Bruce Cloutier
01/05/18 07:39:52.999, Serial Number: 614070500
01/05/18 07:39:53.018, File System mounted
01/05/18 07:39:53.039, Registry mounted
01/05/18 07:39:53.089, Network Initialized
01/05/18 07:39:53.109, Ethernet Address: 9c:8d:1a:00:07:ee
01/05/18 07:39:53.229, Sensor Port initialized
01/05/18 07:39:53.284, I/O services initialized
01/05/18 07:39:53.327, FTP server enabled for port 21
01/05/18 07:39:53.347, Protocol server enabled for port 9200
01/05/18 07:39:53.368, WebServer enabled for port 80
01/05/18 07:39:53.390, Telnet server enabled for port 23
01/05/18 07:39:53.414, POR: 5927
01/05/18 07:39:53.439, Cumulative Runtime: 8 Weeks 5 Days 9 Hours 32:22.102
01/05/18 07:39:53.460, Boot Completed [2.3 seconds]

bruce_dev />

CODE: SELECT ALL

bruce_dev /> cat outfile.dat -h
00000000  30 31 2f 30 35 2f 31 38  20 30 37 3a 33 39 3a 35  01/05/18 .07:39:5
00000010  32 2e 39 31 33 2c 20 2d  2d 20 4d 6f 64 65 6c 20  2.913,.- -.Model.
00000020  34 31 30 20 76 31 2e 36  2e 33 20 2d 20 4a 41 4e  410.v1.6 .3.-.JAN
00000030  4f 53 20 53 65 72 69 65  73 20 34 0d 0a ff 00 13  OS.Serie s.4.....
00000040  00 3d 36 30 2c 20 43 6f  70 79 72 69 67 68 74 20  .=60,.Co pyright.
00000050  28 63 29 20 32 30 31 32  2d ff 00 03 00 05 38 20  (c).2012 -.....8.
00000060  49 4e 54 45 47 20 50 72  6f 63 65 73 73 20 47 72  INTEG.Pr ocess.Gr
00000070  6f 75 70 2c 20 49 6e 63  2e 2c 20 47 69 62 73 6f  oup,.Inc .,.Gibso
00000080  6e 69 61 20 50 41 20 55  53 41 ff 00 15 00 5b 38  nia.PA.U SA....[8
00000090  ff 00 03 00 5b ff 00 06  00 82 77 72 69 74 74 65  ....[... ..writte
000000A0  6e 20 61 6e 64 20 64 65  76 65 6c 6f 70 65 64 20  n.and.de veloped.
000000B0  62 79 20 42 72 75 63 65  20 43 6c 6f 75 74 69 65  by.Bruce .Cloutie
000000C0  72 ff 00 15 00 46 39 39  2c ff 00 05 00 c2 61 6c  r....F99 ,.....al
000000D0  20 4e 75 6d 62 65 72 3a  20 36 31 34 30 37 30 35  .Number: .6140705
000000E0  30 30 ff 00 12 00 31 33  2e ff 00 03 00 b9 2c 20  00....13 ......,.
000000F0  46 69 6c 65 20 53 79 73  74 65 6d 20 6d 6f 75 6e  File.Sys tem.moun
00000100  74 65 64 ff 00 15 00 2c  33 ff 00 03 00 5d 52 65  ted...., 3....]Re
00000110  67 69 73 74 72 79 ff 00  1d 00 29 38 ff 00 03 00  gistry.. ..)8....
00000120  29 4e 65 74 77 6f 72 6b  ff 00 03 01 02 69 74 ff  )Network .....it.
00000130  00 03 00 8f 69 7a ff 00  16 00 2c 31 30 ff 00 03  ....iz.. ..,10...
00000140  00 2c 45 74 68 65 72 6e  65 74 20 41 64 64 72 ff  .,Ethern et.Addr.
00000150  00 03 01 3e 3a 20 39 63  3a 38 64 3a 31 61 3a 30  ...>:.9c :8d:1a:0
00000160  30 3a ff 00 03 00 2c 65  65 ff 00 14 00 3c 32 32  0:....,e e....<22
00000170  ff 00 05 00 ee 6e 73 6f  72 20 50 6f 72 74 20 69  .....nso r.Port.i
00000180  ff 00 1e 00 6c 32 38 34  ff 00 03 01 92 2f 4f 20  ....l284 ...../O.
00000190  73 65 72 76 69 ff 00 03  01 a7 ff 00 20 00 31 33  servi... ......13
000001A0  32 37 ff 00 03 01 1e 54  50 ff 00 05 00 31 65 72  27.....T P....1er
000001B0  20 65 6e 61 62 6c ff 00  03 01 8c 66 ff 00 03 00  .enabl.. ...f....
000001C0  71 70 ff 00 04 00 71 32  31 ff 00 15 00 37 34 ff  qp....q2 1....74.
000001D0  00 03 00 37 ff 00 03 02  09 74 6f 63 6f 6c ff 00  ...7.... .tocol..
000001E0  19 00 3c 39 32 ff 00 16  01 93 33 36 ff 00 03 01  ..<92... ..36....
000001F0  93 57 65 62 ff 00 03 01  c7 ff 00 15 00 38 38 ff  .Web.... .....88.
00000200  00 16 00 36 39 ff 00 03  02 40 54 65 6c ff 00 04  ...69... .@Tel...
00000210  01 46 ff 00 19 00 ae 33  ff 00 14 00 3a 34 31 ff  .F.....3 ....:41.
00000220  00 03 01 16 50 4f 52 3a  20 35 39 32 37 ff 00 15  ....POR: .5927...
00000230  00 22 ff 00 04 01 f9 43  75 6d 75 6c 61 74 69 76  .".....C umulativ
00000240  65 20 52 ff 00 03 01 fa  69 6d 65 3a 20 38 ff 00  e.R..... ime:.8..
00000250  03 00 a8 65 6b 73 20 35  20 44 61 79 73 20 39 20  ...eks.5 .Days.9.
00000260  48 6f 75 72 73 20 33 32  3a 32 32 ff 00 03 01 da  Hours.32 :22.....
00000270  32 ff 00 15 00 4d ff 00  04 03 44 42 6f 6f 74 ff  2....M.. ..DBoot.
00000280  00 03 03 49 6d 70 6c 65  ff 00 03 02 44 20 5b 32  ...Imple ....D.[2
00000290  ff 00 03 03 81 73 65 63  6f 6e 64 73 5d 0d 0a     .....sec onds]..

bruce_dev />
ATTACHMENTS
JTest2.java
(3.04 KiB) Downloaded 25 times
JTest.java
(9.42 KiB) Downloaded 24 times

So at this point I feel like this algorithm generates the optimum LZ77 compression for the data. This should even take into account the lazy matches however that is perceived by the industry. When I cast it into C I will work on optimizing the execution.

The only question might be in optimizing distance codes to minimize extra bits. I didn’t consider that in pruning the matched sequence list for a block. When those situations occur there might be a bit or two to save if I were to retain the closer match. I am not going to worry about that. My feeling is that we son’t save anything noticeable if anything at all.

Now to handle the Huffman coding.

Well there are a couple of bugs in my coding which were discovered while testing the approach on much larger files. With those issues fixed I see that I need to focus on optimizing because this all-encompassing matching is much too slow (even when considering the Java breadboard).

The approach would find all of the sequence matches to data in the previous 32KB of the stream (sliding window) for a section of the input stream bound by non-matching data. Once collected I would then end up trashing the vast majority of those. That is wasteful of processing time. It was a logical approach if without thinking you weren’t sure if a better compression ratio couldn’t be obtained through careful selection of sequences. There is this suggestion that better compression is possible if lazy matches are allowed. Without really knowing what those are the shotgun all-encompassing approach guaranteed at least that you had all of the information you needed to reach the optimum. Let’s actually look at this more closely.

Matching

We start a match upon receipt of a data bytes. I’m keeping a bidirectional linked list for the occurrences of each byte value. This allows the routine to rapidly create an active match object for each. Subsequently as each new byte is received we check each active match for those that may be extended and those that are no longer useful. When we reach a point where none of the active matches have been extended we select the longest match completed as the best. For matches of equal length we pick the closest one (lowest distance).

The DEFLATE specification recognizes matches of 3 or more bytes (maximum 258). Why 3? That is because the compression is achieved by replacing a matched sequence with a pointer back to the same data found in previous 32KB of data (the sliding window). That pointer consists of a distance and a length. That pointer in the worst case requires about 3 bytes. So replacing shorter sequences on average won’t buy you anything. That’s for DEFALTE. I am actually using like 5 bytes for this breadboard but eventually we will be be strictly DEFLATE. Obviously the longer the match the greater the savings. Therefore the best match is the longest and closest (uses a minimum pointer size).

So for a point in the incoming stream we seek the longest match. If there is no 3 byte match then we output that first byte as is and search using the next one. The results can be impressive especially for text and log files. It’s not LZW but it works. It turns out to be good enough for the JNIOR.

Lazy Matching

So what is with this lazy matching? Well imagine a sequence of 3 or more matching bytes located someplace in the sliding window. The consider that if we ignored that match and searched for matches starting with the next byte we might find a much longer match from someplace else in the sliding window. Do we miss an opportunity for better compression?

I can graphically show the overlap of the best matches. Say the first is 5 in lenght and the other some 15.

|---|
 |-------------|

Here vertical bars denote the first and last matching byte and the dashes bytes in between. The first would replace 5 data bytes and the second 15 starting a byte later.

If we were to strictly process matches as they are found we would encode the 5-byte match. And then we would still find the latter 11 bytes of the seconds sequence (or maybe even another better sequence). This would encode as two sequences one right after another and require two pointers for a total of maybe 6 bytes.

|---||---------|
2 pointers = 6 bytes

Note that we can always prune a match. We can ignore some of its leading bytes by incrementing the replace position (CURPTR) and decrementing the length. We can even ignore trailing bytes in a match merely by shortening its length. So here we drop the first 4 bytes of the 15-byte sequence that were eclipsed by the initial 5-byte sequence. We don’t have to actually do this manipulation as supposedly our search algorithm would find it directly for us.

Now those who get excited by such things would point out that if we absolutely ignored the first 5-byte sequence completely and outputted that first raw byte then we would use just one pointer.

.|-------------|
1 raw byte plus 1 pointer = 4 bytes

And, yes, there is a savings that depending on how often such a thing occurs will in fact lead to a better result. This is a lazy match. This is even true when the second sequence is further offset as seen here.

|---|
  |-------------|

..|-------------|
2 raw bytes plus 1 pointer = 5 bytes

But there is no benefit beyond that. If the two sequences were offset by 3 bytes then you might as well include first 3 as a sequence and you end up using 6 bytes (or less) anyway with 2 pointers.

OK so

Alright for the JNIOR it is likely that these lazy matches aren’t necessary. After all we just want to create a file collection or a graphics file. We aren’t really worried about saving every byte. In fact, we are probably more concerned about it getting done quickly.

So using matches as they come works. But… if we can find a way to efficiently accommodate the lazy matching it would be cool.

Optimized Program Code

Now that I have a little better understanding as to the lazy matching we can take what we have and move on to the next step in DEFLATE. Later after I cast this into C we can decide if it is worth handling the lazy matches. It represents a trade off between an optimum compression ratio and processing time. For the JNIOR we really are more concerned about the processing time as the brute force compression ratios appear more than acceptable. Note that I bet that our original processing of all possible matches between unmatched raw data would lead to an even better compression ratio than that including just the lazy matches but that would be slow.

Speaking of slow I thought to take a little time to distill our algorithm down and to code it so it would execute faster. I know that it is still a breadboard but I would like to not waste as much time with iterations in debugging. So I have eliminated the Match object and automatic growing lists. And since we are identifying the best match for a single position I have eliminated the list of completed matches (RSEQ). I also made an adjustment so as to be able to replay bytes into the matcher should we need to output raw data.

The following is the LZ77 code. Hopefully the comments are sufficient for you to follow the algorithm.

// Our LZ77 compression engine
    static void do_compress(BufferedOutputStream outfile, BufferedInputStream infile, int filesize) 
            throws Throwable {
        
        int ch;
        
        // process uncompressed stream byte-by-byte
        while (filesize > 0) {
            
            // Make sure that there are bytes in the queue to work with. We process bytes from 
            //  the queue using SEQPTR. When SEQPTR reaches the INPTR then we add bytes from the input
            //  stream. The linked lists are updated. 
            if (SEQPTR == INPTR) {
                
                // obtain byte from uncompressed stream
                ch = infile.read();
                filesize--;
                
                // queue data and manage associated linked list
                DATA[INPTR] = (byte)ch;
 
                // Add byte to the head of the appropriate linked list. Note pointers are stored +1 so
                //  as to use 0 as an end of list marker. Lists are bi-directional so we can trim the 
                //  tail when data is dropped from the queue.
                int ptr = HEAD[ch];
                HEAD[ch] = INPTR + 1;
                FWD[INPTR] = ptr;
                BACK[INPTR] = 0;
                if (ptr != 0)
                    BACK[ptr - 1] = INPTR + 1;
 
                // advance entry pointer
                INPTR++;
                if (INPTR == WINDOW)
                    INPTR = 0;
 
                // drop old data from queue when the sliding window is full
                if (INPTR == OUTPTR) {
 
                    // trim linked list as byte is being dropped
                    if (BACK[OUTPTR] == 0)
                        HEAD[DATA[OUTPTR]] = 0;
                    else
                        FWD[BACK[OUTPTR] - 1] = 0;
 
                    // push end of queue
                    OUTPTR++;
                    if (OUTPTR == WINDOW)
                        OUTPTR = 0;
                }
            }
            
            // Obtain the next character to process. We are assured of a byte at SEQPTR now.
            //  SEQPTR allows us to replay bytes into the sequence matching.
            ch = DATA[SEQPTR++];
            if (SEQPTR == WINDOW)
                SEQPTR = 0;
            
            // Reset match state. These will define the best match should one be found for 
            //  the current CURPTR.
            int best_distance = 0;
            int best_length = 0;
            
            // If there are no active sequences we create a new set. This uses the linked list
            //  for the byte at CURPTR to initialize a series of potention sequence sites.
            if (MSIZE == 0) {
 
                // create new active matches for all CH in the queue (except last)
                int ptr = HEAD[ch];
                while (ptr != 0) {
                    if (ptr - 1 != CURPTR) {
                        int distance = CURPTR - ptr + 1;
                        if (distance < 0)
                            distance += WINDOW;
 
                        DISTANCE[MSIZE] = distance;
                        LENGTH[MSIZE] = 1;
                        MSIZE++;
                    }
 
                    ptr = FWD[ptr - 1];
                }
                
            }
                
            // Otherwise process the active sequence matches. Here we advance sequences as each
            //  new byte is processed. Of those matches that cannot be extended we keep the
            //  best (longest and closest to CURPTR). We will use the best match if all of the
            //  potential matches end.
            else {
                
                // each active match
                for (int n = MSIZE - 1; 0 <= n; n--) {
                    
                    int p = CURPTR - DISTANCE[n];
                    if (p < 0) p += WINDOW; p += LENGTH[n]; if (p >= WINDOW)
                        p -= WINDOW;
 
                    // Can we extend this match? If so we bump its length and move on to
                    //  the next match.
                    if (DATA[p] == ch && LENGTH[n] < 258) {
                        LENGTH[n]++;
 
                        if (DISTANCE[n] + LENGTH[n] < WINDOW && filesize > 0)
                            continue;
                    }
 
                    // Sequence did not get extended. See if it is the best found so far.
                    if (LENGTH[n] >= 3) {
                        
                        // first 
                        if (best_length == 0) {
                            best_distance = DISTANCE[n];
                            best_length = LENGTH[n];
                        }
                        
                        // longer
                        else if (LENGTH[n] > best_length) {
                            best_distance = DISTANCE[n];
                            best_length = LENGTH[n];
                        }
                        
                        // closer
                        else if (LENGTH[n] == best_length && DISTANCE[n] < best_distance) { best_distance = DISTANCE[n]; best_length = LENGTH[n]; } } // Competed matches are eliminated from the active list. To be quick we // replace it with the last in the list and reduce the count. MSIZE--; DISTANCE[n] = DISTANCE[MSIZE]; LENGTH[n] = LENGTH[MSIZE]; } } // If there are no active sequence matches at this point we can generate output. if (MSIZE == 0) { // If a we have a completed sequence we can output a pointer. These are escaped // into the output buffer for later processing into encoded length-distance // pairs. best_length = 0; if (best_length != 0) { bufwrite(0xff, outfile); bufint(best_length, outfile); bufint(best_distance, outfile); // Move CURPTR to the next byte after the replaced sequence CURPTR += best_length; if (CURPTR >= WINDOW)
                        CURPTR -= WINDOW;
                }                
 
                // Otherwise output a raw uncompressed byte. The unmatched byte is sent to
                //  the output stream and we move CURPTR to the next. 
                else {
                    bufbyte(DATA[CURPTR], outfile);
                    CURPTR++;
                    if (CURPTR == WINDOW)
                        CURPTR = 0;
                }
 
                // Here we reset SEQPTR to process from the nex CURPTR location. In the case that
                //  we could not match this replays bytes previously processed so as to not miss
                //  an opportunity.
                SEQPTR = CURPTR;
            }
        }
        
        // If we are done and there are unprocessed bytes left we push them to the output stream.
        while (CURPTR != INPTR) {
            bufbyte(DATA[CURPTR], outfile);
            CURPTR++;
            if (CURPTR == WINDOW)
                CURPTR = 0;
        }
    }

With this in place we are going to move on to see what needs to be done next with our output stream.

Our LZ77 compression routine loads an output buffer with processed and hopefully compressed data. When this output buffer fills (say to 64KB) we must process it further. That data is then compressed again using a form of Huffman coding.

Huffman coding for DEFLATE

If you search you can find lots of useful descriptions of Huffman coding. Not all of those will provide the detail for constructing the required tree from the data. Of those that do, most do not lead you to creating a Huffman table compatible with DEFLATE. This is because the Huffman table in DEFLATE is eventually stored using a form of shorthand. That is only possible if the Huffman encoding follows some addition rules. To meet those requirements we need to be careful in constructing our initial dynamic tree.

The DEFLATE specification cryptically defines it:

The Huffman codes used for each alphabet in "deflate" format have two additional rules:

   * All codes of a given bit length have lexicographically consecutive values, in the same order as the
     symbols they represet;

   * Shorter codes lexicographically precede longer codes.

It would be nice if they would avoid words like lexicographically but you can’t have everything. You can also get confused over the term codes verses the binary values of the bytes in the alphabet. And of course shorter refers to bit count. That being perhaps a little more obvious but here again these must “lexicographically precede” others.

Alphabet

This refers to the set of values that we intend to compress. Obviously this needs to include byte values (0..255) since we are not constraining our input to ASCII or something. We include all of the possible values in the alphabet (in increasing value) even if some do not appear in the data. That seems obvious but DEFLATE also defines an end-of-block code (like an EOF) of 256 as well as special codes from 257..285 used to represent length codes (in the length-distance pointers we created).

So we will need to encode bytes from 0 thru 285. Okay, That set requires 9 bits and makes life in a world of bytes difficult. Remember how I had to escape my length-distance pointers in the buffer? Anyway, we can handle it in building our trees as we can define the value of a node as an integer. So for DEFLATE our “alphabet” consists of the numbers 0..285.

Don’t be confused if you notice that length codes and also distance codes generally include some “extra” bits. They do and those are simply slipped into the bit stream and are not subjected to Huffman coding. We’ll get into that later.

Length codes

These lie outside of the normal byte values 0..255 simply because in decompression we need to recognize them. These are flagged just as I have escaped the same in the output buffer. There are 29 of the length codes which are used with extra bits in some cases to encode lengths of 3 to 258. You may recall that we did not create matching sequences of less that 3 bytes and there is a maximum of a 258 byte length. The 258 maximum I bet results from storing the length-3 as a byte (0..255) someplace. But I would be very curious as to the thought process that breaks these 256 possible lengths into 29 codes. That is likely based upon some probability distribution or some such thing. It is what it is.

Distance codes

Unlike the length codes the distance codes do not need to be flagged. We expect a distance code after a length code and so those use normal byte values already represented in the alphabet (0..29). Here there are 30 distance codes some also requiring extra bits encoding distance from 1..32768. This allows the matched sequence to sit in that 32KB sliding window.

Huffman coding compresses data by representing frequent values with a small number of bits. If a space ' ' (0x20) appears in the data a tremendous number of times it might get encoded by just 2 bits. That saving 6 bits for every occurrence of a space. That can be a huge savings. The down side is that a rare byte that might occur only a few times might be encoded by 10 bits. That actually increases the storage from the original 8-bit byte but happens only a few times.

This implies then that we know the frequencies of each member of our alphabet. That is the first step. We need to proceed to count each occurrence of each member in our alphabet that appears in the data.

Here we modify my bufflush() routine that is responsible for emptying the buffer. First we will add a routine to count. There are 286 members in the alphabet (256 byte values, the end-of-block code and 29 length codes). We create an integer array where we use the value as an index to count occurrences. There is one complication in that I need to convert my length-distance escaping into the DEFLATE encoding. That entails tables of length and distance ranges so we can decide which of the length and distance codes we need to use.

CODE: SELECT ALL

    // length code range maximums
    static int[] blen = { 
        4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17, 19, 23, 27, 31, 35, 
        43, 51, 59, 67, 83, 99, 115, 131, 163, 195, 227, 258, 259 
    };

Here for each or the 29 length ranges we specify the largest size plus 1 that it can encode. For a given length we will loop through these ranges to determine the proper alphabet value to use. The DEFLATE lengths are encoded as follows (from RFC 1951):

                 Extra               Extra               Extra
            Code Bits Length(s) Code Bits Lengths   Code Bits Length(s)
            ---- ---- ------     ---- ---- -------   ---- ---- -------
             257   0     3       267   1   15,16     277   4   67-82
             258   0     4       268   1   17,18     278   4   83-98
             259   0     5       269   2   19-22     279   4   99-114
             260   0     6       270   2   23-26     280   4  115-130
             261   0     7       271   2   27-30     281   5  131-162
             262   0     8       272   2   31-34     282   5  163-194
             263   0     9       273   3   35-42     283   5  195-226
             264   0    10       274   3   43-50     284   5  227-257
             265   1  11,12      275   3   51-58     285   0    258
             266   1  13,14      276   3   59-66

Similarly we create an array for the 30 distance codes.

CODE: SELECT ALL

    static int[] bdist = { 
        2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193, 
        257, 385, 513, 769, 1025, 1537, 2049, 3073, 4097, 6145, 
        8193, 12289, 16385, 24577, 32769 
    };

The DEFLATE specification encodes distances as follows:

                  Extra           Extra               Extra
             Code Bits Dist  Code Bits   Dist     Code Bits Distance
             ---- ---- ----  ---- ----  ------    ---- ---- --------
               0   0    1     10   4     33-48    20    9   1025-1536
               1   0    2     11   4     49-64    21    9   1537-2048
               2   0    3     12   5     65-96    22   10   2049-3072
               3   0    4     13   5     97-128   23   10   3073-4096
               4   1   5,6    14   6    129-192   24   11   4097-6144
               5   1   7,8    15   6    193-256   25   11   6145-8192
               6   2   9-12   16   7    257-384   26   12  8193-12288
               7   2  13-16   17   7    385-512   27   12 12289-16384
               8   3  17-24   18   8    513-768   28   13 16385-24576
               9   3  25-32   19   8   769-1024   29   13 24577-32768

So this buffer flush routine looks as follows. Note that we are not encoding the output in any way yet. This merely determines the counts.

    // length code range maximums
    static int[] blen = { 
        4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17, 19, 23, 27, 31, 35, 
        43, 51, 59, 67, 83, 99, 115, 131, 163, 195, 227, 258, 259 
    };
    
    static int[] bdist = { 
        2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193, 
        257, 385, 513, 769, 1025, 1537, 2049, 3073, 4097, 6145, 
        8193, 12289, 16385, 24577, 32769 
    };
 
    static void bufflush(BufferedOutputStream outfile) throws Throwable {
        
        // Determine frequecies by counting each occurrence of a byte value
        int[] freq = new int[286];
        for (int n = 0; n < BUFPTR; n++) {
            
            // Get the byte value. This may be escaped ith 0xff.
            int ch = BUFR[n] & 0xff;
            
            // not escaped
            if (ch != 0xff)
                freq[ch]++;
            
            // escaped
            else {
                
                // Get next byte.
                ch = BUFR[++n] & 0xff;
                
                // may just be 0xff itself
                if (ch == 0xff)
                    freq[0xff]++;
                
                // length-distance pair
                else {
                    
                    // obtain balance of length and distance values
                    int len = (ch << 8) + (BUFR[++n] & 0xff);
                    int dist = ((BUFR[++n] & 0xff) << 8) + (BUFR[++n] & 0xff);
                    
                    // determine length code to use (257..285)
                    for (int k = 0; k < blen.length; k++) {
                        if (len < blen[k]) {
                            freq[257 + k]++;
                            break;
                        }
                    }
                    
                    // determine distance code to use (0..29)
                    for (int k = 0; k < bdist.length; k++) {
                        if (dist < bdist[k]) {
                            freq[k]++;
                            break;
                        }
                    }
                    
                }
            }
        }
        
        // dump the results
        for (int n = 0; n < 256; n++) { if (freq[n] > 0)
                System.out.printf("0x%03x %d\n", n, freq[n]);
        }

And the unsorted results. Here we list only those values that appear in the data.

CODE: SELECT ALL

jtest jniorboot.log
0x004 1
0x00a 9
0x00b 13
0x00c 4
0x00d 5
0x00e 8
0x00f 3
0x010 4
0x011 8
0x012 2
0x013 2
0x020 41
0x028 1
0x029 1
0x02c 6
0x02d 5
0x02e 7
0x02f 3
0x030 16
0x031 12
0x032 12
0x033 6
0x034 10
0x035 7
0x036 6
0x037 4
0x038 9
0x039 9
0x03a 10
0x041 4
0x042 2
0x043 3
0x044 1
0x045 2
0x046 1
0x047 3
0x048 1
0x049 3
0x04a 1
0x04d 1
0x04e 4
0x04f 3
0x050 4
0x052 3
0x053 4
0x054 3
0x055 1
0x057 1
0x05b 1
0x05d 1
0x061 8
0x062 5
0x063 7
0x064 9
0x065 29
0x066 1
0x067 2
0x068 2
0x069 14
0x06b 2
0x06c 10
0x06d 6
0x06e 9
0x06f 17
0x070 5
0x072 17
0x073 11
0x074 15
0x075 8
0x076 4
0x077 2
0x079 5
0x07a 1
0x101 30
0x102 2
0x103 3
0x105 1
0x10c 1
0x10d 13
0x10e 2
0x10f 2
0x110 1

Processing 6.129 seconds.
Source 954 bytes.
Result 680 bytes.
Ratio 1.40:1

bruce_dev /> 

There is one omission. We will in fact use one end-of-block code (0x100) and so we will need to force it into the table.

I will need to build a tree. So we need to create some kind of node which will have left and right references as well as a potential value and a weigh (frequency). Here I will define a class.

    static ArrayList  nodes = new ArrayList(512);
    
    static class Node {
        int left;
        int value;
        int weight;
        int right;
        int length;
        int code;
        
        Node() {
        }
        
        // instantiate leaf
        Node(int val, int w) {
            value = val;
            weight = w;
        }
        
        // instantiate node
        Node(int l, int r, int w) {
            left = l;
            right = r;
            weight = w;
        }                
    }

I know that later I will assign individual leaves a code length and a code so those are included in the class as well. Now I will take our array of value frequencies and create leaves for each of those that appear in the data. An ArrayList will store these nodes and grow as we define a full tree.

I will also create an ordering array that we will use to properly constructing the tree. Each entry in this array will reference a node. Initially we will sort this by decreasing frequency. Also in keeping with the lexicographical requirement those nodes of the same frequency will be ordered by increasing alphabet value. This is where it gets tricky but we start here.

We replace the frequency dump loop with the following.

        // create node set (0 not used)
        int[] order = new int[286];
        int cnt = 0;
        nodes.add(new Node());  // not used (index 0 is terminator)
        for (int n = 0; n < 286; n++) { if (freq[n] > 0) {
                nodes.add(new Node(n, freq[n]));
                order[cnt++] = nodes.size() - 1;
            }
        }
        
        // sort
        for (int n = 0; n < cnt - 1; n++) {
            Node nd1 = nodes.get(order[n]);
            Node nd2 = nodes.get(order[n + 1]);
 
            if (nd1.weight < nd2.weight || nd1.weight == nd2.weight && nd1.value > nd2.value) {
                int k = order[n];
                order[n] = order[n + 1];
                order[n + 1] = k;
                n -= 2;
                if (n < -1)
                    n = -1;
            }
        }
        
        //dump
        System.out.println("");
        for (int n = 0; n < cnt; n++) {
            Node nd = nodes.get(order[n]);
            System.out.printf("%d 0x%03x %d\n", order[n], nd.value, nd.weight);
        }

And the results of the sort are displayed.

CODE: SELECT ALL

bruce_dev /> jtest jniorboot.log

12 0x020 41
75 0x101 30
55 0x065 29
64 0x06f 17
66 0x072 17
19 0x030 16
68 0x074 15
59 0x069 14
3 0x00b 13
80 0x10d 13
20 0x031 12
21 0x032 12
67 0x073 11
23 0x034 10
29 0x03a 10
61 0x06c 10
2 0x00a 9
27 0x038 9
28 0x039 9
54 0x064 9
63 0x06e 9
6 0x00e 8
9 0x011 8
51 0x061 8
69 0x075 8
17 0x02e 7
24 0x035 7
53 0x063 7
15 0x02c 6
22 0x033 6
25 0x036 6
62 0x06d 6
5 0x00d 5
16 0x02d 5
52 0x062 5
65 0x070 5
72 0x079 5
4 0x00c 4
8 0x010 4
26 0x037 4
30 0x041 4
41 0x04e 4
43 0x050 4
45 0x053 4
70 0x076 4
7 0x00f 3
18 0x02f 3
32 0x043 3
36 0x047 3
38 0x049 3
42 0x04f 3
44 0x052 3
46 0x054 3
77 0x103 3
10 0x012 2
11 0x013 2
31 0x042 2
34 0x045 2
57 0x067 2
58 0x068 2
60 0x06b 2
71 0x077 2
76 0x102 2
81 0x10e 2
82 0x10f 2
1 0x004 1
13 0x028 1
14 0x029 1
33 0x044 1
35 0x046 1
37 0x048 1
39 0x04a 1
40 0x04d 1
47 0x055 1
48 0x057 1
49 0x05b 1
50 0x05d 1
56 0x066 1
73 0x07a 1
74 0x100 1
78 0x105 1
79 0x10c 1
83 0x110 1

Processing 8.873 seconds.
Source 954 bytes.
Result 680 bytes.
Ratio 1.40:1

bruce_dev /> 

The first number if the node index. The next is the value and the last the count of occurrences. Here we see that the space (0x20) is the most frequent in this file. You can see that I included the end-of-block code (0x100) once and it is as infrequent as a few others.

The next step is to construct a Huffman tree. So to simplify things at this point we are going to use a simple phrase as the data and disable the LZ77 aspect. There is a Duke University page describing Huffman coding that uses the phrase “go go gophers”. For lack of anything better we will use the same.

With LZ77 disabled and running only that phrase our frequency sort yields the following.

CODE: SELECT ALL

bruce_dev /> jtest flash/gogo.dat

3 0x067 3
5 0x06f 3
1 0x020 2
2 0x065 1
4 0x068 1
6 0x070 1
7 0x072 1
8 0x073 1

Processing 0.649 seconds.
Source 13 bytes.
Result 13 bytes.
Ratio 1.00:1

bruce_dev /> 

bruce_dev /> cat flash/gogo.dat
go go gophers
bruce_dev />

We also ignore any end-of-block code.

The Duke page demonstrates that this phrase can be encoded with just 37 bits. It also demonstrates that there are multiple possible Huffman trees that can be created to yield that result. Interesting that none of them there meet the DEFLATE requirement. So I am going to determine the procedure that creates the right kind of table.

The game next is to combine pairs of leaves into nodes, And then pairs of leaves and nodes into other nodes. This procedure is repeated until there is only one at the head of the tree. Generally one is directed to combine the lowest two weighted leaves or nodes into a single node of combined weight.

Each time a node is constructed that defines a bit 0/1 for left/right for the one or two members the node contains. By working from the lowest weighted or least frequent leaves and then nodes one ends up building longer codes for that than the higher frequency values which are not touch right away.

The following procedure generates a tree (but not the kind we want just yet) using just such a procedure. The lowest two are combined and the list is resorted. We dump and repeat.

        // generate tree
        while (cnt > 1) {
            
            // take lowest weight nodes and create new
            int left = order[cnt - 2];
            int right = order[cnt - 1];
            Node nd1 = nodes.get(left);
            Node nd2 = nodes.get(right);
            nodes.add(new Node(left, right, nd1.weight + nd2.weight));
            order[cnt - 2] = nodes.size() - 1;
            cnt--;
 
            // sort
            for (int n = 0; n < cnt - 1; n++) {
                nd1 = nodes.get(order[n]);
                nd2 = nodes.get(order[n + 1]);
 
                if (nd1.weight < nd2.weight) {
                    int k = order[n];
                    order[n] = order[n + 1];
                    order[n + 1] = k;
                    n -= 2;
                    if (n < -1)
                        n = -1;
                }
            }
            
            //dump
            System.out.println("");
            for (int n = 0; n < cnt; n++) {
                Node nd = nodes.get(order[n]);
                if (nd.left == 0 && nd.right == 0)
                    System.out.printf("%d 0x%03x %d\n", order[n], nd.value, nd.weight);
                else
                    System.out.printf("%d %d-%d %d\n", order[n], nd.left, nd.right, nd.weight);
            }
        }

You can follow the procedure in the output although the resulting tree is not displayed. I might create a way to display the tree but I haven’t gone that far yet.

CODE: SELECT ALL

bruce_dev /> jtest flash/gogo.dat

3 0x067 3
5 0x06f 3
1 0x020 2
2 0x065 1
4 0x068 1
6 0x070 1
7 0x072 1
8 0x073 1

3 0x067 3
5 0x06f 3
1 0x020 2
9 7-8 2
2 0x065 1
4 0x068 1
6 0x070 1

3 0x067 3
5 0x06f 3
1 0x020 2
9 7-8 2
10 4-6 2
2 0x065 1

3 0x067 3
5 0x06f 3
11 10-2 3
1 0x020 2
9 7-8 2

12 1-9 4
3 0x067 3
5 0x06f 3
11 10-2 3

13 5-11 6
12 1-9 4
3 0x067 3

14 12-3 7
13 5-11 6

15 14-13 13

Processing 0.865 seconds.
Source 13 bytes.
Result 13 bytes.
Ratio 1.00:1

bruce_dev />

Here the new nodes display node reference indexes as left-right instead of a value.

Now with a tree built I create a recursive routine to walk the tree and assign leaves a code length and binary code. I then collect these leaves into a node array that would allow me to efficiently translate the raw data. Here that array is dumped. You will see the new code later. Here’s the table from the prior procedure.

0x020 ' ' count=2 length=3 code 000
0x065 'e' count=1 length=3 code 111
0x067 'g' count=3 length=2 code 01
0x068 'h' count=1 length=4 code 1100
0x06f 'o' count=3 length=2 code 10
0x070 'p' count=1 length=4 code 1101
0x072 'r' count=1 length=4 code 0010
0x073 's' count=1 length=4 code 0011

if we manually encode the phrase we see that it does in fact compress to just 37 bits.

g  o  ' ' g  o  ' ' g  o  p    h    e   r    s
01 10 000 01 10 000 01 10 1101 1100 111 0010 0011
total 37 bits

But this table does not meet the DEFLATE requirements and cannot fit the shorthand.

By the way here are two other tables from the Duke University page. Both of these are different yet again from our table but each gets the job done. None of these meet the DEFLATE requirement.

Now that third tree is close. But here is the tree that we need to learn to generate from this data.

Why? Because the two additional requirements are met.

First the code lengths (depth of the tree) increase from left to right.

Second, for the same code length (depth) the values increase from left to right (lexicographically).

And with this tree we can apply the required shorthand to properly to fit the DEFLATE format. The details of that shorthand I will get into.

Let’s revise the procedure to consider pairs of nodes from right to left.

Here we first determine the combined weight of the rightmost pair. We will combine that pair and any prior pair whose combined weight is less than or equal to that. We repeat this until we have only one node that being the head of the tree.

        // generate tree
        while (cnt > 1) {
            
            // determine to combined weight of the lowest two nodes
            int left = order[cnt - 2];
            int right = order[cnt - 1];
            Node nd1 = nodes.get(left);
            Node nd2 = nodes.get(right);
            int weight = nd1.weight + nd2.weight;
            
            // Now combine node pairs equal to or less than this weight from
            //  right to left.
            int pos = cnt;
            while (pos >= 2) {
                
                // Get the combined weight of the pair preceeding the pointer. We will
                //  combine the psir if its weight is less than or equal to that of 
                //  the rightmost (least) pair. We stop if not.
                left = order[pos - 2];
                right = order[pos - 1];
                nd1 = nodes.get(left);
                nd2 = nodes.get(right);
                int w = nd1.weight + nd2.weight;
                if (w > weight)
                    break;
                
                // Combine the pair and reduce teh order array.
                nodes.add(new Node(left, right, w));
                order[pos - 2] = nodes.size() - 1;
                for (int n = pos; n < cnt; n++)
                    order[pos - 1] = order[n];
                cnt--;
                
                // onto the the next prior pair
                pos -= 2;
            }
            
            //dump
            System.out.println("");
            for (int n = 0; n < cnt; n++) {
                Node nd = nodes.get(order[n]);
                if (nd.left == 0 && nd.right == 0)
                    System.out.printf("%d 0x%03x %d\n", order[n], nd.value, nd.weight);
                else
                    System.out.printf("%d %d-%d %d\n", order[n], nd.left, nd.right, nd.weight);
            }
        }

Now when this is executed we obtain a different tree. This one actually is the one we seek.

0x020 ' ' count=2 length=3 code 100
0x065 'e' count=1 length=3 code 101
0x067 'g' count=3 length=2 code 00
0x068 'h' count=1 length=4 code 1100
0x06f 'o' count=3 length=2 code 01
0x070 'p' count=1 length=4 code 1101
0x072 'r' count=1 length=4 code 1110
0x073 's' count=1 length=4 code 1111

CODE: SELECT ALL

bruce_dev /> jtest flash/gogo.dat

3 0x067 3
5 0x06f 3
1 0x020 2
2 0x065 1
4 0x068 1
6 0x070 1
7 0x072 1
8 0x073 1

3 0x067 3
5 0x06f 3
1 0x020 2
2 0x065 1
10 4-6 2
9 7-8 2

3 0x067 3
5 0x06f 3
12 1-2 3
11 10-9 4

14 3-5 6
13 12-11 7

15 14-13 13
0x020 ' ' count=2 length=3 code 100
0x065 'e' count=1 length=3 code 101
0x067 'g' count=3 length=2 code 00
0x068 'h' count=1 length=4 code 1100
0x06f 'o' count=3 length=2 code 01
0x070 'p' count=1 length=4 code 1101
0x072 'r' count=1 length=4 code 1110
0x073 's' count=1 length=4 code 1111

Processing 0.840 seconds.
Source 13 bytes.
Result 13 bytes.
Ratio 1.00:1

bruce_dev /> 

I may not be ready to claim victory here but this appears to be very promising. Perhaps we should return to a more complicated situation.

Alright so we re-enable the LZ77 compression and remove much of the dump output. When we run this on jniorboot.log we get the following table.

CODE: SELECT ALL

bruce_dev /> jtest jniorboot.log
0x004 '.' count=1 length=11 code 10011110110
0x00a '.' count=9 length=6 code 110000
0x00b '.' count=13 length=6 code 110100
0x00c '.' count=4 length=8 code 10110011
0x00d '.' count=5 length=6 code 110010
0x00e '.' count=8 length=7 code 1010001
0x00f '.' count=3 length=8 code 10011101
0x010 '.' count=4 length=8 code 10111100
0x011 '.' count=8 length=7 code 1001100
0x012 '.' count=2 length=10 code 1001111000
0x013 '.' count=2 length=10 code 1001111001
0x020 ' ' count=41 length=2 code 00
0x028 '(' count=1 length=11 code 10011110111
0x029 ')' count=1 length=11 code 10110111110
0x02c ',' count=6 length=6 code 110110
0x02d '-' count=5 length=6 code 110011
0x02e '.' count=7 length=8 code 10110001
0x02f '/' count=3 length=9 code 100111110
0x030 '0' count=16 length=5 code 11111
0x031 '1' count=12 length=7 code 1011100
0x032 '2' count=12 length=7 code 1011101
0x033 '3' count=6 length=6 code 110111
0x034 '4' count=10 length=7 code 1001001
0x035 '5' count=7 length=8 code 10110100
0x036 '6' count=6 length=7 code 1001010
0x037 '7' count=4 length=8 code 10111101
0x038 '8' count=9 length=6 code 110001
0x039 '9' count=9 length=6 code 100000
0x03a ':' count=10 length=6 code 111000
0x041 'A' count=4 length=7 code 1000100
0x042 'B' count=2 length=9 code 111010000
0x043 'C' count=3 length=9 code 100111111
0x044 'D' count=1 length=11 code 10110111111
0x045 'E' count=2 length=9 code 111010001
0x046 'F' count=1 length=10 code 1110110110
0x047 'G' count=3 length=8 code 11101110
0x048 'H' count=1 length=10 code 1110110111
0x049 'I' count=3 length=8 code 11101111
0x04a 'J' count=1 length=10 code 1011111110
0x04d 'M' count=1 length=10 code 1011111111
0x04e 'N' count=4 length=7 code 1000101
0x04f 'O' count=3 length=8 code 11101010
0x050 'P' count=4 length=9 code 101101100
0x052 'R' count=3 length=8 code 11101011
0x053 'S' count=4 length=9 code 101101101
0x054 'T' count=3 length=7 code 1000110
0x055 'U' count=1 length=10 code 1110100110
0x057 'W' count=1 length=10 code 1110100111
0x05b '[' count=1 length=10 code 1110110100
0x05d ']' count=1 length=10 code 1110110101
0x061 'a' count=8 length=7 code 1001101
0x062 'b' count=5 length=7 code 1010010
0x063 'c' count=7 length=8 code 10110101
0x064 'd' count=9 length=6 code 100001
0x065 'e' count=29 length=4 code 0110
0x066 'f' count=1 length=10 code 1110100100
0x067 'g' count=2 length=10 code 1011011100
0x068 'h' count=2 length=10 code 1011011101
0x069 'i' count=14 length=6 code 101011
0x06b 'k' count=2 length=9 code 101111100
0x06c 'l' count=10 length=6 code 111001
0x06d 'm' count=6 length=7 code 1001011
0x06e 'n' count=9 length=7 code 1010000
0x06f 'o' count=17 length=4 code 0111
0x070 'p' count=5 length=7 code 1010011
0x072 'r' count=17 length=5 code 11110
0x073 's' count=11 length=7 code 1001000
0x074 't' count=15 length=6 code 101010
0x075 'u' count=8 length=8 code 10110000
0x076 'v' count=4 length=8 code 10011100
0x077 'w' count=2 length=9 code 101111101
0x079 'y' count=5 length=8 code 10110010
0x07a 'z' count=1 length=10 code 1110100101
0x100 '.' count=1 length=10 code 1011111100
0x101 '.' count=30 length=3 code 010

Processing 10.284 seconds.
Source 954 bytes.
Result 680 bytes.
Ratio 1.40:1

bruce_dev />

Well okay. Just about all you can say is that it does appear that the stuff with the higher counts (frequency) does appear to use the shortest code lengths. Another good clue is the fact that the first alphabet entry that uses the smallest code is represented by a sequence of all zeroes.

I suppose now we get into what I have been calling the shorthand storage format for Huffman table. If this table can be so represented and the table reconstructed from that then we are good to go.

Huffman table “Shorthand”

While “shorthand” is my term and no one else’s that I’ve seen, it still refers to efficiently conveying the table. To start I have defined an entry for 285 possible codes each with a count and a binary code. Even with some cute integer packing this is still a lot of bytes. Having to pass the table with the compressed file painfully can cut into the benefit of the compression.

It turns out that if the Huffman table conforms to the two special rules it can be reconstructed from only knowing the code length for each of the alphabet members. So we don’t need to include the actual code.

Once we have the code lengths for each alphabet that gets packed further. It’s a bit crazy. The array of 285 code lengths contains a lot of repetition. This is packed using a form and run-length encoding where sequences of the same length are defined by the count (run length). Then that data is again (ugh) run through a Huffman encoding which results in just 19 bit lengths. Those are stored in a weird order which is intended to keep those alphabet members whose bit lengths are likely to be 0 near the end as trailing zeroes need not be included. So the entire Huffman table ends up being conveyed in just a handful of bytes. I guess people were really creative back then.

The procedure for reconstructing the Huffman table from the code lengths first requires a count of codes for each length. Let me add that to our table output. Here is the additional output from the execution.

Length=2 Count 1
Length=3 Count 1
Length=4 Count 2
Length=5 Count 2
Length=6 Count 13
Length=7 Count 15
Length=8 Count 14
Length=9 Count 8
Length=10 Count 15
Length=11 Count 4

From this table we can calculate the first binary code assigned to that code length group. Each alphabet member using that code length is then assigned an incremental binary value from that. I can add that calculation to the this table.

Length=2 Count 1 Start Code 00
Length=3 Count 1 Start Code 010
Length=4 Count 2 Start Code 0110
Length=5 Count 2 Start Code 10000
Length=6 Count 13 Start Code 100100
Length=7 Count 15 Start Code 1100010
Length=8 Count 14 Start Code 11100010
Length=9 Count 8 Start Code 111100000
Length=10 Count 15 Start Code 1111010000
Length=11 Count 4 Start Code 11110111110

Okay so I can see that the Huffman table generated DOES NOT conform. As an exercise you can see for yourself. So I wonder where error might be. Hmm…

CODE: SELECT ALL

bruce_dev /> jtest jniorboot.log
0x004 '.' count=1 length=11 code 10011110110
0x00a '.' count=9 length=6 code 110000
0x00b '.' count=13 length=6 code 110100
0x00c '.' count=4 length=8 code 10110011
0x00d '.' count=5 length=6 code 110010
0x00e '.' count=8 length=7 code 1010001
0x00f '.' count=3 length=8 code 10011101
0x010 '.' count=4 length=8 code 10111100
0x011 '.' count=8 length=7 code 1001100
0x012 '.' count=2 length=10 code 1001111000
0x013 '.' count=2 length=10 code 1001111001
0x020 ' ' count=41 length=2 code 00
0x028 '(' count=1 length=11 code 10011110111
0x029 ')' count=1 length=11 code 10110111110
0x02c ',' count=6 length=6 code 110110
0x02d '-' count=5 length=6 code 110011
0x02e '.' count=7 length=8 code 10110001
0x02f '/' count=3 length=9 code 100111110
0x030 '0' count=16 length=5 code 11111
0x031 '1' count=12 length=7 code 1011100
0x032 '2' count=12 length=7 code 1011101
0x033 '3' count=6 length=6 code 110111
0x034 '4' count=10 length=7 code 1001001
0x035 '5' count=7 length=8 code 10110100
0x036 '6' count=6 length=7 code 1001010
0x037 '7' count=4 length=8 code 10111101
0x038 '8' count=9 length=6 code 110001
0x039 '9' count=9 length=6 code 100000
0x03a ':' count=10 length=6 code 111000
0x041 'A' count=4 length=7 code 1000100
0x042 'B' count=2 length=9 code 111010000
0x043 'C' count=3 length=9 code 100111111
0x044 'D' count=1 length=11 code 10110111111
0x045 'E' count=2 length=9 code 111010001
0x046 'F' count=1 length=10 code 1110110110
0x047 'G' count=3 length=8 code 11101110
0x048 'H' count=1 length=10 code 1110110111
0x049 'I' count=3 length=8 code 11101111
0x04a 'J' count=1 length=10 code 1011111110
0x04d 'M' count=1 length=10 code 1011111111
0x04e 'N' count=4 length=7 code 1000101
0x04f 'O' count=3 length=8 code 11101010
0x050 'P' count=4 length=9 code 101101100
0x052 'R' count=3 length=8 code 11101011
0x053 'S' count=4 length=9 code 101101101
0x054 'T' count=3 length=7 code 1000110
0x055 'U' count=1 length=10 code 1110100110
0x057 'W' count=1 length=10 code 1110100111
0x05b '[' count=1 length=10 code 1110110100
0x05d ']' count=1 length=10 code 1110110101
0x061 'a' count=8 length=7 code 1001101
0x062 'b' count=5 length=7 code 1010010
0x063 'c' count=7 length=8 code 10110101
0x064 'd' count=9 length=6 code 100001
0x065 'e' count=29 length=4 code 0110
0x066 'f' count=1 length=10 code 1110100100
0x067 'g' count=2 length=10 code 1011011100
0x068 'h' count=2 length=10 code 1011011101
0x069 'i' count=14 length=6 code 101011
0x06b 'k' count=2 length=9 code 101111100
0x06c 'l' count=10 length=6 code 111001
0x06d 'm' count=6 length=7 code 1001011
0x06e 'n' count=9 length=7 code 1010000
0x06f 'o' count=17 length=4 code 0111
0x070 'p' count=5 length=7 code 1010011
0x072 'r' count=17 length=5 code 11110
0x073 's' count=11 length=7 code 1001000
0x074 't' count=15 length=6 code 101010
0x075 'u' count=8 length=8 code 10110000
0x076 'v' count=4 length=8 code 10011100
0x077 'w' count=2 length=9 code 101111101
0x079 'y' count=5 length=8 code 10110010
0x07a 'z' count=1 length=10 code 1110100101
0x100 '.' count=1 length=10 code 1011111100
0x101 '.' count=30 length=3 code 010

Length=2 Count 1 Start Code 00
Length=3 Count 1 Start Code 010
Length=4 Count 2 Start Code 0110
Length=5 Count 2 Start Code 10000
Length=6 Count 13 Start Code 100100
Length=7 Count 15 Start Code 1100010
Length=8 Count 14 Start Code 11100010
Length=9 Count 8 Start Code 111100000
Length=10 Count 15 Start Code 1111010000
Length=11 Count 4 Start Code 11110111110

Processing 10.395 seconds.
Source 954 bytes.
Result 680 bytes.
Ratio 1.40:1

bruce_dev /> 

Yeah I had at least one glitch but trying to generate the precise tree appropriate for the DEFLATE “shorthand” still eludes me. The search engines these days are much less effective for locating useful technical information than for finding ways to separate me from my money. It seems easier to reinvent the wheel and devise my own algorithm even though I know that a simple procedure is likely documented in numerous pages on the net.

It strikes me that all we need to do is determine the optimum bit length for each of the used alphabet members. It is almost irrelevant as to where in a tree a particular member ends up. Once we have the proper bit length a tree meeting the DEFLATE requirements can be directly created.

Perhaps the simple procedure for generating a valid Huffman tree ignoring the DEFLATE requirements can be employed and without actually building a tree structure. Note that when two leaves are combined you are simply assigning another bit to them regardless of which gets ‘0’ and which gets ‘1’. The bit length is incremented for the two leaves as it is combined into a node. In fact when you combine two nodes you need only increment the bit length (depth) for all of the members below it. So in creating a node I need only keep track of all of the leaves below it. A simple linked list suffices.

Such an implementation need not even retain intermediate nodes. You just need to maintain the node membership list. You need that so you can advance the bit count for all of the leaves below as the node is combined.

Maybe you follow me to this point or maybe not. I’ll go ahead an try an implementation.

Okay this new approach is golden! And its fast! Oh and I don’t need to build any damn tree!

    static void bufflush(BufferedOutputStream outfile) throws Throwable {
        
        // Determine frequecies by counting each occurrence of a byte value. 
        //  Here we force the end-of-block code that we know we will use.
        int[] sym_cnt = new int[286];
//        sym_cnt[0x100] = 1;
        
        for (int n = 0; n < BUFPTR; n++) {
            
            // Get the byte value. This may be escaped with 0xff.
            int ch = BUFR[n] & 0xff;
            
            // not escaped
            if (ch != 0xff)
                sym_cnt[ch]++;
            
            // escaped
            else {
                
                // Get next byte.
                ch = BUFR[++n] & 0xff;
                
                // may just be 0xff itself
                if (ch == 0xff)
                    sym_cnt[0xff]++;
                
                // length-distance pair
                else {
                    
                    // obtain balance of length and distance values
                    int len = (ch << 8) + (BUFR[++n] & 0xff);
                    int dist = ((BUFR[++n] & 0xff) << 8) + (BUFR[++n] & 0xff);
                    
                    // determine length code to use (257..285)
                    for (int k = 0; k < blen.length; k++) {
                        if (len < blen[k]) {
                            sym_cnt[257 + k]++;
                            break;
                        }
                    }
                    
                    // determine distance code to use (0..29)
                    for (int k = 0; k < bdist.length; k++) {
                        if (dist < bdist[k]) {
                            sym_cnt[k]++;
                            break;
                        }
                    }
                    
                }
            }
        }
        
        // Create node list containing symbols in our alphabet that are found in the
        //  data. This will be sorted and used to assign bit lengths. Note list pointers
        //  are stored +1 to reserve 0 as a list terminator.
        int[] nodes = new int[286];
        int[] cnts = new int[286];
        int nodecnt = 0;
        for (int n = 0; n < 286; n++) { if (sym_cnt[n] > 0) {
                nodes[nodecnt] = n + 1;
                cnts[nodecnt] = sym_cnt[n];
                nodecnt++;                
            }
        }
        
        // Determine optimal bit lengths. Here we initialize a bit length array and a
        //  node membership list pointer array. These will be used as we generate
        //  the detail required for Huffman coding.
        int[] sym_len = new int[286];
        int[] sym_ptr = new int[286];
        
        // Perform Huffman optimization. This loops until we've folded all the leaves
        //  into a single head node.
        while (nodecnt > 1) {
            
            // The leaves are sorted by decreasing frequency (counts).
            for (int n = 0; n < nodecnt - 1; n++) {
                if (cnts[n] < cnts[n + 1]) { int k = nodes[n]; nodes[n] = nodes[n + 1]; nodes[n + 1] = k; k = cnts[n]; cnts[n] = cnts[n + 1]; cnts[n + 1] = k; if (n > 0)
                        n -= 2;
                }
            }
 
            // The last two leaves/nodes have the lowest frequencies and are to 
            //  be combined. Here we increment the bit lengths for each and 
            //  merge leaves into a single list of node members.
            int ptr = nodes[nodecnt - 2];
            int add_ptr = nodes[nodecnt - 1];
            while (ptr > 0) {
                sym_len[ptr - 1]++;
                int p = sym_ptr[ptr - 1];
                if (p == 0 && add_ptr > 0) {
                    sym_ptr[ptr - 1] = add_ptr;
                    p = add_ptr;
                    add_ptr = 0;
                }
                ptr = p;
            }
            
            // Combine the last two nodes by adding their frequencies and dropping 
            //  the last.
            cnts[nodecnt - 2] += cnts[nodecnt - 1];
            nodecnt--;
            
        }
        
        // dump nonzero bit lengths
        for (int n = 0; n < 286; n++) { if (sym_len[n] > 0)
                System.out.printf("0x%03x '%c' count=%d optimal bits %d\n", n,
                        n >= 0x20 && n < 0x7f ? n : '.', sym_cnt[n], sym_len[n]);
        }
        
        outfile.write(BUFR, 0, BUFPTR);
        BUFPTR = 0;        
    }

Running this on the “go go gophers ” test string again with LZ77 disabled yields the desired results.

bruce_dev /> jtest flash/gogo.dat
0x020 ' ' count=2 optimal bits 3
0x065 'e' count=1 optimal bits 3
0x067 'g' count=3 optimal bits 2
0x068 'h' count=1 optimal bits 4
0x06f 'o' count=3 optimal bits 2
0x070 'p' count=1 optimal bits 4
0x072 'r' count=1 optimal bits 4
0x073 's' count=1 optimal bits 4

Processing 0.540 seconds.
Source 13 bytes.
Result 13 bytes.
Ratio 1.00:1

bruce_dev />

Now I can use this to generate the DEFLATE compatible Huffman table.

If you multiply the frequency (count) times the bit length and add them for this example you get 37 bits which we know is the optimal for this example.

The code with comments above should be reasonably understandable. If there are any questions I can describe what is going on. But basically the routine that counts occurrences of each alphabet symbol is as it was before. Unfortunately that is complicated a bit as I have to process the length-distance pointers to determine the encoding for the tally.

Next we create a list of leaves so we can combine the two with the lowest frequency of occurrence. Here’s where we simply increment the bit lengths for node members. The leaf list distills down as it would for the traditional Huffman process.

Next I can reinstate the LZ77 compressor and run this on real data. Then we can then take the process further.

We can now use the procedure detailed in the DEFLATE specification to convert assigned bit lengths into binary codes for compression. We don’t need to build a tree.

First we tally the number of symbols in our alphabet the use each bit length.

        // count the occurrence of each bit length
        int[] bits = new int[19];
        for (int n = 0; n < 286; n++) { if (sym_len[n] > 0)
                bits[sym_len[n]]++;
        }

With this we can calculate the starting binary code for each bit length. Basically for each bit length we reserve N codes and use the next as a prefix for subsequent bit lengths.

        // determine starting bit code for each bit length
        int[] start = new int[19];
        int c = 0;
        for (int n = 0; n < 19; n++) {
            start[n] = c;
            c = (c + bits[n]) << 1;
        }

This gives us the correct first codes as we see here.

bruce_dev /> jtest flash/gogo.dat
bit length 2 count 2 first code 00
bit length 3 count 2 first code 100
bit length 4 count 4 first code 1100

Now we use these starting codes in assigning the binary codes to each symbol.

        // assign codes to used alphabet symbols
        int[] code = new int[286];
        for (int n = 0; n < 286; n++) { if (sym_len[n] > 0) 
                code[n] = start[sym_len[n]]++;
        }

This results are displayed by the attached program.

CODE: SELECT ALL

bruce_dev /> jtest flash/gogo.dat
bit length 2 count 2 first code 00
bit length 3 count 2 first code 100
bit length 4 count 4 first code 1100

0x020 ' ' count=2 optimal bits 3 100
0x065 'e' count=1 optimal bits 3 101
0x067 'g' count=3 optimal bits 2 00
0x068 'h' count=1 optimal bits 4 1100
0x06f 'o' count=3 optimal bits 2 01
0x070 'p' count=1 optimal bits 4 1101
0x072 'r' count=1 optimal bits 4 1110
0x073 's' count=1 optimal bits 4 1111

Processing 0.654 seconds.
Source 13 bytes.
Result 13 bytes.
Ratio 1.00:1

bruce_dev />

This gives us all of the codes that we need to compress our data. In this case it is for the example string “go go gophers”. Happily we did not need to build any tree structure. And, this Huffman coding is compatible with the DEFLATE specification. We can move forward with the shorthand.

Curious? Here’s what I get using jniorboot.log. The content of that file has changed by the way as I have rebooted the JNIOR during the course of this topic. Here the LZ77 compression has also been re-enabled. the program however does not yet generate the bit stream compressed with these Huffman codes.

CODE: SELECT ALL

bruce_dev /> jtest jniorboot.log
bit length 4 count 3 first code 0000
bit length 5 count 8 first code 00110
bit length 6 count 20 first code 011100
bit length 7 count 22 first code 1100000
bit length 8 count 11 first code 11101100
bit length 9 count 18 first code 111101110

0x004 '.' count=1 optimal bits 9 111101110
0x00a '.' count=8 optimal bits 6 011100
0x00b '.' count=12 optimal bits 5 00110
0x00c '.' count=5 optimal bits 7 1100000
0x00d '.' count=5 optimal bits 7 1100001
0x00e '.' count=6 optimal bits 6 011101
0x00f '.' count=3 optimal bits 8 11101100
0x010 '.' count=5 optimal bits 7 1100010
0x011 '.' count=7 optimal bits 6 011110
0x012 '.' count=4 optimal bits 7 1100011
0x013 '.' count=4 optimal bits 7 1100100
0x020 ' ' count=41 optimal bits 4 0000
0x028 '(' count=1 optimal bits 9 111101111
0x029 ')' count=1 optimal bits 9 111110000
0x02c ',' count=6 optimal bits 6 011111
0x02d '-' count=5 optimal bits 7 1100101
0x02e '.' count=7 optimal bits 6 100000
0x02f '/' count=3 optimal bits 8 11101101
0x030 '0' count=18 optimal bits 5 00111
0x031 '1' count=15 optimal bits 5 01000
0x032 '2' count=9 optimal bits 6 100001
0x033 '3' count=8 optimal bits 6 100010
0x034 '4' count=9 optimal bits 6 100011
0x035 '5' count=5 optimal bits 7 1100110
0x036 '6' count=4 optimal bits 7 1100111
0x037 '7' count=5 optimal bits 7 1101000
0x038 '8' count=7 optimal bits 6 100100
0x039 '9' count=8 optimal bits 6 100101
0x03a ':' count=9 optimal bits 6 100110
0x041 'A' count=4 optimal bits 7 1101001
0x042 'B' count=2 optimal bits 8 11101110
0x043 'C' count=2 optimal bits 8 11101111
0x044 'D' count=1 optimal bits 9 111110001
0x045 'E' count=2 optimal bits 8 11110000
0x046 'F' count=1 optimal bits 9 111110010
0x047 'G' count=3 optimal bits 7 1101010
0x048 'H' count=1 optimal bits 9 111110011
0x049 'I' count=2 optimal bits 8 11110001
0x04a 'J' count=1 optimal bits 9 111110100
0x04d 'M' count=1 optimal bits 9 111110101
0x04e 'N' count=4 optimal bits 7 1101011
0x04f 'O' count=3 optimal bits 7 1101100
0x050 'P' count=5 optimal bits 7 1101101
0x052 'R' count=3 optimal bits 7 1101110
0x053 'S' count=4 optimal bits 7 1101111
0x054 'T' count=3 optimal bits 7 1110000
0x055 'U' count=1 optimal bits 9 111110110
0x057 'W' count=1 optimal bits 9 111110111
0x05b '[' count=1 optimal bits 9 111111000
0x05d ']' count=1 optimal bits 9 111111001
0x061 'a' count=8 optimal bits 6 100111
0x062 'b' count=5 optimal bits 7 1110001
0x063 'c' count=7 optimal bits 6 101000
0x064 'd' count=9 optimal bits 6 101001
0x065 'e' count=29 optimal bits 4 0001
0x066 'f' count=1 optimal bits 9 111111010
0x067 'g' count=2 optimal bits 8 11110010
0x068 'h' count=2 optimal bits 8 11110011
0x069 'i' count=14 optimal bits 5 01001
0x06b 'k' count=2 optimal bits 8 11110100
0x06c 'l' count=10 optimal bits 6 101010
0x06d 'm' count=6 optimal bits 6 101011
0x06e 'n' count=9 optimal bits 6 101100
0x06f 'o' count=17 optimal bits 5 01010
0x070 'p' count=5 optimal bits 7 1110010
0x072 'r' count=17 optimal bits 5 01011
0x073 's' count=11 optimal bits 6 101101
0x074 't' count=15 optimal bits 5 01100
0x075 'u' count=8 optimal bits 6 101110
0x076 'v' count=4 optimal bits 7 1110011
0x077 'w' count=2 optimal bits 8 11110101
0x079 'y' count=5 optimal bits 7 1110100
0x07a 'z' count=1 optimal bits 9 111111011
0x100 '.' count=1 optimal bits 9 111111100
0x101 '.' count=28 optimal bits 4 0010
0x102 '.' count=6 optimal bits 6 101111
0x103 '.' count=2 optimal bits 8 11110110
0x105 '.' count=1 optimal bits 9 111111101
0x10d '.' count=13 optimal bits 5 01101
0x10e '.' count=4 optimal bits 7 1110101
0x10f '.' count=1 optimal bits 9 111111110
0x110 '.' count=1 optimal bits 9 111111111

Processing 9.181 seconds.
Source 954 bytes.
Result 680 bytes.
Ratio 1.40:1

bruce_dev /> 

With the Huffman coding for DEFLATE there is one thing that we need to worry about. We need to limit the bit length to a maximum of 15. It is not very likely to occur I suspect. But this is because the bit length list for the alphabet is run-length encoded using a procedure that can only handle bit lengths of 0 to 15. Codes of 16, 17 and 18 are used to signal certain types of repetition. This is where I got the ’19’ I use in my breadboard code to dimension the bit length arrays. That need only be ’16’ as the 3 additional are repetition codes. If the Huffman coding results in a bit length exceeding 15 I will simply have to decrease the size of the block we are encoding until we are good to go.

So now that we have the ability to reasonably compress our data using LZ77 and then to compress it further with Huffman coding, we are ready to generate the DEFLATE formatted payload. It is time to tackle the “craziness” and “shorthand” that I have referred to. We can apply the run-length encoding and the second iteration of Huffman coding that are required. We are ready to generate the DEFLATE format compressed bit stream.

JANOS has been able to process JAR/ZIP files since early in its development. This was required to meet our goal of executing Java directly from the JAR files generated by the compiler. So we have been able to decipher the DEFLATE format. I just hadn’t needed to generate it. But the advantage to this is that there is already proven code parsing the DEFLATE structure. Referring to that helps to remove any question when trying to figure out how to generate such a structure.

Rather than drop this topic now that I have the LZ77 and Huffman procedures that I need, I’ll take a moment to review the final steps. Let me see if I can clarify some of it here. For our Java breadboard to produce something usable I would not only have to complete the DEFLATE programming but also encapsulate the result in the JAR/ZIP library format. That’s more effort than I need given that I will be doing just that at the C level within the OS and I need to get to that soon.

DEFLATE Formatted Data

When file data is compressed using DEFLATE and included in a JAR/ZIP library it is represented as a bit stream. That sounds simple enough but because our micro-controller works with bytes and retrieves bytes from a file data stream we need to be concerned with bit order and byte order.

Normally bits in a byte are ordered from right to left with the least significant bit (the first in the bit stream) on the right. Like this:

+--------+
|76543210|
+--------+

The 9th bit in the stream then comes from the next byte and bytes are in sequence in increasing memory addresses (or increasing file position). This order of bits seems only natural as it should.

So if we were to retrieve a 5-bit value from the stream we would obtain the right 5 bits from the first byte using the mask 0x1f. Placing that in a byte of its own would give us the numeric value of the 5-bit data element. The next 5-bit element would use the remaining 3 bits in the first byte and 2 from the right side of the next. We would likely be using a right shift before applying the mask to pull those together.

Huffman codes will seem to contradict this. These codes are packed starting with the most-significant bit of the code. In other words the most significant bit of the first Huffman code would be found in the rightmost bit of the first byte. Once you realize that you must process Huffman codes a bit at a time and that you are reading single bit data elements this order makes sense. Pointing out that the Huffman code appears in the byte in reverse order serves to confuse us. But you are reading it a single bit at a time using each bit to decide which direction to descend through the Huffman tree. That means that you need the code’s most-significant bit first. We also never know in advance how many bits we are going to need to reach the first leaf of the tree and thus our first encoded member of the alphabet.

Block Format

The DEFLATE bit stream is broken down into a series of 1 or more blocks of bits. In our case when we flush our 64KB LZ77 data buffer we are going to construct a single block. Since we are compressing it you would expect that it will contain much less than 512Kb (64KB x 8 bits). For large file we will likely need to flush our buffer multiple times creating a stream with multiple blocks.

Each block contains a 3-bit header. The first bit is called BFINAL and it is a 1 only for the last block in the stream. The next 2 bits are called BTYPE and these define the data encoding for this block. That means that we could use a different encoding for each block if we felt it to be beneficial. The 2 BTYPE bits gives us 4 options.

  00 - no compression
  01 - compressed with fixed Huffman codes
  10 - compressed with dynamic Huffman codes
  11 - reserved (error)

We have been working toward being able to generate blocks of BTYPE 10. We could have used the predefined Huffman tables in BTYPE 01 or not even bothered to compress using BTYPE 00. There may be times when our compression fails to reduce the size of our file data. IN that case we could decide to include a block without compressing. But for now we will concern ourselves with BTYPE 10 that includes dynamic (not adaptive) Huffman tables (tables that change block to block).

So to start this defines our block so far. I’ll show the stream progressing from left to right with the number of bits for each element shown in parentheses.

 Bit 0         1         2         3         4         5         6         7
+---------+---------+---------+---------+---------+---------+---------+---------
| BFINAL  |     BTYPE (2)     |        Balance of Block . . .
+---------+---------+---------+---------+---------+---------+---------+---------

BFINAL set on the last block in the stream.

Now logically we know that somehow we need to get the Huffman table before we see compressed data so we can decompress that data. We also have seen that the Huffman table can be defined knowing the bit lengths for each of the symbols in our alphabet. In fact we expect that, since I made a big deal about having the right kind of Huffman table for DEFLATE that can be generated from just the bit lengths. So we are looking for that array of bit lengths. Here’s where the fun begins.

To start we know that not all of the 286 members of the alphabet will be used in the data. Some entries in that table will be assigned a 0 bit length. We have to assume that the majority of the literal bytes (0..255) will be represented in the data. We also know that the end-of-block code (257) will appear once. So we need an array of bit lengths at least 257 entries long. Beyond that we don’t know how many of the sequence length codes (257..285) will be used. But if some of the trailing codes aren’t used then the array doesn’t need to be 286 entries long. We just need the non-zero bit lengths. So this array will be 257 plus however many length codes we need to cover all of the non-zero ones.

The next element in the bit stream is HLIT. This is a 5-bit element when added to 257 defines the count of entries in the bit length array that will be provided. We will assume that the reset are 0. Since there are 29 length codes beyond the end-of-block code we need only know how many of those to know the size of the array. That can be passed in a 5-bit element.

       Bit 0         1         2         3         4         5         6         7
      +---------+---------+---------+---------+---------+---------+---------+---------+-----
      | BFINAL  |     BTYPE (2)     |                     HLIT (5)                    |
      +---------+---------+---------+---------+---------+---------+---------+---------+-----

HLIT + 257 tells us how many of the 286 bit lengths will be provided. But, don’t expect that those will be forthcoming. At least not right away.

Next comes something that the breadboard program handled incorrectly. The distance codes for the length-distance pointers are Huffman coded using their own table. The length codes are Huffman coded using the same table as the literal data. This makes sense since when you retrieve the next code you don’t know if it is a literal or a length code. You do know that after the length code (and any extra bits) comes the distance code. So compressing those with their own table is a benefit. The DEFLATE specification (RFC 1951) states this but it isn’t all that obvious.

So wait! Now we need 2 Huffman tables and therefore 2 arrays of bit lengths. Yes we do.

Next we receive a 5-bit data element containing HDIST – 1. This defines the number of bit lengths to be supplied for the distance alphabet and the second Huffman table. The specification shows that there are 30 distance codes (0..29) but refers to 32 distance codes. It states that codes 30 and 31 will never occur in the data. These are perhaps reserved to allow for a larger sliding window in the future. There is also the need for a 0 distance code which would be used as a flag to indicate that no distance codes are used at all and that the data is all literals. So to pass a bit length array size of up to 33 the value is stored -1.

       Bit 0         1         2         3         4         5         6         7
      +---------+---------+---------+---------+---------+---------+---------+---------+-----
      | BFINAL  |     BTYPE (2)     |                     HLIT (5)                    |
      +---------+---------+---------+---------+---------+---------+---------+---------+-----
           8         9        10        11        12        13        14        15
 -----+---------+---------+---------+---------+---------+---------+---------+---------+-----
      |                    HDIST (5)                    |
 -----+---------+---------+---------+---------+---------+---------+---------+---------+-----

Now we know the length of two arrays defining bit lengths for two alphabets. I had alluded to the fact that we would again use Huffman coding to compress the bit length array data. That is the case and so we need yet another Huffman table and therefore a third array of bit lengths. Will it never end??

The bit length alphabet includes 3 codes for repetition for total of 19 codes. The size of the bit length array for this is conveyed in a 4-bit data element HCLEN – 4. Note though that this array defines bit lengths for the codes in a very specific order. This was devised to keep those codes that typically have 0 bit length at the end of the array so they can be omitted. The order of the codes is as follows:

16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15

That means that we will have to arrange the bit lengths in this order before deciding how many to include in our stream. When reading these we would shuffle them into the proper location.

Our bit stream now looks like this.

       Bit 0         1         2         3         4         5         6         7
      +---------+---------+---------+---------+---------+---------+---------+---------+-----
      | BFINAL  |     BTYPE (2)     |                     HLIT (5)                    |
      +---------+---------+---------+---------+---------+---------+---------+---------+-----
           8         9        10        11        12
 -----+---------+---------+---------+---------+---------+-----
      |                    HDIST (5)                    |
 -----+---------+---------+---------+---------+---------+-----
          13        14        15        16
 -----+---------+---------+---------+---------+-----
      |               HCLEN (4)               |
 -----+---------+---------+---------+---------+-----

Well there is a lot of information so far conveyed in just 2 bytes. Next we will finally receive some bit length data. Following this we are supplied HCLEN + 4 data elements each 3-bits (0..15) defining the code lengths for the Huffman table that we will use to generate bit length arrays for the other tables. Note that these are in the predefined order and must be sorted into the bit lengths for the 19 symbol alphabet. Now there are a variable number of these data elements and so I will no longer be able to number the bits nor can I show them all. HCLEN + 4 data elements however many is required follow in the stream.

          13        14        15        16        17        18        19   . . . 
 -----+---------+---------+---------+---------+---------+---------+---------+-----
      |               HCLEN (4)               |          CODE16 (3)         |
 -----+---------+---------+---------+---------+---------+---------+---------+-----

 -----+---------+---------+---------+---------+---------+---------+-----
      |          CODE17 (3)         |          CODE18 (3)         |
 -----+---------+---------+---------+---------+---------+---------+-----

 -----+---------+---------+---------+---------+-----
      |          CODE0 (3)          |       . . . 
 -----+---------+---------+---------+---------+-----

We now have a bit length array for our 19 symbol alphabet. At least we are familiar with this from the breadboard program. We can use the procedure to generate the Huffman code table. First we count the number of times each bit length is used. Then we calculate a starting code for each length. And then we assign sequential codes to the alphabet for each bit length group. So we can decipher Huffman codes. It’s a good thing too because now what follows in the bit stream are Huffman codes from this table.

Keeping up? We are talking about what is involved in decompressing a DEFLATE stream but really we are doing this so we know what to do when we compress our own data. So now is the time to consider dropping some bread crumbs because if you are going to create a DEFLATE stream you will need to find your way back through this.

Supposedly at this point we can read Huffman codes, locate the symbol they represent and retrieve the bit length array data we were originally looking for. Well almost. While some of the values we obtain will be bit lengths for the next entry in a bit length array there are 3 special codes each requiring a different action. The specification describe them as follows.

               0 - 15: Represent code lengths of 0 - 15
                   16: Copy the previous code length 3 - 6 times.
                       The next 2 bits indicate repeat length
                             (0 = 3, ... , 3 = 6)
                          Example:  Codes 8, 16 (+2 bits 11),
                                    16 (+2 bits 10) will expand to
                                    12 code lengths of 8 (1 + 6 + 5)
                   17: Repeat a code length of 0 for 3 - 10 times.
                       (3 bits of length)
                   18: Repeat a code length of 0 for 11 - 138 times
                       (7 bits of length)

Here we see that when we encounter one of the codes 16, 17 or 18 we are required to pull 2, 3 or 7 additional bits respectively from the bit stream which define a repeat count.

First we will receive Huffman codes to define HLIT + 257 bit lengths for the literal Huffman table. Then we will receive codes defining HDIST + 1 bit lengths for the distance Huffman table. But don’t think the fun ends here.

HLIT and HDIST do not define the count of Huffman codes that follow. If you obtain a code that repeats a value that counts for that many bit lengths. That perhaps makes sense. But to just make things a little trickier, once you acquire the HLIT + 257 codes you immediately start defining the HDIST + 1 codes even if you are performing repetition. Yeah, a single repeat code can take you from one table to the next. If you are repeating some 0 bit lengths trailing in the HLIT bit length array you would just keep going to define any 0 bit lengths in the first part of the HDIST array. The specification says “code lengths form a single sequence of HLIT + HDIST + 258 values.”

When you are generating these Huffman codes of course you don’t have to force it to be a single sequence. You might just be wasting a few bits. Today that’s not a big deal but it certainly must have been 40 years ago.

So start pulling Huffman codes. Remember you process these 1 bit at a time so you are starting with the most-significant bit of some code. With each bit you are either descending through a tree looking for a leaf and the corresponding symbol or otherwise collecting the code looking for a match in a code list. The former is faster but the latter easier to structure in memory (no tree). You proceed to process each symbol to define a bit length or repetition code.

Now you have 2 tables of bit lengths and you know how to generate the Huffman codes for the associated alphabets. What follows next in our bit stream are the actual Huffman codes for the compressed data block. Each Huffman code will either define a literal value (byte) or a length code. The byte you just push into your uncompressed stream and the sliding window. In the case of a length code you would retrieve the additional bits defining the length of a matched sequence. You would then use the distance Huffman table for the next code that together with extra bits defines a distance back into the sliding window. Push the referenced string into your uncompressed stream and the sliding window. This is repeated until you encounter the end-of-block code (256). If BFINAL was set for the block you can save your now uncompressed data and you are done. Otherwise another block will follow.

Now we follow this logic backwards to figure out for our own compression effort what we need to do to generate the proper DEFLATE format for our data.

Okay, feel free to post questions, comments, corrections or whatever. I would be curious to know if I have helped anyone. I have written these posts as a form of review and preparation for myself. I am now ready to generate some C code in JANOS to make perhaps broad use of DEFLATE.

  • This will allow the existing JAR/ZIP JANOS command to create or modify compressed file collections. That will be helpful in applications that generate massive amounts of log data.
  • It will let JANOS create PNG graphics using drawing and plotting commands. This will allow us to easily display the data acquired by monitoring applications.
  • The WebServer can utilize DEFLATE to more efficiently transfer content. It can really help here as we already serve files (the DCP for example) directly out of a ZIP library. Whereas we presently decompress those files and send uncompressed content, the WebServer could forward the DEFLATE formatted data directly providing a bandwidth benefit.

The JANOS WebServer uniquely can locate and serve content directly from ZIP file collections. Generally files in a ZIP collection are compressed in DEFLATE format already. The WebServer can detect a browser’s ability to accept content in DEFLATE format directly and transfer the compressed content directly. Why spend the time to decompress before transfer?

Since this does not involve a DEFLATE compressor it was quick to implement. Starting with JANOS v1.6.4 (now in Beta) the WebServer will utilize DEFLATE content encoding when transferring files already compressed in that format provided that the browser can accept it. It works nicely.

I’ve been busy extending the JAR/ZIP command in JANOS to allow you to create, update and freshen an archive file. The first step was to just store files uncompressed. Getting all of that command line logic straight is enough of a headache. Once that was behind me I was ready to implement DEFLATE.

One difference in the JANOS implementation from the approach taken earlier in this thread is that I am going to work with buffers as opposed to a byte stream. Since the JNIOR uses a 64MB heap and generally consumes only a few MB of it I can load an entire file’s content in a memory buffer. Yeah, files on the JNIOR aren’t very large. This eliminates the queue approach to the sliding window. That helps with matching as it eliminates any need to test pointers for wrap around.

Where I used a bidirectional linked list before tracking occurrences of each byte value in the sliding window, I have gravitated to a series of queues tracking the last 256 matching bytes (or fewer if that be the case) in the 32KB preceding window. There is also no need now to keep any array of active matches since we are not streaming. So a pass through the appropriate queue for the current byte generally delivers a usable match and limits the search times in interest of speed. I get results within 2% or so of what the PC generates for the same file. This is certainly acceptable for JANOS.

I use the code length generator algorithm that I devised previously so as to not have to generate any Huffman tree physically. This for certain test files tended to hit the DEFLATE maximum code length limits. So I will describe a variation that seems to avoid that problem.

I will go through the implementation details here over the next day or so and maybe share some of the code. Remember that I am writing in C so…

DEFLATE Compressor Implementation (C Language)

The goal is to compress using DEFLATE a byte buffer filled with the entire contents of a file or other data. I want a simple function call like this:

FZIP_deflate(fbuff, filesize, &cbuff, &csize)

This function needs to return two things, a buffer containing the DEFLATE formatted bit stream and the compressed length. There are other ways to return these parameters but for JANOS which has to remain thread-safe (cannot use static variables) this works. This routine returns TRUE if we can successfully compress the data. It might return FALSE if in compressing the results we decide it isn’t going to be worth it. This function is used as follows in the JAR/ZIP command in JANOS.

				// optionally compress
				compressed = FALSE;
				compsize = filesize;
				if (filesize >= 128)
				{
					char *cbuff = NULL;
					uint32_t csize;
 
					// compress the content
					if (FZIP_deflate(fbuff, filesize, &cbuff, &csize))
					{
						compressed = TRUE;
						compsize = csize;
						MM_free(fbuff);
						fbuff = cbuff;
					}
					else
						MM_free(cbuff);
				}

Here we see that successful compression replaces the uncompressed buffer and modifies the compressed data size. It sets the compressedflag so the file can be properly saved in the JAR or ZIP archive. Like magic we have a compressed file!

Preface

Why am I doing this? I mean there is code out there and people have been able to construct archives and compress files literally for multiple decades. Why reinvent the wheel?

Well there are multiple reasons. First is a design goal for JANOS. This operating system uses no third-party written code. That sounds crazy but what it means is that there is no bug or performance that we cannot correct, change, improve or alter. And, this can be done quickly and in a timely fashion. Every bit of code is understood, clearly written and documented. It is written for a single target and not littered with conditional compilation blocks which obfuscate everything. If you support an operating system you might see how you could be envious of this.

Another reason is educational. Now that maybe is selfish but if I am going to be able to fully debug something I need to fully understand it. We cannot tolerate making what seems like a simple bug correction which later turns out to break some other part of the system. The only way to guarantee that this risk is minimized is for me to know everything that is going on and exactly what is going on. Yeah there is a JVM in here. Yeah it does TLS v1.2 encryption. It’s been fun.

The real problem though is that it is difficult to find good and complete technical information on the net. Yes there is RFC 1951 defining DEFLATE. It does not tell how to do it just tells you what you need to do. And, some aspects of it are not clear until you encounter it (or recreate it) in action. It describes LZ77 but you don’t realize that this is very difficult to implement and not have it take 5 minutes to compress 100 KB.

There are numerous web pages discussing DEFLATE and some by reputable universities. These usually include a good discussion on Huffman coding. Yet I have found none the creates a Huffman table that actually meets the additional requirements for DEFLATE. If you are going to describe Huffman in connection with DEFLATE, shouldn’t it be compatible? Would it have helped if you actually had implemented DEFLATE before describing it?

The procedure to create a compatible Huffman tree is not described anywhere that I have found, Most don’t even mention that you need 3 separate Huffman tables (one for literals, one for distance codes and one for code lengths) and that there is a limit of a 15 bit code length for 2 of the tables and 7 bit for the third. Then they say only to “adjust the Huffman table” accordingly. So there is no procedure for generating a less than optimum Huffman tree meeting DEFLATE restrictions. I had to get creative myself.

Enough of my rant. The result really is that I have had to reinvent the wheel. I am not the only one to have done so. I am going to try to document it here for your edification.

Overview

First let me greatly over-simplify the process and provide an outline for the compression procedure.

  1. Perform efficient LZ77 scanning the uncompressed data byte-by-byte filling a 64 KB interim buffer with raw unmatched literal bytes and escaped length-distance references.
  2. Scan the interim LZ77 data creating two DEFLATE compatible Huffman tables, one for literals and length codes combined and one for distance codes.
  3. Assign code lengths (15 bits max) to the used alphabet for both tables.
  4. Determine the size of the alphabet required for the length and distance Huffman tables.
  5. Combine code lengths into a single run-length compressed array.
  6. Determine the DELFATE compatible Huffman table needed to code this compressed code length array.
  7. Assign code lengths (7 bits max) to the used alphabet for code lengths.
  8. Sort the resulting code lengths for this 3rd Huffman table into the unique order specified for DEFLATE and determine the length of the array (trim trailing zeroes),
  9. Output the block header marking only the last as BFINAL and output the alphabet sizes.
  10. Output the reordered code lengths.
  11. Output the Huffman codes compressing the run-length encoded combined code length array. Insert extra bits where required.
  12. Output the Huffman codes compressing the LZ77 data. Use the literal table for literals and sequence lengths. Use the distance table for distance codes. Insert extra bits where required.
  13. Output end-of-block code.
  14. If not the final block keep the bit stream going and continue LZ77 at step #1.

Uh. That about summarizes it. All we can do is to push through this step by step. It amounts the two phases. The first compresses the data using LZ77 and determines the Huffman coding requirements. The second outputs a bit stream encoding the Huffman tables and then the actual data. We first determine everything that we need for the bit stream and then generate the bit stream itself.

Time-Efficient LZ77 Compression

First off let me note that we end up trading off compression ratio for processing speed. As we had experimented earlier in Java we could discover every possible matching sequence for a block of data and then analyze those matches selecting the optimum set. This arguably would create the best compression possible. This is certainly doable if we are running on a multiple core GHz processor and it is coded carefully. Still it would be lengthy and possibly not appropriate even then for some applications. The gain in compression ratio is expected to be only slight and not worth the processing cycles. It is certainly not critical for JANOS. So we will not go down this path.

Another approach is what is called lazy matching. Here we are concerned that a matched sequence might prevent a longer matching sequence starting in the next byte or two from being used. In the analysis it appears that 1 or 2 bytes may be unnecessarily forced into the output stream in these situations. Those may be rare but for certain types of data it could be more of a concern. Again the gain in compression ratio if we were to take the time to perform lazy matching is assumed not to be necessary for JNIOR.

As a result we are just going to go ahead and perform straight up sequence match detection for each byte position in the data. Even with this the amount of processing involved prevents us from using any kind of brute force scanning. Imagine how much processing would be involved if for each byte in the file we have to compare it directly against 32 KB of bytes in the preceding sliding window. For a large file this is starts to become a very large number of processor cycles. It gets even worse when bytes match and you need to check following bytes to determine the usability of the sequence.

I had implemented the brute force search with no trickery at first. Small files produced proper LZ77 output in a short time but a 20 KB JSON file took almost 30 seconds (JNIOR remember). A large binary file basically stalled the system. The JSON performed more poorly due to the high occurrence of certain characters such as curly braces, quotes and colons. That code has long been discarded or I would include it here for better understanding.

In the prior Java experimentation I used a linked list to track the occurrences of matching byte values. This eliminated the need to scan the entire 32 KB sliding window saving time. For the JANOS implementation I decided to use a pointer queue for each byte value. Each queue holding a maximum of 256 pointers. So basically we would test only the last 256 matching byte positions for usable sequences. This might miss some good sequence matches deep in the sliding window for very frequently occurring byte values but not for those appearing less often. Again it’s a trade off. It is an interesting approach.

I had figured that I could adjust the depth of these pointer queues. Since there are 256 possible literal values and each with 256 pointers which require 4 bytes this results in a matrix of 65,536 pointers or 256 KB of memory. There is room for that in the JANOS heap. Increasing the depth increases the memory requirement as well as the time spent in sequence detection. I was pleased with the results at a depth of 256. Perhaps later I will conduct some experiments plotting the effects of this parameter on compression ration and execution time.

I will present the resulting routine in the next post.

Here is the resulting routine. This handles only the LZ77 leaving all of the Huffman to the routine responsible for flushing the interim buffer. Note that I have made both the SLIDING_WINDOW and DEPTH parameters adjustable through the Registry. I will use this later to measure performance.

/* -- function ---------------------------------------------------------------
** FZIP_deflate()
**
** Compresses the supplied buffer.
**
** -------------------------------------------------------------------------*/
int FZIP_deflate(char *inb, uint32_t insize, char **outbuf, uint32_t *outsize)
{
	char *obuf;
	int optr;
	int err = FALSE;
	int curptr, len, seqlen;
	char *seqptr, *p1, *p2, *s1, *s2;
	struct bitstream_t stream;
	int *matrix, *mat, *track;
	int ch, trk;
	int window, depth;
 
	// obtain sliding window size
	window = REG_getRegistryInteger("Zip/Window", 16384);
	if (window < 2048) window = 2048; else if (window > 32768)
		window = 32768;
 
	// obtain tracking queue depth
	depth = REG_getRegistryInteger("Zip/Depth", 256);
	if (depth < 16) depth = 16; else if (depth > 1024)
		depth = 1024;
 
	// check call
	if (outbuf == NULL || outsize == NULL)
		return (FALSE);
 
	// initialize bit stream
	memset(&stream, 0, sizeof(struct bitstream_t));
	stream.buffer = MM_alloc(insize + 1024, &_bufflush);
 
	// create an output buffer
	obuf = MM_alloc(64 * 1024, &FZIP_deflate);
	optr = 0;
 
	// initialize matrix
	matrix = MM_alloc(256 * depth * sizeof(int), &FZIP_deflate);
	track = MM_alloc(256 * sizeof(int), &FZIP_deflate);
 
	// process uncompressed stream byte-by-byte
	curptr = 0;
	while (curptr < insize)
	{
		// get current byte value
		ch = inb[curptr];
 
		// Locate best match. This is the longest match located the closest to the curPtr. This
		//  is intended to be fast at the slight cost in compression ratio. We do not handle lazy
		//  matches or block optimization (selective matching). Only seqlen of 3 or more matter so
		//  we initialize seqlen to 2 to limit unnecessary best match updates. Try to limit cycles in
		//  this loop.
		mat = &matrix[depth * ch];
		trk = track[ch] - 1;
		if (trk < 0) trk = depth - 1; seqlen = 2; p2 = &inb[curptr]; while (trk != track[ch]) { if (mat[trk] == 0 || curptr - mat[trk] >= window)
				break;
 
			s1 = p1 = &inb[mat[trk] - 1];
			s2 = p2;
			while (*s1 == *s2)
				s1++, s2++;
 
			// check for improved match
			len = s1 - p1;
			if (len > seqlen)
			{
				seqptr = p1;
				seqlen = len;
			}
 
			trk--;
			if (trk < 0) trk = depth - 1; } // track the character mat[track[ch]] = curptr + 1; track[ch]++; if (track[ch] >= depth)
			track[ch] = 0;
 
		// check validity (match past end of buffer)
		if (curptr + seqlen > insize)
			seqlen = insize - curptr;
 
		// If we have a good sequence we output a pointer and advance curPtr
		if (seqlen >= 3)
		{
			// check maximum allowable sequence which is 258 bytes but we reserve one
			//  for 0xff escaping
			if (seqlen > 257)
				seqlen = 257;
 
			// escape length-distance pointer
			obuf[optr++] = 0xff;
			obuf[optr++] = seqlen - 3;
			*(short *)&obuf[optr] = p2 - seqptr;
			optr += 2;
 
			// advance curPtr
			curptr += seqlen;
			PROC_yield();
		}
 
		// otherwise we output the raw uncompressed byte and keep searching
		else
		{
			// escape 0xff
			if (*p2 == 0xff)
				obuf[optr++] = 0xff;
			obuf[optr++] = *p2;
			curptr++;
		}
 
		// flush output buffer as needed. Because we escape 0xff our length-pointer encoding
		//  requires 4 bytes. This at times replaces a 3-byte match and so the compression
		//  ration in this buffer is compromised. That will be corrected as we move into
		//  Huffman coding. Blocks will then be slightly less than 64KB. We also want to
		//  flush this before over-running it.
		if (optr > 65530)
		{
			if (!_bufflush(obuf, optr, &stream, FALSE))
				err = TRUE;
 
			// if compression seems fruitless
			if (stream.length > curptr)
				err = TRUE;
 
			optr = 0;
		}
 
		if (err)
			break;
	}
 
	// Flush remaining data
	if (!err && !_bufflush(obuf, optr, &stream, TRUE))
		err = TRUE;
 
	// if compression was fruitless
	if (stream.length > insize)
		err = TRUE;
 
	// clean up
	MM_free(track);
	MM_free(matrix);
 
	// return
	*outbuf = stream.buffer;
	*outsize = stream.length;
	return (!err);
}

So the preliminaries are over by line 50 in this code. You might notice that when JANOS allocates memory it retains a reference pointer. That is used to locate memory leaks among other things. I can see quickly where any block is allocated.

The main loop which processes byte-by-byte through the uncompressed data is done by line 150. After that we merely flush any partially compressed block and be done.

From lines 55 to 95 we search for the best match. From 95 to about 130 we either output the sequence length-distance reference or the raw unmatched data byte. After that we check to see if we need to flush the interim buffer.

The magic occurs in the search where we employ the pointer matrix. The trk array keeps a list of current positions in the each byte value queue. This is where we add the pointer to the current byte once we’ve searched it. The search runs backward through the queue either until we’ve check all of the pointers (wraps back to the starting position) or we run out of the sliding window (pointer reference too far back in the data). This checks for closer matches first. When a longer sequence of characters matches the current position we update our best match.

After that if we have a sequence of 3 or more bytes we output the length-distance reference otherwise we output the current byte and move on to process the next.

I know that I can optimize this some more. I haven’t gone through that step of trying to minimize processor cycles. We do use the RX compiler optimization. But… this is my approach for LZ77. There is no hash table or confusing prefix-suffix tricks. It seems to run fast enough at least for our needs at this point. The LZ77 search is where all of the time is consumed.

Next we will look at what happens when the interim buffer is flushed.

The FZIP_deflate() routine involves 90+ percent of the processing time and handles only step #1 in the prior outline. The buffer flush routine handles all of the remaining items.

We will be creating three different Huffman tables. These amount to a structure for each alphabet symbol.

struct huff_t {
	uint16_t code;
	uint16_t len;
	uint16_t cnt;
	uint16_t link;
};

Here the cnt will represent the symbol’s frequency. It is the count of occurrences of the symbol’s value in the data. The len will eventually be the code length in bits assigned to the symbol. The code will hold the bit pattern to be used in coding. And the link will be used in the code length determination routine.

The _bufflush() routine is called with a buffer of LZ77 compressed data. This includes raw data bytes that could not be included in a sequence and length-distance references for matches. Since data can take on all byte values I use a form of escaping. The escape character is 0xFF and a 0xFF in the data is represented by repeating the escape (e.g. 0xff 0xff). Otherwise the next byte is the length of the sequence -3 and the following short value the distance. To make this work I disallow the sequence length of 258 since that would be confused with an escaped 0xFF. I don’t see this as having any significant impact. Later I could find a way around that but for now it works.

// Routine applies DEFLATE style Huffman coding to the buffer content.
static int _bufflush(char *obuf, int osize, struct bitstream_t *stream, int final)
{
	struct huff_t *littbl;
	struct huff_t *dsttbl;
	struct huff_t *cnttbl;
	int n, len, c, code, dst, lastc, ncnt;
	int *cntlist, *litcnts, *dstcnts, *bitcnts, *bitlens;
	int hlit, hdist, hclen;
	int *startcd;
	int totdist = 0;
	int err = FALSE;
 
	// Now we need to construct two Huffman trees (although I am going to avoid
	//  actual trees). One for the literal data and one for the distance codes.
	//  Note that extra bits are just extra bits inserted in the stream.
	littbl = MM_alloc(286 * sizeof(struct huff_t), &_bufflush);
	dsttbl = MM_alloc(30 * sizeof(struct huff_t), &_bufflush);
 
	// Not a loop. This allows the use of break.
	for (;;)
	{
 
		// Now we analyze the data to determine frequencies. Note that this is complicated
		//  just a bit because of the escaping that I have had to use. I will have to
		//  temporarily decode length and distance encoding. We'll have to do that again
		//  later when we stream the coding. We will also use one end-of-block code so we
		//  virtually count it first.
		littbl[256].cnt = 1;
		for (n = 0; n < osize; n++)
		{
			// tally literal if not escaped
			if (obuf[n] != 0xff)
				littbl[obuf[n]].cnt++;
			else
			{
				// check and tally escaped 0xff
				if (obuf[++n] == 0xff)
					littbl[0xff].cnt++;
				else
				{
					totdist++;
 
					// table defined above
					//static const int lcode_ofs[29] = {
					//	3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17, 19, 23, 27, 31,
					//	35, 43, 51, 59, 67, 83, 99, 115, 131, 163, 195, 227, 258
					//};
 
					// determine required length code for lengths (3..258). This code is
					//  coded in the literal table.
					len = (obuf[n++] & 0xff) + 3;
					for (c = 0; c < 29; c++) if (lcode_ofs[c] > len)
							break;
					code = 256 + c;
					littbl[code].cnt++;
 
					// table define above
					//static const int dcode_ofs[30] = {
					//	1, 2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193,
					//	257, 385, 513, 769, 1025, 1537, 2049, 3073, 4097, 6145,
					//	8193, 12289, 16385, 24577
					//};
 
					// determine required distance code for distances (1..32768). This code is
					//  coded in the distance table.
					dst = (obuf[n++] & 0xff) << 8;
					dst |= (obuf[n] & 0xff);
					for (c = 0; c < 30; c++) if (dcode_ofs[c] > dst)
							break;
					code = c - 1;
					dsttbl[code].cnt++;
				}
			}
		}

So here we create two of the Huffman tables, one for the literal alphabet (0..285) and one for the distance alphabet (0..29). Note that when JANOS allocates memory it is zero filled.

Don’t be confused by my use of for(;;) { }. This is not an infinite loop. In fact it is not a loop at all. Rather it allows me to exit the procedure at any point using break; and I just have to remember to place a break; at the very end. There are other ways to achieve the same thing.

The first step is to determine the frequency of symbols from both alphabets. Here we scan the supplied data and count the literals. The escaped length-distance references are translated temporarily into their length codes and distance codes. Those are tallied in the appropriate table. The extra bits are ignored. Length codes are combined with literal data since when they are read you don’t know which it will be. The distance codes use their own alphabet. We will have to do this same translation again later when we encode the references for output. Then, of course, we will insert the required extra bits.

Note that this also tallies one end-of-block code (0x100) as we will be using that.

If I were to dump these two tables after this they may look something like this. These are just the symbols that occur in a particular set of data – the jniorsys.log file. This is the symbol value followed by its count.

CODE: SELECT ALL

 0x00a 2
 0x00d 2
 0x020 70
 0x027 4
 0x028 11
 0x029 2
 0x02a 2
 0x02b 7
 0x02c 7
 0x02d 20
 0x02e 133
 0x02f 16
 0x030 105
 0x031 153
 0x032 124
 0x033 124
 0x034 103
 0x035 110
 0x036 97
 0x037 90
 0x038 93
 0x039 84
 0x03a 74
 0x03c 1
 0x03d 3
 0x03e 3
 0x041 5
 0x043 4
 0x044 1
 0x046 1
 0x048 1
 0x04a 3
 0x04c 2
 0x04d 4
 0x04e 4
 0x04f 4
 0x050 8
 0x052 8
 0x053 4
 0x054 4
 0x055 1
 0x057 3
 0x05a 1
 0x05f 3
 0x061 34
 0x062 4
 0x063 15
 0x064 26
 0x065 37
 0x066 9
 0x067 9
 0x068 8
 0x069 36
 0x06a 6
 0x06b 3
 0x06c 21
 0x06d 13
 0x06e 36
 0x06f 41
 0x070 24
 0x071 1
 0x072 29
 0x073 20
 0x074 29
 0x075 11
 0x076 6
 0x077 7
 0x078 3
 0x079 6
 0x07a 4
 0x100 1
 0x101 962
 0x102 457
 0x103 136
 0x104 127
 0x105 55
 0x106 71
 0x107 47
 0x108 14
 0x109 67
 0x10a 208
 0x10b 103
 0x10c 40
 0x10d 52
 0x10e 44
 0x10f 15
 0x110 42
 0x111 180
 0x112 287
 0x113 30
 0x114 5
 0x115 12
 0x116 6

 0x002 1
 0x003 2
 0x008 2
 0x009 2
 0x00a 13
 0x00b 150
 0x00c 166
 0x00d 40
 0x00e 127
 0x00f 63
 0x010 157
 0x011 112
 0x012 155
 0x013 116
 0x014 198
 0x015 169
 0x016 240
 0x017 197
 0x018 288
 0x019 215
 0x01a 321
 0x01b 226

We need to assign code lengths to these alphabet symbols. Here there are two tables and later we will process a third. So the procedure is handled by a separate rom.buffer;
*outsize = stream.length;
return (!err);
}

The _bitlength() routine assigns code lengths creating possibly a slightly less than optimal Huffman table that does not exceed a 15-bit maximum. The assignments must make the Huffman tables compatible with the DEFLATE requirements.

Huffman Tables for DEFLATE

The buffer full of data that we have and which has been compressed using LZ77 will be compressed further using Huffman coding. The DEFLATE format specifies two separate Huffman code sets. One to encode both the literal bytes in the data (0..255), the end-of-block code (256), and the sequence match length codes (257..285). The second Huffman code set will encode the distance codes (0..29). We can use a separate table for that because we know when we are reading a distance code as one always follows a length code. We never know whether we are reading a literal or a length code so those need to be decoded the same way and therefore from the same table.

Previously we scanned the data and counted the occurrences of each symbol. We now know which symbols occur in the data and how frequently. We have defined our alphabets for each Huffman table. Now we need to create the Huffman trees themselves. This is where things get tricky.

Creating a Huffman tree is not very difficult. But creating a Huffman tree that is compatible with the DEFLATE format is quite another thing altogether. The DEFLATE specification dictates that the Huffman trees must meet two additional rules. In fact they need to adhere to three rules. The third is mentioned later in the specification.

  1. All codes of a given bit length have lexicographically consecutive values, in the same order as the symbols they represent.
  2. Shorter codes lexicographically precede longer codes.
  3. The bit length cannot exceed 15 bits for the literal and distance code sets. It cannot exceed 7 bits for the code length set (comes into play much later).

The first trick is to not generate a tree at all. If you create a tree using the standard Huffman approach you are almost guaranteed to not have a tree that is usable for DEFLATE. All you need from that effort are the bit lengths that end up being assigned to each symbol. You can get those from the same procedure without dealing with right and left links and an actual tree structure. You then use the procedure defined in the DEFLATE specification to create the compatible tree.

The standard Huffman approach is to take a list of all the symbols that occur in the data and sort it in descending frequency. Combine the rightmost two least frequent symbols into a node whose frequency is the total of the two symbols. Next resort the list so this new node repositions itself according to its combined frequency. Now repeat the process combining the next two rightmost entries which may be symbols (leaves) or previously create nodes. This continues until you have just one node which is the head of your tree.

We are not going to bother to build the tree structure. We are only going to keep a list of the symbols that fall beneath a node. We are also going to realize that the combination of the two rightmost entries in the sorted list merely increases by one the bit length of each of the new node’s member symbols. When we finally reach the point where there is only one node in the list we would have assigned a bit length to every symbol based on its frequency. That is all we need to then create the codes for Huffman coding in DEFLATE format using the procedure for that outlined in the specification.

Here is where things really get confusing. This process doesn’t always create a DEFLATE compatible Huffman code set. Sometimes the bit length will exceed 15 (or 7 for the table later). We need a procedure for dealing with that. It amounts to being able to create a less than optimal Huffman tree with bit lengths limited to a maximum. This was a puzzle but I have a way to get it done.

So next I’ll take us through examples.

Let’s use an example. Here we have an alphabet of 19 symbols (0..18). The data set consists of 150 of these and after counting the occurrences of each we have the following. To make discussion simpler I will assign these symbols uppercase names. On the right are the results of the tally for each.

 
A  0x000 4
B  0x001 0
C  0x002 0
D  0x003 2
E  0x004 6
F  0x005 3
G  0x006 2
H  0x007 2
J  0x008 53
K  0x009 26
L  0x00a 5
M  0x00b 4
N  0x00c 3
P  0x00d 1
Q  0x00e 1
R  0x00f 1
S  0x010 37
T  0x011 0
U  0x012 0

In the standard approach to creating a Huffman table we ignore the symbols that do not appear in the data and arrange the other in order of decreasing frequency.

Used symbols:
  A    D    E    F    G    H    J    K    L    M    N    P    Q    R    S 
  4    2    6    3    2    2   53   26    5    4    3    1    1    1   37

Sorted by decreasing frequency:
  J    S    K    E    L    A    M    F    N    D    G    H    P    Q    R
 53   37   26    6    5    4    4    3    3    2    2    2    1    1    1

Next we combine the lowest two frequency symbols into a new node with a combined total. I will name the nodes with lowercase characters just so you can track them. The shorter list is then resorted before proceeding to repeat. The process continues until this is only one node. Here I will show only the combining action for each step. We will get into more detail afterwards.

Starting set:
  J    S    K    E    L    A    M    F    N    D    G    H    P    Q    R
 53   37   26    6    5    4    4    3    3    2    2    2    1    1    1

Step #1 combines Q and R into (a) with new frequency of 2. The list is resorted.
  J    S    K    E    L    A    M    F    N    D    G    H   (a)   P
 53   37   26    6    5    4    4    3    3    2    2    2    2    1

Step #2 combines (a) and P into (b) with new frequency of 3.
  J    S    K    E    L    A    M    F    N   (b)   D    G    H
 53   37   26    6    5    4    4    3    3    3    2    2    2

Step #3 combines G and H into (c) with new frequency of 4.
  J    S    K    E    L    A    M   (c)   F    N   (b)   D
 53   37   26    6    5    4    4    4    3    3    3    2

Step #4 combines (b) and D into (d) with new frequency of 5.
  J    S    K    E    L   (d)   A    M   (c)   F    N
 53   37   26    6    5    5    4    4    4    3    3

Step #5 combines F and N into (e) with new frequency of 6.
  J    S    K    E   (e)   L   (d)   A    M   (c)
 53   37   26    6    6    5    5    4    4    4

Step #6 combines M and (c) into (f) with new frequency of 8.
  J    S    K   (f)   E   (e)   L   (d)   A
 53   37   26    8    6    6    5    5    4

Step #7 combines (d) and A into (g) with new frequency of 9.
  J    S    K   (g)  (f)   E   (e)   L
 53   37   26    9    8    6    6    5

Step #8 combines (e) and L into (h) with new frequency of 11.
  J    S    K   (h)  (g)  (f)   E
 53   37   26   11    9    8    6

Step #9 combines (f) and E into (i) with new frequency of 14.
  J    S    K   (i)  (h)  (g)
 53   37   26   14   11    9

Step #10 combines (h) and (g) into (j) with new frequency of 20.
  J    S    K   (j)  (i)
 53   37   26   20   14

Step #11 combines (j) and (i) into (k) with new frequency of 34.
  J    S   (k)   K
 53   37   34   26

Step #12 combines (k) and K into (m) with new frequency of 60.
 (m)   J    S
 60   53   37

Step #13 combines J and S into (n) with new frequency of 90.
 (n)  (m)
 90   60

Step #14 combines (n) and (m) into our final node (p) with new frequency of 150.
 (p)
150

We are done.

Did you notice how nodes that we created were often quickly reused in another combination? This is what leads to a tree structure exceeding the maximum code length. Imagine a tree with one long branch down its right edge. It is very common when there are a few symbols that appear with high frequency and the balance are relatively low frequency symbols.

This combining nodes exercise was all well and good but something else needs to occur to make it useful. In the typical Huffman case in creating a node you would assign one combining nodes to the left link (bit = 0) and the other to the right link (bit = 1). This would then develop the tree structure that would work for you although most likely not to be DEFLATE compatible.

I am going to avoid the tree but note that in the act of combination all of the symbols participating will have their code length increased by 1. Combining leaves into a node creates another level in the tree. That’s what we are doing. In the code I will use a linked list to simply collect the symbols that are a member of (or lie below) any given node. In combining I will concatenate the member lists for the two leaves/nodes being combined and then increment the code length for each member.

For each step I going to show the node membership and the code length for our alphabet as we proceed through the process. Hopefully this table will make sense to you.

             Step   1    2    3    4    5    6    7    8    9    10   11   12   13   14  clen
A  0x000 4     0    0    0    0    0    0    0    1g   1g   1g   2j   3k   4m   4m   5p   5
D  0x003 2     0    0    0    0    1d   1d   1d   2g   2g   2g   3j   4k   5m   5m   6p   6
E  0x004 6     0    0    0    0    0    0    0    0    0    1i   1i   2k   3m   3m   4p   4
F  0x005 3     0    0    0    0    0    1e   1e   1e   2h   2h   3j   4k   5m   5m   6p   6
G  0x006 2     0    0    0    1c   1c   1c   2f   2f   2f   3i   3i   4k   5m   5m   6p   6
H  0x007 2     0    0    0    1c   1c   1c   2f   2f   2f   3i   3i   4k   5m   5m   6p   6
J  0x008 53    0    0    0    0    0    0    0    0    0    0    0    0    0    1n   2p   2
K  0x009 26    0    0    0    0    0    0    0    0    0    0    0    0    1m   1m   2p   2
L  0x00a 5     0    0    0    0    0    0    0    0    1h   1h   2j   3k   4m   4m   5p   5
M  0x00b 4     0    0    0    0    0    0    1f   1f   1f   2i   2i   3k   4m   4m   5p   5
N  0x00c 3     0    0    0    0    0    1e   1e   1e   2h   2h   3j   4k   5m   5m   6p   6
P  0x00d 1     0    0    1b   1b   2d   2d   2d   3g   3g   3g   4j   5k   6m   6m   7p   7
Q  0x00e 1     0    1a   2b   2b   3d   3d   3d   4g   4g   4g   5j   6k   7m   7m   8p   8
R  0x00f 1     0    1a   2b   2b   3d   3d   3d   4g   4g   4g   5j   6k   7m   7m   8p   8
S  0x010 37    0    0    0    0    0    0    0    0    0    0    0    0    0    1n   2p   2

In this table we follow the code length (clen) associated with each occurring symbol through the step by step combination process. Here as a symbol is combined into a new node (letter changes) we increment its bit depth or code length. The final column shows the resulting clen for this table.

So we mechanically have shown how to derive the code lengths for an alphabet symbol set with given frequencies. Next we create the DEFLATE Huffman code table for this.

My code to perform the node creation and code length incrementing looks something like this.

// Establish bit length for the Huffman table based upon frequencies
static int _bitlength(struct huff_t *tbl, int ncodes, int maxb)
{
	uint16_t *list = MM_alloc(ncodes * sizeof(uint16_t), &_bitlength);
	int *freq = MM_alloc(ncodes * sizeof(int), &_bitlength);
	int nlist = 0;
	int n, c, p;
	int ret = TRUE;
 
	// List all of the symbols used in the data along with their frequencies. Note that
	//  we store pointers +1 so as to keep 0 as a linked list terminator.
	for (n = 0; n < ncodes; n++) if (tbl[n].cnt > 0)
		{
			list[nlist] = n + 1;
			freq[nlist] = tbl[n].cnt;
			nlist++;
		}
 
	// Note that there is a special boundary case when only 1 code is used. In this case
	//  the single code is encoded using 1 bit and not 0.
	if (nlist == 1)
		tbl[list[0] - 1].len = 1;
 
	// process this list down to a single node
	while (nlist > 1)
	{
		// sort the list by decreasing frequency
		for (n = 0; n < nlist - 1; n++)
			if (freq[n] < freq[n + 1]) { // swap order c = list[n]; list[n] = list[n + 1]; list[n + 1] = c; c = freq[n]; freq[n] = freq[n + 1]; freq[n + 1] = c; // need to sort back if (n > 0)
					n -= 2;
			}
 
		// Combine the member lists associated with the last two entries. We combine the
		//  linked lists for the two low frequency nodes.
		p = list[nlist - 2];
		while (tbl[p - 1].link)
			p = tbl[p - 1].link;
		tbl[p - 1].link = list[nlist - 1];
 
		// The new node has the combined frequency.
		freq[nlist - 2] += freq[nlist - 1];
		nlist--;
 
		// Increase the code length for members of this node.
		p = list[nlist - 1];
		while (p)
		{
			tbl[p - 1].len++;
			p = tbl[p - 1].link;
		}
 
	}
 
	MM_free(freq);
	MM_free(list);
	return (ret);
}

You might note the check at line 20 handling a special case when there is only one used item in our alphabet. Here we need to use a code length of 1.

Now to be fair there is a lot more that I will be adding to this routine before we are done.

Now that we know the lengths of the code that we will be using to compress the data we can predict the compression ratio.

        freq  clen     freq*clen
A  0x000 4     5           20
D  0x003 2     6           12
E  0x004 6     4           24
F  0x005 3     6           18
G  0x006 2     6           12
H  0x007 2     6           12
J  0x008 53    2          106
K  0x009 26    2           52
L  0x00a 5     5           25
M  0x00b 4     5           20
N  0x00c 3     6           18
P  0x00d 1     7            7
Q  0x00e 1     8            8
R  0x00f 1     8            8
S  0x010 37    2           74
                      -----------
                          416 bits (52 bytes)

If we multiply the frequency of a symbol times the code length and total that for the set we get total number of bits required to encode the original message. Originally we had 150 bytes or 1200 bits. When we are done we can store that same message in only 52 bytes. We’ve reduced the data to almost one third it’s original size.

Let’s see how to derive the bit codes that we will use in encoding the data.

We want to derive the actual binary code patterns for encoding this symbol set. The DEFLATE specification tells us to first count the number of codes for each code length.

N  bl_count[N]
0      0
1      0
2      3
3      0
4      1
5      3
6      5
7      1
8      2

Next we find the numerical value of the smallest code for each code length. The following code is provided by the specification.

        code = 0;
        bl_count[0] = 0;
        for (bits = 1; bits <= MAX_BITS; bits++) {
            code = (code + bl_count[bits-1]) << 1;
            next_code[bits] = code;
        }

In performing this procedure we get the following. Here I will also show the codes in binary form.

N  bl_count[N]  next_code[N]
0      0
1      0            0
2      3            0     00
3      0            6     110
4      1           12     1100
5      3           26     11010
6      5           58     111010
7      1          126     1111110
8      2          254     11111110 

Now we assign codes to each symbol based upon its length. The DEFLATE specification provides this code snippet. Basically the above defines the starting code which we increment after each use.

        for (n = 0; n <= max_code; n++) {
            len = tree[n].Len;
            if (len != 0) {
                tree[n].Code = next_code[len];
                next_code[len]++;
            }
        }

And this ends up giving us the following codes for encoding this data.

        freq  clen     code
A  0x000 4     5      11010
D  0x003 2     6      111010
E  0x004 6     4      1100
F  0x005 3     6      111011
G  0x006 2     6      111100
H  0x007 2     6      111101
J  0x008 53    2      00
K  0x009 26    2      01
L  0x00a 5     5      11011
M  0x00b 4     5      11100
N  0x00c 3     6      111110
P  0x00d 1     7      1111110
Q  0x00e 1     8      11111110
R  0x00f 1     8      11111111
S  0x010 37    2      10

Take a few moments to picture what is going on. Basically with just 2 bits you can encode no more than 4 symbols. Since we have more than 4 symbols to encode we cannot use all 4 combinations of two bits. We reserve 1 or more bit combinations as a prefix indicting that an additional bit or more will be needed to identify other symbols. The decompressor will be processing the bit stream 1 bit at a time as it doesn’t know in advance how many bits will be needed to identify the next symbol.

In this symbol set it turns out that we use all but the last combination of two bits and 11b becomes the prefix accessing the rest of the symbol set. Note how this coincides with the 3 high frequency codes. It then turns out to be most efficient not to use any 3 bit codes and to jump right to 4 bits for the next possible symbol encoding. In fact if for any bit length if you reserve only the last combination as a prefix then the next bit length has only two combinations (a node). For 3 bit codes here that would be 110b and 111b. If we save all for prefix then we can encode more symbols. Here for the 4 bit codes there are 4 combinations: 1100b, 1101b, 1110b, and 1111b. Again this tree decided to use only one of the 4 bit combinations for a symbol.

Another thing to notice is that the final code for the largest bit length corresponds to the rightmost leaf in the tree. For DEFLATE that lexicographically is the longest code and requires a series of 1 bits to reach. So for this 8 bit code the last symbol is identified by 11111111b. This last code should always be 2**N – 1 where N is the largest bit length (code length). Note that ‘**’ indicates exponentiation here. Two is raised to the power of N.

If you think about it, the set of code lengths have to be just right to end up properly assigning codes to end up this way or to not overflow those available for any one bit length. This is assured by the procedure. If you try some random code lengths you will quickly see what I mean. In general you will be trying to create an impossible tree.

But wait!!

This table looks suspiciously like the code length encoding (third Huffman table that we have not discussed as yet). If it is, didn’t you mention that it would be limited to a bit depth or code length of 7? This one is 8 bits. Is that okay?

No. Is is not okay. You are right. I purposely chose this real-world example which actually does violate that code length rule. So this is not a DEFLATE compatible table. At least not for that third Huffman coding. This table occurs in trying to compress the /etc/JanosClasses.jarfile currently on my development JNIOR. And actually before this the literal table exceeds the 15 bit limit. The resulting compression fails.

Most of what you read now tells you to “adjust the Huffman tree” accordingly and prods on without a hint as to how you might do that. You can certainly detach and reattach nodes to get it done but how do you know that you haven’t significantly affected the compression ratio? You could punt and use the fixed Huffman tables afforded by another DEFLATE block type BTYPE 01. You know that you can’t just fiddle with the code lengths because you will end up trying to create an impossible tree. So what now?

Well, I can show you how to get it done.

Adjust the Huffman Tree Accordingly

We have created the optimum Huffman table for coding our data. Unfortunately we find out that it is not compatible for use with DEFLATE. The DEFLATE specification dictates that this particular table have a code length maximum of 7 bits. That being forced by the fact that these code lengths are stored in the DEFLATE stream using 3 bits each. That limits code lengths for this table to the set containing 0 thru 7.

Our table is too deep. It requires 8 bits to encode our data. What do we do about that? It seems that anything we do will reduce the efficiency of the compression. We need to create a less than optimum Huffman coding. How do we do that and keep the impact at a minimum? How do we not seriously damage the compression ratio?

We are going to adjust our Huffman table so that it does not exceed the maximum bit depth. Ideally we want to stop incrementing the code length of an symbols that reach the maximum. How can we properly do that? Well, we have to go back to the math. Let’s understand what makes a certain set of code lengths valid while others are not?

In the prior post we assigned code lengths to symbols in our alphabet based upon a tree construction algorithm. The steps in this algorithm are those that create a valid tree. It is not surprising then that our set of code lengths represent a real tree. As a result when we calculate the starting codes for each bit length the process ends up with usable codes. And as we noticed the last code in the tree is in fact 2**N – 1. In our case this is 2**8 -1 or 255 and in binary that being 11111111b.

Let me expand the loop in the starting code generation so we can see what happens. Here we will generate starting codes (S0..S8) for our code lengths. We will use a shorthand for the code length counts (N0..N8). In our example those are N = {0 0 3 0 1 3 5 1 2}. Note too that a left shift of 1 is equivalent to multiplication by 2. The starting codes (S) are calculated as follows:

    S1 = (S0 + N0) << 1 = (0 + 0) << 1 = 2 * 0 = 0
    S2 = (S1 + N1) << 1 = (0 + 0) << 1 = 2 * 0 = 0         00
    S3 = (S2 + N2) << 1 = (0 + 3) << 1 = 2 * 3 = 6         110
    S4 = (S3 + N3) << 1 = (6 + 0) << 1 = 2 * 6 = 12        1100
    S5 = (S4 + N4) << 1 = (12 + 1) << 1 = 2 * 13 = 26      11010
    S6 = (S5 + N5) << 1 = (26 + 3) << 1 = 2 * 29 = 58      111010
    S7 = (S6 + N6) << 1 = (58 + 5) << 1 = 2 * 63 = 126     1111110
    S8 = (S7 + N7) << 1 = (126 + 1) << 1 = 2 * 127 = 254   11111110

There are two things to notice here other than the fact that hist matches the table generated earlier. First, the count of 8 bit code lengths (N8) doesn’t come into play. Yet we know that there are 2 and the first will be assigned 11111110b and the second 11111111b. This being the 2**8 -1 that we now expect. The second thing is that all of the starting codes are even numbers. That being driven by the fact that they are the product of multiplication by 2. We will use this fact later.

Now I can reverse this procedure to generate starting codes back from the maximum bit length knowing that the last code must be 2**N – 1. So for this table we get the following:

    S8 = 2**8 - N8 = 2**8 - 2 = 254
    S7 = S8/2 - N7 = 254/2 - 1 = 127 - 1 = 126
    S6 = S7/2 - N6 = 126/2 - 5 = 63 - 5 = 58
    S5 = S6/2 - N5 = 58/2 - 3 = 29 - 3 = 26
    S4 = S5/2 - N4 = 26/2 - 1 = 13 - 1 = 12
    S3 = S4/2 - N3 = 12/2 - 0 = 6
    S2 = S3/2 - N2 = 6/2 - 3 = 3 - 3 = 0
    S1 = 0
    S0 = 0

This is just a matter of running the calculations backwards and knowing (or realizing) that the last code has to be 2**N – 1. You can see it generates the same results.

Now what if we decide to not accept a bit depth exceeding the maximum? So we are going to force those two symbols wanting to be 8 bit codes to be two additional 7 bit codes. So our code length array will look like this: N = {0 0 3 0 1 3 5 3}. Here we combined S7 and S8 and eliminated 8 bit altogether. Legal? Of course not. You can’t visualize what that does to a tree. let’s try the calculations back from the new maximum of 2**7 – 1.

    S7 = 2**7 - N7 = 128 - 3 = 125

Here this fails immediately. We know that starting codes (S) must be even numbers and 125 is odd! Not surprising as we are kind of floating a pair of leaves up in the air somehow. Can we make a home for them?

Clearly if we were to reattach those leaves somewhere else in the tree structure other nodes must be increased in bit depth. We need another 7 bit symbol to get S7 to be an even 124. To do that with minimum impact on compression ratio we increase the 6 bit coded symbol with the lowest frequency to 7 bits. Our array now being N = {0 0 3 0 1 3 4 4}. Try again:

    S7 = 2**7 - N7 = 128 - 4 = 124
    S6 = 124/2 - N6 = 62 - 4 = 58
    S5 = S6/2 - N5 = 58/2 - 3 = 29 - 3 = 26
    S4 = S5/2 - N4 = 26/2 - 1 = 13 - 1 = 12
    S3 = S4/2 - N3 = 12/2 - 0 = 6
    S2 = S3/2 - N2 = 6/2 - 3 = 3 - 3 = 0
    S1 = 0
    S0 = 0

Um. Everything seemed to fit right in. Is this tree valid? Let’s see.

N  bl_count[N]  next_code[N]
0      0
1      0            0
2      3            0     00
3      0            6     110
4      1           12     1100
5      3           26     11010
6      4           58     111010
7      4          124     1111100
        freq  clen     code
A  0x000 4     5      11010
D  0x003 2     6      111010
E  0x004 6     4      1100
F  0x005 3     6      111011
G  0x006 2     6      111100
H  0x007 2     6      111101
J  0x008 53    2      00
K  0x009 26    2      01
L  0x00a 5     5      11011
M  0x00b 4     5      11100
N  0x00c 3     7      1111100
P  0x00d 1     7      1111101
Q  0x00e 1     7      1111110
R  0x00f 1     7      1111111
S  0x010 37    2      10

This appears to have successfully generated a Huffman tree that would work with DEFLATE format! Let’s look at the compression ratio for this.

        freq  clen     freq*clen
A  0x000 4     5           20
D  0x003 2     6           12
E  0x004 6     4           24
F  0x005 3     6           18
G  0x006 2     6           12
H  0x007 2     6           12
J  0x008 53    2          106
K  0x009 26    2           52
L  0x00a 5     5           25
M  0x00b 4     5           20
N  0x00c 3     7           21
P  0x00d 1     7            7
Q  0x00e 1     7            7
R  0x00f 1     7            7
S  0x010 37    2           74
                      -----------
                          417 bits (53 bytes)

Wait! This only cost us 1 bit? Yes it did but to store it we would need another whole byte. So the impact of this procedure is likely (though not proven) to have a minimum impact on compression ratio. Yet, it corrects the table to insure that it is compatible with DEFLATE.

To generalize the process, the reversed starting code calculation is repeated from 2**N – 1 when N is the maximum bit depth back to 0 for bit length 0. If at any point the calculated starting code is not even, you must set the bit depth for the next least frequent symbol to include it at this code length and make the starting code even.

In my next post I will show code for this.

The complete procedure for generating a DEFLATE format compatible Huffman table limited to a maximum bit depth is shown here. I know that this is not optimized code. There is some unnecessary execution but I had wanted to keep steps separate and clear. You can be sure that over time I will optimize the coding and obfuscate it suitably for all future generations. ;)

This routine has the capacity to return FALSE if a table cannot be created. It was doing just that when the bit depth (code length) exceeded maximum. That has since been corrected. It will always return TRUE now.

// Establish bit length for the Huffman table based upon frequencies
static int _bitlength(struct huff_t *tbl, int ncodes, int maxb)
{
	uint16_t *list = MM_alloc(ncodes * sizeof(uint16_t), &_bitlength);
	int *freq = MM_alloc(ncodes * sizeof(int), &_bitlength);
	int nlist = 0;
	int n, c, p;
	int ret = TRUE;
	uint16_t *ptr;
 
	// List all of the symbols used in the data along with their frequencies. Note that
	//  we store pointers +1 so as to keep 0 as a linked list terminator.
	for (n = 0; n < ncodes; n++) if (tbl[n].cnt > 0)
		{
			list[nlist] = n + 1;
			freq[nlist] = tbl[n].cnt;
			nlist++;
		}
 
	// Note that there is a special boundary case when only 1 code is used. In this case
	//  the single code is encoded using 1 bit and not 0.
	if (nlist == 1)
		tbl[list[0] - 1].len = 1;
 
	// process this list down to a single node
	while (nlist > 1)
	{
		// sort the list by decreasing frequency
		for (n = 0; n < nlist - 1; n++)
			if (freq[n] < freq[n + 1]) { // swap order c = list[n]; list[n] = list[n + 1]; list[n + 1] = c; c = freq[n]; freq[n] = freq[n + 1]; freq[n + 1] = c; // need to sort back if (n > 0)
					n -= 2;
			}
 
		// Combine the member lists associated with the last two entries. We combine the
		//  linked lists for the two low frequency nodes.
		p = list[nlist - 2];
		while (tbl[p - 1].link)
			p = tbl[p - 1].link;
		tbl[p - 1].link = list[nlist - 1];
 
		// The new node has the combined frequency.
		freq[nlist - 2] += freq[nlist - 1];
		nlist--;
 
		// Sort the members of this node by decreasing code length. Longer codes to the
		//  left. This will also sort the frequency of the symbols in increasing order
		//  when code lengths are equal. We need this arrangement for the next step should
		//  we be required to balance the tree and avoid exceeding the maximum code
		//  length (maxb).
		p = TRUE;
		while (p)
		{
			p = FALSE;
 
			ptr = &list[nlist - 1];
			while (*ptr && tbl[*ptr - 1].link)
			{
				c = tbl[*ptr - 1].link;
				if ((tbl[*ptr - 1].len < tbl[c - 1].len) || (tbl[*ptr - 1].len == tbl[c - 1].len && tbl[*ptr - 1].cnt > tbl[c - 1].cnt))
				{
					n = tbl[c - 1].link;
					tbl[*ptr - 1].link = n;
					tbl[c - 1].link = *ptr;
					*ptr = c;
					p = TRUE;
				}
				ptr = &tbl[*ptr - 1].link;
			}
		}
 
		// Increase the code length for members of this node. We cannot exceed the maximum
		//  code length (maxb).
		p = list[nlist - 1];
		while (p)
		{
			if (tbl[p - 1].len < maxb)
				tbl[p - 1].len++;
 
			p = tbl[p - 1].link;
		}
 
		// Now verify the structure. This should be absolutely proper up until the point when
		//  we prevent code lengths from exceeding the maximum. Once we do that we are likely
		//  creating an impossible tree. We will need to correct that.
		p = list[nlist - 1];
		c = tbl[p - 1].len;
		if (c == maxb)
		{
 
			n = 1 << c;
			while (p)
			{
				if (tbl[p - 1].len == c)
					n--;
				else
				{
					// n must be even at this point or we extend the length group
					if (n & 1)
					{
						tbl[p - 1].len = c;
						n--;
					}
					else
					{
						c--;
						n /= 2;
					}
				}
 
				p = tbl[p - 1].link;
			}
		}
	}
 
	MM_free(freq);
	MM_free(list);
	return (ret);
}

Here are the steps that it performs:

  1. Create an array for all used alphabet symbols (those with non-zero frequency).
  2. Check for the special case where there is only one symbol. In that case we use a 1 bit code where one is unused.
  3. Sort the list array by decreasing frequency. The least frequent symbols are then at the end of the list.
  4. Combine the rightmost two least frequent symbols or nodes into a single new node having the combined frequency. All members of the two combined nodes become members of the new node.
  5. Sort the member list for the new node by decreasing bit depth (current code length) and secondly by increasing frequency.
  6. Increase the bit depth for all members of the new node by 1. If a symbol will exceed the maximum bit depth do not increment it.
  7. If we have reached the maximum bit depth then confirm the tree structure using the reverse starting code length calculations. Elevate the next least frequent symbol to the current bit depth if the calculated starting code is not an even number. Check all code lengths.
  8. If there is more than one entry in the list array (not down to one node yet) then repeat at Step #3.

So to review….

First, we covered how to perform a reasonably fast version of LZ77. This filling a 64 KB buffer with compressed data which contains literals and length-distance codes.

When we need to we will flush the 64 KB buffer and generate a block of DEFLATE format. At that point we call a routine that first analyzes the data for code frequencies. There are to be two alphabets. One for literals and length codes. The other for distance codes.

Now we’ve developed a method for generating DEFLATE compatible Huffman tables for our two alphabets. The bulk of the complexity is now behind us but there is still some work to do before we can start generate our compressed bit stream.

Moving on…

Now we need the size of our literal alphabet (HLIT) and our distance alphabet (HDIST) since it is unlikely the we have utilized all literal symbols (0..285) or distance codes (0..29). So here we scan each to trim unused symbols.

CODE: SELECT ALL

		// Now we combine the bit length arrays into a single array for run-length-like repeat encoding.
		//  In DEFLATE this encoding can overlap for the literal table to the distance code table as
		//  if a single array. First determine the alphabet size for each table.
		for (hlit = 286; 0 < hlit; hlit--)
			if (littbl[hlit - 1].len > 0)
				break;

		for (hdist = 30; 0 < hdist; hdist--)
			if (dsttbl[hdist - 1].len > 0)
				break;

So we have the two alphabets. One for literals and length codes (0..HLIT) and one for distance codes (0..HDIST). We’ve seen that all we need are the code lengths for each symbol set to define the associated Huffman tree. We need to convey this information to the decompressor. The DEFLATE format combines the code length arrays for the two alphabets now into one long array of HLIT + HDIST code lengths.

CODE: SELECT ALL

		// Now create the bit length array
		cntlist = MM_alloc((hlit + hdist) * sizeof(int), &_bufflush);
		for (n = 0; n < hlit; n++)
			cntlist[n] = littbl[n].len;
		for (n = 0; n < hdist; n++)
			cntlist[hlit + n] = dsttbl[n].len;

Now we know that the end-of-block code (256) is used as well as at least a handful of length codes and distance codes. So this array of code lengths itself is fairly long. It will be somewhere between 260 and 315 entries. Each code length is is in the set 0..15. So typically there is a lot of repetition. There can be a large number of 0’s in a row. Consider data that is constrained to 7-bit ASCII. In that case there are 128 literal codes that never occur and would have a code length of 0.

The DEFLATE specification defines a kind of run-length encoding for this array. This encodes sequences of code lengths using three additional codes.

         The Huffman codes for the two alphabets appear in the block
         immediately after the header bits and before the actual
         compressed data, first the literal/length code and then the
         distance code.  Each code is defined by a sequence of code
         lengths, as discussed in Paragraph 3.2.2, above.  For even
         greater compactness, the code length sequences themselves are
         compressed using a Huffman code.  The alphabet for code lengths
         is as follows:

               0 - 15: Represent code lengths of 0 - 15
                   16: Copy the previous code length 3 - 6 times.
                       The next 2 bits indicate repeat length
                             (0 = 3, ... , 3 = 6)
                          Example:  Codes 8, 16 (+2 bits 11),
                                    16 (+2 bits 10) will expand to
                                    12 code lengths of 8 (1 + 6 + 5)
                   17: Repeat a code length of 0 for 3 - 10 times.
                       (3 bits of length)
                   18: Repeat a code length of 0 for 11 - 138 times
                       (7 bits of length)

So as you can see we are heading towards our third Huffman table. This one with an alphabet of 19 codes (0..18). That’s why the symbol set I used for the example in handling maximum code length has 19 members. I used one of these tables as an example.

In my next post I will show the procedure I use for applying the repeat codes in this alphabet.

The current version of jniorsys.log on my development JNIOR compresses into a single block using the following literal and distance tables.

CODE: SELECT ALL

littbl
 0x00a 2 11 11111101110
 0x00d 2 12 111111111000
 0x020 93 6 101000
 0x027 4 11 11111101111
 0x028 14 9 111101010
 0x029 3 11 11111110000
 0x02a 2 12 111111111001
 0x02b 9 10 1111101000
 0x02c 9 10 1111101001
 0x02d 23 8 11101100
 0x02e 158 5 01100
 0x02f 18 9 111101011
 0x030 120 6 101001
 0x031 182 5 01101
 0x032 147 5 01110
 0x033 150 5 01111
 0x034 127 6 101010
 0x035 132 6 101011
 0x036 113 6 101100
 0x037 110 6 101101
 0x038 113 6 101110
 0x039 106 6 101111
 0x03a 91 6 110000
 0x03c 2 12 111111111010
 0x03d 3 11 11111110001
 0x03e 3 11 11111110010
 0x041 6 10 1111101010
 0x043 5 11 11111110011
 0x044 1 13 1111111111100
 0x046 2 12 111111111011
 0x048 1 13 1111111111101
 0x04a 3 11 11111110100
 0x04c 3 11 11111110101
 0x04d 5 11 11111110110
 0x04e 4 11 11111110111
 0x04f 5 10 1111101011
 0x050 11 9 111101100
 0x052 10 9 111101101
 0x053 6 10 1111101100
 0x054 5 10 1111101101
 0x055 2 12 111111111100
 0x057 3 11 11111111000
 0x05a 1 13 1111111111110
 0x05f 5 10 1111101110
 0x061 44 7 1101100
 0x062 7 10 1111101111
 0x063 19 9 111101110
 0x064 30 8 11101101
 0x065 52 7 1101101
 0x066 14 9 111101111
 0x067 13 9 111110000
 0x068 9 10 1111110000
 0x069 45 7 1101110
 0x06a 6 10 1111110001
 0x06b 3 11 11111111001
 0x06c 29 8 11101110
 0x06d 21 8 11101111
 0x06e 46 7 1101111
 0x06f 51 7 1110000
 0x070 31 8 11110000
 0x071 2 12 111111111101
 0x072 37 8 11110001
 0x073 27 8 11110010
 0x074 40 7 1110001
 0x075 16 9 111110001
 0x076 6 10 1111110010
 0x077 8 10 1111110011
 0x078 4 11 11111111010
 0x079 9 10 1111110100
 0x07a 4 11 11111111011
 0x100 1 13 1111111111111
 0x101 1248 2 00
 0x102 580 4 0100
 0x103 168 5 10000
 0x104 160 5 10001
 0x105 72 7 1110010
 0x106 119 6 110001
 0x107 82 6 110010
 0x108 17 9 111110010
 0x109 116 6 110011
 0x10a 253 5 10010
 0x10b 120 6 110100
 0x10c 52 7 1110011
 0x10d 69 7 1110100
 0x10e 79 6 110101
 0x10f 20 8 11110011
 0x110 54 7 1110101
 0x111 236 5 10011
 0x112 371 4 0101
 0x113 34 8 11110100
 0x114 6 10 1111110101
 0x115 17 9 111110011
 0x116 6 10 1111110110

dsttbl
 0x002 1 10 1111111100
 0x003 2 10 1111111101
 0x008 2 10 1111111110
 0x009 2 10 1111111111
 0x00a 14 8 11111110
 0x00b 204 4 0100
 0x00c 225 4 0101
 0x00d 46 7 1111110
 0x00e 168 5 11100
 0x00f 85 6 111110
 0x010 202 4 0110
 0x011 130 5 11101
 0x012 193 4 0111
 0x013 146 5 11110
 0x014 259 4 1000
 0x015 209 4 1001
 0x016 301 4 1010
 0x017 255 4 1011
 0x018 377 3 000
 0x019 293 4 1100
 0x01a 433 3 001
 0x01b 332 4 1101

We determine HLIT and HDIST for this and combine the code lengths into one HLIT + HDIST length array.

HLIT: 279
HDIST: 28
 0 0 0 0 0 0 0 0 0 0 11 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 11 9 11 12 10 10 8 5 9 6 5 5 5 6 6 6 6 6 6 6 0 12 11 11 0 0 10 0 11 13 0 12 0 13 0 11 0 11 11 11 10 9 0 9 10 10 12 0 11 0 0 13 0 0 0 0 10 0 7 10 9 8 7 9 9 10 7 10 11 8 8 7 7 8 12 8 8 7 9 10 10 11 10 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 2 4 5 5 7 6 6 9 6 5 6 7 7 6 8 7 5 4 8 10 9 10 0 0 10 10 0 0 0 0 10 10 8 4 4 7 5 6 4 5 4 5 4 4 4 4 3 4 3 4

So this array has 307 entries and you can see that there is a lot of repetition. We next apply the repeat codes as appropriate to shorten this array to 139 entries.

HLIT: 279
HDIST: 28
 0 0 0 0 0 0 0 0 0 0 11 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 11 9 11 12 10 10 8 5 9 6 5 5 5 6 6 6 6 6 6 6 0 12 11 11 0 0 10 0 11 13 0 12 0 13 0 11 0 11 11 11 10 9 0 9 10 10 12 0 11 0 0 13 0 0 0 0 10 0 7 10 9 8 7 9 9 10 7 10 11 8 8 7 7 8 12 8 8 7 9 10 10 11 10 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 2 4 5 5 7 6 6 9 6 5 6 7 7 6 8 7 5 4 8 10 9 10 0 0 10 10 0 0 0 0 10 10 8 4 4 7 5 6 4 5 4 5 4 4 4 4 3 4 3 4
NCNT: 139
 17/7 11 0 0 12 18/7 6 17/3 11 9 11 12 10 10 8 5 9 6 5 5 5 6 16/3 0 12 11 11 0 0 10 0 11 13 0 12 0 13 0 11 0 11 11 11 10 9 0 9 10 10 12 0 11 0 0 13 17/1 10 0 7 10 9 8 7 9 9 10 7 10 11 8 8 7 7 8 12 8 8 7 9 10 10 11 10 11 18/122 13 2 4 5 5 7 6 6 9 6 5 6 7 7 6 8 7 5 4 8 10 9 10 0 0 10 10 17/1 10 10 8 4 4 7 5 6 4 5 4 5 4 16/0 3 4 3 4

Here when one of the repeat codes are used I show the value of the extra bits we will insert following the Huffman code for the symbol.

Now we create the Huffman table for this array and determine frequencies. Note that extra bits are inserted later and are not part of the Huffman encoding. Here the maximum code length is 7 bits.

CODE: SELECT ALL

		// Ugh. Now we need yet another Huffman table for this run-length alphabet. First we establish
		//  the frequencies.  Note that we skip the byte defining the extra bits.
		cnttbl = MM_alloc(HUFF_HLEN * sizeof(struct huff_t), &_bufflush);
		for (n = 0; n < ncnt; n++)
		{
			cnttbl[cntlist[n]].cnt++;
			if (cntlist[n] >= 16)
				n++;
		}

		// We need to determine the bit lengths. 
		if (!_bitlength(cnttbl, HUFF_HLEN, 7))
		{
			err = TRUE;
			break;
		}

This results in the following third Huffman table that we will use to code the code lengths which in turn are used to generate the literal and distance tables for the data compression.

cnttbl
 0x000 17 3 000
 0x002 1 6 111100
 0x003 2 6 111101
 0x004 9 4 1000
 0x005 11 3 001
 0x006 9 4 1001
 0x007 11 4 1010
 0x008 10 4 1011
 0x009 10 4 1100
 0x00a 19 3 010
 0x00b 14 3 011
 0x00c 6 4 1101
 0x00d 4 5 11100
 0x010 2 6 111110
 0x011 4 5 11101
 0x012 2 6 111111

Are we there yet? :shock:

Well, we are getting close.

As before we will need to convey the code lengths in this ‘cnttbl’ to the decompressor. We have N = {3 0 6 6 4 3 4 4 4 4 3 3 4 5 0 0 6 5 6}. I mentioned earlier that these are stored each with 3 bits in the bit stream limiting the code length to 7 bits.

But DEFLATE wants to save every possible byte. Each of these code lengths has a probability of being used over some general set of data types. They decided to sequence these into an array i a custom order such that the least likely to be used code lengths fall at the end and can be trimmed. So we get the order from the specification and sequence these. Notice that our three 0 bit code length do in fact get trimmed.

CODE: SELECT ALL

		// Finally (haha) we establish a custom order for these bit lengths
		// array defined above
		//static const char hclen_order[HUFF_HLEN] = {
		//	16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2,
		//	14, 1, 15
		//};
		bitlens = MM_alloc(HUFF_HLEN * sizeof(int), &_bufflush);
		for (n = 0; n < HUFF_HLEN; n++)
			bitlens[n] = cnttbl[hclen_order[n]].len;

		// Now the the end of this array should be 0's so we find a length for the array
		for (hclen = HUFF_HLEN; 0 < hclen; hclen--)
			if (bitlens[hclen - 1] > 0)
				break;
 6 5 6 3 4 4 4 4 3 3 3 4 4 6 5 6 0 0 0
HCLEN: 16

It is hard to believe but we now have everything that we need to actually generate the compressed bit stream. Well, all except a couple of routines to actually do the serialization. So that will be the next step.

We’re going to output to a bit stream. Each write will likely involve a different number of bits. Someplace we have to properly sequence these into bytes to append to the output buffer.

To do this we need one routine to write called _writeb() and one to use at the end to flush any remaining unwritten bits called _flushb().

CODE: SELECT ALL

// Routine to stream bits
static void _writeb(int num, int code, struct bitstream_t *stream)
{
	// if no bits then nothing to do
	if (num == 0)
		return;

	// insert bits into the stream
	code &= ((1 << num) - 1);
	code <<= stream->nbits;
	stream->reg |= code;
	stream->nbits += num;

	// stream completed bytes
	while (stream->nbits >= 8)
	{
		stream->buffer[stream->length++] = (stream->reg & 0xff);
		stream->reg >>= 8;
		stream->nbits -= 8;
	}
}


// Routine flushes the bitstream
static void _flushb(struct bitstream_t *stream)
{
	if (stream->nbits)
	{
		stream->buffer[stream->length++] = (stream->reg & 0xff);
		stream->reg = 0;
		stream->nbits = 0;
	}
}

So to tie this together we use a structure.

// structure to assist with bit streaming
struct bitstream_t {
	char *buffer;
	int length;
	uint32_t reg;
	int nbits;
};

This seems simple enough but there are a couple things to understand. The DEFLATE specification gets into it right off the bat.

The first bit of the bit stream is the least significant bit of the first byte in the buffer. Once 8 bits are retrieved the 9th is the least significant bit of the second byte and so on.

A value appears in the stream starting with it’s least significant bit. That means that the bit order does not need to be reversed to pack the value at the tail of the current bit stream.

Huffman codes are processed a bit at a time. When you are reading a Huffman code you do not know how many bits you will need to retrieve the code for a valid symbol in the alphabet. So in this case you must insert the Huffman code so the most-significant bit is read first. The order of Huffman bits needs to be reversed. Armed with that fact, I have stored these codes in the tables in reverse order. That will be apparent in the following code to generate the tables for coding.

So the last thing we need to do is generate the actual Huffman codes for the three tables.

CODE: SELECT ALL

		// Now we need Huffman codes for these tables because eventually someday we will actually be
		//  generating a compressed bit stream.

		// Next we total the number of symbols using each bit length. These will be used to assign
		//  bit codes for each alphabet symbol.
		litcnts = MM_alloc(16 * sizeof(int), &_bufflush);
		for (n = 0; n < 286; n++)
			if (littbl[n].len > 0)
				litcnts[littbl[n].len]++;

		dstcnts = MM_alloc(16 * sizeof(int), &_bufflush);
		for (n = 0; n < 30; n++)
			if (dsttbl[n].len > 0)
				dstcnts[dsttbl[n].len]++;

		bitcnts = MM_alloc(16 * sizeof(int), &_bufflush);
		for (n = 0; n < HUFF_HLEN; n++)
			if (cnttbl[n].len)
				bitcnts[cnttbl[n].len]++;

		// Now we calculate starting codes for each bit length group. This procedure is defined in the
		//  DEFLATE specification. We can define the Huffman tables in a compressed format provided that
		//  the Huffman tables follow a couple of additional rules. Using these starting codes we
		//  can assing codes for each alphabet symbol. Note that Huffman codes are processed bit-by-bit
		//  and therefore must be generated here in reverse bit order.

		// Define codes for the literal Huffman table
		startcd = MM_alloc(16 * sizeof(int), &_bufflush);
		for (n = 0; n < 15; n++)
			startcd[n + 1] = (startcd[n] + litcnts[n]) << 1;
		for (n = 0; n < 286; n++)
		{
			len = littbl[n].len;
			if (len)
			{
				c = startcd[len]++;
				while (len--)
				{
					littbl[n].code <<= 1;
					littbl[n].code |= (c & 1);
					c >>= 1;
				}
			}
		}

		// Define codes for the distance Huffman table
		for (n = 0; n < 15; n++)
			startcd[n + 1] = (startcd[n] + dstcnts[n]) << 1;
		for (n = 0; n < 30; n++)
		{
			len = dsttbl[n].len;
			if (len)
			{
				c = startcd[len]++;
				while (len--)
				{
					dsttbl[n].code <<= 1;
					dsttbl[n].code |= (c & 1);
					c >>= 1;
				}
			}
		}

		// Define codes for the bit length Huffman table
		for (n = 0; n < 15; n++)
			startcd[n + 1] = (startcd[n] + bitcnts[n]) << 1;
		for (n = 0; n < HUFF_HLEN; n++)
		{
			len = cnttbl[n].len;
			if (len)
			{
				c = startcd[len]++;
				while (len--)
				{
					cnttbl[n].code <<= 1;
					cnttbl[n].code |= (c & 1);
					c >>= 1;
				}
			}
		}

Coming up next: Actually Generating the Bit Stream

The DEFLATE specification greatly oversimplifies the whole process by defining the block format in about a single page.

         We can now define the format of the block:

               5 Bits: HLIT, # of Literal/Length codes - 257 (257 - 286)
               5 Bits: HDIST, # of Distance codes - 1        (1 - 32)
               4 Bits: HCLEN, # of Code Length codes - 4     (4 - 19)

               (HCLEN + 4) x 3 bits: code lengths for the code length
                  alphabet given just above, in the order: 16, 17, 18,
                  0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15

                  These code lengths are interpreted as 3-bit integers
                  (0-7); as above, a code length of 0 means the
                  corresponding symbol (literal/length or distance code
                  length) is not used.

               HLIT + 257 code lengths for the literal/length alphabet,
                  encoded using the code length Huffman code

               HDIST + 1 code lengths for the distance alphabet,
                  encoded using the code length Huffman code

               The actual compressed data of the block,
                  encoded using the literal/length and distance Huffman
                  codes

               The literal/length symbol 256 (end of data),
                  encoded using the literal/length Huffman code

         The code length repeat codes can cross from HLIT + 257 to the
         HDIST + 1 code lengths.  In other words, all code lengths form
         a single sequence of HLIT + HDIST + 258 values.

There are a couple of reasons why the suggestion to teach JANOS how to compress files has been in Redmine for 5 years. That’s about how long ago when in development JANOS began to directly use JAR files for application programming. The obvious reason that it took 5 years to implement is that there really isn’t a huge need for compression in JNIOR. Storage was limited in the previous series and if you had wanted to keep log files around for any serious length of time it would have helped if we could compress them. The Series 4 has much more storage but still not a lot by today’s standards.

The real reason you may now realize. It is a bit involved. Given that JANOS uses no third party developed code and that I had not been able to find usable reference materials (search engines having been damaged by marketing greed).There were some hurdles that left this suggestion sit on the list at a low priority for practically ever. Well, we’ve got it done now.

Let’s generate the bit stream and be done with this tome.

First we stream the Block Header. The very first bit indicates whether or not the block is the last for the compression. For the last block this BFINAL bit will be a 1. It is 0 otherwise. We also are using the dynamic Huffman table type. The next two bits indicate the block type BTYPE. Here we use 10b as we will be providing our own Huffman tables. Yeah, we could have taken the easy path and used a fixed set of tables (BTYPE 01b) but that is no fun.

CODE: SELECT ALL

		// Now we have codes and everything that we need (Finally!) to generate the compressed bit
		//  stream.

		// set BFINAL and type BTYPE of 10
		_writeb(1, final ? 1 : 0, stream);
		_writeb(2, 0x02, stream);

We have already determined the sizes of our alphabets. We have HLIT defining the size of the literal alphabet that includes codes for sequence lengths. The complete alphabet space covers the range 0..285 but since we are not likely to use them all HLIT defines how many actually are used (0..HLIT). We also have HDIST playing a similar role for the distance alphabet which could range 0..29.

Those two parameters are supplied next. Here each values is conveyed in 5 bits. We can do that since we know that HDIST has to be at least 257 since we do have to use the end-of-block code of 256 in our alphabet. So we supply really the count to length codes used which is similar in magnitude to teh count of distance codes. So HLIT is supplied as HLIT – 257 and HDIST as HDIST – 1.

CODE: SELECT ALL

		// Now we output HLIT, HDIST, and HCLEN
		_writeb(5, hlit - 257, stream);
		_writeb(5, hdist - 1, stream);
		_writeb(4, hclen - 4, stream);

		// Output the HCLEN counts from the bit length array. Note that we have already ordered it
		//  as required.
		for (n = 0; n < hclen; n++)
			_writeb(3, bitlens[n], stream);

Following the delivery of HLIT and HDIST we supply HCLEN as HCLEN – 4. Recall that HCLEN defines the size of alphabet we are going to use to encode the array of code lengths for the main two Huffman tables. HCLEN defines the the number of elements we are going to supply for that 19 element array that has the custom order. The balance beyond HCLEN elements is assumed to be 0. This is the array whose elements are supplied with 3 bits and thus limited to the range 0..7. We include HCLEN elements simply as shown above.

Next we stream the concatenated array of code lengths. This array has HLIT + HDIST elements that have been compressed using repeat codes. Here we output the compressed array using the 19 symbol alphabet that we just defined for the decompressor by sending the array above. We encounter the extra bits usage for the first time. You can see in the following that after streaming the repeat codes 16, 17 or 18 we insert the field specifying the repeat count using the required number of extra bits.

CODE: SELECT ALL

		// Output the run-length compressed HLIT + HDIST code lengths using the code length
		//  Huffman codes. The two tables are blended together in the run-length encoding.
		//  Note that we need to insert extra bits where appropriate.
		for (n = 0; n < ncnt; n++)
		{
			c = cntlist[n];
			_writeb(cnttbl[c].len, cnttbl[c].code, stream);

			if (c >= 16)
			{
				switch (c)
				{
				case 16:
					_writeb(2, cntlist[++n], stream);
					break;
				case 17:
					_writeb(3, cntlist[++n], stream);
					break;
				case 18:
					_writeb(7, cntlist[++n], stream);
					break;
				}
			}
		}

If you are the decompressor at this point you have everything you need to develop the Huffman tables for literals and distance codes. You had to expand thee repeat compression and separate the two code length sets. You then tallied code length counts and calculated starting codes for each bit length. Finally you assigned codes to each used alphabet symbol. So we are ready to deal with the actual compressed data.

When we initially scanned the LZ77 compressed data to determine frequencies we had to temporarily expand the length-distance references to determine which codes from the length and distance alphabets we would be using. Well the following loop is practically identical because we again process the LZ77 compressed data expanding the length-distance references. This time we will output the codes to the bit stream. Here again we have the extra bits to insert. Where we ignored them before, now we insert them into the stream when necessary.

CODE: SELECT ALL

		// Unbelievable! We can actually now output the compressed data for the block! This is
		//  encoded using the literal and distance Huffman tables as required. Note we need to
		//  again process the escaping and length-distance codes.
		for (n = 0; n < osize; n++)
		{
			c = obuf[n];
			if (c != 0xff)
				_writeb(littbl[c].len, littbl[c].code, stream);
			else
			{
				// check and tally escaped 0xff
				if (obuf[++n] == 0xff)
					_writeb(littbl[0xff].len, littbl[0xff].code, stream);
				else
				{
					// table defined above
					//static const int lcode_ofs[29] = {
					//	3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17, 19, 23, 27, 31,
					//	35, 43, 51, 59, 67, 83, 99, 115, 131, 163, 195, 227, 258
					//};

					// determine required length code for lengths (3..258). This code is
					//  coded in the literal table.
					len = (obuf[n++] & 0xff) + 3;
					for (c = 0; c < 29; c++)
						if (lcode_ofs[c] > len)
							break;
					code = 256 + c;
					_writeb(littbl[code].len, littbl[code].code, stream);

					// insert extra bits as required by the code
					// table defined above
					//static const int lcode_bits[29] = {
					//	0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3,
					//	3, 4, 4, 4, 4, 5, 5, 5, 5, 0
					//};
					c = lcode_bits[code - 257];
					if (c)
						_writeb(c, len - lcode_ofs[code - 257], stream);

					// table define above
					//static const int dcode_ofs[30] = {
					//	1, 2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193,
					//	257, 385, 513, 769, 1025, 1537, 2049, 3073, 4097, 6145,
					//	8193, 12289, 16385, 24577
					//};

					// determine required distance code for distances (1..32768). This code is
					//  coded in the distance table.
					dst = (obuf[n++] & 0xff) << 8;
					dst |= (obuf[n] & 0xff);
					for (c = 0; c < 30; c++)
						if (dcode_ofs[c] > dst)
							break;
					code = c - 1;
					_writeb(dsttbl[code].len, dsttbl[code].code, stream);

					// insert extra bits as required by the code
					// table defined above
					//static const int dcode_bits[30] = {
					//	0, 0, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8,
					//	9, 9, 10, 10, 11, 11, 12, 12, 13, 13
					//};
					c = dcode_bits[code];
					if (c)
						_writeb(c, dst - dcode_ofs[code], stream);
				}
			}
		}

And finally we output that end-of-block code (256). If BFINAL were set and this were the last block in the compression (LZ77 flushing the last partial buffer of data) we would flush the serial buffer. This provides a final byte with the remaining bits for the stream.

CODE: SELECT ALL

			// And finally the end-of-block code and flush
			_writeb(littbl[256].len, littbl[256].code, stream);
			if (final)
				_flushb(stream);

At this point the buffer supplied by the LZ77 compression has been flushed to the bit stream. We would reset that buffer pointer and return. If there is more data the LZ77 compressor will then proceed with it. Remember that I had defined this buffer to be 64KB so there will likely be more in the case of larger files.

This DEFLATE compression capability will be part of JANOS v1.6.4 and later OS. I am likely to do some optimization before release. Here we see that it works. I’ll create an archive containing the files in my JNIOR’s root folder.

bruce_dev /> zip -c test.zip /
 6 files saved
bruce_dev /> 

bruce_dev /> zip test.zip
     Size   Packed          CRC32        Modified
    56096    10764   81%  0b779605  Jan 31 2018 08:42  jniorsys.log
    40302     6249   84%  b1dffc05  Jan 26 2018 07:38  web.log
    22434     9499   58%  059a09d9  Jan 25 2018 14:53  manifest.json
       89       89    0%  f97bbba2  Jan 28 2018 09:11  access.log
      990      460   54%  9699bbfe  Jan 30 2018 14:48  jniorboot.log.bak
      990      461   53%  8f3c0390  Jan 31 2018 08:42  jniorboot.log
 6 files listed
bruce_dev />

This archive verifies. This verification does decompress each file and confirm the CRC32.

bruce_dev /> zip -vl test.zip
  verifying: jniorsys.log (compressed)
  verifying: web.log (compressed)
  verifying: manifest.json (compressed)
  verifying: access.log
  verifying: jniorboot.log.bak (compressed)
  verifying: jniorboot.log (compressed)
 6 entries found - content verifies!
bruce_dev />

We can repeat the construction in verbose output mode. Here we see timing. Again, keep in mind that JANOS is running on a 100 MHz Renesas RX63N micro-controller.

bruce_dev /> zip -cl test.zip /
  deflate: /jniorsys.log (56096 bytes)
   saving: jniorsys.log (compressed 80.8%) 1.176 secs
  deflate: /web.log (40302 bytes)
   saving: web.log (compressed 84.5%) 0.671 secs
  deflate: /manifest.json (22434 bytes)
   saving: manifest.json (compressed 57.7%) 1.863 secs
   saving: access.log (stored) 0.011 secs
  deflate: /jniorboot.log.bak (990 bytes)
   saving: jniorboot.log.bak (compressed 53.5%) 0.054 secs
  deflate: /jniorboot.log (990 bytes)
   saving: jniorboot.log (compressed 53.4%) 0.054 secs
 6 files saved
bruce_dev />

The /etc/JanosClasses.jar file compresses and this file is where I first encountered Huffman tables whose bit depth (code length) exceeded the DEFLATE maximums (15 bit for literal and distance tables, 7 bit for code length encoding).

bruce_dev /> zip -cl test.zip /etc
  deflate: /etc/JanosClasses.jar (266601 bytes)
   saving: etc/JanosClasses.jar (compressed 11.2%) 24.215 secs
 1 files saved
bruce_dev /> 

bruce_dev /> zip test.zip
     Size   Packed          CRC32        Modified
   266601   236758   11%  20916587  Jan 11 2018 09:58  etc/JanosClasses.jar
 1 files listed
bruce_dev /> 

bruce_dev /> zip -vl test.zip
  verifying: etc/JanosClasses.jar (compressed)
 1 entries found - content verifies!
bruce_dev />

I know that 24 seconds for 1/4 megabyte file is nothing to write home about. Now that things are fully functional I can go back and work on the LZ77 where basically all of the time is consumed. I can certainly improve on this performance but as is is, it isn’t that bad. The JNIOR is a controller after all and you likely wouldn’t need to compress other archives.

In order to breath life into a being you must get its heart beating. For a generic operating system the heart is it’s multitasking capability. In developing JANOS one of the very first modules that was created was the one responsible for coordinating the execution of multiple independent processes.

Practically every microprocessor program that has been written is responsible for, and therefore juggles, more than one task. Even those that seem to be so specific so as to be focused on one single task end up responsible for two or more sub-tasks. That doesn’t imply though that every program need implement sophisticated multitasking algorithms. In most cases a simple loop need be employed to cycle through the various tasks at hand.

With a focused application a control loop is often sufficient. In this case task A, B, C, etc. are executed in sequence if they are needed. Once all tasks have been given the chance to run the loop is repeated. Depending on the needs of the application a loop may repeat as often as possible or it may have the luxury of sleeping. The latter being helpful in reducing power requirements.

The rate at which a control loop repeats is variable and dependent on the number of tasks that needs to execute in each cycle and the amount of time each task takes to complete. If there is one task that is time sensitive there can be issues. If the total time required by the other tasks that need to run delays the loop too long, the time critical task may fail. The programmer has to take this into consideration. This can be a real problem if any task becomes dependent upon an external event and must wait for an indeterminate amount of time.

When a task has to wait it is prudent to move on and let other activities be performed in the meantime. We can employ a state machine. Here we break the current task down into a set of sub-tasks where a variable is used to keep track of the current sub-task. If a task has to wait it exits allowing other tasks in the main control loop a chance to run. When the main loop returns to the task at hand, the state machine takes us back to the point where we left off. If we still have to wait we exit again otherwise we handle the next step in the current task.

With fairly complex applications, the main control loop can become complex. The programmer may need to implement some kind of priority scheme. This would increase the time allotted for a set of tasks and force others to take a backseat. Then if multiple tasks require state machine implementations the overall dynamics start to become fuzzy. The programmer suddenly finds situations where the system may behave strangely leading to tweaks and eventually a feeling of careful balance. It works but you are now cautious not to add much more to it. Not without emphasizing testing.

Consider a generic operating system such as JANOS. Here there is a spectrum of system tasks that may be optionally running. Then there can also be multiple programs developed by others (of varying skill levels) running in the system and requiring CPU time. The resulting set of tasks can be so unpredictable that a simple tasking loop is doomed to failure.

A program may need to wait on a communications channel and the third-party programmer does not realize the need to return to allow other tasks to run. At a minimum this slows the system down and suddenly the task handling network traffic might start to cause timeouts and protocol failures. You just cannot predict this let alone control it with the simple loop approach.

So rather than to even try, JANOS employed a preemptive approach right from the start. Now every task becomes a process. Each process executes within its own context. Every context is isolated from other contexts creating independent processes. A timeslice is defined. For JANOS a timeslice is 32 milliseconds. JANOS allows each process to run for up to one timeslice before forcibly moving on to another. A process can relinquish its timeslice when it is waiting on something but it cannot monopolize the CPU.

In coding JANOS the main.c program was the first created. It performs some initial configuration and starts the Process Engine. From that moment on the system became capable of executing more than one process. The main.c procedure then dubbed itself the “Startup Process”. Once it completes its required steps it renames itself as the “Idle Process” and falls into the tight wait loop. From that point forward it becomes solely responsible for collecting unused CPU time. This being the equivalent of twiddling your thumbs.

Before becoming the Idle Process mode the Startup Process initializes the Network Service. This process runs at all times and with priority to insure that all network traffic is handled properly and in a timely fashion. The Startup Process also starts the System process. The System process consists of a simple control loop executing low-priority background tasks such as logging, memory cleanup and event notification. These are the three low-level processes in the OS that run always and cannot be stopped.

Here some of the complexity becomes apparent. In addition to the three base processes that I have described we see some others. The jaccess.jar program is running. This is an application that the HoneyPot uses to create the Mirai Botnet Map web page that it serves to the public. While the Web Server process is required for you to access those pages it is likely running here because I have the DCP open in my browser. In fact the Console process there is mine as well. This results from a session in the Console tab of the DCP and that is where I executed the PS command you see above.

Finally there is a Command process running for some other random IP address. That indicates that there is a Telnet session open. On the HoneyPot those are generally from a Mirai infected host trying (unsuccessfully) to log into this JNIOR. Under JANOS there can be as many as 16 separate processes (including the Idle Process) running at one time.

So what is in the Process Engine that creates this preemptive multitasking environment? It may or may not seem complex to you. I have managed to create a preemptive environment on a variety of processors including the Z80, HP’s early entry into the minicomputer market, CP/M running on early Intel processors and many others. Notably I added this to the Dallas DS80C400 which created the multitasking capability found in the Series 3 JNIOR. The TINI OS that forms the basis of the Series 3 OS is not originally capable of multitasking. JANOS runs preemptive multitasking quite successfully on the Renesas RX63N.

To create a preemptive multitasking environment you need a few basic things. The first is a good 1 millisecond interrupt forming a process tick. You also need a way to switch program context with a minimum of overhead. And finally there needs to be a good atomic means to create a mutex (mutual exclusion object). That being a synchronization technique to control access to system resources when multiple processes are contending for them.

The main.c program during the initialization of JANOS sets up one of the timer registers in the RX63N to divide down the system clock and generate an interrupt every millisecond. The Process Engine first uses this to tally milliseconds and thereby track uptime and to create a real-time software clock. Then on every 32nd tick the Process Engine moves forward to see if another process is pending execution. In that case the context is changed and the next process is allowed to run.

The RX63N uses two stack areas. There is a User Stack which programs normally use to PUSH and POP data. The C Subroutine entry implementation also allocates local variables on that stack. For software to be re-entrant and therefore functional in such a multitasking environment it must store all of its variables on the stack. Now that can include pointers where each instance of that subroutine might allocate its own memory blocks and save the pointers. Later the route needs to release those blocks to avoid memory leaks. The point being that the “context” is really the stack content.

In addition to the User Stack the RX63N also uses an Interrupt Stack. When an interrupt occurs, for instance when the internal timer register is ready to signal 1 millisecond, The program pointer and certain critical resisters are save on the Interrupt Stack before running the interrupt service routine.

So for JANOS to change processes it simply saves the User and Interrupt stack pointers in the Process Table for the current process and then locates the next process that is ready to execute. The User and Interrupt stack pointers are updated from the Process Table for that process and we return from the tick interrupt. Ahh, but now we are not returned to where were just interrupted but to where the new task was interrupted before and so it continues. The originally interrupted process then waits for its turn to run again.

To change processes and context we swap two stack pointers. This is an extremely efficient (time wise) and low-overhead task swap. We do have to push all of the other registers onto the stack before swapping to insure and complete context swap. It happens very quickly. JANOS jumps cleanly from one process to the next without any process realizing it had been set aside temporarily.

This approach requires that we pre-assign 16 independent User Stacks and 16 independent Interrupt Stacks in the Process Table. Fortunately the RX63N has sufficient internal high speed RAM. The Process Engine initialization called by main.c gets that done before enabling the 1 millisecond tick.

Now to create a new process a PROC_create() method is provided that defines the starting point for the new process and passes a couple of useful parameters. The Web Server for instance starts at an entry point named HTTP_process(). An new entry is added to the Process Table for the new process. The name of the new process is one of the parameters supplied. The User and Interrupt stacks for the new process are initialized with content and initial values as would be expected when a return from the 1 millisecond interrupt is executed. Once everything is in order a flag for the new process is set allowing it to execute. We return from the creation routine to do other things but when our timeslice is up, the new process also proceeds from the point we defined.

If a program runs non-stop for its 32 millisecond timeslice it will be preempted. It then waits until it can continue after other tasks get their time in the sun. If a process needs to wait it can release its timeslice by calling PROC_yield(). This call generates a software interrupt that then performs the process swap and starts a new timeslice. The yielding process will continue once others have had a chance.

Often we know about how long we will need to wait. In this case we can release our timeslice for a certain number of milliseconds by calling PROC_sleep(). Here we not only allow our process to swap but also set a timer telling the Process Engine that we won’t be ready to execute for a while.

Along these lines to keep options open and flexible there is also a PROC_delayed() call. This sets the wait time so our process won’t run again until the specified time (uptime as opposed to RTC time). Add to this a PROC_pause() method that stops a process until another process specifically allows it to run again.

Finally there is a PROC_terminate() method which, like it says, terminates the process. This removes the process from the Process Table .

A process may be preempted at any time. This is likely to be in the middle of the assembly language executing a C statement. There are times when critical system structures are being read and updated. An interruption an an inopportune moment can invalidate an update and cause system issues. This is handled through the implementation of a CRITICAL SECTION.

A process can execute an ENTER_CRITICALSECTION macro. This establishes a section of code that cannot be swapped. Once the critical system structures are updated the EXIT_CRITICALSECTION macro restores normal swapping. While this seems to violate our statement earlier that the CPU cannot be monopolized, the Critical Section functionality is only available for use in JANOS system code and not by external application programs. It is not available to third party programmers. Those programs cannot monopolize the system. At least not so easily.

Critical Sections can be nested. That is just handled as a counter which is incremented upon section entry and decremented on exit. As a result there are very few situations where system interrupts must be disabled to prevent issues. This keeps the interrupt processing system ready at all times to service interrupts. Interrupts are not missed.

As a result of all this, the Web Server for example can be written without concern for anything else that may be running. It can also execute processes of its own. An example here is the PHP Scripting Engine which is started by the Web Server. This allows the Web Server to service other requests while a PHP program is compiled and then executed. Such complex interactions occur very smoothly and cleanly.

When a tick generates an interrupt only some registers are preserved on the stack. JANOS uses some inline assembly to also push R6 thru R13 and FP register. The two stack pointers are then passed to the scheduler which may or may not update them depending on what else need to run.

CODE: SELECT ALL

/* -- inline asm -------------------------------------------------------------
** inline_call_tick()
**
** Saves processor state in a consistent manner and manages the stack 
** pointers independent of C optimization effects.
** -------------------------------------------------------------------------*/
#pragma inline_asm inline_call_tick
static void inline_call_tick(void)
{
	PUSHM	R6-R13		; save balance of registers
	PUSHC	FPSW		; save FP register

	MVFC	USP,R1		; snapshot both stack pointers
	MVFC	ISP,R2
	BSR.W	_tick_proc	; perform task processing

	MVTC	R2,ISP		; update both stack pointers
	MVTC	R1,USP

	POPC	FPSW		; restore
	POPM	R6-R13
}

These are the additional registers that I mention needed to be saved.

Once the Process Engine became available the main.c proceeded to start Network and System processes as I have mentioned. Eventually in development of JANOS the Network process has to start receiving and then sending packets over the Ethernet LAN. That makes sense but we are kind of blind. That makes it difficult to develop and debug.

So the next process that we created was designed to handle command line interaction through the serial RS-232 (COM) port. The main.cprogram had been outputting notification strings on this port as the various events were occurring but we couldn’t interact with it. Just prior to becoming the Idle process the main.c program outputs a “Hit any key to login” message. We are not going to start a process for command line interaction until we know that there is someone there to use it.

This established the need for Polling Routines. A polling routine is a routine that can optionally be associated with a process class. It is called once every millisecond tick. Obviously these routines cannot do anything lengthy or they would significantly add to the overhead in the process engine. We would use this to detect an “any key” condition on the port and to then start the command line process.

So a structure for process design had started to evolve. The main.c program calls an initialization routine for each subsystem. For the command line process it calls CMD_initialize(). This initialization routine would indicate that it is “Starting command services” and issue the “Hit any key” message. Eventually when we got the network up and running it would start a Telnet server. The CMD_initialize() routine registers the CMD_polling() routine before returning. That polling routine merely checks the status of the serial port buffer and if a character is pending (the ANY key) and the command line process is not running, the CMD_process() is started.

Eventually we will use polling routines to check and handle timeouts and watchdogs among other tasks. It is a key aspect of the process engine and perhaps even a necessity.

Probably the most basic function that the JNIOR can perform is to close and open one of its relays. This is one of the ways the JNIOR controls the world around it. The Model 410 has 8 relays; The Model 412 has 12 relays; And, the Model 414 has just 4 relays. The number of inputs varies as 8, 4 and 12 respectively.

These are small signal relays. Early production JNIORs (with Omron relays) are limited to 1A. We currently use relays that can handle 2A. Voltages as high as 120VAC or more may be used. The limiting factor though is the current and the inductance of the load. If you use one of the relay outputs to drive something like a larger relay’s coil you should make sure there is a flywheel diode for Back EMF Suppression in the circuit. Otherwise the spark that would occur across the contains will seriously limit the life of the relay. A MOV device would be even better.

The Power 4ROUT external module that we have available uses 10A relays and has built-in MOV devices across each contact pair. The part is Panasonic ERZ-V07D431 VARISTOR 430V 1.75KA DISC 7MM. These can obtained from DigiKey among other places. If you are driving an inductive load you can purchase these or similar devices and place one across the relay outputs to extend the relay life.

ERZ-V07D431 VARISTOR

There are a number of good pages on the Internet that address this concern. You can search on “Back EMF Suppression” for more information.

Each Series 4 has at least two relays that can be modified by internal jumper to function Normally Closed (NC). The Model 412 has 4 that can be so configured.

Right out of the box you can control the relays. Once you have connected the JNIOR to your network and set the IP addressing you can point your browser at the unit and open the DCP. Log in as the administrator. The default username is ‘jnior’ and the password is ‘jnior’.

DCP Default Page

Here you can click on a “Toggle” button to turn on (close) and off (open) each relay. In the configuration section you can change the control to “Pulse” as I have done here for Relay Output 8. You can define the pulse duration in milliseconds. In fact, through configuration settings you can alter many aspects of this screen including the labels. The display can then reflect your application.

A “guest” login will only be able to view the status. The control buttons are not displayed.

So if you allow access to the JNIOR through your router you can control relays from anywhere in the world. It is highly recommended of course that you change the administrator’s password. I would also disable the “admin”, “user” and “guest” accounts if you do not need them. Use the USERS command in the Command Line Console to see the status of accounts. If you open your JNIOR up to the world you need to make sure that it is secure.

Command Line Relay Control

You can control the relays from the Command Line Console using the JRMON command. The Command Line Console is available as a tab in the DCP (see image in prior post), through a Telnet (Port 23) connection, or through connection to the Serial RS-232 (COM) port at the bottom next to the LAN port on the JNIOR.

The JRMON command will start an ASCII based real-time I/O status and command console. If you wish to control you need to use the JRMON -C option to enable command input. Here a “C1” closes Relay Output 1, “O1” opens that relay, and “Q” exits the mode.

JRMON Control

The 0’s and 1’s reflect the real-time status of inputs and outputs. In this case we have a Model 410 with 8 inputs (all off) and 8 relays (all off to start). Here additional relays will appear if you have a Model 412 or add 1 or more external 4ROUT modules. The number of inputs also follows with model with up to 12 inputs on a Model 414.

You can execute a JRMON -X command at the Command Line and not invoke the real-time display. The following command closes Relay Output 2 and opens Relay Output 1.

jrmon -x c2o1

The JRMON is useful in debugging Digital I/O using the serial RS-232 (COM) port and through a Telnet connection. If you have network access though perhaps the DCP is more convenient as the Graphical User Interface is nicer. You would likely not need to use the JRMON command through the DCP Console tab.

JNIOR Protocol

The JNIOR Protocol is deprecated and not recommended for new development. It is still supported however. For the Series 3 this was the protocol to use for remotely controlling the JNIOR. It is a binary protocol and very efficient. Experienced programmers can develop drivers to handle the JNIOR Protocol. INTEG supplies an SDK and API allowing PC applications to utilize the JNIOR Protocol. The Series 4 provides a more modern approach through a built-in Websockets interface. It’s use is preferred.

I have attached the current JNIOR Protocol specification. Relays can be controlled through this protocol using the Command Message (Message Type 10). The status of the JNIOR is obtained through the Monitor Message (Message Type 1). The JNIOR Protocol is not strictly Master-Slave. While there are commands and responses, there are unsolicited messages. The Monitor Message is supplied by exception whenever I/O status changes. You do not need to poll for status as you would with MODBUS for instance. You would just wait and consume Monitor Messages when supplied. The Websockets interface is similar.

JNIOR Protocol

Websockets Interface

With the Series 3 the configuration pages reached using the browser were required to execute Java Applets. The applet is the only way of making the JNIOR Protocol connection (Port 9200) for web use. Today both the applet and the JNIOR binary protocol are outdated.

The Series 4 Dynamic Configuration Pages (DCP) are also accessed through the browser. These use modern Dynamic HTML and JavaScript. This communicates with the JNIOR through a Websockets connection which is supported by all browsers over the same HTTP or HTTPS connections. The Series 4 built-in Websockets interface uses JSON messaging and provides for a wider set of capabilities. The DCP can manage files on the JNIOR, offer Command Line Console access, and monitor real-time Syslog status. There is much more. A PC Application can make a Websockets connection as well.

Both the Websockets Interface and the JNIOR Protocol can be used to remotely control relay outputs and maintain status of JNIOR I/O.

The current Websockets documentation is a Wiki in our Redmine system at INTEG. I have generated a PDF copy of that document and attached it here. We will help you make use of this protocol.

Builtin_Websocket_Access

Program Control

There is one more way to control relays. That is through an application program running on the JNIOR. Application programs are written in Java and can be generated by Netbeans or any Java IDE and compiler. The Series 4 operating system (JANOS) can execute Java JAR files directly. You write your program using the JNIOR runtime libraries, load it onto the JNIOR and execute it. Applications can be set to start at boot.

You can do just about anything with an application program on the JNIOR. Manipulating the JNIOR I/O is obviously one of the most important.

For example the following program when executed closes Relay Output 1.

package jtest;
 
import com.integpg.system.JANOS;
 
public class Main {
    
    public static void main(String[] args) throws Exception {
        
        JANOS.setOutputRelay(0, true);
            
    }
}
    

Relay Test Program

In this case I created a Project in Netbeans. I referenced the JanosClasses.jar runtime libraries (available in the JNIOR’s /etc folder). The program must be built to run in the JNIOR environment. It’s close to but not the same as a standard Java Runtime environment for the PC. It has additional classes like the JANOS class that provide for JNIOR functionality.

The program is compiled and the JAR copied from the dist folder into the /flash folder on the JNIOR. I just drag and dropped it into the proper folder using the Folders tab of the DCP.

Then from the command line just type the program name “jtest” to execute. It takes a second for the JVM to load and start the program. The relay then closes.

By the way, I like to include the “throws Exception” or “throws Throwable” in my main method when I am first developing a program. Later I will remove it and then be forced by the compiler to place necessary try-catch clauses to get error handling as it needs to be. Uncaught exceptions are nicely reported by the JNIOR.

Other Control Methods

There are still other ways to control JNIOR I/O. For one you can enable the MODBUS protocol. The JNIOR can be configured as either a MODBUS Server or MODBUS Client. Addressing is defined for the JNIOR I/O. There are a number of standard applications that INTEG supplies. The CINEMA application for instance allows you to define macros that are triggered by certain events and those can manipulate the I/O.

Did I miss any? Maybe you know of another?

Welcome to my world…

JANOS Core:

Pictured is the core processing section of the first 412DMX hand-assembled prototype. This is typical of a core for the Series 4 JNIOR.

So there is hardware but it ain’t worth the FR4 it sits upon without Firmware.

No one today would argue that statement unless, of course, they just like to argue. But 50 years ago (yes its about 50 years!) few would understand.

That Renesas RX63NE micro-controller’s internal high speed Flash initially is blank. This board would do nothing until we put something in there. I will tell what we put in there and how it came to be in this forum section.

So we see here that the micro-controller has a basic RS-232 serial port (COM Diagnostic Port); It has access to the Ethernet LAN through an external PHY chip; And, there is a variety of external memory components.

One of the key aspects or the hardware that the Operating System must determine is the configuration. The 412DMX pictured above supports 128MB of NAND Flash. The JNIORs in production at this point use 32MB Serial Flash. So this NAND is being introduced here. It provides for lower cost, larger storage capacity, faster access, and more flexibility. It may find its way back into the entire series 4 line when revisions come up.

Every JNIOR has that battery backed-up low power SRAM. In production today we supply the 2MB chip. Some early JNIORs were built with 4MB chips. This is non-volatile memory. It contains the root of the file system (everything outside of the /flash and /temp folders). It maintains the Registry as the Series 4 does not reload from the /flash/jnior.ini file on boot. And, it holds other immutable memory blocks as might be needed for the system and its applications. All of that being the choice of the Operating System and we will discuss it a bit later.

The 64MB DRAM is high-speed volatile memory that serves as the memory heap for the system. memory here is allocated, managed, freed, cleaned-up and formatted at the pleasure of the Operating System. JANOS has mastered the art of garbage collection making Java applications almost real-time. We will get in to all of that as well.

Early JNIOR Series 4 and the Series 3 versions did not utilize DRAM. These depended entirely on the SRAM for non-volatile storage and memory heap. Life back then was limited. The DRAM offered much greater capacity and lower cost. It also provided for faster access.

So that is the lay of the land where JANOS lives. Shortly we will get to know JANOS at an unprecedented level.

By the way, and as an aside, JANOS is only available in the Series 4 JNIOR at the moment. It will appear in other INTEG products as time goes on. It will also make itself at home in other other processors (ARM for instance). In fact, it may even be available to anchor the operation of products developed by other companies. It was conceived of and born just a short 5 years ago. Only time will tell. Now, at least, it has us spoiled.

With hardware just waiting to be programmed we encountered the first hurdle.

Project Name?

The first thing the IDE wanted was a name for the project. This gave me pause since it would be the code base for the new operating system. So what do we call this OS?

Knowing that we could change the name at any point we decided to get creative. I came up with JANOS which is an acronym standing for (J)nior (A)utomation (N)etwork (O)perating (S)ystem. The ‘J’ being a bit questionable as it could stand for (J)ava like that in JNIOR. Or it could just be because we like starting names with ‘J’.

But JANOS started to make sense when we learned that Janus was an ancient Roman figure, spirit or god that guarded doorways and archways. Depicted with his face pointing in both directions he was the god that dealt with all things coming and going, past and future, beginning and ending. This could loosely be considered to be the controller or god of I/O, right? Bingo!

So unless something better were to come up everyone seemed cool with the name and we quickly made it over our first real hurdle. A quick scan of the Internet led us to believe that “JANOS” would be relatively unencumbered for this use. The first C files were cast with a Copyright header referring to the code as JANOS. The earliest inception date seems to be the 15th of December 2011. That would be for main.c.

Processor Selection

I suppose that I could get into the reasoning behind the selection of the Renesas RX600 processor. Initially we developed using the RX62N and then moved to the RX63N when that became available.

To start I would have to say that there certainly wasn’t an exhaustive search and very few processors were considered for JNIOR. Basically it was between some flavor of ARM and one or two micro-controllers that we had briefly looked into some years earlier. Luckily the local sales support for Renesas stepped in to tell us about the RX600 series. We had been inquiring about an earlier product. I had just wanted to pick something with the proper I/O mix that can handle the job and to pick it quickly.

Processors these days are quite complex. Most of the techniques employed in their design are now common place but 50 years ago things were somewhat simpler. Pipelining is an example. We want code to execute efficiently and quickly. Speed being somewhat of concern as we were growing tired of the performance (or lack of) in the Series 3 JNIORs. Most all processors these days employ a form of pipelining wherein multiple instructions at various stages are being processed simultaneously. This manages memory access and calculations making optimum use of each and every clock cycle.

I had been figuring on around a 100MHz system clock. That by itself isn’t fast given that PCs and smart phones all run an order of magnitude faster. But the JNIOR is just a controller and cost is a consideration. I also had wanted to avoid some of the concerns that come with high frequency circuit design.

At the same time I was concerned about code size. The operating system would surely be quite large and the ARM appeared to have a fixed 32-bit instruction size. That means that to simply increment a register that instruction would consume 4 bytes of program memory. I will have to admit that I have no experience with ARM at this point. I figure that migrating JANOS to it is inevitable. But I was attracted to the RX line as it boasted both fast execution and compact code size. Having variable length instructions and instructions that combine operations makes the pipeline design difficult if not nearly impossible. But the RX has apparently gotten it done.

Beyond the I/O pin count of the micro-controller and the included set of peripherals, I had also wanted a good amount of high-speed internal Flash for program storage. The processor also needed a good amount of high speed internal RAM for stacks. There also needed to be some form of internal EEPROM or erasable Data Flash for critical configuration settings. I had hoped to be less dependent upon the external memory devices which necessitate longer access timing. The RX62N seemed to cover my needs.

So a quick decision was made and I proceeded to pick up an RX62N evaluation kit. That was the sum and total of our (not so) elaborate processor selection process.

Considerations for Coding

Of concern is the IDE supplied for the RX line. Looking back I can’t give Renesas good grades on this front. Their home grown IDE was full of rookie GUI bugs among other things. They quickly moved to an Eclipse based IDE which caused a bit of a headache at one point. The IDE gets the job done but there are soooo many configuration settings that even creating a branch in code for development is a challenge. It works though.

But for operating system development you absolutely need good access to assembly code. Unlike 50 years ago the IDE now has you developing directly in C and hides any assembly aspect by supplying hidden initialization files and built-in libraries. That is all good if you are writing a simple single-threaded program but I was not. So to start I took the usual “Hello World” sample and had to dig into it to find all of the hidden code. We were not going to develop JANOS dependent on any one IDE or processor. So I needed access to each and every byte generated and incorporated in the resulting object files.

Once the startup code was located and I proved that I could pull that out of hiding, I had to deal with libraries. The IDE supplies its own “Standard C Libraries”. Again that is great for simple program development. But here again taking JANOS to another platform would mean different library code and therefore uncertainty in result. I would also be developing a preemptive multi-tasking environment and C libraries that are not guaranteed to be reentrant or that use even a single static variable spell disaster.

So JANOS required its own C Library code. Trying to prevent the IDE from supplying its own library code proved to be an issue. I managed to find the right balance that kept the built-in libraries out of the way. Even today it remains a concern. I’ve had to utilize uniquely named headers supplying standard named functions (i.e. pintf, etc.). It has been a challenge where the IDE should just have let me tell it that I did not want any libraries.

Even at the instruction level there are macros that need to be considered. If you add two 64-bit variables on a 32-bit processor there is no single instruction available. The IDE inserts a block of code that performs the operation. So that sum becomes several instructions the code for which is IDE or manufacturer specific. This you cannot prevent and it is a necessity but I needed the source code for those macros. Again this is to insure proper operation of the preemptive tasking system and to increase confidence in moving to other platforms. None of these seemingly personal requirements of mine seemed to be received by Renesas with any evidence of understanding. Simply put, reverse engineering is also a necessity.

I also needed to understand the approach the Renesas compiler uses for function calls. What processor registers are used and how stack space allocated and released. In addition I needed to understand the precise response to interrupts. This is key in creating a pure multi-tasking environment. Fortunately those topics are covered (briefly) in the Renesas compiler documentation. This does present a hurdle say if I want to use the IAR RX compiler at any point.

Obviously JANOS exists and I managed to find some level of comfort and control over the development environment.

If you are setting out to write an Operating System on par with Linux or JANOS you don’t want to do this. This is akin to standing at the edge of a cliff and looking straight down. If you realize how much work you have ahead of you, you will likely lose confidence in yourself. Then again, if you are staffing up to get the job done then understanding what needs to be done is critical.

Here I am going to list the sections of code that had to be developed to get JANOS to where it is. I would bet that this list will be missing some of it but here goes. This will also be in rough chronological order. At least as to when coding for the section was started.

Startup – Basically the main.c for the OS program. It is responsible for configuring hardware and each subsystem. Eventually this terminates and becomes the “Idle Process” executing an infinite wait look and collecting unused process time.

Diagnostic Serial Port – The RS-232 (COM) port code so boot messages can be output and the command line interface developed.

Process Management – The ability to execute more than one process through preemptive multitasking. Requires a 1 millisecond tick interrupt and the ability to swap process context with minimum overhead. This also allows processes to terminate or be terminated. This subsytem maintains the uptime and realtime clocks.

Ethernet Interface – Configures the Ethernet peripherals and external PHY. This handles the delivery of outgoing packets and the reception of packets. This establishes the high-priority Network process needed to insure that network traffic can be received and processed quickly and efficiently. The Network Process was the first process created for JANOS and the first to run outside of the Startup/Idle Process.

Command Process – The Command Line executes as its own process. This process is awakened by a keystroke on the RS-232 (COM) port and will handle all Command Line access. You will also be able to access this through a Telnet connection and via the Websockets Interface as well.

Command Line – Prompts for command entry and allows for convenient entry of textual commands. Includes a set of built-in commands and facilitates the execution of application programs. Maintains command history allowing commands to be easily repeated.

Memory Manager – Maintains Heap memory. This gives processes the ability to allocate memory for different purposes and then to return it for reuse. Efficiently manages the free memory block pool. Strives to minimize fragmentation and improve the availability.

Date & Time – Provides the ability to set and read the hardware RTC. Maintains the time of day in milliseconds since some epoch. Provides the ability to convert the millisecond count into year-month-day formats. Handles time zones and Daylight Saving Time.

System Process – The operating system itself has a number of background tasks it must attend to while other processes are busy with their own tasks. These are performed by the System Process.

Network Protocol Stack – To exist on the network the JNIOR must handle standard protocols at a low level such as ARP, ICMP, DHCP and DNS. Then once IP addressing is established the higher level protocols like IP, UDP/IP and TCP/IP are required.

Digital I/O – Subsytem handles the Digital Inputs and Relay Outputs. Handles counters and usage metering.

File System – Provides that ability to store and retrieve data as files. This handles file storage as it distributed across memory technologies. The JNIOR stores files in the non-volatile SRAM, in Flash (/flash), and in volatile Heap (/temp). This includes static content from the Program ROM (/etc). Provides for file access controls using permissions modeled after Unix.

C Libraries – JANOS is written in C and requires the standard C library. In order to insure the re-entrant and thread-safe aspect of the C Library you have to have the source code and therefore have written a custom library.

AUX Serial Port – While very similar to the Diagnostics port, the AUX port includes handshaking capabilities and can support RS-485 communications.

FTP Server – Required to get files on to and off of the JNIOR.

WebServer – Used to serve files and create web sites.

Websockets – WebServer HTTP/HTTPS connections can be upgraded to handle the Websockets Protocol. This provides for the ability to remotely monitor and control the JNIOR.

JNIOR Protocol Sever – Provides for legacy support of the JNIOR protocol which allows for remote monitoring and control of the JNIOR.

Registry System – Subsystem to manage name-value pairs for system configuration.

JVM – Java Virtual Machine permitting programs written in Java to be executed. This includes a complete stand-alone Java Runtime Environment for the JNIOR. Uses a Native interface with the operating system.

Library File Handling – The ability to access and decompress files stored in ZIP and JAR files. Necessary for the execution of externally compiled Java programs. Also expands the File System allowing the WebServer to serve files directly out of a ZIP library.

External Module Support – Configures and utilizes the Sensor Port to manage external expansion modules.

Network Time Protocol – NTP synchronizes the JNIOR clock with the rest of the world.

Beacon Protocol – Required to facilitate initial network configuration. Used by the Support Tool in managing collections of JNIORs.

Field Update – The ability to update JANOS in the field. Allows a UPD file to be loaded and the operating system updated on reboot. This must be a fault tolerant procedure.

Sockets – The means by which Java application programs integrate with the network stack.

Users – The ability to create users, maintain credentials and require login. Users can be created, disabled and removed. There are different user permissions/capabilities.

Email – JANOS has the ability to send email notifications.

Inter-process Messaging – JANOS supports a Message Pump which is a system by which messages can be circulated between processes. JANOS can notify applications of certain events.

I/O Logging – The ability to log and display DIgitial I/O events and serial communications.

Network Capture – The ability to capture network traffic for use in debugging. JANOS creates files compatible with WireShark.

Security – Implements SSL/TLSv1.2 secure communications. This includes multi-precision math libraries, message digest calculations, encryption and decryption. Supports RSA, RC4, DES and others. Automatically generates self-signed certificates.

Server-side Scripting – Handle PHP-like server side scripting to support the development of sophisticated dynamic websites.

Regex Engine – Support the ability to perform search and replacement operations using Regular Expressions.

Watchdogs – The ability to detect when program operation has failed and then to take action to restore operation.

User Groups – Create user groups and assign users for use in handling file permissions.

If I am missing anything feel free to ask about it. Some of these items are extremely complex in and by themselves while others were not all that complicated. It has taken some 5 years in development. The OS consists of over 400 individual source files and countless lines of code. Does anyone really know how to count code lines? Anyway, slowly I will proceed to describe the implementation details.

 

In 2011 it became apparent that we needed to redesign the JNIOR. This was not because its features and functions were lacking in any serious way. It was more due to the fact that product performance was limited and starting to fall behind that expected by the market. We also could tell that certain devices we used were approaching some kind of End of Life. Specifically the Dallas/Maxim-IC DS400 processor.

We needed to develop hardware around a new processor. While that was certainly doable from a hardware perspective, it meant that none of the existing JNIOR operating system code and its applications would be usable. Regardless of processor selection we would be faced with a new real-time operating system (RTOS) and have to develop a new approach to applications. This would mean as well that the user would see a totally different user interface and would have to learn new procedures for the JNIOR.

Would the result actually be a JNIOR? Certainly we would not be creating a drop-in product revision. This was not going to be an acceptable approach. The main point in the specification of the next series of JNIOR demanded that it remain consistent and familiar to the customer base. If this was not the case then we would never be able to stop producing the Series 3 and move to the Series 4 in any seamless fashion.

So in order to develop a revised series of the JNIOR without disrupting product use we had to both redesign the hardware and the firmware. We needed an RTOS were we would have complete control over the user interface and product capabilities. Given that existing operating system offerings were bloated with generic code, burdened with licensing issues, and expensive the decision was made to write our own.

Series 4 Product Specification

So the first step in any engineering project is to write the Specification document, right? It shouldn’t be any different here. But we never did create a specification document for this. Actually in this case there was no need. The Series 3 JNIOR in and by itself was our design specification.

The intent of the effort was to create a 100% drop-in revision to the Series 3 that utilized a new processor. That meant that it couldn’t look different form a Series 3. All of the external hardware connections had to remain consistent. The external expansion modules needed to continue to be usable. And, it still needed to execute the existing application programs. Basically, once the Series 3 were no longer in production we needed to be able to ship a customer a replacement JNIOR that didn’t involve re-engineering the application and modifying the installation.

That said, the intent also was for the new version of JNIOR to demonstrate “improved performance”. This mostly referred to processing speed. The Series 3 JNIORs were taking upwards of a minute just to boot and load applications. We had to improve on that. But by just how much? It was left unstated.

There were also (naturally) software issues with the existing Series 3 line. Many were rooted in 3rd party code for which we had no access and no ability to rectify. Some of those bugs we indeed started to consider as features. It would make no sense to replicate any of those. The revised JNIOR would be able to address those issues even if it meant some slight change in user interface or procedure. This was to be minimized however.

We had to consider licensing and copyrights. We could not directly convert the Dallas/Maxim-IC TINI operating system for our own use. We absolutely needed to develop our own new and improved code to avoid any legalities or added licensing expense.

Further we wanted to insure that we would not encounter any deficiency or issue caused by 3rd party code. These tended to be costly in our experience as a solution generally was not forthcoming and we would spend a large effort in creating our own work-around. So the new operating system would not utilize 3rd party code if possible. We were to have 100% control of, and knowledge of, the source code. This allowing us to guarantee bug fixes in an unprecedented short turnaround.

Finally, the door had to remain open for the introduction of new functionality that would improve the value of the product.

So, I believe that this conveys the (previously unwritten) set of specifications behind the creation of the Series 4 JNIOR and the JANOS operating system. Now that the product revision is reaching its own maturity I’ll let you create the report card.

Processor Selection
(Fall 2011)

With our goals set we obviously first and foremost had to select a new processor to sit at the core of the new JNIOR series. This would be a good topic for the Hardware section as there were a number of considerations involved. We decided to try the Renesas RX600 line and specifically the 144-pin version of the RX62N.

We obtained an RX evaluation kit, installed the Renesas IDE and wrote some simple “Hello World” type programs. I then jumped into the assembly code and looked carefully at the register handling in C subroutine calls, the complexity of startup code, and the handling of interrupts. It also mattered how the program stack was handled. And there needed to be some atomic instructions (for mutex implementation). I could experiment with these things with the evaluation kit. It also mattered how code was optimized. Was code compact and yet still execute efficiently?

The micro-controller’s complement of I/O pins and their flexibility were of interest. There had to be a good set of peripherals available. Enough to handle at least 3 serial streams. There needed to be flexibility in an external memory bus. Much had to be considered.

Once the decision was made new schematics were created and a board laid out for the first Model 410. The first prototype was hand assembled and made to execute a simple “Hello World” program through the RS-232 (COM ) port. That was (to the best of my recollection) December of 2011.

The Registry Key Assignments document is now distributed as part of the Dynamic Configuration Pages (DCP) package. You access that built-in documentation using the F1 key in the DCP or the link displayed at the bottom of the DCP. This is distributed with every JNIOR.

If you do not have access to the DCP then your JNIOR needs to be updated.

I will try to maintain the current Registry document on our HoneyPot JNIOR which is located on the open network. Yes, this JNIOR is not behind a firewall or NAT router. I have placed the Registry documentation in the public file area on that unit. Here is the link:

http://honeypot.integpg.com/RegistryDoc.html

(Hint: Hold the Ctrl key when you click a link if you want it to open in a new tab.)

By the way, this Registry document is indexed and if you want to refer to a specific Key you can add it to the URL. For example if you want to see the documentation for the Device/Timezone Regsitry Key use the following custom link.

honeypot.integpg.com/RegistryDoc.html#device.timezone

Try it!
http://honeypot.integpg.com/RegistryDoc … e.timezone

And since this documentation is built into the DCP you can retrieve it from you own JNIOR. Just reference your JNIOR instead. Note that in this case since the DCP is not public you will have to log into your JNIOR.

http://[IP Address]/RegsitryDoc.html

I will try to keep the most current version of the document on the HoneyPot though. That since most JNIORs probably need to be updated (which is highly recommended).

As we approach the moment when we adjust the clocks as we head into winter it seems appropriate to point out the date and time tools that JANOS has to offer.

bruce_dev /> date -v
 utc: 1509109889
 Fri Oct 27 09:11:29 EDT 2017
 Current Timezone is EST for the America/New_York area.
 Abbreviated EDT when Daylight Savings is in effect.
 DST begins at 02:00 on Sun on or after Mar 8th.
 DST ends at 02:00 on Sun on or after Nov 1st.
 When in effect DST sets clocks ahead by 1 hour.
 DST is currently in effect.

bruce_dev />

The DATE -V command (V for verbose) offers a description of the current time status. Here we see that I have incorrectly referred to DST as “Daylight Savings Time” when in fact it is Daylight Saving Time. So, before posting this I’ll just update the JANOS code to correct that. My bad.

It is in fact Daylight Saving Time and it is designed to capture an hour of sunshine that we would otherwise be sleeping through in the morning. To do this we set our clocks ahead an hour in the Spring at a point when the Sun insists on rising an hour before we want to get up. Remember “Spring ahead. Fall back.” DST is in effect during the Summer. So instead of it being 6:00 AM when the Sun blares into the room and pisses you off by waking you early, it will be 7:00 AM and maybe you should get up. That hour of daylight then is shifted to the end of the day. So it could be almost 10:00 PM before you have to put that lawnmower away. :?

Seems like a reasonable thing to do. Well of course it causes all kinds of issues. Making matters worse not all timezones observe DST and some do but use a different set of rules. And although the rules haven’t been changed recently governments have been known to make adjustments. Sometimes DST usage is temporarily altered say for when we may be in an energy crisis. Then there are the cities or areas of a timezone that decide to not follow DST like the rest of their timezone. All of this makes it difficult for a device to cope.

By the way, I’ve fixed my problem. You’ll have to wait for JANOS v1.6.3 for this. Sorry. :)

bruce_dev /> date -v
 utc: 1509110646
 Fri Oct 27 09:24:06 EDT 2017
 Current Timezone is EST for the America/New_York area.
 Abbreviated EDT when Daylight Saving Time is in effect.
 DST begins at 02:00 on Sun on or after Mar 8th.
 DST ends at 02:00 on Sun on or after Nov 1st.
 When in effect DST sets clocks ahead by 1 hour.
 DST is currently in effect.

bruce_dev />

Did you ever notice in windows that your file last modification times all change when you enable or disable DST?

Windows apparently applies timezone and current DST status to any time it displays. So if DST is in effect every time you see is adjusted even if DST wasn’t in effect at that point in the past. So if I save a file today (October) it will show the file was created today at such and such a hour. But if I check that file date next month (November) after we leave DST it would have been created an hour earlier that I did it. Perhaps even before I got into work. Show your boss that and tell him that all summer long you consistently made it in early and deserve a bonus. ;)

Well the times will be off by an hour but trying to figure whether the times would be an hour earlier or later makes my head hurt.

JANOS has been programmed differently and perhaps correctly at least in my opinion. When a time is displayed JANOS uses DST if and only if DST was in effect at that past moment. So when DST is enabled or disabled the file times remain consistent. You know, if you have written a Windows application that uses a file’s last modification time to check if the file were changed, you’ll go nuts. Twice a year you would think that batches of files suddenly have changed. Yeah I know Windows keeps a changed flag that you can and maybe need to use. But in general the file times seem like a reasonable indication of a change (or different version). You should use UTC anyway.

Of course if you change the DST rules or your timezone for that matter the reported file times will naturally all change. It’s all no big deal anyway.

We’ve also somehow moved from selecting your timezone to specifying a nearby city for getting your clock to display correctly. Well there are hundreds of major cities and that database becomes quite large given there is a lot of repetition (there a multiple major cities in a single timezone). Meanwhile the JNIOR is just a small device and doesn’t have the luxury of wasting storage space that you’ve spent your hard earned dollars on.

In JANOS you select the timezone using its abbreviation and there is a short list of timezones included by default.

bruce_dev /> date -t
Available timezones:
  IDLW (-1200) Date_Line                  WST  (-1100) Pacific/Midway
  HST  (-1000) Pacific/Honolulu           AKST (-0900) America/Anchorage(1)
  PST  (-0800) America/Los_Angeles(1)     MST  (-0700) America/Denver(1)
  CST  (-0600) America/Chicago(1)         IET  (-0500) America/Indianapolis
  EST  (-0500) America/New_York(1)        PRT  (-0400) America/Caracas
  CNT  (-0330) America/St_Johns(1)        AGT  (-0300) Argentina/Buenos_Aires
  BET  (-0300) Brazil/Brasilia            MAT  (-0200) Atlantic/Mid
  CAT  (-0100) Atlantic/Cape_Verde        GMT  (+0000) Greenwich_Mean
  UTC  (+0000) Universal_Coordinated      WET  (+0000) Western European(5)
  CET  (+0100) Central European(4)        EET  (+0200) Eastern European(3)
  EAT  (+0300) Asia/Riyadh                MET  (+0330) Iran/Tehran
  NET  (+0400) Russia/Moscow              AFT  (+0430) Afghanistan/Kabul
  PLT  (+0500) Russia/Ural_Mountains      IST  (+0530) India
  ALMT (+0600) Asia/Alma-Ata              CIT  (+0630) Burma
  VST  (+0700) Mongolia/West              AWST (+0800) Australia/Perth
  JST  (+0900) Japan                      ACST (+0930) Australia/Adelaide(2)
  AEST (+1000) Australia/Sydney(2)        LHT  (+1030) Lord Howe Island
  SST  (+1100) Russia/East                NIT  (+1130) Norfolk Island
  NST  (+1200) Pacific/Fiji             

Note 1:
 DST begins at 02:00 on Sun on or after Mar 8th. Clocks move forward by 60 minutes.
 DST ends at 02:00 on Sun on or after Nov 1st.
Note 2:
 DST begins at 02:00 on Sun on or after Oct 1st. Clocks move forward by 60 minutes.
 DST ends at 03:00 on Sun on or after Apr 1st.
Note 3:
 DST begins at 03:00 on Sun on or before Mar 31st. Clocks move forward by 60 minutes.
 DST ends at 04:00 on Sun on or before Oct 31st.
Note 4:
 DST begins at 02:00 on Sun on or before Mar 31st. Clocks move forward by 60 minutes.
 DST ends at 03:00 on Sun on or before Oct 31st.
Note 5:
 DST begins at 01:00 on Sun on or before Mar 31st. Clocks move forward by 60 minutes.
 DST ends at 02:00 on Sun on or before Oct 31st.

bruce_dev />

We’ve recently adjusted this list having learned of some additional DST rules. If you find that none of these standard timezones work for your location, tell me about it. I can add a new timezone (or fix an error) as easily as I changed “Daylight Savings Time” above.

If this doesn’t do it for you, you can define a custom timezone. This is all described in the Registry Key Assignments document the current version of which is built into the DCP and distributed with every JNIOR.

You can open the DCP by pointing your browser to your JNIOR and use the F1 key to bring up the documentation. A whole section on Custom Timezones and Timezone Corrections can be found in the Table of Contents.

In fact I can put the current version of RegistryDoc.html and RegistryDoc.css (and also inputs.png) in the public web area on the HoneyPot unit. The HoneyPot is a JNIOR Model 410 that we have connected directly to the Internet for security testing. That means that you can also retrieve pages from it. I’ll try to remember to keep that copy current.

Here’s a link to the Timezone section.

http://honeypot.integpg.com/RegistryDoc … e.timezone

So now on the HoneyPot we have the Registry Key Assignments document: http://honeypot.integpg.com/RegistryDoc.html and an application that maps all of the illegal login attempts that mostly result from malicious bots like that Mirai Botnet: http://honeypot.integpg.com/map.php

The JNIOR captures network traffic continuously. A large queue is established at boot and a record of network packets is kept. Once the queue fills the oldest packets are dropped. Depending on your network usage that queue might contain the past hour of traffic or much more.

The NETSTAT -C command generates a network capture file /temp/network.pcapng on demand containing the content of the capture queue. For example:

HoneyPot /> netstat -c
LAN connection active (100 Mbps)
 generating capture file...
 /temp/network.pcapng created
 collected 4791 packets from the previous 16 minutes
HoneyPot />

You can download this file to your personal computer. If you have Wireshark installed you will likely be able to double-click this PCAPNG file and it will open right up for viewing and analysis.

By the way you can download Wireshark for free from https://www.wireshark.org/.

Notice that in my example only the past 16 minutes were captured. First of all the capture begins at boot. So if your JNIOR has only been up for 16 minutes that is all that you will get. This, I think, was the case here.

It is important also to note that when you access the DCP you are also using the network. That means that all of the packets involved with your browser access will also be captured. If you leave the DCP up to monitor the status of your JNIOR all of that traffic takes up space in the capture buffer. If you are interested in analyzing the JNIOR interaction with another device that DCP traffic will limit the amount of information you can capture relative to that other device.

One solution is to not use the network to monitor the JNIOR by using the RS-232 (COM) port. But for that you have to be physically at the JNIOR and have the ability to use a serial connection. A better solution is to use capture filtering.

With the JNIOR you can both filter the packets that are to be retained in the capture buffer and filter the packets that you extract into the /temp/network.pcapng file. For example I will set this JNIOR up to ignore HTTP (Port 80) traffic and capture everything else. I can then use the DCP without loading the capture buffer with packets that we are not interested in.

HoneyPot /> netstat -f !80
LAN connection active (100 Mbps)
 capture filter set
HoneyPot />

This tells JANOS to only accept packets into the capture buffer that DO NOT reference port 80. This takes effect immediately (provided there isn’t a syntax error). The next /temp/network.pcapng file that I generate will no longer contain HTTP communications and therefore nothing to do with the DCP. There will be many more packets from a longer period of time all pertaining to other communications that might help with my debugging.

The NETSTAT -F command basically sets the IpConfig/CaptureFilter Registry key. If you mouseover that key in the Registry tab of the DCP and hit F1 you will see the documentation for filtering and the syntax for these filters. These details are in the Registry Key Assignmentsdocumentation. The current version of that documented is distributed as part of the DCP. It will be present on every Series 4 JNIOR that is up to date.

Network Filtering

The content of a network capture or the network clients allowed to interact with the JNIOR can be controlled through dynamic filtering. These filters can be quite simple or, if needed, much more sophisticated.

The IpConfig/CaptureFilter Registry key may optionally define a filter which is applied to incoming packet data prior to capture. There is limited storage for captured information and by filtering you can extend the capture period and the amount of pertinent information collected.

A filter may also be used in generating the network.pcap capture file from the capture buffer. Here the filter allows you to extract only pertinent information and otherwise keep file sizes at a manageable level.

The IpConfig/Allow Registry key may optionally define a filter which is applied to incoming connections. In this case the referenced IP addresses refer to the incoming source IP addresses, those of clients. Referenced port numbers refer only to destination ports, those available on the JNIOR.

IP Address Filters

To filter packets referencing a specific IP address you need only include the IP address in the format “nnn.nnn.nnn.nnn” in the filter string. Any packet that references this IP address either as the source or the destination address will be selected for inclusion. All other packets will be excluded unless covered by some other part of the filter.

To exclude packets referencing a certain IP address you can prefix a ‘!’ exclamation point to the address like this “!nnn.nnn.nnn.nnn”. All packets that do reference the IP address as either a source or destination address will NOT be selected for inclusion. This can also be written as “NOT nnn.nnn.nnn.nnn”.

Note that an IP address is identified by its format, four decimal values between 0 and 255 separated by the ‘.’ period.

The domain syntax allows a range of IP addresses associated with a netmask to be specified. The format is “nnn.nnn.nnn.nnn/mm” where ‘mm’ specifies the number of high order bits in the netmask. For example, “10.0.0.0/24” specifies any IP address in the domain that contains IP addresses 10.0.0.1 through 10.0.0.255 and uses a netmask of “255.255.255.0”. This would be useful in selecting only local traffic for instance.

MAC Address Filters

Although less often required you can filter on a specific MAC address. The MAC address is included in the filter string in the format “hh:hh:hh:hh:hh:hh”. The format is six hexadecimal values (0-9a-f or 0-9A-F) separated by the ‘:’ colon. For instance most INTEG Series 4 JNIORs have MAC address formatted as “9C:8D:1A:hh:hh:hh” where the lower three bytes are assigned in some sequence.

As with IP addressing, packets with MAC addresses may be excluded by writing the filter as “!hh:hh:hh:hh:hh:hh” or “NOT hh:hh:hh:hh:hh:hh”. Again a MAC address is identified by its format.

TCP/UDP Port Filters

A port is specified in the filter string as a decimal value between 1 and 65535 inclusive. No punctuation is used. The capture filter does not distinguish between a TCP or UDP port number. A port may be excluded using the negation “!nnn” or “NOT nnn”.

There are standard ports assigned for various functions. The capture filter knows some of them by name. Some may be reconfigured through the Registry. As a convenience the port may be specified using its protocol name. The capture will be filtered on the port as configured at the time the filter is compiled (at boot or upon NETSTAT command). JANOS recognizes these port names where the default values are shown in parentheses: SMTP (25), NTP (123), JNIOR (9200), FTP (21), HTTP (80), HTTPS (443), TELNET (23), and BEACON (4444). These ports may be excluded using the same negation syntax as previously shown.

Boolean Constants

The capture filter will also recognize the terms TRUE and FALSE. True indicates that the packet is to be included and False otherwise.

Logical Operations

To filter on a single IP address, MAC address or port (or to exclude a single item) the filter need only specify the address or port in the associated format. The following would select the communications involved in an email transfer. If this is used as an incoming filter, only email transactions would be captured. If this is used with NETSTAT –C in generating the PCAPNG file, the file would only include email communications.

            NETSTAT –C SMTP
            netstat –c 25

Note that filters (and also commands) are not case-sensitive. Either form above will create a PCAPNG file with just email communications. This assumes that you have not reconfigured the SMTP port. If you have set Email/Port to another port (587 for instance) then the first line will extract your email communications and the second will not. Although the second filter might show an application trying to use the incorrect port.

Filters often need to be slightly more complex in order to include the collection of communications needed. The syntax allows you to specify any number of addresses or ports in any combination using AND, OR and XOR logic. As an alternative you may use the notation ‘&&” and ‘||’ for AND or OR respectively.

As an example perhaps you want to filter only email communications with the SERVER.

            netstat –c 10.0.0.4 && smtp

If you want to also include BEACON communications you might write the filter as:

            netstat –c 10.0.0.4 AND smtp OR beacon

But here you might question the order of precedence of the logical operations. The capture filters do not support an order of precedence but perform the operations from left to right. So this would be calculated as follows:

            netstat –c (10.0.0.4 && SMTP) || BEACON

And this would have done what we had said. If there is some question you can use the parentheses in the filter as shown. The following will create the same subset of packets but would not if we were to exclude the parentheses:

            netstat –c BEACON || (10.0.0.4 && SMTP)

A parentheses grouping can be negated as you would expect. The following will create a capture of all activity EXCEPT email communications with the SERVER.

            netstat –c !(10.0.0.4 && smtp)

Finally if we had wanted to mask these email communications from the overall capture buffer we can install this filter using the command:

           netstat –f !(10.0.0.4 && smtp)

This would result in the following Registry setting and would filter out matching communications until such time as the filter is removed.

            IpConfig/CaptureFilter = !(10.0.0.4 && smtp)

The JNIOR will only capture packets that are addressed to it. That would include broadcasts.

Ok. Well perhaps it is undocumented (and you didn’t hear it from me) but you can set the IpConfig/Promiscuous Registry Key to true. The JNIOR then will capture (subject to any filtering) all traffic including packets not addressed to the JNIOR. But for this to be of any use you will need to be using a Network Hub instead of a Network Switch. There is a difference. The latter “switch” will only pass packets to the JNIOR that are addressed to it in addition to any broadcasts. If you are interested in other traffic you need to use the “hub” which passes all traffic to all connected devices.

The Series 4 JNIOR with a hub then can be a valuable network debugging tool useful in sniffing traffic between two other devices (connected via hubs). It would allow you to remotely use the features of Wireshark.

The Series 4 JNIOR can be configured to send email. The approach is slightly different than what existed in the prior controller series. The differences are driven by the heightened concern for security and an increase in restrictions on email server use.

First of all, we now we require valid user credentials (username and password) for authentication at the defined MailHost. You can only submit email to a Comcast SMTP server if you have a valid Comcast email account for example. Just having a valid Comcast email address or originating on a Comcast subnet is no longer sufficient identification. A login is typically required.

Secondly, the Series 4 can establish SSL/TLS secure connections. That means that your email content and credentials are protected by encryption. This was not possible in the prior series where such communications had to be in the clear and readable by anyone clever enough to sniff the network.

The attached document was written prior to the introduction of the DCP (Dynamic Configuration Pages) which are accessed using your browser. At that time one had to use the IPCONFIG command to enter email user credentials in order to encrypt and protect the account password. Most configuration can be also achieved through Registry key modifications. But the password entry requires the active step of encryption which is handled by IPCONFIG.

I will detail recent changes here.

Configuring JANOS for Email

Up to and including the release of JANOS v1.6.2 email can be delivered through an SMTP port (default 25) or MSA port (default 587). JANOS handles both of these ports in the same manner. The JNIOR will authenticate and then if the STARTTLS option is supported JANOS will establish an encrypted connection. The latter requires that SSL be enabled on the JNIOR which it is by default.

You will not get an indication as whether or not an SSL/TLS connection is in use. You could enable SSL Required but that may have other ramifications (such as requiring the use of HTTPS protocols for web connections).

Beginning with JANOS v1.6.3 you will be able to use the SMTPS port (default 465) for guaranteed secure email delivery. The SMTPS port requires a SSL/TLS secure connection right from the start. The email submission procedure will not proceed until the connection is secure. So in this case you can be certain that content and credentials are fully protected. A new Registry key Email/SMTPS must be set TRUE or enabled so JANOS knows to immediately secure the connection.

By the way, JANOS uses this same STARTTLS option to secure FTP. The JANOS Telnet Server which can be used to access the Command Line interface also supports STARTTLS. In this case the option had be proposed but not adopted. So, to our knowledge, the INTEG Telent tools are the only ones that provide the secure Telnet channel. All of this becomes simple when the DCP is used over a secure connection (HTTPS).

So to setup email on the Series 4:

  1. Make sure that your IP configuration (IP address, Subnet, Gateway and DNS servers) is correct. If you are using the default NTP server for synchronizing the clock and you see that NTP is doing just that by the entry in the system log, then your IP configuration is most likely correct.
  2. Set the MailHost or Mail Server. This would be something along the lines of smtp.comcast.net.
  3. Enter you own email address as the From address. Emails will look like they come from you.
  4. Enter your Username. Depending on the system that may be your email address or just the prefix. Note that IPCONFIG and also the DCP will prompt for a password. Once that is confirmed it will be encrypted and stored securely.
  5. By default the email port is 25. That should work for you. Depending on your service they may ask you to use a different port. Set the port as needed. If it is port 465 (or other SMTPS port) you will need JANOS v1.6.3 and Email/SMTPS enabled.

A good test is to enable the Email On Boot and reboot the JNIOR. That setting is on the events page in the DCP. The document attached above tells you how to use the SENDMAIL command form the Command Line which is also useful for testing.

Since today you really need to keep the Login requirement enabled on your JNIORs, what if you want to serve some Web data publicly? You know, without having everyone use a password?

Well, you probably aren’t aware that in addition to the default WebServer root of /flash/www there is another called /flash/public. Yep. You can probably guess now that anything you put in /flash/public will be served by the WebServer without requiring that the client login/authenticate. Everything else remains secure and requires your login for access.

In fact that is how our HoneyPot unit that is sitting out directly on the Internet lets you access the following page:

http://honeypot.integpg.com/map.php

There you didn’t need to login and yet that JNIOR is secure and nothing else can be accessed or modified without securely logging in.

Pretty sweet, eh? :mrgreen:

Oh and you could rename /flash/www.zip as /flash/public.zip and serve the DCP publicly. Wait! wouldn’t that be dangerous? :o

Well, not really. The DCP makes a secure Websockets connection back to the JNIOR. That Websockets interface requires a login (assuming you haven’t also disabled that). So when you open the public DCP you are again asked to login. You have to properly authenticate before you can really do anything. 8-)