Author Archives for Bruce Cloutier

















In an earlier article we attempted to connect the JNIOR to a DMX512 network in a way that would allow a theatre stage crew to control relays and other outputs through their main lighting control panel. Here with the programmable facet of the JNIOR we could help create complex special effects that can be triggered in-sync with lighting. In fact, DMX channel data can be used by the JNIOR application to trigger and manipulate activity on a LAN segment and thereby update computer screens and TV displays on stage. This can all occur with precise timing in concert with other lighting changes.

The DMX512 protocol and interface was developed over 50 years ago as a means to reduce wiring cost and to improve the mobility needed to support major concert tours in the music industry. It incorporated serial data transmission formats that were state-of-the-art at the time and that are still relevant today. Unfortunately the protocol employed a baud rate much higher than would be later recognized as standard in the computer industry. Also the limited capabilities for data synchronization and lack of error checking make interfacing difficult. Even though the JNIOR Model 410 is able to handle the RS-485 signals and was upgraded to receive 250 Kbaud data triggered on the specified serial Break Condition, we could not achieve reliable operation as a fixture through programming alone.

The issue relates to the design of the standard computer hardware component called the UART. The Break Condition that the DMX protocol uses to identify the start of a frame of data is not handled by the UART in a fashion that can be reliably used to synchronize data reception. At least it is not possible in a purely software implementation as we discussed previously. The answer lies in a simple hardware adapter that alters the DMX512 signals just enough to work correctly through the UART. We will see what that entails in the rest of this article.

 

 

DMX512 Fundamentals

Today DMX is defined by official standards. On the wire DMX512 uses EIA-485 differential signaling. We often refer to it as RS-485. You are probably more familiar with the term RS-232. As serial digital communications became more prevalent the distance limitations and noise susceptibility of RS-232 became more and more of concern. The solution, designated as RS-422, used balanced transmission lines over twisted pair which extended the distances that signals can traverse as well as the communication rates that can reliably be achieved. RS-422 connections however are point to point and as computer systems grew so did the need for networking. Soon RS-485 came along and added functionality that created a multi-drop environment where one talker can communicate with a number of listeners. RS-485 is what DMX512 needed where one (or more) lighting control panel must communicate with multiple lighting fixtures.

This basically amounts to a 3-wire network where there is one DATA+ (positive) data line and one DATA- (negative) data line along with a GND ground. The DMX standard details specific connectors (5-pin XLR) and wiring of sufficient gauge and quality. The world also wants to utilize 3-pin XLR connectors for DMX as these are prevalent in the industry through their use in audio applications. You will find a mixture of fixtures some with 3-pin and some with 5-pin connectors. Thankfully 3-pin to 5-pin adapters are readily available.

On the protocol side it was necessary to standardize. This would encourage lighting manufacturers to incorporate the technology. Over the years the protocol itself had been handled by different standardization groups and now is:

 

American National Standard
ANSI E1.11-2008 (R2008)
Entertainment Technology – USITT DMX512-A
Asynchronous Serial Digital Data Transmission Standard
for Controlling Lighting Equipment and Accessories.

 

In order to understand where we have difficulty we need to look deeper into the specifics of what is sent across the wires. You recall that DMX was invented to reduce wiring costs. Each channel in the protocol replaces a single power cable. The protocol allows for up to 512 channels and all of those multiplexed onto a single 3-wire network cable. That is a significant savings.

Each channel digitizes an analog signal into one 8-bit value. These can represent numbers from 0 to 255 and for lamps that is used to sufficiently cover the range from 0% to 100% brightness in 256 steps. Where once one channel controlled one lighting fixture, today multiple channels are assigned to each fixture to define attributes such as color and positioning in the case of motorized fixtures. In the serial world each channel is transmitted just as a byte of character data would with a start bit, 8 data bits and one or two stop bits. For DMX512 two stop bits are used. That is a total of 11 bits per channel.

 

 

The DMX protocol allows for the transmission of up to 512 channels. These are sent in sequence starting with channel 1 and up to channel 512. A Start Code is defined as channel 0 and the typical value for that is zero. The complete set of channels 0 through 512 (513 channels) is called a Frame. Frames are transmitted repeatedly one right after another with the start bit immediately following the prior stop bits at the defined rate of 250 Kbaud. We can now do some math. With 11 bits per channel and 513 channels that is a total of 5,643 bits. A bit at 250 Kbaud requires 4 microseconds and so the data part of the frame takes a total of 22.6 milliseconds. We can fit a little over 44 frames in a each second.

Now with a constant flow of channels each looking the same on the wire how do we know which one is channel 0 and which might be the one we need for our fixture? If we connect to an active DMX network we need a way to synchronize ourselves. For this the standard defines the use of a Break Condition to signal the beginning of a frame. In old days this was called an Attention Signal and it was used to wake up remote teletype equipment and there was actually a key on the Teletype for this. So back in those days this was a natural choice for DMX synchronization. A Break Condition amounts to a period of time where the signalling is held at the level of a Start Bit (Space) long enough to violate the normal byte format and cause a reception error. The receiver simply fails to see the Stop Bits where they are expected and raises a flag. This is intended to signal the beginning of a frame.

 

 

By this definition a Break Condition need only last long enough to mask the expected Stop Bits (11 bits, 44 microseconds). This causes a UART to signal a Framing Error indicating that the data it received was not properly formatted and followed by the expected Stop Bits. This error can then be used for Break detection and the synchronization of the DMX Frame. The current DMX512 specification defines the minimum time for this Break Condition to be 88 microseconds which is exactly the transmission time of two channels. The timing of this Break Condition has been reduced from prior versions of DMX512 and it is unclear if there is any risk that existing lighting fixtures may fail to operate when presented with the shorter pulse. The specification does not specify a maximum duration for the Break Condition and states only a “typical” duration of 176 microseconds.

After the Break Condition terminates there must be a brief period of time before the Channel 0 (Start Code 0x00) can be transmitted. This period is called Mark After Break and the current specification defines a minimum of 8 microseconds the time period of two bits. This too has been reduced from prior versions of the specification. A maximum duration of 1 second is defined for Mark After Break and no typical value is given.

This defines the entire sequence. First a Break Condition of sufficient duration is issued. This is followed by a Mark After Break also of sufficient duration. After this each of 513 Channels is sent with proper serial formatting. The first is the START CODE which for normal DMX is always 0x00. Following the final Stop Bits for the last Channel there can be a gap before the next Break Condition. These Frames are sent continuously and according to the specification at least one Frame needs to be sent every 1.25 seconds.

Here is what this looks like in reality. The wider pulse in the center is a Break Condition and Frame data can be seen to the right and left of it. Note that in this case all Channels are set to 0x00 and so we see only the Stop Bits for each.

 

 

One side note is that shown above are the CMOS logic signals observed after the RS-485 receivers. This signal is typically fed to the receiving hardware or UART. This image is consistent with our prior diagrams where the Marking condition is a HIGH or 1 and the Spacing condition a LOW or 0. The signals on the actual wire demonstrate the same timing but would involve dramatically different voltage levels.

There is one point that I have not mentioned. The total number of Channels included in a Frame is a maximum of 513 but that number is optional. A lighting controller may decide to only send 128 Channels in addition to the START CODE assuming that others are unused and thereby increasing the number of Frames per second. The specification defines a maximum Frame rate of around 836 Hz (Frames per second). This can accommodate fixtures that need higher Refresh Rates. The Refresh Rate for a DMX512 Universe of a full 512 Channels is around 44 Hz.

In the previous article we alluded to there being some difficulty in receiving these signals with standard computer hardware. We now have enough technical detail to understand what that reception problem might be. This is the topic for the next section.

 

The Issue with the DMX Break Condition

Now let’s look at the task of receiving a frame of DMX data from a software point of view. We will assume that you have configured the hardware to receive RS-485 signals. The JNIOR Model 410 AUX port has RS-485 capability when properly wired and configured. INTEG can provide those details and those are mentioned in other articles. The JANOS operating system has also been enhanced to accommodate DMX and includes the 250 Kbaud rate setting as standard.

In the previous article JNIOR as a DMX Fixture we covered both the proper hardware configuration for the JNIOR Model 410 and the simple Java application software that might be used to receive a frame of channel data. We took that a step further and let the JNIOR activate relays in response to changing channel levels. It is here that we noticed a reliability issue and then spent time to understand it. We discovered that it is a limitation in the design of the standard UART and not something specific to our software or even the design of the JNIOR. Here we will review those findings and then in the balance of this article present a solution.

As you can imagine it would not be acceptable for the JNIOR to receive an incorrect channel value and act upon it. On stage you might create a flicker and distract the audience or worse trigger some elaborate effect at a totally inappropriate moment. Neither contribute to receiving good performance reviews. Well our first attempt at having the Model 410 serve as a fixture worked but occasionally there was a glitch. A relay toggled inappropriately and this lead to an investigation.

We started by attempting to validate the frame of channels. We know that the START CODE should be 0x00. So we decided to first test the START CODE. Well, 0x00 was not what we were seeing and this led us to understand that the UART is not able to properly receive data immediately following a Break Condition. Let’s look deeper into it.

Every description of the inner workings of the UART describes the reception of data beginning with the detection of the Start Bit. They state that “A Start Bit is detected as the falling edge of the signal” (which at that point is moving from a Mark condition to a Space condition from HIGH to LOW). This makes sense since this can be used to synchronize the baud rate and the UART once detecting the falling edge delays 1/2 bit time to begin sampling. The first reading is then LOW (0) since it is a start bit.

The UART then can proceed to receive data bits one after another starting with the LSB (Least Significant Bit or D0). Let’s assume that we are configured for 8 data bits and no parity as is the case for DMX. Once all 8 Data Bits are collected and the byte of data fully accumulated then next bit sampled is expected to be HIGH (1) as this is the necessary Stop Bit. The UART sees the Stop Bit and pushes the data to the output buffer and notifies the system that data can be read from the port. You can see how this follows from the first figure in this article repeated here.

 

Logically then you can see how the UART might go back into some ‘Start Bit Search Mode‘  in order to pass over any number of remaining Stop Bits and dead time between character transmissions.

We mentioned previously that in the case of a Break Condition the Stop Bits are missing. Here the UART receives 8 Data Bits or what it hopes are 8 Data Bits and then fails to detect the Mark condition signalling the presence of a Stop Bit. The typical UART then pushes the data to the output buffer and signals the processor that data can be read from the port. It also sets a Framing Error flag and well-written serial drivers handle the error accordingly.

Here is where your logical thinking might fail you. Once the UART signals a framing error you would like to think that it reenters the aforementioned ‘Start Bit Search Mode’. This would take the UART through the balance of the Break Condition and past whatever Mark After Break is present to the very next falling edge of the signal. This then allowing it to reliably and properly receive the very next valid byte of data. Well, surprise, it does not work that way.

Instead the UART continues to sample bits based upon its internal baud rate clock. It accepts the very next LOW (0) bit as a Start Bit and proceeds to process the following 10 or 11 bit times as another byte of data. This is why from the software side a Break Condition usually presents itself as more than one 0x00 byte each with an associated Framing Error. By itself this is not an issue as you can accept one or more Framing Errors as an indication of the Break Condition. In fact you could use this to detect a short Break from a long Break. But let’s look closer at what happens at the very end of the Break.

At the conclusion of an arbitrarily long Break Condition the signal returns to a Marking state. We have reached the Mark After Break but it is highly likely that the UART is still accumulating hopeful data bytes. Eventually it sees the Mark as a valid Stop Bit. The result is that a byte is then output by the port and there is no Framing Error. There is no way for software to determine if this is the first valid data byte (START CODE) or a bogus trailing byte formed out of the end of the Break Condition.

In testing we can see that this indeed is occurring. Since data bits are transmitted from least significant to the most significant, depending on signal timing this bogus byte can be seen to contain any one of the following: 0x00, 0x80, 0xC0, 0xE0, 0xF0, 0xF8, 0xFC, 0xFE or 0xFF. Perhaps you can see that this result depends on just what Data Bit the UART thinks it is receiving at the point when the Mark After Break begins. There is actually one additional case and that being when the data is actually the first valid data byte after the Break. This effect depends highly on the duration of the Mark After Break as well.

Matters can be even worse. In DMX512 the Mark After Break can be as short as two bit times or 8 microseconds. That means that it is possible that the UART does not even look for a Stop Bit until the signal is deep into the START CODE byte. Since the START CODE is itself 0x00 the UART might then grab one of those data bits as a Start Bit and, well, fail to receive several channels of data. How can this be? Are UART designs defective?

Well let’s not jump to that conclusion just yet. Let’s give the hardware developers the benefit of the doubt at least initially. First we should understand that once the computer industry embraced serial data communications use of the Break Condition fell by the wayside. Protocols used Start of Header (SOH) bytes and other means for synchronizing data blocks. These tended to include byte counts, error checking and even error correction. These days there is even data compression and data encryption. The Break Condition not only fell out of use but also started to disappear from serial communications documentation.

Meanwhile the industry started to look into the reliability of data transmission lines. In the early days RS-232 cabling was not what it is today and it was very susceptible to external electrical noise. Also remember that we were still using machines developed in the early half of the 20th century. These were power hungry electro-mechanical contrivances that spewed out electrical noise with reckless abandon. RS-232 signals often experienced data disruptions some as brief as an incorrect value of a single bit up to the loss of entire blocks of bytes. There were even situations where communications were completely blocked out for lengthy periods of time. Perhaps for as long as the nearby noisy equipment remained in use. This led to better cabling. It led to improved communications standards like RS-422. It prompted the need for error detection and motivated a lot of creativity on the error correction front. Simultaneously the Federal Communication Commission (FCC) put forth standards for controlling RF and electrical noise emissions and prompted certifications making new equipment meet the stringent requirements.

A UART design that utilized a “Start Bit Search Mode” fails to perform properly in a noisy environment. In the case when a Stop Bit is damaged by noise and thereby missed by the UART, an approach that accepts the very next falling edge as a start bit falls on its face. It fails to re-synchronize quickly. Many bytes of data then are lost even though electrically the signals are undamaged. Error correcting schemes just could not handle it. They determined that the current UART approach performed much better and, well, what is this Break Condition anyway?

We can only guess at how we got to where we are today. It is actually a surprise that a 50-year old protocol would still be as active today and it would still rely on synchronization techniques like the Break Condition. Of course it is not a really issue as DMX fixtures are developed to properly receive these signals. We are just a bit hampered in trying to use standard computer hardware. So what can we do about it? That is our next topic.

 

Correcting the DMX512 Protocol

We stand little chance of “correcting” the DMX512 protocol. With over 50 years worth of legacy DMX equipment out there a change in the design of the protocol is just not an option. The ability for DMX to inter-operate with standard computer hardware is just not that critical of a need. We have also exhausted any possible software solution to this issue. So what can we do? Some kind of hardware solution will be needed if we want to develop a JNIOR that can be used as a DMX fixture. What form will that take?

Obviously lighting fixtures all over the world read DMX channels seemingly without difficulty. So we know we can do it. Does this require a custom UART implementation in some FPGA component? Do we have to dedicate some PIC processor or other device to process incoming DMX bit by bit? Are there standard DMX chip level devices out there to make it easy? It certainly all cries out for some research. But first let’s see if there is some simple and creative way to address the issue.

The problem breaks down into two separate issues:

  1. The UART is unable to synchronize after a Break condition and reliably read the first START CODE byte.
  2. The trailing edge of the Break Condition can generate unexpected data not identified with a Framing Error.

Clearly if we use a Framing Error indication to signal the reception of a Break Condition and we are 100% assured that the next valid data byte is accurately representing the START CODE we are good to go. We assume then that the processor has enough horsepower to receive and buffer all 513 channels (if they are present) without error. This can be done with low-level interrupt routines avoiding any dependence on the processing load in the JNIOR. In fact all of this has already been implemented in JANOS and deployed into the field. Those techniques were tested in the prior article.

The Mark After Break condition specified by the DMX512 protocol is a minimum of 8 microseconds. That is just two bit times. This is unfortunate. If this Mark After Break condition were to be held at least 11 bit times (44 microseconds) then any UART would be assured of seeing some part of the condition as valid Stop Bits. The UART then would be guaranteed ready to receive the very next data byte which is the START CODE.

If we somehow could extend the Mark After Break that would insure the reliable reception of the channel data. That by itself is not enough since we know that the trailing edge of the Break Condition can generate a data byte that is not flagged with a Framing Error. This would confuse us as there is no real way to determine if this is the START CODE or a bogus addition to the stream.

Now if we could somehow limit the Break Condition to a single byte we would receive one and only one byte flagged with a Framing Error. The UART would not see an additional Start Bit and thereby insure that the next byte we see is in fact the START CODE. We know that the Break Condition is at least 88 microseconds at its shortest. That is exactly two data byte times (each of 11 bits = 1 start bit + 8 data bits + 2 stop bits). So if we truncated the Break Condition at exactly 44 microseconds which would generate 1 byte with a Framing Error, there is another whole byte’s worth of time that could be added to the Mark After Break. This would satisfy our interest in extending the Mark After Break for at least a byte time. At a minimum it would then be 52 microseconds (44 borrowed from the Break + 8 original microseconds).

 

The Hardware Solution

Our hardware needs to truncate the Break Condition. We need only hold it long enough to mask the first Stop Bit and thereby force the Framing Error. This needs to be achieved without risk of causing other Framing Errors. With a little thought we developed the following logic.

 

This circuit is inserted in the CMOS logic stream after the RS-485 receivers and before input to the UART. The RXD signal from the receiver enters at the left. We will use the OR gate (U7) to truncate the Break Condition. This will pull the signal HIGH at the appropriate time but otherwise allow the normal serial stream through. The output then is TXD to be delivered to the UART on the right.

The CD4024B (U6) is a 7-stage counter and it is reset (RST) or restarted with any HIGH Marking condition on the incoming data. This means that the count is not allowed to progress very far except during the Break Condition where there is no reset (RST). This in essence times the duration of the Break Condition. Note that when the output of the 7th stage goes high (count >= 64) the OR gate then pulls the TXD signal high effectively truncating the Break Condition. The signal output is then held HIGH until the Mark After Break arrives and resets the counter. At this point the incoming signal is HIGH and we have effectively transferred time from the Break Condition to the Mark After Break. This is exactly what we want.

The UART will throw the Framing Error as soon as the first Stop Bit is not detected. We don’t want it to see any potential Start Bit after that. To be safe then we truncate the Break Condition after the 10th bit time right after the first Stop Bit. This means that we must truncate the Break Condition precisely after 40 microseconds and this must occur at the 64th count when the 7th stage goes high. As it turns out 40 microseconds divided by 64 is 0.625 microseconds the precise period of a 1.600 MHz clock. Those are readily available. So we clock the counter with a 1.600 MHz signal using oscillator U4.

One last detail to consider. There is no upper bound on the duration of the Break Condition. That means that once we truncate the Break Condition we need to hold it for potentially a very long time. So we cannot allow the counter to continue to run. Once the 7th stage goes HIGH we need to hold it there. A second OR gate (U5) masks the incoming 1.600 MHz clock signal at that point effectively stopping the counting. And, miraculously, we’ve got it covered.

To test this we created a prototype adapter which implements an isolated DMX512 standard port. The signal passes through our circuit and then a standard RS-232 transceiver is used to communicate with the JNIOR AUX port. One advantage this has is that any Model JNIOR with an AUX port can be used as a DMX fixture. Whereas only the Model 410 supports RS-485 directly and can be used as described in the prior article.

 

 

Here the few components involved sit in the lower lefthand corner. These Surface Mount devices take up almost no real estate and this is perfect for a new model of the JNIOR sporting a DMX512 input. Okay, so does the circuit work?

 

 

Here we see another oscilloscope trace. The yellow signal is RXD as received from the RS-485 receiver. This is what enters the schematic from the left. The blue signal is TXD and what we will supply to the UART. This exits on the right of our schematic. The wider pulse in the center is our Break Condition.

On the yellow trace we see that the Break Condition supplied by the DMX network is about 100 microseconds and it is followed by a brief Mark After Break which looks to be somewhat less than 50 microseconds. We know that this signal is problematic as it is from the same source used in the prior article. Note here that the START CODE and all channels happened to be 0x00. So those high pulses are Stop Bit pairs.

Now the blue trace is the result. The Break Condition has indeed been truncated. The vertical cursor lines are positioned to measure the duration of the new Break Condition. On the right we see the delta-t to be the precise 40 microseconds we had hoped for. We can also see that the balance of the time in the Break Condition has now been added to the Mark After Break.

Does this allow the JNIOR to be used as a DMX fixture? In fact it does and there is absolutely no glitch or reliability concern. Here is a short video showing relays responding to a DMX channels. A channel value greater than 127 is used to signal closure of a relay. The JNIOR’s 8 relays are mapped to consecutive channels.

Success!

Of course this would need to be evaluated with a wide range of DMX512 signal sources. The Java application on that JNIOR in the video is essentially that discussed in the previous article. It would be useful to reiterate its operation but that is beyond the scope of this article. If you should be interested in using a JNIOR as a DMX fixture just let us know. We can likely supply you with one of these prototype adapters and the application programming is open source from us. If you already have a JNIOR then your are almost there.

Now we have everything that we need to create another JNIOR Model with its own Isolated DMX512 input!

We should note that not all UARTs are created equal. There are likely UART implementations that do properly handle synchronization after the Break Condition. If you have one of those then you are golden. In the case of the JNIOR we were not so lucky. While the adapter was described above as prototype we do have several that we can supply. It is not clear that there would be any long term demand for this. A model of the JNIOR directly providing a DMX512 electrically isolated input is now feasible and would likely include the circuit as described here. Contact INTEG to find out more.

Once a JNIOR is up and running in an application it rarely needs attention. In fact many users forget that it is even there. It is however good practice to check the system logs once and a while. Fortunately there are a number of ways that you can do that. If you open the DCP (Dynamic Configuration Pages) by accessing the unit with a browser there is a ‘Syslog’ tab. This displays the log with the most recent events right at the top. You can also go to the ‘Console’ tab or otherwise open the Command Line through a Telnet connection or the serial port where you can list the log content directly to the screen or window. In this case the latest entries are the last listed.

If you are developing a new application and testing it on the JNIOR you may wish to monitor the log more closely. Naturally you can leave the DCP open on the ‘Syslog’ tab and new events would just pop up when they are logged. That might not be as convenient as it sounds especially if you are building a network-facing application and are constantly testing and working with the JNIOR using various browsers. Perhaps you are rebooting the unit causing the DCP to reconnect far too often. More typically we periodically make a Command Line connection through Telnet and manually list the log. Here we make use of the CAT command.

mqtt-test /> help cat
CAT filespec

Options:
 -H             Dump file in hex
 -J             Formats JSON
 -P             Displays last page

Displays the contents of a file.
Aliases: CAT, TYPE

mqtt-test /> cat -p jniorsys.log

Now the jniorsys.log file is limited to about 64KB characters. That can represent a lot of lines. Do you really want to list those every time? Here we see that the CAT command also provides the -P option which will show only the last page (25 lines) of the file. Of course you do have to remember to include the option. One side note here is that with JANOS you can supply the options anywhere on the line. So if you at first forget to include the -P you can place it after the filename. No problem.

We can improve on this by making use of JANOS batch file capabilities. A batch file is a kind of script file that can contain commands to be executed one after another. Most operating systems, JANOS included, provide some facility for batch execution. But you might be wondering how do you create batch files or if you do that externally how do you get them onto the JNIOR? Well here’s a tip: You can create them on the fly. Of course, the simpler the file the easier that can be accomplished.

For example we will create a simple batch file called log.bat as as kind of short hand that we can use when we want to display the System Log. Here we simply ECHO the command routing the text to the file. Immediately we can use the short batch file name to execute the command.

mqtt-test /> echo cat -p jniorsys.log > log.bat

mqtt-test /> log
cat -p jniorsys.log
10/03/18 18:54:15.850, FTP/10.0.0.27:58531 uploaded /flash/MQTT.jar [325.8 kbps]
10/03/18 18:54:24.571, ... (etc.)

Here we see that we used the ‘>’ pipe character to route the output to the specified file. If you need to build a batch file with more than one command you can append to it using ‘>>’ as well. The power here comes in combining commands to achieve some goal. The advantage being that you can repeat the procedure easily using the batch file name as shorthand. While batch files in JANOS are not as fully featured as you will find in Linux or MS-DOS systems most of the basics are there.

Suppose instead we want to list only today’s events. Here we create our batch file in way that lists only those entries with today’s date. Check it out.

mqtt-test /> echo grep %DATE% jniorsys.log > log.bat

mqtt-test /> log
grep 10/04/18 jniorsys.log
10/04/18 12:10:48.657, Ending session Command/10.0.0.27:64514 (pid 132)
10/04/18 12:37:53.642, Command/10.0.0.27:64075 login 'jnior' (pid 405)
10/04/18 12:38:05.179, FTP/10.0.0.27:64118 login 'jnior'
10/04/18 12:38:05.711 ... (etc.)

And there you go. You now have a simple way to check recent System Log activity. Now I bet that you are thinking how you might alter this to be even more helpful. Well, if you find that you cannot achieve what you have in mind just let us know. An advantage that you have with INTEG is a direct link to the technical team and the advantage we have is the power to implement what you need.

The Model 412DMX generates a DMX512 Universe and allows the JNIOR to control DMX fixtures like those used in stage lighting. What if you needed a fixture with relays that can be controlled by DMX? Perhaps you need to output channels over a 4-20ma loops. Maybe you need a 10 VDC output signal to control LED house lighting. Can the JNIOR receive DMX? Can the JNIOR be a DMX Fixture?

We showed you how you could control DMX fixtures with a standard Model 410 in a White Paper available here:

http://www.integpg.com/downloads/docume … tation.pdf

Now we have the 412DMX JNIOR designed for that purpose. Can the Model 410 also serve as a DMX fixture? Yes, it can. I’ll show you how here and we’ll see how we manage to accommodate some of the unique aspects of the DMX512 format with the JNIOR.

Cabling

We can use the JNIOR Model 410 because the AUX port is compatible with RS-485. In the white paper explaining how the 410 can be used to control DMX fixtures we described an adapter cable taking the DB9 output from the JNIOR and presenting the proper female XLR connector for DMX. Now since a DMX fixture always has both a male and female 5-pin XLR connectors, our cabling has to be slightly different. Note that you can do this with the 3-pin XLR (as I have) if that is appropriate for your situation.

Here is an example of one that we put together.

This can be constructed by splicing into a standard DMX extension cable. A number of DB9 adapters with screw terminals like the one pictured can be found on Amazon. Note that you will want one with large screws compatible with larger wire sizes. DMX wiring is typically of a larger diameter and you will need to successfully clamp two wires in each of three positions on the adapter.

Here is the pin numbering. Note that wire colors vary.

        Signal           XLR      DB-9 Male
--------------------  ---------  -----------
Signal Ground (GND)       1          5
Data (D-)                 2          3
Data (D+)                 3          7
Not Used (NC)            4,5     1,2,4,6,8,9

This cable allows the JNIOR to be a DMX FIXTURE.

THE RESULTING DMX CONNECTION IS NOT ISOLATED. We recommend using an isolated power supply for the JNIOR and not sharing that voltage with other circuits. Take great care in making ground connections. Note that the JNIOR relay outputs are naturally isolated.

Serial Connection

Connect the adapter to the Model 410 AUX serial port as I have in this photo and connect this to the DMX network. Note that the 412 and 414 are not RS-485 compatible and cannot be used for this purpose.

The serial port parameters should be set as follows. This is done through the Dynamic Configuration Pages (DCP) that should come up when accessing the JNIOR using your browser. You enable the RS-485 mode here so the AUX port output doesn’t disrupt the DMX communications before you have a chance to run the DMXFIXTURE application that I will describe. That application will also configure the AUX port just to make sure that all is well.

If you encounter “Applets” instead of the DCP then your Series 4 needs to be updated or you have a Series 3. The latter also cannot be used for this application. You will need to update your JNIOR to JANOS v1.6.6 or later for the functionality to be described here.

Data

With the JNIOR Model 410 wired to the DMX network and the AUX serial port properly configured the unit should be receiving data. There is a simple way to check that. You can see data without any application running just by using the IOLOG command. Here we enter the Console (or Command Line Interface) and use this command.

CODE: SELECT ALL

InfoComm_LED /> help iolog
IOLOG

Options:
 -T             Indicate transitions
 -R             Reset logs
 -A             AUX Serial log
 -S             Sensor Port log
 -O             Output to stdout

Generates jniorio.log file from available logs.

InfoComm_LED /> iolog -ao
--  07/02/18 15:42:46.098
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--80--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--FF--00--FF--80--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--83--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--80--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--FF--00--FF--80--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--83--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--80-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--FF--00--FF--80-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--83--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-80--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--FF--00--FF-    ................
-80--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--83--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-    ................
-00--00--00--00--00--00--00--00--00--00--00--00--00--00--00-        ...............

InfoComm_LED />

If you scan down in the above output and look through the data you will see that there are a couple of channels at 100% (0xFF) and a couple near half (0x80’ish). There are two pretty major issues in trying to read these bytes with standard library read() functions.

  1. How do we know how to find Channel 1? There are many more than 512 bytes shown here. If you read 512 bytes what you get could start anywhere.
  2. The data rate at 250 Kbaud supplies 44 complete channel sets per second! That absolutely will overrun the buffer before you can process any of it. The overrun would likely further obfuscate the data.

The fact is that the standard serial communications routines that you may be used to are just not usable here. JANOS will come to the rescue. But first let’s take a look at the data stream so we understand how this is to be resolved.

DMX Format

The DMX data on the RS-485 lines conforms to standard asynchronous serial data with 8 data bits, 2 stop bits and no parity. The bits are marched out from least significant (LSB) D0 to the most significant (MSB) D7. Each byte is called a “Slot”. The standard implementation transfers a START CODE and 512 channels in a total of 513 slots. The START CODE for normal DMX data is 0x00.

The beginning of the sequence is signaled with a Break Condition. This BREAK can be detected by the fixtures which allows them to synchronize with the stream. After the BREAK comes the START CODE (0x00 – NULL START) followed by the value for Channel 1 on through to Channel 512. Not all 512 channels need to be part of the transmission. The number of channels may vary by DMX controller. The complete implementation provides all 512.

On the oscilloscope the BREAK looks like this. Here all of the channels are 0X00 and so you see only the STOP BITS. That long low puilse is the BREAK.

TEK0002.JPG
TEK0002.JPG (100.6 KiB) Viewed 66 times

The issue is that the BREAK is difficult to handle with the standard serial port. It results in a FRAMING ERROR. During the break the signal is held at a low level. When the receiving serial UART expects the STOP BITS and they aren’t there it throws a FRAMING ERROR. While that can be detected and your application can be notified there still is no way to insure that the next bytes read from the port are those that follow the break. They may have been buffered some time before. They may be overrun by oncoming data.

In order to handle this and properly capture a reliable channel set, there must be a special function for that purpose in the AUX port class (AUXSerialPort). Of course, being the author of JANOS, I have implemented exactly what we need. And, those details are next…

Packet Capture

To read the DMX Packet (START CODE plus up to 512 Slots/Channels) we need to detect the Break Condition and then reliably collect as many as 513 serial bytes that immediately follow. Under many other RTOS implementations we would need to write an interrupt driven routine both to detect the Break condition and then also to collect the data. The JNIOR executes application programs written in a managed language (Java) and one does not have low level access to write things like serial interrupt routines. That is actually a good thing as the user generally does not have the programming experience. Such low level user programming often leads to unstable/unpredictable operation.

Here we rely on JANOS to maintain reliable operation. Low level interrupt routines have already been implemented to buffer incoming serial data and otherwise issue a notification of errors. Recall that the Break Condition manifests itself and one or more FRAMING ERRORS. But we have already established that reading buffered serial data and receiving asynchronous notifications is not going to be sufficient for capturing a DMX packet. This is where we benefit from having developed JANOS in-house and having authored 100% of it. Here we identify a need and are able to promptly and correctly implement a solution.

AUXSerialPort.readAfterBreak(byte[] buffer)

I have added the readAfterBreak() method to the AUXSerialPort class in the JANOSClasses.jar library. From the naming its use is self-explanatory. Here you create a buffer as a byte array and pass it to JANOS. The operating system enables the capture and then blocks the thread until the data collection completes. At the low-level JANOS sets up the buffer with a pointer and goes into a kind of ‘armed’ state. The interrupt routine that detects FRAMING errors has a tiny bit of code that checks for an armed capture and ‘triggers’ the collection of data. The interrupt routine that collects and buffers serial bytes from the port has code to set each byte aside into the buffer that you have provided. Once triggered the capture passes into the a ‘collection’ mode. When the buffer is full (or when another Break Condition is detected) the capture is ‘complete’ and the application program can proceed now with a byte array containing the DMX data.

Now to benefit from this new feature, you will need to update your Series 4 to run JANOS v1.6.6 or later. At the moment this is Beta code. We would make it available if you were to want to try this before its release. All you need do is ask.

Next we need to try it out…

DMX Capture Test

Here we create a project in Netbeans (making the few settings needed to target it for JANOS) and create the following test program. This merely takes control of, and fully configures, the AUX port in case it has not been configured through the DCP. Lines 23 and 24 test our new method. The rest merely dumps the byte array content for review.

package dmxfixture;
 
import com.integpg.comm.AUXSerialPort;
import com.integpg.comm.SerialPort;
 
public class Dmxfixture {
 
    public static void main(String[] args) throws Throwable {
 
        // AUX port access and configuration. We need to open the port to gain exclusive access and
        //  set the proper baud rate and format. We enable RS-485 mode and make sure that the receivers
        //  are enabled. With normal RS-485 you would disable the transmit drivers. Our adapter doesn't
        //  bridge the transmit and receive lines anyway and the DCP configuration automatically disables
        //  the drivers. It is here for clarity.
        AUXSerialPort aux = new AUXSerialPort();
        aux.open();
        aux.setSerialPortParams(250000, 8, 1, SerialPort.PARITY_NONE);
        aux.setRS485(true);
        aux.enableReceivers(true);
        aux.enableDrivers(false);
        
        // capture a complete frame using our new method
        byte[] data = new byte[513];
        aux.readAfterBreak(data);
        
        // The remainder here is a fancy dump (skipping the START CODE). Note how JANOS implements the
        //  printf formatting for us.
        for (int i = 1; i < data.length; i++) { 
            if (i % 10 == 1)
                System.out.printf("%04d  ", i);
            System.out.printf("%4d ", data[ i ] & 0xff);
            if (i % 10 == 0)
                System.out.println("");
        }
        System.out.println("");
    }
    
}

To run this we first build it in Netbeans. Then using the DCP we open the Folders tab and select the /flash folder. We then drag the dmxfixture.jar file from the project to the /flash folder (it can be executed from the root too). Then under the Console tab we log in and execute the application. The following is the result.

CODE: SELECT ALL

InfoComm_LED /> dmxfixture
0001     0    0    0    0    0    0    0    0    0    0 
0011     0    0    0    0    0    0    0    0    0    0 
0021     0    0    0    0    0    0    0    0    0    0 
0031     0    0    0    0    0    0    0    0    0    0 
0041     0    0    0    0    0    0    0    0    0    0 
0051     0    0    0    0    0    0    0    0    0    0 
0061     0    0    0    0    0    0    0    0    0    0 
0071     0    0    0    0    0    0    0    0    0    0 
0081     0    0    0    0    0    0    0    0    0    0 
0091     0    0  255    0  255  128    0    0    0    0 
0101     0    0    0    0    0    0    0    0    0    0 
0111     0    0    0    0    0    0    0    0    0    0 
0121   131    0    0    0    0    0    0    0    0    0 
0131     0    0    0    0    0    0    0    0    0    0 
0141     0    0    0    0    0    0    0    0    0    0 
0151     0    0    0    0    0    0    0    0    0    0 
0161     0    0    0    0    0    0    0    0    0    0 
0171     0    0    0    0    0    0    0    0    0    0 
0181     0    0    0    0    0    0    0    0    0    0 
0191     0    0    0    0    0    0    0    0    0    0 
0201     0    0    0    0    0    0    0    0    0    0 
0211     0    0    0    0    0    0    0    0    0    0 
0221     0    0    0    0    0    0    0    0    0    0 
0231     0    0    0    0    0    0    0    0    0    0 
0241     0    0    0    0    0    0    0    0    0    0 
0251     0    0    0    0    0    0    0    0    0    0 
0261     0    0    0    0    0    0    0    0    0    0 
0271     0    0    0    0    0    0    0    0    0    0 
0281     0    0    0    0    0    0    0    0    0    0 
0291     0    0    0    0    0    0    0    0    0    0 
0301     0    0    0    0    0    0    0    0    0    0 
0311     0    0    0    0    0    0    0    0    0    0 
0321     0    0    0    0    0    0    0    0    0    0 
0331     0    0    0    0    0    0    0    0    0    0 
0341     0    0    0    0    0    0    0    0    0    0 
0351     0    0    0    0    0    0    0    0    0    0 
0361     0    0    0    0    0    0    0    0    0    0 
0371     0    0    0    0    0    0    0    0    0    0 
0381     0    0    0    0    0    0    0    0    0    0 
0391     0    0    0    0    0    0    0    0    0    0 
0401     0    0    0    0    0    0    0    0    0    0 
0411     0    0    0    0    0    0    0    0    0    0 
0421     0    0    0    0    0    0    0    0    0    0 
0431     0    0    0    0    0    0    0    0    0    0 
0441     0    0    0    0    0    0    0    0    0    0 
0451     0    0    0    0    0    0    0    0    0    0 
0461     0    0    0    0    0    0    0    0    0    0 
0471     0    0    0    0    0    0    0    0    0    0 
0481     0    0    0    0    0    0    0    0    0    0 
0491     0    0    0    0    0    0    0    0    0    0 
0501     0    0    0    0    0    0    0    0    0    0 
0511     0    0 

InfoComm_LED />  

We note that channels are correct. Here we go over to the 412DMX controlling this DMX network and check Kevin's DMX panel page for comparison.

Putting it to Work

Now we can receive a DMX frame and read the individual channels what can we do with it? I mean other than dump it?

Well Kevin has defined an eight channel fixture starting at DMX channel 121. The idea being that each channel would correspond to a JNIOR Relay Output. Channel settings from 0-127 would result in an open/off relay and values in the range 128-255 would close the relay. You can imagine any use that you would want given the flexibility that you now have in JNIOR programming. Let's implement this particular fixture.

The approach will be to sample a DMX packet periodically and set the relays appropriately. There is no need to catch every DMX packet and in fact we are not likely going to be able to do that. We are also going to be considerate of the JNIOR CPU and anything else that the unit might want to be doing. We will sample say every 1/4 second and sleep in between.

Here is the program. This uses an infinite loop to sample the DMX stream about 4 times a second. The starting address must be defined in the Registry. This could be cached. With this implementation you can change the starting address without rebooting or restarting the DMXFIXTURE program. It is presume that you would start the DMXFIXTURE program automatically at boot with a Registry Run key.

package dmxfixture;
 
import com.integpg.comm.AUXSerialPort;
import com.integpg.comm.SerialPort;
import com.integpg.system.JANOS;
 
public class Dmxfixture {
 
    public static void main(String[] args) throws Throwable {
 
        // AUX port access and configuration. We need to open the port to gain exclusive access and
        //  set the proper baud rate and format. We enable RS-485 mode and make sure that the receivers
        //  are enabled. With normal RS-485 you would disable the transmit drivers. Our adapter doesn't
        //  bridge the transmit and receive lines anyway and the DCP configuration automatically disables
        //  the drivers. It is here for clarity.
        AUXSerialPort aux = new AUXSerialPort();
        aux.open();
        aux.setSerialPortParams(250000, 8, 1, SerialPort.PARITY_NONE);
        aux.setRS485(true);
        aux.enableReceivers(true);
        aux.enableDrivers(false);
 
        // here we create an infinite loop to continuously process the DMX data
        byte[] data = new byte[513];
        for (;;) {
            
            // capture a complete frame
            aux.readAfterBreak(data);
            
            // Obtain the starting address. If it is invalid or not defined no action is taken.
            int addr = JANOS.getRegistryInt("DMX/Address", 0);
            if (addr > 0 && addr < 505) {
                
                // Although we don't have to we are going to collect all of the relay states
                //  and set them simultaneously. This will also take advantage of signed values
                //  in Java. Values in the range 128-255 will appear to be negative if we don't
                //  mask them with 0xff.
                int bits = 0;
                for (int i = 0; i < 8; i++) {
                    if (data[addr++] < 0)
                        bits += (1 << i);
                }
                JANOS.setOutputStates(bits, 0xff);
            }
            
            // sleep for a quarter second
            System.sleep(250);
        }        
    }
    
}

This program should be pretty easy to follow. Let's test it.

Demonstration

A video can best demonstrate the operation of this program. Here we have a DMX application running on a 412DMX (10.0.0.242) allowing us to vary the channels that we associate with our 410 fixture. A separate Model 410 running our DMXFIXTURE (10.0.0.250) program can be monitored remotely through its DCP page. Here we overlap the two browser entities and we can see how modifying the channel fader results in the relay status change out across the DMX network.

Reliability

Let's look into potential error conditions and the reliability of this approach. The DMX format typically supplies nearly 44 frames per second. If there is a communications error, due to electrical noise for instance, one and possibly up to a few frames might be in error. For a light fixture this might cause a minute flicker or some small flinch in pointing. But, given the frame rate it is quickly corrected and might not be even noticeable. If we are interpreting a frame with our program we need to be extra careful not to trigger a chain of events based upon an error packet.

Typically in data protocols we would have some form of checksum or CRC which we can use to identify an erroneous transmission so it can be ignored. There is no such thing in the DMX512 protocol. So what steps can we take?

Well to start we should verify that the START CODE is the expected NULL START 0x00 and ignore any frame with a different code. The controller might actually be inserting those and we must ignore them. I will adjust the program to check this.

Well... The START CODE is returning 128 (0x80) and the channels appear to be properly registered (e.g. in the right place). Now to look into this.

Synchronization After Break

The DMX512 specification defines the width of the Break Condition as something greater than 92 microseconds. It is important to note that it is something greater than twice that of a single slot time (the time to receive a single byte) of 44 microseconds (11 bit times - start bit, 8 data bits and 2 stop bits). It is not a precise multiple of slot times or even bit times. This forces the receiver to synchronize with each and every packet.

Given this I could make the argument that the Mark After Break should be at least one slot time of 44 microseconds in order to insure that the leading start bit of the first slot is successfully interpreted. The DMX512 specification however specifies the minimum Mark after Break of 12 microseconds. This puts us at the mercy of the UART design and its ability to synchronize following a Break Condition of arbitrary length. There are a number of possible outcomes that depend on what the UART decides is the first STOP BIT once the Break Condition passes.

  • For example, if the beginning of the Mark After Break is seen as a valid STOP BIT then a 0x00 byte is received AHEAD OF the normal NULL START code 0x00. This extra 0x00 can be interpreted as a valid START CODE but all of the channel slots are off by 1. Channel 2 would have the value for Channel 1. This is an ERROR!
  • If the Mark condition just slightly into to Mark After Break is interpreted as a valid start bit then an extra 0x80 is received AHEAD of the START CODE. This might be seen as a bad packet if the START CODE is verified. Channels are also shifted if values are used. This is a ERROR!
  • The above continues with each bit time advance into the Mark After Break generating an initial extra byte of 0xC0, 0xE0, 0xF0, 0xF8, 0xFC, 0xFE and 0xFF depending on the length of the Mark After Break. In each case the START CODE would then be considered the Channel 1 value. ERRORs result!
  • With a short Mark After Break the UART might look at a low bit value in the START CODE as a missing STOP BIT and generate yet another FRAMING ERROR. Again depending on the timing the START CODE might be returned as 0x80 with the first STOP BIT actually being interpreted as the MSB. In this case the Channel data is properly positioned. This is actually the most common mode I am seeing in the current set up. It is timing sensitive. This is also an ERROR!

If you follow this logic you might see that it is possible that it may take a couple of regular slot times before the UART grabs something it is happy about. It is all about the synchronization aspect of the hardware design.

The question is how to know when you are receiving valid data and properly aligned slots? Is there a solution to this?

A UART that requires a Marking Condition before attempting to detect a START BIT (falling edge) would function properly. Apparently they don't work this way. At least not all of them.

UART Issue

The problem that we run into is an ancient design flaw in serial ports.

A Framing Error results when the UART (RX SCI) expects a Stop Bit and none is detected. A Stop Bit is a high (1 Marking) and during a Break Condition the signal is held low (0 Space) so a Framing Error is quickly encountered. Now most descriptions of UART logic suggest that after a Break the UART locates the next Start Bit (0 Marking) and that this is detected by a high to low transition of signal (1 -> 0). Logically it is done that way for asynchronous reception as the UART clock needs to synchronize and then sample the middle of each bit period.

In reality after a Framing Error the UART seems to see the next low (0 Space) as a Start Bit and continues to read bit data. As a result Framing Errors are repeated throughout the break period. A bogus byte value might appear to be properly read if the tail end of the Break Condition aligns with the UART in a way to make the high (1 Marking) after the Break look like a Stop Bit.

The likelihood of this bogus data byte and its content can vary depending on the length of the Break and the length of the Marking after the Break and before actual data is present. Since bytes are serialized LSB first these extra bytes look like one of 0xFF, 0xFE, 0xFC, 0xF8, 0xF0, 0xE0, 0xC0, 0x80 and even 0x00.

If the Marking after Break is brief (only a few bit times) and the alignment falls such that the UART looks at bits in the first byte of data for that magic Stop Bit, you will receive an incorrect value for the fist byte. It is conceivable that the UART might take several bytes before synchronizing and providing real data.

If the UART simply fell into a mode whereby it actually did search for the next Start Bit by looking for a valid high to low transition (1 -> 0), you would get a single Framing Error followed by the proper collection of data. But no... after 50+ years we have not addressed this issue. I half recall struggling with this exact thing maybe 30 or so years back now. The fact that it is still an problem is not impressive.

I guess I shouldn't be surprised in that these hi-tech MCU processors all still include the Real Time Clock (RTC) circuit first designed for the very first digital watches in the late 1960's. This forces us to parse time into Day, Month , Year, Hour, Minute and Seconds as if setting a watch on your wrist. In fact Seconds can only be reset to 00 and not directly set. On boot we have to read the time and reassemble it into Linux or Internet time as a tally of milliseconds since some epoch. Lots of work that causes loss of precision. And the ideal would be a non-volatile battery-backed 64-bit millisecond counter. Sometimes silicon space is limited and this counter would save lots of that. But no... these integrated circuit companies aren't as swift as we would like to think.

Since DMX512 signals can have different lengths of Break and Marking after Break and these can vary depending on source, and since the protocol has no leading header that can be used in identifying valid frames, we are NOT ABLE to reliably receive data. Note that if the DMX512 Standard had forced the Mark After Break to be at least one data Slot long (> 44 microseconds) then UARTs would likely properly synchronize and reliably present the first byte of data. But the spec does not and the problem is that changing the standard now does not correct all of the DMX controllers already in use all over the world. So it is what it is.

So for us to insure that we read a valid frame, we need to resort to some trickery, filtering and indeed AI. While that can be fun, it's unfortunate.

Beginning with JANOS v1.6.4 you will be able to adjust the Time-To-Live (TTL) parameter used by the network stack.

The IpConfig/TTL Registry key defines the lifespan of a network packet. The time-to-live value is a kind of upper bound on the time that an IP datagram can exist in the Internet system. The value is reduced with the passage through a router. If it reaches 0 the packet is discarded. The default value has been increased to 128 from the value of 64 used prior to JANOS v1.6.4.

The TTL setting can be considered to limit the maximum radius (in terms of hops) of the network within reach of the JNIOR. The default setting should allow packets to reach the far end of the globe. A low setting would limit access to the unit as only those in the local vicinity could communicate. In this respect the TTL setting can be used to improve device security.

A very low setting of 1 or 2 would constrain the JNIOR to the local network. One must consider the need to reach Doman Name Servers (DNS) and Network Time Servers (NTP). There may also be the requirement for email transfers wherein the JNIOR needs to reach out to a SMTP Server. To help determine the minimum setting you may be able to use your PC’s TRACERT command to detect the hop count involved in reaching those destinations. The JNIOR does not support a route tracing function.

Real World Test

Luckily we have a neat way to test the effect of reducing TTL. We have a JNIOR we call HoneyPot sitting on the open Internet. Naturally it comes under a constant level of attack. For instance there is a fairly constant level of random login attempts on the Telnet port. On the JNIOR the Telnet port provides access to the JANOS command line interface. We log failed login attempts to a @/access.log@ file.

Log files on the JNIOR rollover to BAK files when they reach 64 KB in size. We keep only one BAK file for each log. Typically an application would archive BAK files when longer term logging is desired. A syslog server can be used for the system log @/jniorsys.log@ for longer term logging.

On HoneyPot we have an application that takes the access.log when it rolls over and analyzes the hosts attempting to log into the unit. IP addresses are added to a database (JSON based) covering data from the past 24 hours. The application uses a locating service to identify the geographical location of the host. A simple web page http://honeypot.integpg.com/map.php receives the database and uses the Google Maps API to plot these locations.

By default JANOS uses a TTL of 128. The map typically appears as follows:

If we reduce the TTL to 16 the map changes. Note that this seems to thin out the number of hosts able to communication with the unit. It does not seem to create a geographical radius.

The thinning effect is useful but one gets the feeling that systems within our own country may no longer be able to communicate with the unit.

The further reduction of TTL to 12 begins to suggest a geographical radius. Note in the following how the unit now seems to be invisible in China. This might suggest that our friends in far away places might actually be using shortcuts in the network to gain access to systems in the United States.

Of course, for a controller the most important aspect of this kind of security is whether or not YOU can access your own unit. In that case you might also use the IP filtering functionality of the device and limit access to only YOU.

One note. With the TTL limited to 16 the HoneyPot unit had trouble reaching some of the @pool.ntp.org@ NTP servers for synchronizing the clock. By limiting the radius of the network you may limit the useful services such as DNS and NTP.

 

So this test fails in that the service that is used to determine a location for an IP address is about 12 hops away. Here we see it is 13 from inside INTEG.

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Windows\system32>tracert ip-api.com

Tracing route to ip-api.com [69.195.146.130]
over a maximum of 30 hops:

  1    <1 ms    <1 ms    <1 ms  10.0.0.1
  2     1 ms     1 ms     1 ms  50-197-34-78-static.hfc.comcastbusiness.net [50.197.34.78]
  3    12 ms     9 ms     9 ms  96.120.62.245
  4    10 ms     9 ms     8 ms  te-0-1-1-1-sur01.westdeer.pa.pitt.comcast.net [68.86.146.225]
  5    32 ms    15 ms    15 ms  be-62-ar01.mckeesport.pa.pitt.comcast.net [69.139.195.37]
  6    28 ms    21 ms    37 ms  be-7016-cr02.ashburn.va.ibone.comcast.net [68.86.91.25]
  7    21 ms    20 ms    20 ms  be-10130-pe04.ashburn.va.ibone.comcast.net [68.86.82.214]
  8    20 ms    20 ms    19 ms  23.30.206.206
  9    61 ms    61 ms    72 ms  xe-0-2-1.cr2-kan1.ip4.gtt.net [213.254.215.121]
 10    62 ms    52 ms    51 ms  ip4.gtt.net [69.174.12.26]
 11    52 ms    53 ms    60 ms  10.0.1.137
 12     *        *        *     Request timed out.
 13    52 ms    51 ms    51 ms  us-mo-1.free.ip-api.com [69.195.146.130]

Trace complete.

C:\Windows\system32>

And as a result with TTL restricted to 10 I get a lot of these errors.

04/05/18 08:29:12.949
** Uncaught java/io/IOException thrown: "Unable to connect to remote host"
   in java/io/IOException.<init>:(Ljava/lang/String;)V
   in java/net/PlainSocketImpl.connect:(Ljava/net/InetAddress;I)V
   in java/net/Socket.<init>:(Ljava/net/InetAddress;ILjava/net/InetAddress;IZ)V
   in java/net/Socket.<init>:(Ljava/lang/String;I)V
   in jaccess/JAccess.main:([Ljava/lang/String;)V at line 71

Just a note that I generally create application programs that are not destined for customer deployment with a throws Throwable clause. This insures that every exception is logged to the errors.log file and I don’t need to busy the code with try-catch structures. The application uses the com.integpg.system.Watchdog class which restarts the application after a timeout. You can see this in the system log up until I removed the TTL restriction.

In summary…

Reducing the TTL reduces the “radius” of the the accessible Internet but that does not precisely correspond to a geographic radius. Sites in Russia appear to have access to our Internet vicinity through less hops than some citizens in this country. Still it is a good defense in limiting access to the JNIOR so long as the resources your application uses can still be reached.

I had been thinking about this.

In testing by running with a low TTL we ran into problems where the JNIOR had difficulty reaching services it requires (like NTP) while locations perhaps even in Russia could still reach us. It seems to me that the standard large TTL should still be used for all outgoing communications. But a reduced TTL applied only to incoming connections. Specifically to UDP replies and TCP/IP SYN ACK responses. This would prevent distant (Internet radius wise) hosts from initiating connections or soliciting UDP replies.

The issue with UDP is that the original source TTL is unknown. So we cannot filter on it. The UDP would be received and would be processed. That packet would represent a vulnerability. All we can do is prevent any response from making it back to the malicious host.

The JNIOR Model 410, 412 and 412 each have two available serial ports. Each port providing at least a 3-wire RS-232 interface. A 3-wire connection contains only the Transmit (Tx), Receive (Rx) and Signal Ground (GND) circuits. This is the bare minimum for Duplex communication or interfaces utilizing software handshakes. The Rx line may be omitted if only sending data. Similarly the Tx might be omitted if only receiving data.

In addition to the 3-wire signals the AUX port supports optional hardware handshaking using the Request To send (RTS) and Clear To Send (CTS) signals. The Model 410 AUX port also provides a configuration for RS-422 and RS-485 communications.

While there are a number of parameters that must be properly configured in order to achieve functional and reliable communication, the biggest issue is (and has always been) proper cabling. If an RS-232 connection is not working and it is the first time the connection has been made, the connections are probably not correct.

Originally the RS-232 standard was created to support the connection of a modem. Before networking the modem was used to extend communications over standard telephone lines. Typically a computer (an IBM 360 for instance) would connect to a modem. At home a user would connect their terminal to another modem and establish a remote connection via dial-up. There are two types of equipment in this scenario: the computer stuff and the communications stuff (modems). The RS-232 standard defines two acronyms for this: DTE and DCE. These are used extensively to define connector types and signal definitions.

This is where the confusion begins. The acronym DTE refers to Data Terminal Equipment and in our example above this includes both the Computer and the Terminal (CRT or Teletype). That would be the stuff that you would be trying to connect together had you not needed the modems. The term DCE is often confused and is meant to refer to Data Circuit-Terminating Equipment or Data Communications Equipment. That being the modem in the above example. It does not stand for Data Computing Equipment which implies the computer. These terms are often confused and, perhaps, never really understood. As a result even the engineers who design the equipment (including myself) often employed the incorrect connectors, signal terminology and pin assignments. So let’s not use these designations.

JNIOR Serial Ports

The JNIOR has a COM port (labelled RS-232) and an AUX port (labelled AUX Serial). Both are DB-9F Female 9-pin D-sub connectors. The AUX port has 4 active signals and the COM port 2. The pin assignments are as follows:

2 >> RS232 TX / RS485 TX-
3 << RS232 RX / RS485 RX-
5    GND
7 << RS232 RTS / RS485 RX+
8 >> RS232 CTS / RS485 TX+

Here is how it shows on the schematic. Note that even the pin numbering on the the connector itself can be confused. The (>>) indicates an output. The JNIOR generates a voltage on this pin and it must be connected to an input at the other end. The (<<) indicates an input. This should be connected to an output at the other end. We will cover RS485 in a little bit.

You can see that we do not use DCD, DSR, DTR and RI. These are unconnected. The COM port follows the same assignments but ONLY pins 2, 3, and 5 are used.

Here is the source of additional confusion. The JNIOR transmits data on Pin 3 and therefore from the JNIOR’s point of view THAT is Transmit Data (TX or TxD). But when that signal reaches the other end (say your PC) it is incoming data or Receive Data (RX or RxD). That is because from the point of view at the PC it is data that would be received. So you connect RXD to TXD and visa versa.

Not everyone labels it that way. You will find an input pin labelled TxD. The thinking is that you would connecte TxD to TxD. After all you do connect CTS to CTS as the signal is Clear To Send regardless as to who generated it and who is listening to it. The same goes for Request To Send (RTS).

It is not surprising that we sometimes have to grab a voltmeter to see if a pin is generating an RS232 voltage level (an output) or not (an input). Even that can be misleading when pull-up resistors are used. I used to have a couple of really sweet RS-232 break-out boxes. Those have gotten lost but were life savers back in the day. You know, nice colored LEDs showing outputs and jumper wires that you could use to test various cabling solutions before soldering the final cable.

JNIOR to PC Connection

Well today if you want to connect the JNIOR to your PC you will need a USB-To-Serial adapter. You would likely want to do that to gain access to the JANOS Console (command line interface) available over the COM port (115.2Kbaud, 8 data bits, 1 stop bit, no parity). The adapter will present you with a DB-9M Male connector identical to what you would have found on an older PC as a COM or AUX port connection. The connector (DTE) can be directly plugged into the JNIOR COM (or AUX) port (DCE).

Some USB-To-Serial adapters provide a length of cable and others are relatively short. If you need a longer cable then you either use a USB extension or an Male-To-Female Straight-Thru Serial Extension cable. The latter would need only be 3-wire unless your application optionally employed the hardware handshake. I will cover that a little later.

You would use this same approach to connect the JNIOR’s AUX port to a PC-based media server or other system that uses the standard PC serial ports. An application on the JNIOR can then send and recieve data or commands to the remote server.

Connecting a Device to the JNIOR

If you plan to connect a barcode scanner or other device to the JNIOR then you might need a little help. You may need a 9-pin Gender Changer. There are two kinds: F-F and M-M. You may need the Male-To-Male (m-M) Gender Changer. This has pins on both sides and when plugged into the JNIOR it changes the connector from a Female DB-9F to the equivalent of a Male DB-9M. Unfortunately this does not alter the pin assignments and if the device was designed to be plugged into a PC then you will need a cross-over adapter or cable. The cross-over exchanges pins 2 and 3 (as well as 7 and 8). Remember that you want to always connect an output to an input. Sometimes this is called a Null Modem adapter, the name coming from the need to interconnect two DTE devices without modems.

Perhaps in hindsight it would seem that the JNIOR AUX port should have been DTE. In fact in the beginning we did not use a DB9 connector at all and provided screw terminals for the 5 signals since we would be required to connect to either DTE or DCE. The reality is that in Cinema (which was an early and big market for JNIORs) we connected often to media servers (which are essentially PCs) and the current DCE arrangement worked best for those customers. That stuck.

So as a result you end up with stuff like this.

Of course if you are handy with the soldering iron and get some solder-cup DB9 connectors and hoods from Digi-Key, you can clean this up nicely.
They had hoped to solve all of this with USB but that has created other issues.

It didn’t help RS-232 that from the beginning no one fully understood how to document it. Some of us might remember the detailed signal diagrams explaining plus and minus 12V states, start and stop bits, and little endian order in the back of manuals. That level of detail was just adding to the confusion.

Here is a modern day failure. This is from a product received in 2017. At first glance you would think this is good documentation.

Here only the boldface signals are available or can be used. Perhaps only those should be shown. But beyond that picky item the important piece that is missing is any indication or what is an output and what is an input. You can naturally make your own assumptions. You might correctly assume that Received Data (RxD) is information generated (or output from) the remove connection and therefore an input at this connector. The TxD would then be an output. I mean you only have two choices here and chances of being correct are 50/50. If you are working a soldering iron though you won’t appreciate making the wrong guess.

It is not so obvious as to whether the CTS or RTS connection is an input or output. These signals are shown here but are they used? Are they required? Is there an option setting some place of which you should be aware?

So if you have the diagram for the other piece of equipment that you are connecting should you wire straight thru? Do you wire TxD to RxD and vice versa? If that ends up crossing over from pin 3 to pin 2 and vice versa should you also cross over RTS and CTS? Who knows. RS-232 failure.

My point though is that this nice little picture doesn’t eliminate the chance that your cabling or the cable you make might not work. And, if it doesn’t work you don’t have enough information to decide what to change. Come on man! You can do better.

Digital Files are a given entity in the programming world. They can contain 0 to many bytes each of 256 values. Those could be ordered to represent everything from common text (ASCII) to binary bit streams (compressed data) and everything in between. Files therefore have a size and they also have a timestamp. These days that timestamp represents the date and time of the last modification to the file. They also carry permissions which can control who can access the file or even know of its existence.

There is also location. In part that means where the file is positioned within a directory or folder structure. We are also concerned with the type of media in which the file is stored. You know, did you put the file on a memory stick, on the hard drive or in the Cloud? This aspect of “location” is what we are going to consider here in this topic. Files stored on your JNIOR end up in one of four different areas but yet all appear to be in the same place. Where your file is located can affect performance and the longevity of the your data.

Each JNIOR contains multiple memory components each of which can provide for file storage. These are integrated into a single File System. There are 4 types of storage: RAM, DRAM, Flash and ROM. You actually utilize them all in routine operation. Let’s look into it.

Non-volatile Battery-Backed Static Random Access Memory (SRAM)

When you enter the JNIOR’s Console (Command Line Interface) either through the serial port, using Telnet or by opening the DCP, your working directory is the root of the File System or “/”. By performing a DIR/LS command with the -L option you see content details generally containing the system’s basic log files. You can also see that there are a number of sub-directories or sub-folders. I struggle with terminology here. Do you use “directory” or “folder”? I think that I haphazardly vacillate between the two.

bruce_dev /> dir -l
total 10
drwxrwxrwx   1 root      root           8 Jan 26 08:22 .
drwxrwxrwx   1 root      root           8 Jan 26 08:22 ..
dr-xr-xr-x   1 root      root           1 Dec 31 1999  etc
drwxr-xr-x   1 root      root          58 Jan 26 07:38 flash
drwxrwxrwx   1 root      root           0 Jan 25 15:13 temp
-rw-r--r--   1 root      root       40968 Jan 26 08:13 jniorsys.log
-rw-r--r--   1 root      root         956 Jan 26 08:13 jniorboot.log
-rw-r--r--   1 root      root        1005 Jan 26 08:00 jniorboot.log.bak
-rw-r--r--   1 root      root       40302 Jan 26 07:38 web.log
-rw-r--r--   1 jnior     root       22434 Jan 25 14:53 manifest.json
  1763.2 KB available

bruce_dev />

Now you might immediately notice that there is only 1763 KB available. That’s not very much! Is that it?

No. But the File System root is located in a 2MB SRAM. This content is protected from loss by a battery. In fact, the battery is there for this purpose and to retain the current time and date during power outage. We built some JNIORs with a more expensive 4MB part but eventually realized that it wasn’t necessary. The bulk of your file storage will be located elsewhere.

The advantage of the SRAM is it’s speed and re-usability. In addition to the file system root, JANOS locates the Registry and other immutable memory blocks here. But space here is limited and it is best to preserve this area for system use. Data stored here does come with the risk of loss. This is a small probability but not an insignificant one. First of all the battery could die. If your JNIOR is powered 24/7 the battery should be there for you for 10 years and more. But if you power down the JNIOR routinely you may get 5 or 6 years out of it. Thankfully the Series 4 batteries are replaceable and you can get them at your local convenience store. Some customers though are happy to leave the dead battery not caring if their root folder is then volatile.

Perhaps more likely is that you decide to wipe the memory. You may have an application issue that gets the system into a problem condition. It is possible and we might recommend that you “business card’ the battery. So by that we mean that you remove power from the unit, open it and slip a piece of something (business cards work well) under the battery tab for a few seconds. This clears the SRAM (and the clock). Typically you only lose the logs. The Registry and therefore your configuration is backed up by another file stored in another area. But don’t worry, we recommend that procedure very very infrequently.

If you are programming your own JNIOR you might get yourself into a reboot loop. Basically your application starts up and performs something incorrectly that throws an assertion (system restart). The JNIOR reboots and restarts your application and another assertion ensues. Okay, not a great situation. JANOS eventually will detect some forms of reboot looping and it may decide to reformat the SRAM as a last ditch effort to restore access to your JNIOR. It sounds terrible but again it is a very very rare thing.

The point is that data stored in the root of the file system offers good performance and immediate data retention. It is not your best choice for long term storage. For that you want to use the Flash memory.

Flash File System

Flash memory retains data even in the absence of power. Files written in Flash memory are therefore retained even when the battery is removed. For that reason it is the best location for long term data storage. This is where you should place all of your programs, web site files and whatever else needs to be kept around. Everything under the /flash directory/folder is located in Flash memory.

bruce_dev /> dir -l flash
total 60
drwxr-xr-x   1 root      root          58 Jan 26 07:38 .
drwxrwxrwx   1 root      root           8 Jan 26 08:22 ..
drwxr-xr-x   1 root      root           1 Dec 06 11:15 cinema_backup
drwxr-xr-x   1 jnior     root           2 Dec 10 2015  generators
drwxr-xr-x   1 root      root           1 Jan 15 09:31 logs
drwxr-xr-x   1 root      root           2 Jan 15 09:32 public
drwxr-xr-x   1 root      root           2 Feb 06 2017  somepath
drwxr-xr-x   1 jnior     root          25 Jan 26 07:03 www
-rwxr-xr-x   1 jnior     root        1081 Jan 26 07:37 JTest.jar
-rw-r--r--   1 jnior     root       22434 Jan 25 14:53 manifest.json
-rw-r--r--   1 root      root        5449 Jan 23 08:33 jnior.ini
-rw-r--r--   1 jnior     root          13 Jan 11 15:16 gogo.dat
-rw-r--r--   1 jnior     root      183358 Jan 11 09:44 www.zip
-rwxr-xr-x   1 jnior     root        3043 Jan 05 10:21 JTest2.jar
-rw-r--r--   1 jnior     root         278 Dec 12 13:28 pubkey.pem
-rw-r--r--   1 jnior     root        1092 Dec 08 12:48 honeypot.cer
-rw-r--r--   1 jnior     root         272 Dec 06 13:27 key.pub
-rwxr-xr-x   1 jnior     root       20266 Dec 06 09:31 Cinekey.jar
-rwxr-xr-x   1 jnior     root      313835 Dec 04 13:44 Cinema.jar
-rwxr-xr-x   1 jnior     root        8329 Nov 21 12:04 Hmi.jar
-rwxr-xr-x   1 jnior     root        2189 Oct 04 14:24 JScan.jar
-rwxr-xr-x   1 jnior     root        3201 Sep 29 15:33 JUptime.jar
-rwxr-xr-x   1 jnior     root       58619 Aug 08 15:05 ModbusServer.jar
-rwxr-xr-x   1 jnior     root        4476 Jul 20 2017  Dmx.jar
-rw-r--r--   1 jnior     root         304 May 18 2017  test.txt
-rwxr-xr-x   1 jnior     root      169011 Apr 24 2017  snmp.jar
-rw-r--r--   1 jnior     root        1041 Feb 28 2017  key.pem
-rw-r--r--   1 jnior     root         902 Feb 15 2017  bruce_dev.cer
-rwxr-xr-x   1 root      root        4820 Jan 30 2017  jAccess.jar
-rwxr-xr-x   1 root      root        2174 Jan 23 2017  jPing.jar
-rwxr-xr-x   1 root      root        5651 Jan 23 2017  JManifest.jar
-rwxr-xr-x   1 root      root        1510 Dec 22 2016  ctrlc.jar
-rwxr-xr-x   1 jnior     root       74743 Oct 10 2016  Environ.jar
-rwxr-xr-x   1 jnior     root        9680 Oct 06 2016  ftp.jar
-rwxr-xr-x   1 jnior     root        4180 Aug 16 2016  TimeSearch.jar
-rwxr-xr-x   1 jnior     root        2616 Aug 03 2016  clktest.jar
-rwxr-xr-x   1 jnior     root       13079 Jul 27 2016  rz.jar
-rwxr-xr-x   1 jnior     root        2992 Jul 19 2016  Display.jar
-rwxr-xr-x   1 jnior     root       95325 Jun 30 2016  Buffer.jar
-rwxr-xr-x   1 jnior     root      112411 Jun 08 2016  slaveservice.jar
-rwxr-xr-x   1 jnior     root        5811 Jun 07 2016  UdpTest.jar
-rwxr-xr-x   1 jnior     root        5580 Jun 06 2016  jModule.jar
-rwxr-xr-x   1 jnior     root         969 Jun 02 2016  IntelliJ.jar
-rwxr-xr-x   1 jnior     root        1903 Jun 02 2016  Benchmark.jar
-rwxr-xr-x   1 jnior     root        4532 Mar 08 2016  SerialTest.jar
-rw-r--r--   1 root      root         898 Feb 10 2016  current.key
-rwxr-xr-x   1 jnior     root       32187 Dec 17 2015  serialcontrol.jar
-rwxr-xr-x   1 jnior     root      106794 Dec 10 2015  Utility.jar
-rwxr-xr-x   1 jnior     root      163902 Sep 04 2015  AnalogPresets.jar
-rwxr-xr-x   1 jnior     root        5053 Jul 28 2015  0-10vtest.jar
-rw-r--r--   1 jnior     root         898 Jul 24 2015  jnior1024.key
-rwxr-xr-x   1 jnior     root          56 Jul 10 2015  clean.bat
-rwxr-xr-x   1 jnior     root          17 Jun 30 2015  dirs.bat
-rwxr-xr-x   1 jnior     root        3862 Jun 18 2015  Test4to20.jar
-rwxr-xr-x   1 jnior     root       46590 Jun 18 2015  task.jar
-rwxr-xr-x   1 jnior     root        3601 Jun 18 2015  ThreadTest.jar
-rw-r--r--   1 jnior     root        4311 Jun 08 2015  task.ini
-rwxr-xr-x   1 jnior     root       25266 Jun 05 2015  serialethernet.jar
-rwxr-xr-x   1 jnior     root        2993 Jul 12 2013  4routtest.jar
-rwxr-xr-x   1 jnior     root        3142 Jan 17 2013  jPanel.jar
  26.85 MB flash available

bruce_dev />

Okay so my development unit is full of all kinds of stuff. Here you will notice that even so there is some 26 MB of file storage available. For the JNIOR that is a lot. You aren’t dealing with large graphics files and such on the JNIOR. But if you were to develop a really sophisticated website hosted by the JNIOR you might fill that. If that is the case you might want the new 412DMX.

412dmx_r00 /> dir -l flash
total 22
drwxr-xr-x   1 root      root          20 Jan 11 09:45 .
drwxrwxrwx   1 root      root          16 Jan 23 13:27 ..
drwxr-xr-x   1 jnior     root          13 Oct 17 14:06 www
-rw-r--r--   1 jnior     root      183358 Jan 11 09:45 www.zip
-rw-r--r--   1 root      root        2055 Dec 12 15:16 jnior.ini
-rwxr-xr-x   1 jnior     root        4526 Dec 05 14:15 Dmx.jar
-rwxr-xr-x   1 jnior     root        1597 Nov 17 07:34 ident.jar
-rw-r--r--   1 jnior     root       15584 Nov 07 09:04 manifest.json
-rw-r--r--   1 jnior     root       46000 Oct 12 12:37 string-test.dat
-rw-r--r--   1 jnior     root       20000 Oct 12 12:37 four-byte-test.dat
-rwxr-xr-x   1 jnior     root       42138 Oct 12 12:36 Benchmark.jar
-rw-r--r--   1 jnior     root       65481 Oct 11 11:10 lorem-ipsum.txt
-rwxr-xr-x   1 jnior     root       37110 Oct 05 14:51 MidNiteSolar.jar
-rwxr-xr-x   1 jnior     root       98569 Oct 03 13:51 ModbusClasses.jar
-rwxr-xr-x   1 jnior     root       58620 Oct 03 13:45 ModbusServer.jar
-rwxr-xr-x   1 jnior     root        3971 Oct 03 13:45 Simulator.jar
-rwxr-xr-x   1 jnior     root       95488 May 08 2017  SNMP.jar
-rwxr-xr-x   1 jnior     root      115448 May 08 2017  task.jar
-rwxr-xr-x   1 jnior     root       54247 Feb 03 2017  SlaveService.jar
-rwxr-xr-x   1 jnior     root       87637 Feb 03 2017  serialethernet.jar
-rwxr-xr-x   1 jnior     root       31640 Feb 03 2017  serialcontrol.jar
-rwxr-xr-x   1 jnior     root        9563 Feb 03 2017  ftp.jar
  509.70 MB flash available

412dmx_r00 />

Here there is close to 1/2 GB of file space. Actually we will be shipping the 412DMX with 1/4 GB capacity.

The existing JNIOR line uses a 32 MB serial Flash component. Data is written to and read from this Flash device using a serial (SPI) channel. This memory is therefore slower. This is not an issue though as JANOS uses a sophisticated caching system to handle Flash I/O. And if power is lost in the midst of a lengthy Flash write the device’s integrity is not damaged. The JANOS Flash File System uses a fault tolerant form of transaction processing. In the event of power loss (or crash) the Flash File System rolls back to the last known good configuration. As a result data stored here is likely to remain until purposely deleted. You can reformat the Flash File System but generally there is hardly ever a need to do so.

The 412DMX introduces a different Flash technology to the line. Here we employ a parallel NAND Flash memory. In addition to greater capacity the read and write access timing has significantly improved. Files stored here are accessed with almost the same performance as SRAM. In fact, in the future we may move the File System root to Flash and eliminate the SRAM altogether. Potentially the NAND Flash can be implemented on the 410, 421 and 414 and it will be considered when PCB revisions occur on those models.

Temporary Storage

Files stored in the /temp folder are considered temporary. That folder is actually located in the Heap which as I mentioned is DRAM memory. That memory is reformatted on boot. So the /temp folder always comes up being empty.

bruce_dev /> dir -l /temp
total 2
drwxrwxrwx   1 root      root           0 Jan 25 15:13 .
drwxrwxrwx   1 root      root           8 Jan 26 08:22 ..
  62.87 MB available (temporary)

bruce_dev />

The JNIORs are shipping now with 64 MB of Heap memory. The system normally utilizes only about 3 or 4 MB of that. So the /temp folder has reasonable capacity. This is twice what is available in the standard JNIOR Flash but much less than will be available in the 412DMX Flash. This is a great place to create temporary files. This provides the best performance as well.

We recommend that you transfer UPD files for updates first to the /temp folder. The advantage being that the file disappears once the update has been completed. UPD files are quite large and generally don’t fit into the File System root. You certainly wouldn’t want to leave one in the root for very long. And placing the UPD in Flash is not necessary and slow to accomplish.

An application might first create a file here and should the procedure complete properly then move it to long term storage. This is also great for files that will be accessed randomly (using a lot of fseek). You might improve an application’s performance by copying a database to /tempfirst. It would remain until reboot. Of course that is heap memory and the same memory where a large byte buffer would be allocated. So to improve performance an application might read the entire file into a byte buffer and access that directly. The load on the heap would be the same and random access would be greatly simplified.

The /etc Folder

Lastly there is the /etc folder. This is not a writable area and it is actually built into JANOS. This is where JANOS provides system files as might be necessary for application execution. That is the case now for the JanosClasses.jar file.

bruce_dev /> dir -l /etc
total 3
dr-xr-xr-x   1 root      root           1 Dec 31 1999  .
drwxrwxrwx   1 root      root           8 Jan 26 08:22 ..
-r-----r--   1 root      root      266601 Jan 11 09:58 JanosClasses.jar
  0 KB available (read only)

bruce_dev />

So since this is read-only there is no space available. This is stored within the processor in its Program ROM. Access is very fast.

It is important though as you can download this JAR file and use it in compiling your applications for the JNIOR. I would recommend getting the JAR from us or this site that not only contains these classes but source stubs and JavaDoc as well. Clearly that would help you more in development.

JNIORs are shipped with a number of default files in /flash. Some of those should be updated when JANOS is updated. In the future there may be additional files included in /etc. So it is something to keep an eye on.

In Summary

The JANOS File System appears to be centrally located and of a single directory structure. Yet it covers storage in a variety of media. One needs to keep this in mind when deciding where to place files either for temporary use or long term availability. Files in different areas experience different performance levels and different risks. Keeping this in mind you can better manage your JNIOR controller and create great applications.

ZIP is an alias for the JAR command. The JAR command gives you the ability to check and extract files from a file collection. JAR and ZIP files are of the same format. JANOS uses JAR files for Java programs which are collections of class files best handled as a group. This is the HELP for the command:

JAR filespec [pattern]

Options:
 -C             Check integrity
 -T             Lists library contents
 -X             Extracts library contents
 -V             Verbose

List/Extract files from a ZIP/JAR library.
Aliases: JAR, ZIP

Even though JAR collections store content generally in a compressed format the files can be quite large. If you ever question the integrity of a JAR/ZIP file you can use this command to verify it. Remember that you can also use the MANIFEST command to verify a file’s checksum.

bruce_dev /> jar -c flash/jAccess.jar                  
 4 entries found
 content verifies!
bruce_dev /> 

bruce_dev /> jar -cv flash/jAccess.jar
  verifying: META-INF/
  verifying: META-INF/MANIFEST.MF
  verifying: jaccess/
  verifying: jaccess/JAccess.class
 4 entries found
 content verifies!
bruce_dev />

You can see that the -V verbose option enumerates the entries as they are verified.

The -T option displays the table of entries in the collection. Recently with JANOS v1.6.3 we have enhanced this listing. Here is an example with and without the verbose option.

bruce_dev /> jar -t flash/jAccess.jar
META-INF/MANIFEST.MF
jaccess/JAccess.class

bruce_dev /> jar -tv flash/jAccess.jar
     Size   Packed          CRC32        Modified
      227      227    0%  6180ffe5  Jan 30 2017 14:40  META-INF/MANIFEST.MF
     4143     4143    0%  639ebba5  Jan 30 2017 14:40  jaccess/JAccess.class

bruce_dev />

Recently I have been interested in implementing DEFLATE compression. The existing JAR/ZIP command in JANOS has been able to decompress DEFLATE (inflate?) for years. We just haven’t had a strong need for creating or modifying an archive on the JNIOR. Beginning with JANOS v1.6.4 which is now in Beta there will be some new capabilities involving DEFLATE.

New to v1.6.4 is a greatly improved JAR/ZIP command that not only can list or test an archive but that can create, update and even freshen them. This would be useful for those who need to retain log files for extended periods of time. The jniorsys.log file compresses some 80% for example. The available command options are as follows:

ZIP libraryfile [filespec]...

Options:
 -V             Verify archive
 -T             List contents
 -X             Extracts contents
 -C             Create new archive
 -U             Update archive
 -F             Freshen archive
 -S,-R          Recurse folders
 -L             Verbose format

List/Add/Extract files from a ZIP/JAR library.
Aliases: JAR, ZIP

Some options have been reassigned. For instance the -V option now implies (V)erify as opposed to (V)erbose as it has been previously. Hopefully those changes will not cause difficulties. It was our opinion that the JAR/ZIP command in the past was relatively obscure and unused.

With this new implementation one or more file specifications inclusive of wildcards may be specified when appropriate. Recursion through the directory/folder structure is now not assumed. You must use the -S (or -R alias) option for that. Relative paths in the archive are maintained and created as you might expect. I will provide some examples.

The root on my JNIOR contains a few typical files.

bruce_dev /> dir -l
total 10
drwxrwxrwx   1 root      root           8 Jan 25 14:21 .
drwxrwxrwx   1 root      root           8 Jan 25 14:21 ..
dr-xr-xr-x   1 root      root           1 Dec 31 1999  etc
drwxr-xr-x   1 root      root          59 Jan 25 14:21 flash
drwxrwxrwx   1 root      root           0 Jan 25 13:26 temp
-rw-r--r--   1 root      root       37994 Jan 25 14:21 jniorsys.log
-rw-r--r--   1 jnior     root       22280 Jan 25 14:21 manifest.json
-rw-r--r--   1 root      root         953 Jan 25 14:12 jniorboot.log
-rw-r--r--   1 root      root        1002 Jan 25 13:37 jniorboot.log.bak
-rw-r--r--   1 root      root       35938 Jan 25 09:16 web.log
  1853.9 KB available

bruce_dev />

I can now create an archive of these files using the ZIP command. I can use JAR as it is the very same command. It is just an alias. I tend to use the command name appropriate to the archive I am working with. If I am creating a ZIP I use the ZIP command but there is no particular requirement to do so.

bruce_dev /> zip -c test.zip /
 5 files saved
bruce_dev /> 

bruce_dev /> zip test.zip
     Size   Packed          CRC32        Modified
    37994     7545   80%  bce2daff  Jan 25 2018 14:21  jniorsys.log
    35938     5797   84%  d393e4a3  Jan 25 2018 09:16  web.log
     1002      472   53%  afae59c3  Jan 25 2018 13:37  jniorboot.log.bak
      953      458   52%  b473efb0  Jan 25 2018 14:12  jniorboot.log
    22280    10086   55%  06c9451f  Jan 25 2018 14:21  manifest.json
 5 files listed
bruce_dev />

Here I specified the root folder. No wildcard was needed since that is a folder and it assumes in that case that I mean all of the contents. When the command is issued without option and verbose listing is assumed.

Note that the compression ratios are reasonable even though I have made some trade-offs in the interest of speed. The verbose output can provide interesting information. For example here is the same archive creation with the long/verbose output.

bruce_dev /> zip -cl test.zip /
  deflate: /jniorsys.log (37994 bytes)
   saving: jniorsys.log (compressed 80.1%) 0.758 secs
  deflate: /web.log (35938 bytes)
   saving: web.log (compressed 83.9%) 0.547 secs
  deflate: /jniorboot.log.bak (1002 bytes)
   saving: jniorboot.log.bak (compressed 52.9%) 0.044 secs
  deflate: /jniorboot.log (953 bytes)
   saving: jniorboot.log (compressed 51.9%) 0.044 secs
  deflate: /manifest.json (22280 bytes)
   saving: manifest.json (compressed 54.7%) 1.851 secs
 5 files saved
bruce_dev />

Keep in mind when you consider timing that the JNIOR runs on a 100 MHz 32-bit micro-controller and not a multi-core GHz processor.

The (U)date option (-U) allows you to add or replace files in the archive. For example:

bruce_dev /> zip -us test.zip *.ini *.bat
 4 files saved
bruce_dev /> 

bruce_dev /> zip test.zip
     Size   Packed          CRC32        Modified
    37994     7545   80%  bce2daff  Jan 25 2018 14:21  jniorsys.log
    35938     5797   84%  d393e4a3  Jan 25 2018 09:16  web.log
     1002      472   53%  afae59c3  Jan 25 2018 13:37  jniorboot.log.bak
      953      458   52%  b473efb0  Jan 25 2018 14:12  jniorboot.log
    22280    10086   55%  06c9451f  Jan 25 2018 14:21  manifest.json
     4311      913   79%  36a57579  Jun 08 2015 12:47  flash/task.ini
     5449     2014   63%  88996b53  Jan 23 2018 08:33  flash/jnior.ini
       56       56    0%  3b661614  Jul 10 2015 08:54  flash/clean.bat
       17       17    0%  6a11f77a  Jun 30 2015 15:17  flash/dirs.bat
 9 files listed
bruce_dev />

Here I have added any INI and BAT files present on the JNIOR.

Yes, the JNIOR can do BAT batch files. These are not scripting files like you may know from MSDOS but still useful. For example I do a lot of testing on my development JNIOR and that ends up creating error files and sometimes dump files. My clean.bat file creates a CLEAN command that removes any errors.log or dump.log file. It also resets the attention flag using the STATS command.

bruce_dev /> cat flash/clean.bat    
@rm errors.log
@rm dump.log
@stats -a
@echo Cleaned

bruce_dev />

If you are concerned that an archive may not have transferred to the JNIOR properly, you can use the (V)erify (-V) option. Here are both the normal and verbose versions of the command.

bruce_dev /> zip -v test.zip
 9 entries found - content verifies!
bruce_dev /> 

bruce_dev /> zip -vl test.zip
  verifying: jniorsys.log (compressed)
  verifying: web.log (compressed)
  verifying: jniorboot.log.bak (compressed)
  verifying: jniorboot.log (compressed)
  verifying: manifest.json (compressed)
  verifying: flash/task.ini (compressed)
  verifying: flash/jnior.ini (compressed)
  verifying: flash/clean.bat
  verifying: flash/dirs.bat
 9 entries found - content verifies!
bruce_dev />

Note that beginning with v1.6.4 this verification not only check file integrity but decompresses the entries and verifies CRC32 checksums.

Here we see that JAR files can also be processed (regardless of command name).

CODE: SELECT ALL

bruce_dev /> zip -v flash/ModbusServer.jar
 42 entries found - content verifies!
bruce_dev /> 

bruce_dev /> jar -vl flash/ModbusServer.jar
  verifying: META-INF/
  verifying: META-INF/MANIFEST.MF (compressed)
  verifying: appinfo.ini (compressed)
  verifying: com/
  verifying: com/integpg/
  verifying: com/integpg/janoslib/
  verifying: com/integpg/janoslib/datastructures/
  verifying: com/integpg/janoslib/debug/
  verifying: com/integpg/janoslib/io/
  verifying: com/integpg/janoslib/system/
  verifying: com/integpg/janoslib/utils/

The (F)reshen command will update files in an archive ONLY if a newer version of the file is found. This does not add new files to the archive. If you do not provide a file specification the command will attempt to freshen all of the archive contents. For example, we haven’t changed anything and the freshen command does nothing.

bruce_dev /> zip -f test.zip
 nothing to do
bruce_dev />

But if we execute the MANIFEST command which adjusts the manifest.json database then we have a newer version. The archive can then be freshened.

bruce_dev /> manifest -ul
JNIOR Manifest      Thu Jan 25 14:52:55 EST 2018
  Size                  MD5                  File Specification
 37994    5627aaee400338b1b3479842cecabe29  [Updated] /jniorsys.log
 28304    2a8a593cc66fa62117497c28bf565d20  [Added] /test.zip
End of Manifest (2 files listed)

bruce_dev /> zip -f test.zip
 2 files saved
bruce_dev />

bruce_dev /> zip test.zip
     Size   Packed          CRC32        Modified
    35938     5797   84%  d393e4a3  Jan 25 2018 09:16  web.log
     1002      472   53%  afae59c3  Jan 25 2018 13:37  jniorboot.log.bak
      953      458   52%  b473efb0  Jan 25 2018 14:12  jniorboot.log
     4311      913   79%  36a57579  Jun 08 2015 12:47  flash/task.ini
     5449     2014   63%  88996b53  Jan 23 2018 08:33  flash/jnior.ini
       56       56    0%  3b661614  Jul 10 2015 08:54  flash/clean.bat
       17       17    0%  6a11f77a  Jun 30 2015 15:17  flash/dirs.bat
    38036     7559   80%  b2b18320  Jan 25 2018 14:53  jniorsys.log
    22434    10129   55%  059a09d9  Jan 25 2018 14:53  manifest.json
 9 files listed
bruce_dev />

The MANIFEST update both alters the database and posts to the system log file. So two files are updated.

To demonstrate the E(X)tract option I will move the ZIP file to the /temp folder so we don’t overwrite any existing files. Here I will extract the manifest database and take a look at its content.

CODE: SELECT ALL

bruce_dev /> mv test.zip /temp

bruce_dev /> cd /temp

bruce_dev /temp> dir -l
total 3
drwxrwxrwx   1 root      root           1 Jan 25 14:58 .
drwxrwxrwx   1 root      root           8 Jan 25 14:58 ..
-rw-r--r--   1 jnior     root       28361 Jan 25 14:53 test.zip
  61.98 MB available (temporary)

bruce_dev /temp> zip -x test.zip *.json

bruce_dev /temp> dir -l
total 4
drwxrwxrwx   1 root      root           2 Jan 25 14:59 .
drwxrwxrwx   1 root      root           8 Jan 25 14:58 ..
-rw-r--r--   1 jnior     root       28361 Jan 25 14:53 test.zip
-rw-r--r--   1 jnior     root       22434 Jan 25 14:53 manifest.json
  61.95 MB available (temporary)

bruce_dev /temp> cat manifest.json -j
{
  "model":"410",
  "serno":614070500,
  "vers":"v1.6.4-b4",
  "date":"01/25/18 14:52:55",
  "files":{
    "/etc/janosclasses.jar":{
      "length":243492,
      "date":1515682735,
      "md5":"bb85898d4e208a388fb958f1fb90fcc5",
      "crc":"20916587",
      "sha":"a9eb59e9c709ff4ceba82b1e55c841ec5860cc42"
    },
    "/flash/serialcontrol.jar":{
      "length":31344,
      "date":1450364184,
      "md5":"b349e02b7efc64c0dfe5eb74292a5ee6",
      "crc":"3a005104"
    },
    "/flash/serialethernet.jar":{
      "length":25266,
      "date":1433505362,
      "md5":"ee5e266bb8418b4223a666bd046a8c56",
      "crc":"c3961df2"
    },
    "/flash/modbusserver.jar":{
      "length":51907,
      "date":1502219129,
      "md5":"77c16d6134dbd7ec93313fbad2b00d93",
      "crc":"b7456b42",
      "sha":"fad4ecc3d1607aafe0a385a10fb5ee90eff521bd"
    },
    "/flash/snmp.jar":{
      "length":239949,
      "date":1493062048,
      "md5":"b77d35c322ef6645f1eca9d22b29400b",
      "crc":"a4073dcb",
      "sha":"44a3c2b41a2375ef603063cc9b04642903dad973"
    },
    "/flash/www/base64.js":{
      "length":3493,
      "date":1433505378,
      "md5":"1138db1b5a6e165beae3ed81739dd2ec",
      "crc":"baceb6f6"
    },
    "/flash/www/configure/index.html":{
      "length":1349,
      "date":1433505382,
      "md5":"0454014aecfd0b7d9e4ce1efe0979139",
      "crc":"11ba5486"
    },
    "/flash/www/jr310applet.jar":{
      "length":287159,
      "date":1441207703,
      "md5":"f9c4840e7244824b75858a1a40dfb163",
      "crc":"3d1d0c72"
    },
    "/flash/www/jniorprotocol.jar":{
      "length":115148,
      "date":1441207710,
      "md5":"404b40c4293bf3c334e3b88e2fe0dd10",
      "crc":"5143ec4f"
    },
    "/flash/www/jniorprotocolhelpers.jar":{
      "length":34991,
      "date":1433505394,
      "md5":"b08e33e0c21e6c075b9b242bf092b68e",
      "crc":"48990308"
    },
    "/flash/www/task/index.html":{
      "length":1415,
      "date":1433505397,
      "md5":"bbdc32dce371881b3eebd15f5b3fce96",
      "crc":"cdbe02e4"
    },
    "/flash/www/taskmanagerinterface.jar":{
      "length":123052,
      "date":1433505400,
      "md5":"077cddccee476fab552d52a5eefd26a7",
      "crc":"647bb4b3"
    },
    "/flash/www/jquery/jquery-1.9.0.min.js":{
      "length":93071,
      "date":1433505404,
      "md5":"2b869ea9c8edd4c2243c5d44f665f632",
      "crc":"6a2a8434"
    },
    "/flash/www/jquery/jquery-ui.css":{
      "length":33441,
      "date":1433505405,
      "md5":"c6bd2971b8e625f2ae43ede9f655a27b",
      "crc":"0497b7a6"
    },
    "/flash/www/jquery/jquery-ui.min.js":{
      "length":96395,
      "date":1433505409,
      "md5":"8f636d4c90ea0abfcbb25528c635bf7d",
      "crc":"820662f5"
    },
    "/flash/www/vendor/bowser/bowser_0.7.2.min.js":{
      "length":3359,
      "date":1433505412,
      "md5":"61a36d48aad1298b17284b53f6ce3fd1",
      "crc":"22deb9e6"
    },
    "/flash/www/text":{
      "length":1336,
      "date":1434044220,
      "md5":"bab65804218b18b9e1a79f2d8e873259",
      "crc":"dda17d61"
    },
    "/flash/www/cycle":{
      "length":419,
      "date":1434044214,
      "md5":"9eb9bbdae70c1f994ebb7f51b18783b8",
      "crc":"9e496eb9"
    },
    "/flash/slaveservice.jar":{
      "length":73323,
      "date":1465435094,
      "md5":"cd6f5e177d75675607e9523d52e133f7",
      "crc":"9a871cd7"
    },
    "/flash/ftp.jar":{
      "length":9563,
      "date":1475783634,
      "md5":"793e460054f07867685e87f98fd402e6",
      "crc":"36fd641e"
    },
    "/flash/task.ini":{
      "length":4311,
      "date":1433782061,
      "md5":"b1f877ac198306b266311eab557ed1dd",
      "crc":"36a57579"
    },
    "/flash/task.jar":{
      "length":102655,
      "date":1434645611,
      "md5":"1979b16970127f2c38912777cb105133",
      "crc":"ed4d6ad7"
    },
    "/flash/jnior.ini":{
      "length":4874,
      "date":1516714407,
      "md5":"58d36d44e807564035fa88ad63e2b80c",
      "crc":"88996b53",
      "sha":"0f8b5112e66d27fcee64b8fdd9309e4e850f18c7"
    },
    "/jniorsys.log":{
      "length":32844,
      "date":1516908086,
      "md5":"5627aaee400338b1b3479842cecabe29",
      "crc":"bce2daff",
      "sha":"9c10cd81e308e594c47f2f9509721380b2648cdd"
    },
    "/jniorboot.log.bak":{
      "length":1041,
      "date":1516905441,
      "md5":"4f99b5c09ba93b48222183cddb9e7802",
      "crc":"afae59c3",
      "sha":"9442209de78327134b6ab0d87965d6e09c8bdc27"
    },
    "/jniorboot.log":{
      "length":995,
      "date":1516907554,
      "md5":"945b6dcbb03349fa9fd4ef8f91898bb6",
      "crc":"b473efb0",
      "sha":"4c17d7d0f6f2fa3bf7740541ec8104ade157a402"
    },
    "/flash/benchmark.jar":{
      "length":24351,
      "date":1464873509,
      "md5":"987f4044786771f31e0656cf91ed73f3",
      "crc":"1eed095a"
    },
    "/flash/threadtest.jar":{
      "length":3601,
      "date":1434645124,
      "md5":"902ce61cbd2524ca9b83dea335c395d3",
      "crc":"cd2479ff"
    },
    "/flash/test4to20.jar":{
      "length":3862,
      "date":1434659455,
      "md5":"a2e309c9d6dd112e5303aa76d2470740",
      "crc":"976f8208"
    },
    "/flash/dirs.bat":{
      "length":87,
      "date":1435691869,
      "md5":"531d655733ee668d829f9b3bdad96038",
      "crc":"6a11f77a"
    },
    "/flash/www/console/index.php":{
      "length":4347,
      "date":1438974987,
      "md5":"8728680bbc36d369429f7ca2c73cce7d",
      "crc":"c939c423"
    },
    "/flash/clean.bat":{
      "length":56,
      "date":1436532855,
      "md5":"ac9ce6553e1629412fb426b342440493",
      "crc":"3b661614"
    },
    "/flash/jnior1024.key":{
      "length":887,
      "date":1437746752,
      "md5":"b76b5351a92fdcc8d9b6b38ca62d8d71",
      "crc":"7983e14c"
    },
    "/flash/www/config/md5.js":{
      "length":5693,
      "date":1433505379,
      "md5":"a60fec5a81f207ff99ec1b97e3ccad0e",
      "crc":"e2a43d16"
    },
    "/flash/www/config/node.png":{
      "length":253,
      "date":1440435886,
      "md5":"1a8dbfaf1771a06e48dea0e3dc604392",
      "crc":"799c6dfc"
    },
    "/flash/www/config/tabs-styles.css":{
      "length":970,
      "date":1477590404,
      "md5":"68bca7015f51e26ab42199b5eb17a356",
      "crc":"f8870a33"
    },
    "/flash/www/config/tabs.js":{
      "length":3662,
      "date":1449678641,
      "md5":"ff728c86018341548ee70028062c89e0",
      "crc":"1a813112"
    },
    "/flash/www/config/styles.css":{
      "length":4450,
      "date":1504814044,
      "md5":"9ad78cca1b794dbcf9db3c55f1be5f1b",
      "crc":"acbd2e14",
      "sha":"3cf0bbc864840994a49f62d0ae00df6d8eb47ef3"
    },
    "/flash/www/config/comm.js":{
      "length":3541,
      "date":1507912287,
      "md5":"e7d2e56a443176d6150bbcc8b56e1911",
      "crc":"0ac0ed26",
      "sha":"5e66b96227779c5ef3736a7ca891a43cacffbbf1"
    },
    "/flash/www/config/console.js":{
      "length":5137,
      "date":1515680981,
      "md5":"58995da21198553a37d666ef043c289b",
      "crc":"ce8780d4",
      "sha":"bbe576a9bb28caa82306184ac38e8c5e0e1f1243"
    },
    "/flash/www/config/config.js":{
      "length":12639,
      "date":1515676686,
      "md5":"ae2d4b763f10adef65d65f9024ea809e",
      "crc":"cb109f41",
      "sha":"bb80d401bbc977695ee7c79a21487c2bbb3d7564"
    },
    "/flash/www/config/index.php":{
      "length":22103,
      "date":1515677508,
      "md5":"bdf0df657f4988b7e5abe86ac8ce6956",
      "crc":"6cd2ae57",
      "sha":"4d9883b4f3bf833831bb26a54b6b97698f074dd4"
    },
    "/flash/www/jnior.ico":{
      "length":3262,
      "date":1439548680,
      "md5":"1c3b3dda6b10c6259fcf7c068b760f09",
      "crc":"051803eb"
    },
    "/flash/www/favicon.ico":{
      "length":156790,
      "date":1486410493,
      "md5":"07cb90c7f3573eff80222269625ed1dd",
      "crc":"7e367afa",
      "sha":"284add71fe3d3ba48fba059b88ff5143d3964b1d"
    },
    "/flash/analogpresets.jar":{
      "length":163902,
      "date":1441372806,
      "md5":"25eacc647412535e320302d3680ce327",
      "crc":"e6b656fc"
    },
    "/flash/www/config/config.css.php":{
      "length":1045,
      "date":1475072901,
      "md5":"1692861e9abd7f8d81f5b7cf8a176046",
      "crc":"4c386a21"
    },
    "/flash/www/config/inputs.png":{
      "length":18047,
      "date":1443116143,
      "md5":"e2151c93b6cdeaa154d15fab486ae61b",
      "crc":"16290877"
    },
    "/flash/www/config/loading.gif":{
      "length":3236,
      "date":1264096270,
      "md5":"d96f6517e00399c37a9765e045eaaf22",
      "crc":"16f442ed"
    },
    "/flash/jtest.jar":{
      "length":1832,
      "date":1515959298,
      "md5":"051517cc7a8978d97746bb7acb0a57ed",
      "crc":"509a17f2",
      "sha":"beefc003bf3a076871b7eb0df2931db677b2bca1"
    },
    "/flash/www/vendor/angular_1.3.15/angular.min.js":{
      "length":125909,
      "date":1449498838,
      "md5":"ca1a58818682c3e858a585f283ab9beb",
      "crc":"9d8147d7"
    },
    "/flash/www/vendor/bootstrap_3.3.0/css/bootstrap-theme.css":{
      "length":21740,
      "date":1449498835,
      "md5":"c64043a3388612233d7eb947918a9bfc",
      "crc":"638f58a3"
    },
    "/flash/www/vendor/bootstrap_3.3.0/css/bootstrap-theme.css.map":{
      "length":41933,
      "date":1449498838,
      "md5":"c5da8241305bfe7e19919e6e943739eb",
      "crc":"11260772"
    },
    "/flash/www/vendor/bootstrap_3.3.0/css/bootstrap-theme.min.css":{
      "length":19199,
      "date":1449498840,
      "md5":"374df0ad5809a5314b0577802430a272",
      "crc":"8b3c47b7"
    },
    "/flash/www/vendor/bootstrap_3.3.0/css/bootstrap.css":{
      "length":137590,
      "date":1449498845,
      "md5":"ad6381ebfa541b55b0152349c6cabf76",
      "crc":"371e67da"
    },
    "/flash/www/vendor/bootstrap_3.3.0/css/bootstrap.css.map":{
      "length":366866,
      "date":1449498854,
      "md5":"4ba278e0c420d166e5a0eb71545f9509",
      "crc":"b7c9868d"
    },
    "/flash/www/vendor/bootstrap_3.3.0/css/bootstrap.min.css":{
      "length":114011,
      "date":1449498852,
      "md5":"78e7f91c0c4cca415e0683626aa23925",
      "crc":"34387388"
    },
    "/flash/www/vendor/bootstrap_3.3.0/fonts/glyphicons-halflings-regular.eot":{
      "length":20335,
      "date":1449498855,
      "md5":"7ad17c6085dee9a33787bac28fb23d46",
      "crc":"f171b590"
    },
    "/flash/www/vendor/bootstrap_3.3.0/fonts/glyphicons-halflings-regular.svg":{
      "length":62926,
      "date":1449498857,
      "md5":"ff423a4251cf2986555523dfe315c42b",
      "crc":"385cd4ad"
    },
    "/flash/www/vendor/bootstrap_3.3.0/fonts/glyphicons-halflings-regular.ttf":{
      "length":41280,
      "date":1449498858,
      "md5":"e49d52e74b7689a0727def99da31f3eb",
      "crc":"0617f1ff"
    },
    "/flash/www/vendor/bootstrap_3.3.0/fonts/glyphicons-halflings-regular.woff":{
      "length":23320,
      "date":1449498858,
      "md5":"68ed1dac06bf0409c18ae7bc62889170",
      "crc":"cec1a35c"
    },
    "/flash/www/vendor/bootstrap_3.3.0/js/bootstrap.min.js":{
      "length":34653,
      "date":1449498862,
      "md5":"281cd50dd9f58c5550620fc148a7bc39",
      "crc":"32d6c689"
    },
    "/flash/www/vendor/bootstrap_3.3.0/js/bootstrap.js":{
      "length":65813,
      "date":1449498862,
      "md5":"d5a03d9cca57637f008124916b86b585",
      "crc":"f504a7b3"
    },
    "/flash/www/vendor/bootstrap_3.3.0/js/npm.js":{
      "length":484,
      "date":1449498863,
      "md5":"ccb7f3909e30b1eb8f65a24393c6e12b",
      "crc":"cc50e34d"
    },
    "/flash/www/vendor/jquery_1.11.1/jquery-1.11.1.min.map":{
      "length":141680,
      "date":1449498870,
      "md5":"ffbeb16578d8cdf58104889baacbbef2",
      "crc":"e4e92bfd"
    },
    "/flash/www/vendor/jquery_1.11.1/jquery-1.11.1.min.js":{
      "length":95786,
      "date":1449498869,
      "md5":"8101d596b2b8fa35fe3a634ea342d7c3",
      "crc":"804ff984"
    },
    "/flash/www/config/integlogo.png":{
      "length":5773,
      "date":1449163436,
      "md5":"9111308273dadea73f5d09a5e02c7311",
      "crc":"60c4e184"
    },
    "/flash/utility.jar":{
      "length":106794,
      "date":1449773066,
      "md5":"ac559b91b537dfa70720a416f32f2960",
      "crc":"888936f1"
    },
    "/flash/generators/json/colour.js":{
      "length":4327,
      "date":1449774238,
      "md5":"c67e10d0e0e698fcdbbbadcaa55600d4",
      "crc":"19e8a38f"
    },
    "/flash/generators/json/ethernet.js":{
      "length":1409,
      "date":1449774238,
      "md5":"1b6bae08feb93f6bd345a3780c3acb69",
      "crc":"848097a7"
    },
    "/flash/generators/json/inputs.js":{
      "length":2825,
      "date":1449774239,
      "md5":"6959db5a769ff3ceea45bf606bda940a",
      "crc":"c544d780"
    },
    "/flash/generators/json/lists.js":{
      "length":12006,
      "date":1449774239,
      "md5":"5cc489ac77db7a3369b2ffc30cbd3a86",
      "crc":"ba761254"
    },
    "/flash/generators/json/logic.js":{
      "length":4404,
      "date":1449774239,
      "md5":"9cd1cf854976ebb69a6c20a7ac88d2f9",
      "crc":"6c2189f9"
    },
    "/flash/generators/json/loops.js":{
      "length":6040,
      "date":1449774239,
      "md5":"e8e9021b5d4eb2e0cc43f11ad5b3bfd7",
      "crc":"b30a758a"
    },
    "/flash/generators/json/math.js":{
      "length":14673,
      "date":1449774240,
      "md5":"fa22c29efc362e02d8f35838fcca46e5",
      "crc":"8fc62e67"
    },
    "/flash/generators/json/other.js":{
      "length":983,
      "date":1449774240,
      "md5":"dd77f555bc9b50ed17a215d7935f10ab",
      "crc":"3e07810d"
    },
    "/flash/generators/json/outputs.js":{
      "length":3861,
      "date":1449774240,
      "md5":"72a118cd7829b5a510e5a901d8863d6e",
      "crc":"bdd5e320"
    },
    "/flash/generators/json/procedures.js":{
      "length":3945,
      "date":1449774240,
      "md5":"cb9fb880bebb3375273353fafc12dc9c",
      "crc":"20d43aad"
    },
    "/flash/generators/json/text.js":{
      "length":1363,
      "date":1449774241,
      "md5":"a0bd39f638202a0800c100b4eac3cbc3",
      "crc":"b17b24d6"
    },
    "/flash/generators/json/timing.js":{
      "length":2638,
      "date":1449774241,
      "md5":"b1ee803dd8e6e00de74e0a3269f0a2ff",
      "crc":"489061b8"
    },
    "/flash/generators/json/variables.js":{
      "length":1500,
      "date":1449774241,
      "md5":"fecce79a400d5e4e1edbe521699fa604",
      "crc":"cb724c91"
    },
    "/flash/generators/json.js":{
      "length":4115,
      "date":1449774238,
      "md5":"cc72f2468eb970110f3f6f0278f43467",
      "crc":"25a98f30"
    },
    "/flash/www/config/link_to.png":{
      "length":259,
      "date":1450466976,
      "md5":"b1ed68183be4f97ce1793139496dbbb4",
      "crc":"a067876a"
    },
    "/flash/www/config/collapsed.png":{
      "length":232,
      "date":1452087215,
      "md5":"ef7dd392142824ec54b7b7188717411c",
      "crc":"c7bd8428"
    },
    "/flash/www/config/linked.png":{
      "length":174,
      "date":1452088114,
      "md5":"56d2755d08a0857ff6e7750c4b2822dd",
      "crc":"ff59187e"
    },
    "/flash/www/config/expanded.png":{
      "length":238,
      "date":1452097812,
      "md5":"905b26e96849524dd6c37e1878f66779",
      "crc":"68686921"
    },
    "/flash/www/config/registry.js":{
      "length":8276,
      "date":1452271284,
      "md5":"fc35855793b2bbfe577e420f34cb0dda",
      "crc":"6c73e25a"
    },
    "/flash/www/config/deletex.png":{
      "length":240,
      "date":1452284181,
      "md5":"2750f1e60d0222d7f3c0752207fb41e7",
      "crc":"386b823b"
    },
    "/flash/www/config/modules.js":{
      "length":13520,
      "date":1484149578,
      "md5":"5d79964a8ca70cc7dc0504c343be3e3c",
      "crc":"3c09b9e2",
      "sha":"d6f0b3ec60796662acd105694ef39543e3dc50a2"
    },
    "/flash/www/logging.php":{
      "length":4853,
      "date":1463582298,
      "md5":"170c17bd0962f434eebe699129491912",
      "crc":"dce15f4e"
    },
    "/flash/www/slaving.zip":{
      "length":113815,
      "date":1465493787,
      "md5":"b3e85080154b5a7dc10078a6c6fe75c7",
      "crc":"975c987e"
    },
    "/flash/0-10vtest.jar":{
      "length":5053,
      "date":1438104444,
      "md5":"3a7be82077e29c598bdd8694d47805f4",
      "crc":"05e27897"
    },
    "/flash/4routtest.jar":{
      "length":2993,
      "date":1373644405,
      "md5":"14381605ec8f2f0d0dbe34843b7178b8",
      "crc":"8240fc03"
    },
    "/flash/environ.jar":{
      "length":3881,
      "date":1476102546,
      "md5":"8d738f0145516d287174a00dda32dabc",
      "crc":"ff1ecc8b"
    },
    "/flash/current.key":{
      "length":898,
      "date":1455116261,
      "md5":"035a0d79bd6c8258c12111479fe7353e",
      "crc":"cbdd8ffe"
    },
    "/flash/serialtest.jar":{
      "length":4532,
      "date":1457448880,
      "md5":"48fc4bd9421a5cf275b42235d2f4e2cb",
      "crc":"6d86943b"
    },
    "/flash/intellij.jar":{
      "length":969,
      "date":1464918560,
      "md5":"aea445862e32190fa61abc5d97e5b25f",
      "crc":"959a1596"
    },
    "/flash/jmodule.jar":{
      "length":5580,
      "date":1465240063,
      "md5":"af7d42f427d0e711c4a79c8e1c1d341d",
      "crc":"40058988"
    },
    "/flash/udptest.jar":{
      "length":5811,
      "date":1465328251,
      "md5":"5bbc399b4eb1f5ec427ccbf93c8b135d",
      "crc":"3d976325"
    },
    "/flash/buffer.jar":{
      "length":95325,
      "date":1467321013,
      "md5":"0c66b2a130de483b64b91d87471eb952",
      "crc":"5d0819e2"
    },
    "/flash/display.jar":{
      "length":2992,
      "date":1468953410,
      "md5":"efcfc78470e98842f52579c81c088a2d",
      "crc":"5ec67fd0"
    },
    "/flash/rz.jar":{
      "length":13079,
      "date":1469638127,
      "md5":"c4b7e9f4072d64e3dde9fe5a62406a1e",
      "crc":"20367148"
    },
    "/flash/www/config/folder.png":{
      "length":329,
      "date":1454662486,
      "md5":"316b7810fa502618b4e85788a82617a8",
      "crc":"55f20187"
    },
    "/flash/www/config/file.png":{
      "length":286,
      "date":1454662486,
      "md5":"1b75c23448e9c6eed675404f6130491d",
      "crc":"d327c449"
    },
    "/flash/www/config/warning.png":{
      "length":3068,
      "date":1332275646,
      "md5":"9c96d831cfc50fdedfdc980bc2abb2cf",
      "crc":"e90bb05a"
    },
    "/flash/www/config/folders.js":{
      "length":19270,
      "date":1504815735,
      "md5":"c7a59ef1aea3aad95d3315627d3a3b29",
      "crc":"6b1adf25",
      "sha":"93d7e851c9a1a65ed45b7c1bbe4368d3d941b32f"
    },
    "/flash/clktest.jar":{
      "length":2616,
      "date":1470249535,
      "md5":"345b4a9a22ec05bc89bb291b7b047e0e",
      "crc":"270f1d8b"
    },
    "/flash/timesearch.jar":{
      "length":4180,
      "date":1471371624,
      "md5":"bf719e65d8f4be9d7348a621ac69bc2b",
      "crc":"25075aa7"
    },
    "/flash/www/config/relays.js":{
      "length":4189,
      "date":1484587793,
      "md5":"803af5c2431b8f58c110260b3f317838",
      "crc":"ee9ab3af",
      "sha":"21ec766fe220bd0618b43050851f9cd67dd1bf54"
    },
    "/flash/www/config/temperature.js":{
      "length":2870,
      "date":1475245816,
      "md5":"262c339513007cd746ee01da9a4a843f",
      "crc":"d062a444"
    },
    "/flash/www/config/dimmer.js":{
      "length":8255,
      "date":1475265861,
      "md5":"e7213c6fb8c263ac71acb766e62dc4ce",
      "crc":"b9edf051"
    },
    "/flash/www/config/range.css":{
      "length":2212,
      "date":1475499110,
      "md5":"6932c76ab79879ea4c5d826d9cb60db9",
      "crc":"3334dfd1"
    },
    "/flash/www/config/analog.js":{
      "length":7267,
      "date":1484587793,
      "md5":"87abcaf68dea5e2e203326a55bc2bca5",
      "crc":"9766b532",
      "sha":"dd788111904d41826164ea151f78dd4b3e3b84e6"
    },
    "/flash/www/config/ledon.png":{
      "length":626,
      "date":1475506220,
      "md5":"6018d69896fcba49da54c39d8ee19803",
      "crc":"32a65f15"
    },
    "/flash/www/config/panel.js":{
      "length":2038,
      "date":1475509052,
      "md5":"e0631cb06777f63f0a071f7aa5d198d0",
      "crc":"a38a7db3"
    },
    "/flash/www/config/ledoff.png":{
      "length":757,
      "date":1475509575,
      "md5":"4bb71e412a20ae6f098a29b195b10e13",
      "crc":"3fd16f7a"
    },
    "/flash/jpanel.jar":{
      "length":3142,
      "date":1358430294,
      "md5":"39825ccddf7b61c1ad41d261d84f4950",
      "crc":"446bee7f"
    },
    "/flash/www/config/syslog.js":{
      "length":1929,
      "date":1496773328,
      "md5":"4e8ecca50284c2aeae8e8b90db27ded8",
      "crc":"ac2a2541",
      "sha":"e413d70cc2bb6717448bc84c2980abc764bc3dd6"
    },
    "/flash/www/config/peers.js":{
      "length":5885,
      "date":1505835290,
      "md5":"2536fc521f916341b98183f6ce0b2453",
      "crc":"f2a44392",
      "sha":"5d949b8daa8e5081f19c88e42af968b24955e02c"
    },
    "/flash/www/index.php":{
      "length":356,
      "date":1477657721,
      "md5":"3ba20cf61f44f9ace09104261acf2711",
      "crc":"7f8eaed3"
    },
    "/flash/www/www.zip":{
      "length":85751,
      "date":1477663620,
      "md5":"296baa71d70bf40c1ad6ee0c71066c49",
      "crc":"69922bd1"
    },
    "/flash/www/download1.php":{
      "length":465,
      "date":1480616431,
      "md5":"1f69c84031dbdbe9aeecd634c0ab9607",
      "sha":"9770a8f6534f17f86eeb332309b7cbe07441022e",
      "crc":"c7b59619"
    },
    "/flash/www/short.php":{
      "length":273,
      "date":1516028524,
      "md5":"14687d4240d58955736ac2f6b31614a0",
      "sha":"2291bacbbd7aac09c488436efbe5c2be1f3936b6",
      "crc":"3cf41987"
    },
    "/flash/ctrlc.jar":{
      "length":1510,
      "date":1482421756,
      "md5":"b7ce2da5b761674e626ae62c4b9edbcc",
      "sha":"51a17a3f092333a0a48aa8e6dcebe0ce99cef3de",
      "crc":"bd2a0810"
    },
    "/flash/www.zip":{
      "length":87642,
      "date":1515681899,
      "md5":"c3cfda778bf0334684669fedb36180f7",
      "sha":"1aef18b365347aa0f13f38f315a04edbf7eb37d2",
      "crc":"1da88b8e"
    },
    "/flash/www/config/favicon.ico":{
      "length":766,
      "date":1486410493,
      "md5":"07cb90c7f3573eff80222269625ed1dd",
      "sha":"284add71fe3d3ba48fba059b88ff5143d3964b1d",
      "crc":"7e367afa"
    },
    "/flash/www/map.html":{
      "length":1170,
      "date":1485380108,
      "md5":"901c9971c3c591b3d736cd91516960de",
      "sha":"5ded94156ca71884af1afae0fcaf1e78d3bac23d",
      "crc":"71f8c837"
    },
    "/flash/jmanifest.jar":{
      "length":5651,
      "date":1485192866,
      "md5":"dfb84226c647a42295d9f671cfb99fa5",
      "sha":"a7331cca377c1f96e400ddd5044c01a175ee230f",
      "crc":"1a64c6d6"
    },
    "/flash/jping.jar":{
      "length":2174,
      "date":1485201152,
      "md5":"0d533008847888e0dfcf497c0cff1a96",
      "sha":"75fbff5a973b8dac3408fdda46e47e708b585e58",
      "crc":"f1203f43"
    },
    "/flash/jaccess.jar":{
      "length":4820,
      "date":1485805203,
      "md5":"29ce866873686dd133a724e4db29c690",
      "sha":"239bf75c1597a25fdbbbb78798fe72971ca15f63",
      "crc":"e5ae0d1c"
    },
    "/flash/somepath/path2/testx.php":{
      "length":5282,
      "date":1486397961,
      "md5":"ce1a071b258c936c65679d6bb67db198",
      "sha":"30342828ebaeb69cd8ecefd75f2dd01e80c6388b",
      "crc":"ecd9251a"
    },
    "/flash/bruce_dev.cer":{
      "length":902,
      "date":1487172768,
      "md5":"e9917f27384ddee36817c04c8cde9199",
      "sha":"4b2b82a042a0019679c1b071956278f6ddd1f27b",
      "crc":"115ed2ae"
    },
    "/flash/www/config/registrydoc.css":{
      "length":21460,
      "date":1504201641,
      "md5":"15423ca727b03e6b1581910c6ca2eab5",
      "sha":"f521b53a4518e7490768d2a8ae0e707c1dfb943b",
      "crc":"0d5fd8c9"
    },
    "/flash/www/config/registrydoc.html":{
      "length":169108,
      "date":1515600577,
      "md5":"f4b896b0cd0ead740985e4d8e8c20be4",
      "sha":"893b119002295f37afaa71c2f7f6d13fda14ea7c",
      "crc":"3b5a3493"
    },
    "/flash/www/panel/comm.js":{
      "length":4715,
      "date":1498074333,
      "md5":"44aa80868230fbfeee0a3c48c390896d",
      "sha":"37b479f65e7e8221d6fd9349439a8193cc645ba7",
      "crc":"0d5e92bd"
    },
    "/flash/www/panel/index.php":{
      "length":2648,
      "date":1501526934,
      "md5":"923ce6739971521191f9000662f38323",
      "sha":"a35d1d5f24da487be376595b46598e162e0f5310",
      "crc":"ffd86d7b"
    },
    "/flash/www/panel/panel.js":{
      "length":993,
      "date":1501527049,
      "md5":"9d9a2cbb435ffe8af5bd9d8c0598dccd",
      "sha":"2ef881dc8d90b4b0fb80a59d717c7125ca23fb04",
      "crc":"4fcd0f37"
    },
    "/flash/www/panel/panel.css":{
      "length":2586,
      "date":1501527291,
      "md5":"2a3a66d14d7bc6d4b01dfbd745205c7d",
      "sha":"886770297a07a594b88430d5db4ae9e23738d118",
      "crc":"2dd8a81d"
    },
    "/flash/www/graphr.zip":{
      "length":556637,
      "date":1506536442,
      "md5":"891b1dfa8d774b85aefcbd8791abe11f",
      "sha":"e5d204333658bd5c2f7c5b5ff682911124a10766",
      "crc":"62d153fb"
    },
    "/flash/public/dcp.zip":{
      "length":181914,
      "date":1504795829,
      "md5":"655e8587293f35f11c5c24fc38201d2f",
      "sha":"5fcfd8e38826e648f98f8d50f3613deb0d6312b6",
      "crc":"da99b7d0"
    },
    "/flash/test.txt":{
      "length":304,
      "date":1495131459,
      "md5":"fc9f1f5e67928ccb9be3aeaa66cd9e52",
      "sha":"6100d999f484f98ab476408c801dd000e579a62c",
      "crc":"765047c5"
    },
    "/flash/dmx.jar":{
      "length":4476,
      "date":1500567859,
      "md5":"3fd35bbe6bbf53a32aecf273275d1839",
      "sha":"4f702a87adb060294b553e6bd212672727d5d25f",
      "crc":"e81db9aa"
    },
    "/flash/juptime.jar":{
      "length":3201,
      "date":1506713589,
      "md5":"d4c2482fae18482727c1b2afabcf94b4",
      "sha":"86268b720b99760a4ebdb803db53f3f7fd18fd18",
      "crc":"44b0878c"
    },
    "/flash/jscan.jar":{
      "length":2189,
      "date":1507141493,
      "md5":"a0a42e17f003cedcac9c8e662ada6b36",
      "sha":"f1cafb56fdae33b66fff9b20cd2ff2705d96da9e",
      "crc":"60f00fe2"
    },
    "/flash/hmi.jar":{
      "length":8329,
      "date":1511283865,
      "md5":"1a1b247ccb5e3eb9623d12578c1ba833",
      "sha":"7a1f5868817e8a3e60fe8fb2c4d9ed168e53d141",
      "crc":"fb2a0367"
    },
    "/web.log":{
      "length":4735,
      "date":1516889801,
      "md5":"03febfe88d35e995a0d8a15f05e37f70",
      "sha":"4da80a3fb423a2e1ad8b05b6384326ef974a45f3",
      "crc":"d393e4a3"
    },
    "/flash/cinema.jar":{
      "length":313835,
      "date":1512413064,
      "md5":"45b29edcb85af51f58eda0f693b6c13e",
      "sha":"ba7f0da988e351b329e1c8af1929ab36dad99dec",
      "crc":"6e688a54"
    },
    "/flash/cinekey.jar":{
      "length":20266,
      "date":1512570698,
      "md5":"4b8adacc107abc577fae3c73db11d56a",
      "sha":"dde36076fe9a0613a40ccf78d9895bdfd92d93a2",
      "crc":"69db880f"
    },
    "/flash/key.pem":{
      "length":1041,
      "date":1488297708,
      "md5":"f643172f1cceb3703ce126df1f9293b9",
      "sha":"2cea702929e9cc04f6b4c003d2fb3ee507d5240e",
      "crc":"2e1cc611"
    },
    "/flash/key.pub":{
      "length":272,
      "date":1512584838,
      "md5":"344622d414a797bb9d992582c4d129b5",
      "sha":"1a45f21b80ee1ec8509d62fbfd5c71a96e400154",
      "crc":"4c1ce46a"
    },
    "/flash/honeypot.cer":{
      "length":1092,
      "date":1512755338,
      "md5":"51f65aaabc1f1f8d20c27dbe21389e8a",
      "sha":"d218400c2d82bb3766917e9139d0a21a54c56e4e",
      "crc":"ec194c40"
    },
    "/flash/pubkey.pem":{
      "length":278,
      "date":1513103302,
      "md5":"8077da7d24beedf7d0c56bd1d42bd062",
      "sha":"06631dbc5226ea3d3c3e6695c573877e351a7b72",
      "crc":"ce425129"
    },
    "/flash/jtest2.jar":{
      "length":3043,
      "date":1515165671,
      "md5":"c4b4ba07a459dd644abac99bbccbd31e",
      "sha":"35256db54659e900ffc9112bc0e769683ab8e818",
      "crc":"7beaf8b1"
    },
    "/flash/gogo.dat":{
      "length":13,
      "date":1515701808,
      "md5":"32201ddab35c4461b4cc8a555cc52125",
      "sha":"3a10b47bd880c61ab49b8d9c20a357ffb9905424",
      "crc":"c3d317fe"
    },
    "/flash/manifest.zip":{
      "length":8589,
      "date":1516717968,
      "md5":"cc9525181bd63a36f7a7c9bbdd263d52",
      "sha":"a9d9aa3d9f9e43bb77e00861cf1cae8c75307794",
      "crc":"aa1d1871"
    },
    "/flash/www/test.zip":{
      "length":183358,
      "date":1516103573,
      "md5":"c3cfda778bf0334684669fedb36180f7",
      "sha":"1aef18b365347aa0f13f38f315a04edbf7eb37d2",
      "crc":"1da88b8e"
    },
    "/flash/public/logs/file_list.php":{
      "length":1324,
      "date":1516026614,
      "md5":"dc00d3ff6e0dbde0d518cb031adb2ffc",
      "sha":"084e23a1c3920288fc77f5077af9e426d15a7070",
      "crc":"1619a010"
    },
    "/flash/logs/file_list.php":{
      "length":1324,
      "date":1516026614,
      "md5":"dc00d3ff6e0dbde0d518cb031adb2ffc",
      "sha":"084e23a1c3920288fc77f5077af9e426d15a7070",
      "crc":"1619a010"
    },
    "/flash/cinema_backup/macro_cineasia.csv":{
      "length":912,
      "date":1512576908,
      "md5":"3a9c04ed302b116828c6b1e34d90eee8",
      "sha":"0ba6c912592b8fcc94f325088bbf6e5e915b8095",
      "crc":"c08feb1b"
    },
    "/test.zip":{
      "length":28304,
      "date":1516908949,
      "md5":"2a8a593cc66fa62117497c28bf565d20",
      "sha":"d62543f024dfa510450d7be40ff5685269c042c9",
      "crc":"9c9d97ef"
    }
  }
}
bruce_dev /temp> 

We see here how the CAT command can format JSON for us.

Hmm… Perhaps before we release v1.6.4 JANOS I’ll have this command list the files it extracts. Seems like it should have at least indicated that it did what we wanted.

So in the past I have designed products that were capable of plotting collected data as a graphics file for display in the browser. This isn’t currently possible in the JNIOR and it is a feature that could be added to JANOS. It can be useful. I should also mention that JANOS executes Java and serves files out of JAR and ZIP files which generally utilize DEFLATE compression. So we already perform DEFLATE decompression. With a compressor we can add the ability to create/modify JAR and ZIP archives as well as to generate PNG graphics for plots.

The issue with the specifications is that they describe the compression and file formats but do not give you the algorithms. They don’t tell you how to do it. Just what you must do.

Sure often there is reference code or open source projects that you can find. Those often are complete projects and it is difficult to find the precise code in it that you need to understand if you are to implement the core algorithms. Then that code has been optimized and complicated with options over the years that end up obfuscating the structure. And, no, we don’t just absorb 3rd party code into JANOS. That is not the way to maintain a stable embedded environment. Not in my book.

So far this is the case for DEFLATE. I am going to develop the algorithm from scratch so that I fully understand it, know precisely what every statement does, and how the code will perform in the JANOS platform. Maybe more importantly that it will not misbehave and drag JANOS down.

Well I was thinking that I would openly do that here so you can play along. Don’t hesitate to create a legitimate account here and comment or even help. Some approaches to LZ77 have been patented. I have no clue and am not going to spend weeks trying to understand the patent prose. Supposedly the LZ77 implementation associated with DEFLATE is unencumbered. Maybe so. Still I might get creative and cross some line. I am not overly concerned about it but would like to know, at least before getting some sort of legal complaint.

Yeah so… let’s reinvent the wheel. That’s usually how I roll…

If you look into DEFLATE you will be quickly distracted by data structure and Huffman coding. There is some confusing but genius ways of efficiently conveying the Huffman tables and the Huffman coding even seems to be recursively applied. All of that will boggle your mind even before you get to the LZ77 compression at the heart of it all. So, ignore all of that. We will get to it. I am going to start at the heart and build outwards.

LZ77 compression

The compression works by identifying sequences of bytes that occur in the data stream that are repeated later. The repeated sequence is replaced by a short and efficient reference back to the earlier data thereby reducing the size of the overall stream. A 20-year old document by Antaeus Feldspar describes it well by example.

As everything has a limitation, DEFLATE defines a 32KB sliding window which may contain any referenced prior sequence. It just is not feasible to randomly access the entire data set and allow you to reference sequences all the way back to the beginning. This also keeps distances under control allowing only smaller integers to appear in the stream.

It all sounds great but then you realize that in compression you have search that entire 32KB window for matches to the current sequence of bytes each time a new byte is added. Lots of processor cycles are involved and the whole process could take forever. Of course while the decompressor needs to be ready to reference the whole 32KB window a compressor might use a smaller window. That would reduce the effort involved at the cost of compression efficiency. The specifications suggest window and sequence length factors that might be controlled in balancing speed and space efficiency.

It can all get more complex in that the prior sequence can actually overlap the current sequence (as in the example in the document). A further complication comes in if you consider that lazy matching might lead to better compression. A short sequence match might mask a potentially longer match which could have been more beneficial.

So how do I want to proceed here. Some of this reminds me of the fun in creating JANOS’ Regex engine. Hmm…

The Regex engine ends up compiling the expression into one large state machine through which multiple pointers can progress simultaneously. Pointers are advanced as each character is obtained from the data stream. The first pointer (or last depending on the mode) to make it through the whole expression signals a match. If it sounds complicated, it sort of is but at the same time its pretty cool. As far as Regex goes I’ve been able to apply this to almost all of the standard Regex functionality. But JANOS hasn’t implemented every Regex syntax.

For DEFLATE there is somewhat of a similar situation where we want to examine bytes from the data stream one at a time and have the compression algorithm raise its hand when a sequence can be optimally encoded by a [length, distance] pointer. But we want to consider all possibilities and to try to do what leads to the most efficient compression.

I will start by implementing the sliding window as a queue using my standard approach to such things. The size of the queue will be 32K entries or less. In fact, to start out I’ll probably keep it very small. We can enlarge it later when we benchmark the compression algorithm.

Two index pointers will bound the data in the queue. Each new byte will be inserted at the INPTR and that pointer will be incremented and wrapped as required. The oldest byte will be located at the OUTPTR. Once the queue fills we will have to advance the OUTPTR to make room for the next entry. Older data will be dropped and the queue will run at maximum capacity. This is the sliding window.

The sliding window caches the uncompressed data stream. The compressed data stream will be generated by the algorithm separately. We need one more pointer in the sliding widow indicating the start of the current sequence being evaluated. Call that CURPTR. If the current sequence cannot be matched we would output the byte from CURPTR and advance and wrap the pointer as necessary. If the sequence is matched we output the [length, distance] code and advance CURPTR to skip the entire matched sequence.

CURPTR then will lag behind INPTR. It will not get in the way of OUTPTR as DEFLATE specifies a maximum length for the sequence match of 238 bytes and our sliding window will be much larger.

Now lets think about the sequence matching procedure…

I am going to prototype my algorithm in Java on the JNIOR just to make it easier to test and debug. Later, once I have the structure, I can recast it in C and embed it into JANOS.

We’re going to focus on the LZ77 part of the compression first. Our main goal is simply to be compatible with decompression. Our LZ77 algorithm then basically doesn’t have to do anything. We wouldn’t need to find a single repeated sequence nor replace anything with pointers back to any sliding window. Of course our compression ratio wouldn’t be all that impressive. We would still gain through the latter Huffman Coding stages which I am leaving to later. But in the end we would still be able to create file collections and PNG graphics files that are universally usable.

But really, there no fun in a kludge. Let’s see if we can achieve a LZ77 implementation that we can be proud of. Well, at least one that works.

So for development I am going to create a program that will read a selected file compress it using LZ77 into another. I’ll isolate all of the compression effort into one routine and have the outer program report some statistics upon completion.

Here’s our program for testing compression. All of the compression work will be done in do_compress() and this will report the results. At this point we just copy the source file. Yeah, this will be slow. But it will let us examine what we are trying to do more closely than if I went straight to C and use the debugger. In that case I couldn’t really share it with you.

package jtest;
 
import com.integpg.system.JANOS;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
 
public class Main {
    
    public static void main(String[] args) throws Throwable {
        
        // Requires a test file (use log as default)
        String filename = "/jniorsys.log";
        if (args.length > 0)
            filename = args[0];
        
        // Open the selected file for reading
        File src = new File(filename);
        long srclen = src.length();
        BufferedReader infile = new BufferedReader(new FileReader(src));
        if (!infile.ready())
            System.exit(1);
        
        // Create an output file
        BufferedWriter outfile = new BufferedWriter(new FileWriter("/outfile.dat"));
        
        // perform compression
        long timer = JANOS.uptimeMillis();
        do_compress(outfile, infile);
        timer = JANOS.uptimeMillis() - timer;
        
        // Close files
        outfile.close();
        infile.close();
        
        // Output statistics
        File dest = new File("/outfile.dat");
        long destlen = dest.length();
        
        System.out.printf("Processing %.3f seconds.\n", timer/1000.);
        System.out.printf("Source %lld bytes.\n", srclen);
        System.out.printf("Result %lld bytes.\n", destlen);
        System.out.printf("Ratio %.2f%%\n", 100. - (100. * destlen)/srclen);
    }
        
}

This uses my throws Throwable trick to avoid having to worry about try-catch for the time being.

    
    // Our LZ77 compression engine
    static void do_compress(BufferedWriter outfile, BufferedReader infile) throws Throwable {
        
        // simply copy at first
        while (infile.ready()) {
            int ch = infile.read();
            outfile.write(ch);
        }
        
    }
bruce_dev /> jtest
Processing 32.360 seconds.
Source 36737 bytes.
Result 36737 bytes.
Ratio 0.00%

bruce_dev /> echo Blah blah blah blah > blah.dat

bruce_dev /> cat blah.dat
Blah blah blah blah

bruce_dev /> jtest blah.dat
Processing 0.023 seconds.
Source 21 bytes.
Result 21 bytes.
Ratio 0.00%

bruce_dev />

Now back to thinking about the actual compression and matching sequences from a sliding window.

To start let’s implement our sliding window. Recall that I would use a queue for that. You can see here that depending on the window size we will retain the previous so many bytes. New bytes are queued at the INPTR position and when the queue fills we will push the OUTPTR discarding older data.

    // Our LZ77 compression engine
    static void do_compress(BufferedWriter outfile, BufferedReader infile) throws Throwable {
        
        // create queue (sliding window)
        int window = 1024;
        byte[] data = new byte[window];
        int inptr = 0;
        int outptr = 0;
        
        // simply copy at first
        while (infile.ready()) {
            
            // obtain next byte
            int ch = infile.read();
            
            // matching (cannot yet so just output byte)
            outfile.write(ch);
            
            // queue uncompressed data
            data[inptr++] = (byte)ch;
            if (inptr == window)
                inptr = 0;
            if (inptr == outptr) {
                outptr++;
                if (outptr == window)
                    outptr = 0;
            }       
            
        }
        
    }

Now for a particular position in the input stream (CURPTR) we want to scan the queue for sequence matches. We could do that by brute force but for a large sliding window that would be very slow. Also there is the concept of a lazy match which if implemented might lead to better compression ratios. So how to approach the matching process?

So for some position the input stream we will be searching prior data for a sequence match (3 or more bytes). So we create the CURPTR. If we cannot find a match (and right now we cannot because we haven’t implemented matching) we will just output the data byte and bump the current position. Searching will continue for a match starting at the new position.

Right now CURPTR will track INTPTR. Later while we are watching for matches it will lag behind. Here we have created CURPTR. There is otherwise no major functional change.

    // Our LZ77 compression engine
    static void do_compress(BufferedWriter outfile, BufferedReader infile) throws Throwable {
        
        // create queue (sliding window)
        int window = 1024;
        byte[] data = new byte[window];
        int inptr = 0;
        int outptr = 0;
        int curptr = 0;
        
        // simply copy at first
        while (infile.ready()) {
            
            // obtain next byte
            int ch = infile.read();
            
            // matching (cannot yet so just output byte)
            outfile.write(ch);
            curptr++;
            if (curptr == window)
                curptr = 0;
            
            // queue uncompressed data
            data[inptr++] = (byte)ch;
            if (inptr == window)
                inptr = 0;
            if (inptr == outptr) {
                outptr++;
                if (outptr == window)
                    outptr = 0;
            }       
            
        }
        
    }

Now how are we going to do this “watching for matches” thing?

Let’s create the concept of an active match. At any given time we there will be from 0 to some number of active matches. Each will represent a match to the sequence of bytes appearing at the an input stream position. As new bytes are retrieved from the uncompressed input stream we will check any active matches and advance them . If a match is no longer valid it will be removed and forgotten. At that point though maybe the match warrants replacement in the stream with a pointer. We will see.

I made the data queue and related pointers static members of the program and created the following class representing an active match.

    // An active match. At any given position in the sliding WINDOW we compare and track
    //  matches to the incoming DATA stream.
    class match {
        public int start;
        public int ptr;
        public int len;
        
        match(int pos) {
            start = pos;
            ptr = pos + 1;
            if (ptr == WINDOW)
                ptr = 0;
            len = 1;
        }
        
        public boolean check(int ch) {
            if (DATA[ptr] != ch)
                return (false);
            
            ptr++;
            if (ptr == WINDOW)
                ptr = 0;
            len++;
            
            return (true);
        }
    }

When a new data byte is entered into the queue we will want to create these active match objects for every matching byte that previously exists. Every one of those represents a potential sequence.

Okay when we enter a new byte in the queue it becomes a candidate for replacement by a reference to a sequence starting with the byte somewhere earlier. So we want to start a new active match for the earlier bytes. We will process these matches as additional bytes are received from the input stream. To do this though we don’t want to search the entire window for prior existences of each character. So to make things efficient I am going to maintain linked lists through the queue for each character.

Since a byte can have 256 values we create a HEAD pointer array with 256 entries. This is referenced using the byte value. Each queue position then will have both a forward FWD and backwards BACK pointer forming a bi-directional linked list. Yeah, this quintuples our memory requirement but with the benefit of processing speed.

The list has to be bi-directional because once the queue fills we are going to drop bytes. It is then necessary to trim the linked lists to remove pointers for data no longer in the queue. That can only be done efficiently if we can reference a previous entry in the linked list. So we need both directions.

Here are our static members so far. This is the memory usage.

    // create queue (sliding window)
    static final int WINDOW = 1024;
    static final byte[] DATA = new byte[WINDOW];
    static int INPTR = 0;
    static int OUTPTR = 0;
    static int CURPTR = 0;
    
    // data linked list arrays
    static final short[] HEAD = new short[256];
    static final short[] FWD = new short[WINDOW];
    static final short[] BACK = new short[WINDOW];

Now we maintain the linked lists as we add and remove data from the queue. We also can efficiently create new active match objects. Note that we store pointers in the links as +1 so as to keep 0 as a terminator.

            // queue uncompressed DATA
            DATA[INPTR] = (byte)ch;
            
            // Add byte to the head of the appropriate linked list. Note pointers are stored +1 so
            //  as to use 0 as an end of list marker. Lists are bi-directional so we can trim the 
            //  tail when data is dropped from the queue.
            short ptr = HEAD[ch];
            HEAD[ch] = (short)(INPTR + 1);
            FWD[INPTR] = ptr;
            BACK[INPTR] = 0;
            if (ptr != 0)
                BACK[ptr - 1] = (short)(INPTR + 1);
            
            // advance entry pointer
            INPTR++;
            if (INPTR == WINDOW)
                INPTR = 0;
            
            // drop data from queue when full
            if (INPTR == OUTPTR) {
                
                // trim linked list as byte is being dropped
                if (BACK[OUTPTR] == 0)
                    HEAD[DATA[OUTPTR]] = 0;
                else
                    FWD[BACK[OUTPTR] - 1] = 0;
 
                // push end of queue
                OUTPTR++;
                if (OUTPTR == WINDOW)
                    OUTPTR = 0;
            }
 
            // create new active match for all CH in the queue (except last)
            while (ptr != 0) {
                
                // new match started (not doing anything with it yet)
                match m = new match(ptr - 1);
                
                ptr = FWD[ptr - 1];
            }

I adjusted the program to dump non-zero HEAD entries and each occupied queue position including the links as a check. Remember that links are stored in here +1.

bruce_dev /> jtest blah.dat

HEAD
 0x0a 21
 0x0d 20
 0x20 15
 0x42 1
 0x61 18
 0x62 16
 0x68 19
 0x6c 17

QUEUE
 0 0x42 0 0
 1 0x6c 0 7
 2 0x61 0 8
 3 0x68 0 9
 4 0x20 0 10
 5 0x62 0 11
 6 0x6c 2 12
 7 0x61 3 13
 8 0x68 4 14
 9 0x20 5 15
 10 0x62 6 16
 11 0x6c 7 17
 12 0x61 8 18
 13 0x68 9 19
 14 0x20 10 0
 15 0x62 11 0
 16 0x6c 12 0
 17 0x61 13 0
 18 0x68 14 0
 19 0x0d 0 0
 20 0x0a 0 0

STATS
Processing 0.078 seconds.
Source 21 bytes.
Result 21 bytes.
Ratio 0.00%

bruce_dev />

Now as we create new active matches we are going to collect them in an ArrayList object.

    // active matching
    static ArrayList SEQ = new ArrayList();
            // create new active matches for all CH in the queue (except last)
            while (ptr != 0) {
                SEQ.add(new match(ptr - 1));
                ptr = FWD[ptr - 1];
            }

So as each new data byte is retrieved from the uncompressed input stream we will process all active matches. Those that continue to match will be retained and others dropped. That code looks like this:

        // process uncompressed stream
        while (infile.ready()) {
            
            // obtain next byte
            int ch = infile.read();
            
            // process active match objects
            System.out.printf("New byte[%d]: 0x%02x\n", INPTR, ch & 0xff);
            for (int n = SEQ.size() - 1; 0 <= n; n--) {
                match m = SEQ.get(n);
                if (!m.check(ch))
                    SEQ.remove(n);
            }

If following this if I dump the remaining active matches we can watch those proceed. At this point though we have not interpreted the matching status so as to decide whether or not the stream can be altered.

            // dump remaining active matches
            Iterator i = SEQ.iterator();
            while (i.hasNext()) {
                match m = (match) i.next();
                System.out.printf(" Start: %d Ptr: %d Len: %d\n", m.start, m.ptr, m.len);
            }

So in reviewing the An Explanation of the DEFLATE Algorithm paper from 1997:

Antaeus Feldspar wrote:LZ77 compression

LZ77 compression works by finding sequences of data that are repeated. The term “sliding window” is used; all it really means is that at any given point in the data, there is a record of what characters went before. A 32K sliding window means that the compressor (and decompressor) have a record of what the last 32768 (32 * 1024) characters were. When the next sequence of characters to be compressed is identical to one that can be found within the sliding window, the sequence of characters is replaced by two numbers: a distance, representing how far back into the window the sequence starts, and a length, representing the number of characters for which the sequence is identical.

I realize this is a lot easier to see than to just be told. Let’s look at some highly compressible data:

        Blah blah blah blah blah!

Our datastream starts by receiving the following characters: “B,” “l,” “a,” “h,” ” ,” and “b.” However, look at the next five characters:

         vvvvv
        Blah blah blah blah blah!
              ^^^^^

There is an exact match for those five characters in the characters that have already gone into the datastream, and it starts exactly five characters behind the point where we are now. This being the case, we can output special characters to the stream that represent a number for length, and a number for distance.

The data so far:

	Blah blah b

The compressed form of the data so far:

	Blah b[D=5,L=5]

The compression can still be increased, though to take full advantage of it requires a bit of cleverness on the part of the compressor. Look at the two strings that we decided were identical. Compare the character that follows each of them. In both cases, it’s “l” — so we can make the length 6, and not just five. But if we continue checking, we find the next characters, and the next characters, and the next characters, are still identical — even if the so-called ‘previous’ string is overlapping the string we’re trying to represent in the compressed data!

It turns out that the 18 characters that start at the second character are identical to the 18 characters that start at the seventh character. It’s true that when we’re decompressing, and read the length, distance pair that describes this relationship, we don’t know what all those 18 characters will be yet — but if we put in place the ones that we know, we will know more, which will allow us to put down more… or, knowing that any length-and-distance pair where length > distance is going to be repeating (distance) characters again and again, we can set up the decompressor to do just that.

It turns out our highly compressible data can be compressed down to just this:

	Blah b[D=5, L=18]!

So if I feed this exact stream to what we have so far we can observe the sequencing:

CODE: SELECT ALL

bruce_dev /> echo Blah blah blah blah blah! > blah.dat

bruce_dev /> cat blah.dat
Blah blah blah blah blah!

bruce_dev /> jtest blah.dat                           
New byte[0]: 0x42
New byte[1]: 0x6c
New byte[2]: 0x61
New byte[3]: 0x68
New byte[4]: 0x20
New byte[5]: 0x62
New byte[6]: 0x6c
New byte[7]: 0x61
 Start: 1 Ptr: 3 Len: 2
New byte[8]: 0x68
 Start: 1 Ptr: 4 Len: 3
 Start: 2 Ptr: 4 Len: 2
New byte[9]: 0x20
 Start: 1 Ptr: 5 Len: 4
 Start: 2 Ptr: 5 Len: 3
 Start: 3 Ptr: 5 Len: 2
New byte[10]: 0x62
 Start: 1 Ptr: 6 Len: 5
 Start: 2 Ptr: 6 Len: 4
 Start: 3 Ptr: 6 Len: 3
 Start: 4 Ptr: 6 Len: 2
New byte[11]: 0x6c
 Start: 1 Ptr: 7 Len: 6
 Start: 2 Ptr: 7 Len: 5
 Start: 3 Ptr: 7 Len: 4
 Start: 4 Ptr: 7 Len: 3
 Start: 5 Ptr: 7 Len: 2
New byte[12]: 0x61
 Start: 1 Ptr: 8 Len: 7
 Start: 2 Ptr: 8 Len: 6
 Start: 3 Ptr: 8 Len: 5
 Start: 4 Ptr: 8 Len: 4
 Start: 5 Ptr: 8 Len: 3
 Start: 6 Ptr: 8 Len: 2
 Start: 1 Ptr: 3 Len: 2
New byte[13]: 0x68
 Start: 1 Ptr: 9 Len: 8
 Start: 2 Ptr: 9 Len: 7
 Start: 3 Ptr: 9 Len: 6
 Start: 4 Ptr: 9 Len: 5
 Start: 5 Ptr: 9 Len: 4
 Start: 6 Ptr: 9 Len: 3
 Start: 1 Ptr: 4 Len: 3
 Start: 7 Ptr: 9 Len: 2
 Start: 2 Ptr: 4 Len: 2
New byte[14]: 0x20
 Start: 1 Ptr: 10 Len: 9
 Start: 2 Ptr: 10 Len: 8
 Start: 3 Ptr: 10 Len: 7
 Start: 4 Ptr: 10 Len: 6
 Start: 5 Ptr: 10 Len: 5
 Start: 6 Ptr: 10 Len: 4
 Start: 1 Ptr: 5 Len: 4
 Start: 7 Ptr: 10 Len: 3
 Start: 2 Ptr: 5 Len: 3
 Start: 8 Ptr: 10 Len: 2
 Start: 3 Ptr: 5 Len: 2
New byte[15]: 0x62
 Start: 1 Ptr: 11 Len: 10
 Start: 2 Ptr: 11 Len: 9
 Start: 3 Ptr: 11 Len: 8
 Start: 4 Ptr: 11 Len: 7
 Start: 5 Ptr: 11 Len: 6
 Start: 6 Ptr: 11 Len: 5
 Start: 1 Ptr: 6 Len: 5
 Start: 7 Ptr: 11 Len: 4
 Start: 2 Ptr: 6 Len: 4
 Start: 8 Ptr: 11 Len: 3
 Start: 3 Ptr: 6 Len: 3
 Start: 9 Ptr: 11 Len: 2
 Start: 4 Ptr: 6 Len: 2
New byte[16]: 0x6c
 Start: 1 Ptr: 12 Len: 11
 Start: 2 Ptr: 12 Len: 10
 Start: 3 Ptr: 12 Len: 9
 Start: 4 Ptr: 12 Len: 8
 Start: 5 Ptr: 12 Len: 7
 Start: 6 Ptr: 12 Len: 6
 Start: 1 Ptr: 7 Len: 6
 Start: 7 Ptr: 12 Len: 5
 Start: 2 Ptr: 7 Len: 5
 Start: 8 Ptr: 12 Len: 4
 Start: 3 Ptr: 7 Len: 4
 Start: 9 Ptr: 12 Len: 3
 Start: 4 Ptr: 7 Len: 3
 Start: 10 Ptr: 12 Len: 2
 Start: 5 Ptr: 7 Len: 2
New byte[17]: 0x61
 Start: 1 Ptr: 13 Len: 12
 Start: 2 Ptr: 13 Len: 11
 Start: 3 Ptr: 13 Len: 10
 Start: 4 Ptr: 13 Len: 9
 Start: 5 Ptr: 13 Len: 8
 Start: 6 Ptr: 13 Len: 7
 Start: 1 Ptr: 8 Len: 7
 Start: 7 Ptr: 13 Len: 6
 Start: 2 Ptr: 8 Len: 6
 Start: 8 Ptr: 13 Len: 5
 Start: 3 Ptr: 8 Len: 5
 Start: 9 Ptr: 13 Len: 4
 Start: 4 Ptr: 8 Len: 4
 Start: 10 Ptr: 13 Len: 3
 Start: 5 Ptr: 8 Len: 3
 Start: 11 Ptr: 13 Len: 2
 Start: 6 Ptr: 8 Len: 2
 Start: 1 Ptr: 3 Len: 2
New byte[18]: 0x68
 Start: 1 Ptr: 14 Len: 13
 Start: 2 Ptr: 14 Len: 12
 Start: 3 Ptr: 14 Len: 11
 Start: 4 Ptr: 14 Len: 10
 Start: 5 Ptr: 14 Len: 9
 Start: 6 Ptr: 14 Len: 8
 Start: 1 Ptr: 9 Len: 8
 Start: 7 Ptr: 14 Len: 7
 Start: 2 Ptr: 9 Len: 7
 Start: 8 Ptr: 14 Len: 6
 Start: 3 Ptr: 9 Len: 6
 Start: 9 Ptr: 14 Len: 5
 Start: 4 Ptr: 9 Len: 5
 Start: 10 Ptr: 14 Len: 4
 Start: 5 Ptr: 9 Len: 4
 Start: 11 Ptr: 14 Len: 3
 Start: 6 Ptr: 9 Len: 3
 Start: 1 Ptr: 4 Len: 3
 Start: 12 Ptr: 14 Len: 2
 Start: 7 Ptr: 9 Len: 2
 Start: 2 Ptr: 4 Len: 2
New byte[19]: 0x20
 Start: 1 Ptr: 15 Len: 14
 Start: 2 Ptr: 15 Len: 13
 Start: 3 Ptr: 15 Len: 12
 Start: 4 Ptr: 15 Len: 11
 Start: 5 Ptr: 15 Len: 10
 Start: 6 Ptr: 15 Len: 9
 Start: 1 Ptr: 10 Len: 9
 Start: 7 Ptr: 15 Len: 8
 Start: 2 Ptr: 10 Len: 8
 Start: 8 Ptr: 15 Len: 7
 Start: 3 Ptr: 10 Len: 7
 Start: 9 Ptr: 15 Len: 6
 Start: 4 Ptr: 10 Len: 6
 Start: 10 Ptr: 15 Len: 5
 Start: 5 Ptr: 10 Len: 5
 Start: 11 Ptr: 15 Len: 4
 Start: 6 Ptr: 10 Len: 4
 Start: 1 Ptr: 5 Len: 4
 Start: 12 Ptr: 15 Len: 3
 Start: 7 Ptr: 10 Len: 3
 Start: 2 Ptr: 5 Len: 3
 Start: 13 Ptr: 15 Len: 2
 Start: 8 Ptr: 10 Len: 2
 Start: 3 Ptr: 5 Len: 2
New byte[20]: 0x62
 Start: 1 Ptr: 16 Len: 15
 Start: 2 Ptr: 16 Len: 14
 Start: 3 Ptr: 16 Len: 13
 Start: 4 Ptr: 16 Len: 12
 Start: 5 Ptr: 16 Len: 11
 Start: 6 Ptr: 16 Len: 10
 Start: 1 Ptr: 11 Len: 10
 Start: 7 Ptr: 16 Len: 9
 Start: 2 Ptr: 11 Len: 9
 Start: 8 Ptr: 16 Len: 8
 Start: 3 Ptr: 11 Len: 8
 Start: 9 Ptr: 16 Len: 7
 Start: 4 Ptr: 11 Len: 7
 Start: 10 Ptr: 16 Len: 6
 Start: 5 Ptr: 11 Len: 6
 Start: 11 Ptr: 16 Len: 5
 Start: 6 Ptr: 11 Len: 5
 Start: 1 Ptr: 6 Len: 5
 Start: 12 Ptr: 16 Len: 4
 Start: 7 Ptr: 11 Len: 4
 Start: 2 Ptr: 6 Len: 4
 Start: 13 Ptr: 16 Len: 3
 Start: 8 Ptr: 11 Len: 3
 Start: 3 Ptr: 6 Len: 3
 Start: 14 Ptr: 16 Len: 2
 Start: 9 Ptr: 11 Len: 2
 Start: 4 Ptr: 6 Len: 2
New byte[21]: 0x6c
 Start: 1 Ptr: 17 Len: 16
 Start: 2 Ptr: 17 Len: 15
 Start: 3 Ptr: 17 Len: 14
 Start: 4 Ptr: 17 Len: 13
 Start: 5 Ptr: 17 Len: 12
 Start: 6 Ptr: 17 Len: 11
 Start: 1 Ptr: 12 Len: 11
 Start: 7 Ptr: 17 Len: 10
 Start: 2 Ptr: 12 Len: 10
 Start: 8 Ptr: 17 Len: 9
 Start: 3 Ptr: 12 Len: 9
 Start: 9 Ptr: 17 Len: 8
 Start: 4 Ptr: 12 Len: 8
 Start: 10 Ptr: 17 Len: 7
 Start: 5 Ptr: 12 Len: 7
 Start: 11 Ptr: 17 Len: 6
 Start: 6 Ptr: 12 Len: 6
 Start: 1 Ptr: 7 Len: 6
 Start: 12 Ptr: 17 Len: 5
 Start: 7 Ptr: 12 Len: 5
 Start: 2 Ptr: 7 Len: 5
 Start: 13 Ptr: 17 Len: 4
 Start: 8 Ptr: 12 Len: 4
 Start: 3 Ptr: 7 Len: 4
 Start: 14 Ptr: 17 Len: 3
 Start: 9 Ptr: 12 Len: 3
 Start: 4 Ptr: 7 Len: 3
 Start: 15 Ptr: 17 Len: 2
 Start: 10 Ptr: 12 Len: 2
 Start: 5 Ptr: 7 Len: 2
New byte[22]: 0x61
 Start: 1 Ptr: 18 Len: 17
 Start: 2 Ptr: 18 Len: 16
 Start: 3 Ptr: 18 Len: 15
 Start: 4 Ptr: 18 Len: 14
 Start: 5 Ptr: 18 Len: 13
 Start: 6 Ptr: 18 Len: 12
 Start: 1 Ptr: 13 Len: 12
 Start: 7 Ptr: 18 Len: 11
 Start: 2 Ptr: 13 Len: 11
 Start: 8 Ptr: 18 Len: 10
 Start: 3 Ptr: 13 Len: 10
 Start: 9 Ptr: 18 Len: 9
 Start: 4 Ptr: 13 Len: 9
 Start: 10 Ptr: 18 Len: 8
 Start: 5 Ptr: 13 Len: 8
 Start: 11 Ptr: 18 Len: 7
 Start: 6 Ptr: 13 Len: 7
 Start: 1 Ptr: 8 Len: 7
 Start: 12 Ptr: 18 Len: 6
 Start: 7 Ptr: 13 Len: 6
 Start: 2 Ptr: 8 Len: 6
 Start: 13 Ptr: 18 Len: 5
 Start: 8 Ptr: 13 Len: 5
 Start: 3 Ptr: 8 Len: 5
 Start: 14 Ptr: 18 Len: 4
 Start: 9 Ptr: 13 Len: 4
 Start: 4 Ptr: 8 Len: 4
 Start: 15 Ptr: 18 Len: 3
 Start: 10 Ptr: 13 Len: 3
 Start: 5 Ptr: 8 Len: 3
 Start: 16 Ptr: 18 Len: 2
 Start: 11 Ptr: 13 Len: 2
 Start: 6 Ptr: 8 Len: 2
 Start: 1 Ptr: 3 Len: 2
New byte[23]: 0x68
 Start: 1 Ptr: 19 Len: 18
 Start: 2 Ptr: 19 Len: 17
 Start: 3 Ptr: 19 Len: 16
 Start: 4 Ptr: 19 Len: 15
 Start: 5 Ptr: 19 Len: 14
 Start: 6 Ptr: 19 Len: 13
 Start: 1 Ptr: 14 Len: 13
 Start: 7 Ptr: 19 Len: 12
 Start: 2 Ptr: 14 Len: 12
 Start: 8 Ptr: 19 Len: 11
 Start: 3 Ptr: 14 Len: 11
 Start: 9 Ptr: 19 Len: 10
 Start: 4 Ptr: 14 Len: 10
 Start: 10 Ptr: 19 Len: 9
 Start: 5 Ptr: 14 Len: 9
 Start: 11 Ptr: 19 Len: 8
 Start: 6 Ptr: 14 Len: 8
 Start: 1 Ptr: 9 Len: 8
 Start: 12 Ptr: 19 Len: 7
 Start: 7 Ptr: 14 Len: 7
 Start: 2 Ptr: 9 Len: 7
 Start: 13 Ptr: 19 Len: 6
 Start: 8 Ptr: 14 Len: 6
 Start: 3 Ptr: 9 Len: 6
 Start: 14 Ptr: 19 Len: 5
 Start: 9 Ptr: 14 Len: 5
 Start: 4 Ptr: 9 Len: 5
 Start: 15 Ptr: 19 Len: 4
 Start: 10 Ptr: 14 Len: 4
 Start: 5 Ptr: 9 Len: 4
 Start: 16 Ptr: 19 Len: 3
 Start: 11 Ptr: 14 Len: 3
 Start: 6 Ptr: 9 Len: 3
 Start: 1 Ptr: 4 Len: 3
 Start: 17 Ptr: 19 Len: 2
 Start: 12 Ptr: 14 Len: 2
 Start: 7 Ptr: 9 Len: 2
 Start: 2 Ptr: 4 Len: 2
New byte[24]: 0x21
New byte[25]: 0x0d
New byte[26]: 0x0a
Processing 2.494 seconds.
Source 27 bytes.
Result 27 bytes.
Ratio 0.00%

bruce_dev /> 

Now the ‘D’ in the paper refers to the distance between our CURPTR at the point when a match is started and the matched sequence position. If you persevere through the above you can verify that D=5 would be correct and that our best match ran through the length of 18.

So we now need the logic controlling the advancement of CURPTR and what then to output to the compressed stream.

So the strategy at this point is to process all active matches. We don’t move the CURPTR when there is at least one potential match still in the works. When a data byte is received that does not extend a match we remove that match from the array. We keep track of the best of the matches that terminate as that would be a candidate for using a reference pointer if none remain active.

    // Our LZ77 compression engine
    static void do_compress(BufferedWriter outfile, BufferedReader infile) throws Throwable {
        
        // process uncompressed stream
        while (infile.ready()) {
            
            // obtain next byte
            int ch = infile.read();
            System.out.print((char)ch);
            
            // process active match objects
            boolean bActive = false;
            match best = null;
            for (int n = SEQ.size() - 1; 0 <= n; n--) { match m = SEQ.get(n); if (!m.check(ch)) { if (m.curptr == CURPTR && best == null && m.len >= 3)
                        best = m;
                    SEQ.remove(n);
                }
                else if (m.curptr == CURPTR)
                    bActive = true;
            }
            

From the above bActive will be true if there remain potentially longer sequences. In that case we will not do anything with CURPTR or output anything. We just eventually move on to process the next byte from the uncompressed input stream.

If bActive is false and best remains null then the byte at CURPTR will be output and CURPTR advanced.

Otherwise we have match to something in the sliding window and that can replace the uncompressed sequence.

            // If there is no active sequence then we need to generate some output
            if (!bActive) {
                
                // If there's been no match then we output data as is
                if (best == null) {
                    while (CURPTR != INPTR) {
                        outfile.write(DATA[CURPTR]);
                        CURPTR++;
                        if (CURPTR == WINDOW)
                            CURPTR = 0;
 
                        int n;
                        for (n = SEQ.size() - 1; 0 <= n; n--) {
                            match m = SEQ.get(n);
                            if (m.curptr == CURPTR)
                                break;
                        }
                        if (0 <= n)
                            break;
                    }
                }
                
                // otherwise we can substitute
                else {
                    int distance = best.curptr - best.start;
                    if (distance < 0) distance += WINDOW; String msg = String.format("[D=%d, L=%d]", distance, best.len); outfile.write(msg); // flush active matches int len = best.len; while (len-- > 0) {
                        CURPTR++;
                        if (CURPTR == WINDOW)
                            CURPTR = 0;
 
                        // remove overlapped active sequences
                        for (int n = SEQ.size() - 1; 0 <= n; n--) {
                            match m = SEQ.get(n);
                            if (m.curptr == CURPTR)
                                SEQ.remove(n);
                        }
                    }
                    
                }
            }  

In the above when we are outputting data uncompressed from CURPTR we continue until we encounter an active match. When we replace a sequence we flush any active matches involving the data replaced. Those matches (which potentially could be more beneficial) are no longer valid. Note that for debugging I am outputting the [D= L=] format so I can see what is being replaced and how more clearly.

Later the best match would also be the one with the lowest distance. That saves bits and takes advantage of the Huffman Coding which we have yet to implement.

The output using the above logic and debugging looks like this for the blah blah test data.

bruce_dev /> jtest blah.dat
Blah blah blah blah blah!
Processing 0.856 seconds.
Source 27 bytes.
Result 19 bytes.
Ratio 29.63%

bruce_dev /> cat outfile.dat
Blah b[D=5, L=18]!
bruce_dev />

This agrees nicely with the example from the paper. This works with more elaborate data as well. I know that this is a Java prototype but it seems even a bit too slow for larger files even considering the platform. I get the feeling that I am creating far too many new active match objects. Perhaps there is some logic as to what is worth creating and what isn’t.

It turns out that I am likely creating 3X the number of active match objects than is necessary however the overhead to detect them slows the process way too much. It seems better to create the unnecessary match objects and optimize them out later.

I decided to take a step back and not try to generate the compressed output stream just yet. Instead I want to look at the sequence matches that I detect to see what logic I would need to achieve an optimal compression. There is this lazy match optimization to consider.

You might recall that the approach is that when I retrieve a byte from the uncompressed input stream I process all of the active sequence match objects. If the new character extends a match then it remains active. Otherwise the sequence is complete and we decide whether it is useful or not before removing it from the active match list. A useful match is simply one that is 3 or more bytes in length.

The uncompressed data byte is queued (enters the sliding window) and we create new active sequence matches for every matching bat still in the window. We use a linked list to efficiently locate those characters.

If we simply list those potentially useful matches the code looks like what follows.

    // Our LZ77 compression engine
    static void do_compress(BufferedWriter outfile, BufferedReader infile) throws Throwable {
        
        // process uncompressed stream
        while (infile.ready()) {
            
            // obtain next byte
            int ch = infile.read();
            //System.out.print((char)ch);
            
            // process active Match objects
            for (int n = SEQ.size() - 1; 0 <= n; n--) { Match m = SEQ.get(n); if (!m.check(ch)) { if (m.len >= 3) {
                        System.out.printf("I=%04x C=%04x P=%04x, D=%d, L=%d\n", 
                                INPTR, m.curptr, m.start, m.distance, m.len);
                    }
                            
                    SEQ.remove(n);
                }
            }
            
            // queue uncompressed DATA
            int inp = INPTR;
            DATA[INPTR] = (byte)ch;
            
            // Add byte to the head of the appropriate linked list. Note pointers are stored +1 so
            //  as to use 0 as an end of list marker. Lists are bi-directional so we can trim the 
            //  tail when data is dropped from the queue.
            short ptr = HEAD[ch];
            HEAD[ch] = (short)(INPTR + 1);
            FWD[INPTR] = ptr;
            BACK[INPTR] = 0;
            if (ptr != 0)
                BACK[ptr - 1] = (short)(INPTR + 1);
            
            // advance entry pointer
            INPTR++;
            if (INPTR == WINDOW)
                INPTR = 0;
            
            // drop data from queue when full
            if (INPTR == OUTPTR) {
                
                // trim linked list as byte is being dropped
                if (BACK[OUTPTR] == 0)
                    HEAD[DATA[OUTPTR]] = 0;
                else
                    FWD[BACK[OUTPTR] - 1] = 0;
 
                // push end of queue
                OUTPTR++;
                if (OUTPTR == WINDOW)
                    OUTPTR = 0;
            }
 
            // create new active matches for all CH in the queue (except last)
            while (ptr != 0) {
                SEQ.add(new Match(inp, ptr - 1));
                ptr = FWD[ptr - 1];
            }
        }
    }

Now we display the match results with pointers in hex so we can locate them in a hex dump. These are ‘I’ giving the uncompressed input position; ‘C’ showing the input position when the match was created; ‘P’ is the position in the sliding window or queue; ‘D’ is the distance (basically C minus P); And, ‘L’ the length of the match. We get the following for the blah blah data.

CODE: SELECT ALL

bruce_dev /> cat blah.dat -h
00000000  42 6c 61 68 20 62 6c 61  68 20 62 6c 61 68 20 62  Blah.bla h.blah.b
00000010  6c 61 68 20 62 6c 61 68  21 0d 0a                 lah.blah !..

bruce_dev /> jtest blah.dat
I=0018 C=0015 P=0001, D=20, L=3
I=0018 C=0015 P=0006, D=15, L=3
I=0018 C=0015 P=000b, D=10, L=3
I=0018 C=0015 P=0010, D=5, L=3
I=0018 C=0014 P=0005, D=15, L=4
I=0018 C=0014 P=000a, D=10, L=4
I=0018 C=0014 P=000f, D=5, L=4
I=0018 C=0013 P=0004, D=15, L=5
I=0018 C=0013 P=0009, D=10, L=5
I=0018 C=0013 P=000e, D=5, L=5
I=0018 C=0012 P=0003, D=15, L=6
I=0018 C=0012 P=0008, D=10, L=6
I=0018 C=0012 P=000d, D=5, L=6
I=0018 C=0011 P=0002, D=15, L=7
I=0018 C=0011 P=0007, D=10, L=7
I=0018 C=0011 P=000c, D=5, L=7
I=0018 C=0010 P=0001, D=15, L=8
I=0018 C=0010 P=0006, D=10, L=8
I=0018 C=0010 P=000b, D=5, L=8
I=0018 C=000f P=0005, D=10, L=9
I=0018 C=000f P=000a, D=5, L=9
I=0018 C=000e P=0004, D=10, L=10
I=0018 C=000e P=0009, D=5, L=10
I=0018 C=000d P=0003, D=10, L=11
I=0018 C=000d P=0008, D=5, L=11
I=0018 C=000c P=0002, D=10, L=12
I=0018 C=000c P=0007, D=5, L=12
I=0018 C=000b P=0001, D=10, L=13
I=0018 C=000b P=0006, D=5, L=13
I=0018 C=000a P=0005, D=5, L=14
I=0018 C=0009 P=0004, D=5, L=15
I=0018 C=0008 P=0003, D=5, L=16
I=0018 C=0007 P=0002, D=5, L=17
I=0018 C=0006 P=0001, D=5, L=18
Processing 1.063 seconds.
Source 27 bytes.
Result 0 bytes.
Ratio 100.00%

bruce_dev />

You can see from this that we are processing a lot of matches for what we know will be just one replacement. You can see that for each new byte received (at C) we create potential matches for all matching bytes (1 or more P) each with a fixed D. We then have advanced the length L as additional characters extend the match.

Of all these matches completed there will be one best match. That would be the longest match (largest L) and if there is a choice between multiple matches of the same length (L) we would take the one with lowest distance (D).

Now let me add logic to select the best match and only display that one. I am also going to display a marker (“—-“) when we reach a point in the input stream after having found at least one match when there are no active matches in process. That point would be a good time to generate the compressed data based on the matching sequences we’ve found.

So that section of the code now looks like this.

    // Our LZ77 compression engine
    static void do_compress(BufferedWriter outfile, BufferedReader infile) throws Throwable {
        
        boolean bFound = false;
 
        // process uncompressed stream
        while (infile.ready()) {
            
            // obtain next byte
            int ch = infile.read();
            //System.out.print((char)ch);
            
            // process active Match objects
            Match best = null;
            for (int n = SEQ.size() - 1; 0 <= n; n--) { Match m = SEQ.get(n); if (!m.check(ch)) { if (m.len >= 3) {
                        if (best == null)
                            best = m;
                        else if (m.len > best.len)
                            best = m;
                        else if (m.len == best.len && m.distance < best.distance)
                            best = m;
                    }
                            
                    SEQ.remove(n);
                }
            }
            if (best != null) {
                System.out.printf("I=%04x C=%04x P=%04x, D=%d, L=%d\n", 
                        INPTR, best.curptr, best.start, best.distance, best.len);
                bFound = true;
            }
            
            if (bFound && SEQ.size() == 0) {
                System.out.println("----");
                bFound = false;
            }
            
            // queue uncompressed DATA

Processing the blah blah data yield the following which we have seen is the replacement that we are hoping for.

CODE: SELECT ALL

bruce_dev /> cat blah.dat -h
00000000  42 6c 61 68 20 62 6c 61  68 20 62 6c 61 68 20 62  Blah.bla h.blah.b
00000010  6c 61 68 20 62 6c 61 68  21 0d 0a                 lah.blah !..

bruce_dev /> jtest blah.dat 
I=0018 C=0006 P=0001, D=5, L=18
----
Processing 0.819 seconds.
Source 27 bytes.
Result 0 bytes.
Ratio 100.00%

bruce_dev /> 

What about some more involved input data?

CODE: SELECT ALL

bruce_dev /> cat jniorboot.log -h
00000000  30 31 2f 30 34 2f 31 38  20 30 38 3a 31 39 3a 31  01/04/18 .08:19:1
00000010  38 2e 31 31 31 2c 20 2a  2a 20 4f 53 20 43 52 43  8.111,.* *.OS.CRC
00000020  20 64 65 74 61 69 6c 20  75 70 64 61 74 65 64 0d  .detail. updated.
00000030  0a 30 31 2f 30 34 2f 31  38 20 30 38 3a 31 39 3a  .01/04/1 8.08:19:
00000040  31 38 2e 31 35 38 2c 20  2d 2d 20 4d 6f 64 65 6c  18.158,. --.Model
00000050  20 34 31 30 20 76 31 2e  36 2e 33 20 2d 20 4a 41  .410.v1. 6.3.-.JA
00000060  4e 4f 53 20 53 65 72 69  65 73 20 34 0d 0a 30 31  NOS.Seri es.4..01
00000070  2f 30 34 2f 31 38 20 30  38 3a 31 39 3a 31 38 2e  /04/18.0 8:19:18.
00000080  31 37 38 2c 20 43 6f 70  79 72 69 67 68 74 20 28  178,.Cop yright.(
00000090  63 29 20 32 30 31 32 2d  32 30 31 38 20 49 4e 54  c).2012- 2018.INT
000000A0  45 47 20 50 72 6f 63 65  73 73 20 47 72 6f 75 70  EG.Proce ss.Group
000000B0  2c 20 49 6e 63 2e 2c 20  47 69 62 73 6f 6e 69 61  ,.Inc.,. Gibsonia
000000C0  20 50 41 20 55 53 41 0d  0a 30 31 2f 30 34 2f 31  .PA.USA. .01/04/1
000000D0  38 20 30 38 3a 31 39 3a  31 38 2e 31 39 37 2c 20  8.08:19: 18.197,.
000000E0  4a 41 4e 4f 53 20 77 72  69 74 74 65 6e 20 61 6e  JANOS.wr itten.an
000000F0  64 20 64 65 76 65 6c 6f  70 65 64 20 62 79 20 42  d.develo ped.by.B
00000100  72 75 63 65 20 43 6c 6f  75 74 69 65 72 0d 0a 30  ruce.Clo utier..0
00000110  31 2f 30 34 2f 31 38 20  30 38 3a 31 39 3a 31 38  1/04/18. 08:19:18
00000120  2e 32 31 36 2c 20 53 65  72 69 61 6c 20 4e 75 6d  .216,.Se rial.Num
00000130  62 65 72 3a 20 36 31 34  30 37 30 35 30 30 0d 0a  ber:.614 070500..
00000140  30 31 2f 30 34 2f 31 38  20 30 38 3a 31 39 3a 31  01/04/18 .08:19:1
00000150  38 2e 32 33 36 2c 20 46  69 6c 65 20 53 79 73 74  8.236,.F ile.Syst
00000160  65 6d 20 6d 6f 75 6e 74  65 64 0d 0a 30 31 2f 30  em.mount ed..01/0
00000170  34 2f 31 38 20 30 38 3a  31 39 3a 31 38 2e 32 35  4/18.08: 19:18.25
00000180  37 2c 20 52 65 67 69 73  74 72 79 20 6d 6f 75 6e  7,.Regis try.moun
00000190  74 65 64 0d 0a 30 31 2f  30 34 2f 31 38 20 30 38  ted..01/ 04/18.08
000001A0  3a 31 39 3a 31 38 2e 33  30 36 2c 20 4e 65 74 77  :19:18.3 06,.Netw
000001B0  6f 72 6b 20 49 6e 69 74  69 61 6c 69 7a 65 64 0d  ork.Init ialized.
000001C0  0a 30 31 2f 30 34 2f 31  38 20 30 38 3a 31 39 3a  .01/04/1 8.08:19:
000001D0  31 38 2e 33 32 36 2c 20  45 74 68 65 72 6e 65 74  18.326,. Ethernet
000001E0  20 41 64 64 72 65 73 73  3a 20 39 63 3a 38 64 3a  .Address :.9c:8d:
000001F0  31 61 3a 30 30 3a 30 37  3a 65 65 0d 0a 30 31 2f  1a:00:07 :ee..01/
00000200  30 34 2f 31 38 20 30 38  3a 31 39 3a 31 38 2e 34  04/18.08 :19:18.4
00000210  34 37 2c 20 53 65 6e 73  6f 72 20 50 6f 72 74 20  47,.Sens or.Port.
00000220  69 6e 69 74 69 61 6c 69  7a 65 64 0d 0a 30 31 2f  initiali zed..01/
00000230  30 34 2f 31 38 20 30 38  3a 31 39 3a 31 38 2e 35  04/18.08 :19:18.5
00000240  30 32 2c 20 49 2f 4f 20  73 65 72 76 69 63 65 73  02,.I/O. services
00000250  20 69 6e 69 74 69 61 6c  69 7a 65 64 0d 0a 30 31  .initial ized..01
00000260  2f 30 34 2f 31 38 20 30  38 3a 31 39 3a 31 38 2e  /04/18.0 8:19:18.
00000270  35 33 35 2c 20 46 54 50  20 73 65 72 76 65 72 20  535,.FTP .server.
00000280  65 6e 61 62 6c 65 64 20  66 6f 72 20 70 6f 72 74  enabled. for.port
00000290  20 32 31 0d 0a 30 31 2f  30 34 2f 31 38 20 30 38  .21..01/ 04/18.08
000002A0  3a 31 39 3a 31 38 2e 35  35 36 2c 20 50 72 6f 74  :19:18.5 56,.Prot
000002B0  6f 63 6f 6c 20 73 65 72  76 65 72 20 65 6e 61 62  ocol.ser ver.enab
000002C0  6c 65 64 20 66 6f 72 20  70 6f 72 74 20 39 32 30  led.for. port.920
000002D0  30 0d 0a 30 31 2f 30 34  2f 31 38 20 30 38 3a 31  0..01/04 /18.08:1
000002E0  39 3a 31 38 2e 35 38 36  2c 20 57 65 62 53 65 72  9:18.586 ,.WebSer
000002F0  76 65 72 20 65 6e 61 62  6c 65 64 20 66 6f 72 20  ver.enab led.for.
00000300  70 6f 72 74 20 38 30 0d  0a 30 31 2f 30 34 2f 31  port.80. .01/04/1
00000310  38 20 30 38 3a 31 39 3a  31 38 2e 36 30 38 2c 20  8.08:19: 18.608,.
00000320  54 65 6c 6e 65 74 20 73  65 72 76 65 72 20 65 6e  Telnet.s erver.en
00000330  61 62 6c 65 64 20 66 6f  72 20 70 6f 72 74 20 32  abled.fo r.port.2
00000340  33 0d 0a 30 31 2f 30 34  2f 31 38 20 30 38 3a 31  3..01/04 /18.08:1
00000350  39 3a 31 38 2e 36 33 32  2c 20 50 4f 52 3a 20 35  9:18.632 ,.POR:.5
00000360  39 32 36 0d 0a 30 31 2f  30 34 2f 31 38 20 30 38  926..01/ 04/18.08
00000370  3a 31 39 3a 31 38 2e 36  35 33 2c 20 43 75 6d 75  :19:18.6 53,.Cumu
00000380  6c 61 74 69 76 65 20 52  75 6e 74 69 6d 65 3a 20  lative.R untime:.
00000390  38 20 57 65 65 6b 73 20  35 20 44 61 79 73 20 31  8.Weeks. 5.Days.1
000003A0  20 48 6f 75 72 20 32 34  3a 33 32 2e 32 38 31 0d  .Hour.24 :32.281.
000003B0  0a 30 31 2f 30 34 2f 31  38 20 30 38 3a 31 39 3a  .01/04/1 8.08:19:
000003C0  31 38 2e 36 37 38 2c 20  42 6f 6f 74 20 43 6f 6d  18.678,. Boot.Com
000003D0  70 6c 65 74 65 64 20 5b  32 2e 33 20 73 65 63 6f  pleted.[ 2.3.seco
000003E0  6e 64 73 5d 0d 0a                                 nds]..

CODE: SELECT ALL

bruce_dev /> jtest jniorboot.log
I=0044 C=0031 P=0000, D=49, L=19
----
I=0064 C=0061 P=001a, D=71, L=3
----
I=0081 C=006c P=002f, D=61, L=21
----
I=0085 C=0082 P=0045, D=61, L=3
----
I=009b C=0098 P=0093, D=5, L=3
I=009d C=009a P=0074, D=38, L=3
----
I=00d2 C=00cf P=009a, D=53, L=3
I=00dc C=00c7 P=006c, D=91, L=21
----
I=00e6 C=00df P=005d, D=130, L=7
----
I=00f4 C=00f1 P=0020, D=209, L=3
----
I=0118 C=0115 P=009a, D=123, L=3
I=0121 C=010d P=00c7, D=70, L=20
----
I=012a C=0125 P=0063, D=194, L=5
----
I=0149 C=0146 P=009a, D=172, L=3
I=0152 C=013e P=00c7, D=119, L=20
I=0153 C=013e P=010d, D=49, L=21
----
I=0157 C=0154 P=0123, D=49, L=3
----
I=0175 C=0172 P=009a, D=216, L=3
I=017e C=0167 P=002c, D=315, L=23
I=017f C=016a P=013e, D=44, L=21
----
I=0183 C=0180 P=00dd, D=163, L=3
----
I=019e C=019b P=009a, D=257, L=3
I=01a7 C=018b P=0162, D=41, L=28
----
I=01ac C=01a9 P=0154, D=85, L=3
----
I=01b6 C=01b3 P=00b1, D=258, L=3
I=01bb C=01b8 P=0129, D=143, L=3
----
I=01ca C=01c7 P=009a, D=301, L=3
I=01d3 C=01bd P=0168, D=85, L=22
I=01d4 C=01bd P=0191, D=44, L=23
----
I=01d8 C=01d5 P=01a9, D=44, L=3
----
I=01e8 C=01e5 P=00a7, D=318, L=3
----
I=0206 C=0203 P=009a, D=361, L=3
I=020f C=01fb P=01bf, D=60, L=20
----
I=0214 C=0211 P=0180, D=145, L=3
I=0216 C=0212 P=0124, D=238, L=4
----
I=0227 C=0224 P=0129, D=251, L=3
I=0236 C=0233 P=009a, D=409, L=3
I=023f C=0221 P=01b5, D=108, L=30
----
I=0245 C=0242 P=00b0, D=402, L=3
----
I=0250 C=024d P=00a6, D=423, L=3
I=0251 C=024e P=0068, D=486, L=3
I=0258 C=0255 P=0129, D=300, L=3
I=0267 C=0264 P=009a, D=458, L=3
I=0270 C=0252 P=01b5, D=157, L=30
I=0271 C=0250 P=021f, D=49, L=33
----
I=0276 C=0273 P=0155, D=286, L=3
----
I=027d C=0278 P=0247, D=49, L=5
----
I=0288 C=0285 P=00f9, D=396, L=3
----
I=028c C=0289 P=0218, D=113, L=3
----
I=0291 C=028d P=021c, D=113, L=4
----
I=029e C=029b P=009a, D=513, L=3
I=02a7 C=0293 P=01fb, D=152, L=20
I=02a8 C=0293 P=025c, D=55, L=21
----
I=02ac C=02a9 P=01d5, D=212, L=3
I=02af C=02ab P=00a2, D=521, L=4
----
I=02b9 C=02b4 P=0247, D=109, L=5
I=02c4 C=02c1 P=00f9, D=456, L=3
I=02c8 C=02c5 P=0218, D=173, L=3
I=02cd C=02b4 P=0278, D=60, L=25
----
I=02dc C=02d9 P=009a, D=575, L=3
I=02e5 C=02cf P=013c, D=403, L=22
I=02e6 C=02d1 P=0293, D=62, L=21
----
I=02ea C=02e7 P=02a9, D=62, L=3
----
I=02f0 C=02ed P=0126, D=455, L=3
I=02f1 C=02ee P=0249, D=165, L=3
I=02fc C=02f9 P=00f9, D=512, L=3
I=0300 C=02fd P=0218, D=229, L=3
I=0305 C=02ee P=02b6, D=56, L=23
----
I=0312 C=030f P=009a, D=629, L=3
I=031b C=0306 P=02d0, D=54, L=21
----
I=0320 C=031d P=0082, D=667, L=3
----
I=0327 C=0323 P=01dd, D=326, L=4
I=032b C=0326 P=0247, D=223, L=5
I=0336 C=0333 P=00f9, D=570, L=3
I=033a C=0337 P=0218, D=287, L=3
I=033f C=0326 P=02b4, D=114, L=25
I=0340 C=0326 P=0278, D=174, L=26
----
I=034c C=0349 P=009a, D=687, L=3
I=0355 C=0341 P=02d1, D=112, L=20
I=0356 C=0341 P=0307, D=58, L=21
----
I=035a C=0357 P=0241, D=278, L=3
I=035b C=0358 P=02aa, D=174, L=3
----
I=036e C=036b P=009a, D=721, L=3
I=0377 C=0363 P=02d1, D=146, L=20
I=0378 C=0363 P=0341, D=34, L=21
----
I=037d C=037a P=0083, D=759, L=3
----
I=038b C=0388 P=018e, D=506, L=3
----
I=0394 C=0391 P=02e9, D=168, L=3
----
I=03ba C=03b7 P=009a, D=797, L=3
I=03c3 C=03ae P=0292, D=284, L=21
I=03c4 C=03af P=0363, D=76, L=21
----
I=03c8 C=03c4 P=0081, D=835, L=4
----
I=03cf C=03cc P=0084, D=840, L=3
----
I=03d6 C=03d3 P=0190, D=579, L=3
I=03d7 C=03d4 P=0333, D=161, L=3
----
I=03dc C=03d9 P=0059, D=896, L=3
I=03de C=03db P=0326, D=181, L=3
----
Processing 77.371 seconds.
Source 998 bytes.
Result 0 bytes.
Ratio 100.00%

bruce_dev /> 

Don’t let the execution times scare you. Remember we are just bread-boarding this in Java and this is running on the JNIOR after all.

So at those points where we would generate compressed output when there is only one match our task is obvious. But what about when there is more than one?

Look closely these matches overlap! That means that if we had acted on the first one we may have missed the opportunity to employ one that follows which sometimes would be a serious improvement. This is the benefit of performing the lazy matches.

There are some cases where there are several usable sequences for a block. We need logic now to select one or more of those sequences so as to end up with the absolute minimum length of generated compressed data. It is not just taking the one longest as there could be two or more that do not overlap but that would result in the optimum outcome. This requires a little examination…

For example in this case:

I=009b C=0098 P=0093, D=5, L=3
I=009d C=009a P=0074, D=38, L=3
----

If we replace 3 bytes at position 0x98 with [D=5, L=3] that will conflict with data for 0x9a and that second match would be unusable. But in this case since both matches are L=3 we really have no choice but to use only one. Here we need to select the one with the shorter distance (D=5) as that would benefit likely the most from the Huffman coding yet to come.

The following case is a little more interesting:

I=00d2 C=00cf P=009a, D=53, L=3
I=00dc C=00c7 P=006c, D=91, L=21
----

Here we see that we can replace 21 bytes at 0xc7 with [D=91, L=21] and since this sequence completely contains the other it isn’t needed at all. In this case going with the longest match happens to provide the best compression. But that cannot always be the rule. Here we need to be careful that our algorithm just doesn’t blindly go for the first replacement.

A little further into the file we have this case:

I=0149 C=0146 P=009a, D=172, L=3
I=0152 C=013e P=00c7, D=119, L=20
I=0153 C=013e P=010d, D=49, L=21
----

Here the longest is again the most beneficial as it completely overlaps the other two.

How about this one?

I=0175 C=0172 P=009a, D=216, L=3
I=017e C=0167 P=002c, D=315, L=23
I=017f C=016a P=013e, D=44, L=21
----

The 2nd and 3rd matches both eliminate the usefulness of the 1st. The 2nd replaces 23 bytes from addresses 0x167 thru 0x17d inclusive. The 3rd replaces 21 bytes from 0x16a thru 0x17e inclusive. There is not a complete overlap but since we still have to choose one over the other the longer one is the most beneficial.

You can see how we might benefit from some careful implementation here. We do have some flexibility to partially implement one or both of the sequences.

Look at this case that occurs further into the file:

I=0250 C=024d P=00a6, D=423, L=3
I=0251 C=024e P=0068, D=486, L=3
I=0258 C=0255 P=0129, D=300, L=3
I=0267 C=0264 P=009a, D=458, L=3
I=0270 C=0252 P=01b5, D=157, L=30
I=0271 C=0250 P=021f, D=49, L=33
----

We should elect to use the first to replace bytes from 0x24d thru 0x24f inclusive with [D=423, L=3]. We then should use the 6th to replace bytes from 0x250 thru 0x270 inclusive with [D=49, L=33]. Thus we replace a total of 36 bytes with two references.

Okay, time to create some logic.

To help visualize I added a little plotting. I found this case as an example of partially applying a match.

I=0327 C=0323 P=01dd, D=326, L=4
I=032b C=0326 P=0247, D=223, L=5
I=0336 C=0333 P=00f9, D=570, L=3
I=033a C=0337 P=0218, D=287, L=3
I=033f C=0326 P=02b4, D=114, L=25
I=0340 C=0326 P=0278, D=174, L=26
0323 - 0340
|--|                          
   |---|                      
                |-|           
                    |-|       
   |-----------------------|  
   |------------------------| 

We can see from this that we can replace the entire range from 0x323 to 0x340 using two sequences.

One option is to truncate the 1st using only 3 of its 4 bytes along with the 6th sequence in its entirety. Remember that the minimum sequence is 3 bytes so we can do this. The replacement being [D=326, L=3][D=174, L=26].

The other option, which we would have to use if the 1st were only 3 bytes, is to use the 1st match and then skip the first byte of the 6th. The replacement then being [D=326, L=4][D=173, L=25].

Is one preferential over the other? I am not sure. This might come down to how the logic is implemented. This is fun though.

Here’s the jniorboot.log run with the sequences plotted if you are interested.

CODE: SELECT ALL

bruce_dev /> cat jniorboot.log -h 
00000000  30 31 2f 30 34 2f 31 38  20 30 38 3a 31 39 3a 31  01/04/18 .08:19:1
00000010  38 2e 31 31 31 2c 20 2a  2a 20 4f 53 20 43 52 43  8.111,.* *.OS.CRC
00000020  20 64 65 74 61 69 6c 20  75 70 64 61 74 65 64 0d  .detail. updated.
00000030  0a 30 31 2f 30 34 2f 31  38 20 30 38 3a 31 39 3a  .01/04/1 8.08:19:
00000040  31 38 2e 31 35 38 2c 20  2d 2d 20 4d 6f 64 65 6c  18.158,. --.Model
00000050  20 34 31 30 20 76 31 2e  36 2e 33 20 2d 20 4a 41  .410.v1. 6.3.-.JA
00000060  4e 4f 53 20 53 65 72 69  65 73 20 34 0d 0a 30 31  NOS.Seri es.4..01
00000070  2f 30 34 2f 31 38 20 30  38 3a 31 39 3a 31 38 2e  /04/18.0 8:19:18.
00000080  31 37 38 2c 20 43 6f 70  79 72 69 67 68 74 20 28  178,.Cop yright.(
00000090  63 29 20 32 30 31 32 2d  32 30 31 38 20 49 4e 54  c).2012- 2018.INT
000000A0  45 47 20 50 72 6f 63 65  73 73 20 47 72 6f 75 70  EG.Proce ss.Group
000000B0  2c 20 49 6e 63 2e 2c 20  47 69 62 73 6f 6e 69 61  ,.Inc.,. Gibsonia
000000C0  20 50 41 20 55 53 41 0d  0a 30 31 2f 30 34 2f 31  .PA.USA. .01/04/1
000000D0  38 20 30 38 3a 31 39 3a  31 38 2e 31 39 37 2c 20  8.08:19: 18.197,.
000000E0  4a 41 4e 4f 53 20 77 72  69 74 74 65 6e 20 61 6e  JANOS.wr itten.an
000000F0  64 20 64 65 76 65 6c 6f  70 65 64 20 62 79 20 42  d.develo ped.by.B
00000100  72 75 63 65 20 43 6c 6f  75 74 69 65 72 0d 0a 30  ruce.Clo utier..0
00000110  31 2f 30 34 2f 31 38 20  30 38 3a 31 39 3a 31 38  1/04/18. 08:19:18
00000120  2e 32 31 36 2c 20 53 65  72 69 61 6c 20 4e 75 6d  .216,.Se rial.Num
00000130  62 65 72 3a 20 36 31 34  30 37 30 35 30 30 0d 0a  ber:.614 070500..
00000140  30 31 2f 30 34 2f 31 38  20 30 38 3a 31 39 3a 31  01/04/18 .08:19:1
00000150  38 2e 32 33 36 2c 20 46  69 6c 65 20 53 79 73 74  8.236,.F ile.Syst
00000160  65 6d 20 6d 6f 75 6e 74  65 64 0d 0a 30 31 2f 30  em.mount ed..01/0
00000170  34 2f 31 38 20 30 38 3a  31 39 3a 31 38 2e 32 35  4/18.08: 19:18.25
00000180  37 2c 20 52 65 67 69 73  74 72 79 20 6d 6f 75 6e  7,.Regis try.moun
00000190  74 65 64 0d 0a 30 31 2f  30 34 2f 31 38 20 30 38  ted..01/ 04/18.08
000001A0  3a 31 39 3a 31 38 2e 33  30 36 2c 20 4e 65 74 77  :19:18.3 06,.Netw
000001B0  6f 72 6b 20 49 6e 69 74  69 61 6c 69 7a 65 64 0d  ork.Init ialized.
000001C0  0a 30 31 2f 30 34 2f 31  38 20 30 38 3a 31 39 3a  .01/04/1 8.08:19:
000001D0  31 38 2e 33 32 36 2c 20  45 74 68 65 72 6e 65 74  18.326,. Ethernet
000001E0  20 41 64 64 72 65 73 73  3a 20 39 63 3a 38 64 3a  .Address :.9c:8d:
000001F0  31 61 3a 30 30 3a 30 37  3a 65 65 0d 0a 30 31 2f  1a:00:07 :ee..01/
00000200  30 34 2f 31 38 20 30 38  3a 31 39 3a 31 38 2e 34  04/18.08 :19:18.4
00000210  34 37 2c 20 53 65 6e 73  6f 72 20 50 6f 72 74 20  47,.Sens or.Port.
00000220  69 6e 69 74 69 61 6c 69  7a 65 64 0d 0a 30 31 2f  initiali zed..01/
00000230  30 34 2f 31 38 20 30 38  3a 31 39 3a 31 38 2e 35  04/18.08 :19:18.5
00000240  30 32 2c 20 49 2f 4f 20  73 65 72 76 69 63 65 73  02,.I/O. services
00000250  20 69 6e 69 74 69 61 6c  69 7a 65 64 0d 0a 30 31  .initial ized..01
00000260  2f 30 34 2f 31 38 20 30  38 3a 31 39 3a 31 38 2e  /04/18.0 8:19:18.
00000270  35 33 35 2c 20 46 54 50  20 73 65 72 76 65 72 20  535,.FTP .server.
00000280  65 6e 61 62 6c 65 64 20  66 6f 72 20 70 6f 72 74  enabled. for.port
00000290  20 32 31 0d 0a 30 31 2f  30 34 2f 31 38 20 30 38  .21..01/ 04/18.08
000002A0  3a 31 39 3a 31 38 2e 35  35 36 2c 20 50 72 6f 74  :19:18.5 56,.Prot
000002B0  6f 63 6f 6c 20 73 65 72  76 65 72 20 65 6e 61 62  ocol.ser ver.enab
000002C0  6c 65 64 20 66 6f 72 20  70 6f 72 74 20 39 32 30  led.for. port.920
000002D0  30 0d 0a 30 31 2f 30 34  2f 31 38 20 30 38 3a 31  0..01/04 /18.08:1
000002E0  39 3a 31 38 2e 35 38 36  2c 20 57 65 62 53 65 72  9:18.586 ,.WebSer
000002F0  76 65 72 20 65 6e 61 62  6c 65 64 20 66 6f 72 20  ver.enab led.for.
00000300  70 6f 72 74 20 38 30 0d  0a 30 31 2f 30 34 2f 31  port.80. .01/04/1
00000310  38 20 30 38 3a 31 39 3a  31 38 2e 36 30 38 2c 20  8.08:19: 18.608,.
00000320  54 65 6c 6e 65 74 20 73  65 72 76 65 72 20 65 6e  Telnet.s erver.en
00000330  61 62 6c 65 64 20 66 6f  72 20 70 6f 72 74 20 32  abled.fo r.port.2
00000340  33 0d 0a 30 31 2f 30 34  2f 31 38 20 30 38 3a 31  3..01/04 /18.08:1
00000350  39 3a 31 38 2e 36 33 32  2c 20 50 4f 52 3a 20 35  9:18.632 ,.POR:.5
00000360  39 32 36 0d 0a 30 31 2f  30 34 2f 31 38 20 30 38  926..01/ 04/18.08
00000370  3a 31 39 3a 31 38 2e 36  35 33 2c 20 43 75 6d 75  :19:18.6 53,.Cumu
00000380  6c 61 74 69 76 65 20 52  75 6e 74 69 6d 65 3a 20  lative.R untime:.
00000390  38 20 57 65 65 6b 73 20  35 20 44 61 79 73 20 31  8.Weeks. 5.Days.1
000003A0  20 48 6f 75 72 20 32 34  3a 33 32 2e 32 38 31 0d  .Hour.24 :32.281.
000003B0  0a 30 31 2f 30 34 2f 31  38 20 30 38 3a 31 39 3a  .01/04/1 8.08:19:
000003C0  31 38 2e 36 37 38 2c 20  42 6f 6f 74 20 43 6f 6d  18.678,. Boot.Com
000003D0  70 6c 65 74 65 64 20 5b  32 2e 33 20 73 65 63 6f  pleted.[ 2.3.seco
000003E0  6e 64 73 5d 0d 0a                                 nds]..

bruce_dev />

CODE: SELECT ALL

bruce_dev /> jtest jniorboot.log
I=0044 C=0031 P=0000, D=49, L=19
0031 - 0044
|-----------------| 

I=0064 C=0061 P=001a, D=71, L=3
0061 - 0064
|-| 

I=0081 C=006c P=002f, D=61, L=21
006c - 0081
|-------------------| 

I=0086 C=0082 P=0045, D=61, L=3
0082 - 0085
|-| 

I=009d C=0098 P=0093, D=5, L=3
I=009d C=009a P=0074, D=38, L=3
0098 - 009d
|-|   
  |-| 

I=00dd C=00cf P=009a, D=53, L=3
I=00dd C=00c7 P=006c, D=91, L=21
00c7 - 00dc
        |-|           
|-------------------| 

I=00e6 C=00df P=005d, D=130, L=7
00df - 00e6
|-----| 

I=00f4 C=00f1 P=0020, D=209, L=3
00f1 - 00f4
|-| 

I=0121 C=0115 P=009a, D=123, L=3
I=0121 C=010d P=00c7, D=70, L=20
010d - 0121
        |-|          
|------------------| 

I=012b C=0125 P=0063, D=194, L=5
0125 - 012a
|---| 

I=0153 C=0146 P=009a, D=172, L=3
I=0153 C=013e P=00c7, D=119, L=20
I=0153 C=013e P=010d, D=49, L=21
013e - 0153
        |-|           
|------------------|  
|-------------------| 

I=0157 C=0154 P=0123, D=49, L=3
0154 - 0157
|-| 

I=017f C=0172 P=009a, D=216, L=3
I=017f C=0167 P=002c, D=315, L=23
I=017f C=016a P=013e, D=44, L=21
0167 - 017f
           |-|           
|---------------------|  
   |-------------------| 

I=0183 C=0180 P=00dd, D=163, L=3
0180 - 0183
|-| 

I=01a8 C=019b P=009a, D=257, L=3
I=01a8 C=018b P=0162, D=41, L=28
018b - 01a7
                |-|          
|--------------------------| 

I=01ad C=01a9 P=0154, D=85, L=3
01a9 - 01ac
|-| 

I=01bb C=01b3 P=00b1, D=258, L=3
I=01bb C=01b8 P=0129, D=143, L=3
01b3 - 01bb
|-|      
     |-| 

I=01d4 C=01c7 P=009a, D=301, L=3
I=01d4 C=01bd P=0168, D=85, L=22
I=01d4 C=01bd P=0191, D=44, L=23
01bd - 01d4
          |-|           
|--------------------|  
|---------------------| 

I=01d8 C=01d5 P=01a9, D=44, L=3
01d5 - 01d8
|-| 

I=01e8 C=01e5 P=00a7, D=318, L=3
01e5 - 01e8
|-| 

I=020f C=0203 P=009a, D=361, L=3
I=020f C=01fb P=01bf, D=60, L=20
01fb - 020f
        |-|          
|------------------| 

I=0217 C=0211 P=0180, D=145, L=3
I=0217 C=0212 P=0124, D=238, L=4
0211 - 0216
|-|   
 |--| 

I=023f C=0224 P=0129, D=251, L=3
I=023f C=0233 P=009a, D=409, L=3
I=023f C=0221 P=01b5, D=108, L=30
0221 - 023f
   |-|                         
                  |-|          
|----------------------------| 

I=0245 C=0242 P=00b0, D=402, L=3
0242 - 0245
|-| 

I=0271 C=024d P=00a6, D=423, L=3
I=0271 C=024e P=0068, D=486, L=3
I=0271 C=0255 P=0129, D=300, L=3
I=0271 C=0264 P=009a, D=458, L=3
I=0271 C=0252 P=01b5, D=157, L=30
I=0271 C=0250 P=021f, D=49, L=33
024d - 0271
|-|                                  
 |-|                                 
        |-|                          
                       |-|           
     |----------------------------|  
   |-------------------------------| 

I=0276 C=0273 P=0155, D=286, L=3
0273 - 0276
|-| 

I=0280 C=0278 P=0247, D=49, L=5
0278 - 027d
|---| 

I=0288 C=0285 P=00f9, D=396, L=3
0285 - 0288
|-| 

I=028c C=0289 P=0218, D=113, L=3
0289 - 028c
|-| 

I=0293 C=028d P=021c, D=113, L=4
028d - 0291
|--| 

I=02a8 C=029b P=009a, D=513, L=3
I=02a8 C=0293 P=01fb, D=152, L=20
I=02a8 C=0293 P=025c, D=55, L=21
0293 - 02a8
        |-|           
|------------------|  
|-------------------| 

I=02af C=02a9 P=01d5, D=212, L=3
I=02af C=02ab P=00a2, D=521, L=4
02a9 - 02af
|-|    
  |--| 

I=02ce C=02b4 P=0247, D=109, L=5
I=02ce C=02c1 P=00f9, D=456, L=3
I=02ce C=02c5 P=0218, D=173, L=3
I=02ce C=02b4 P=0278, D=60, L=25
02b4 - 02cd
|---|                     
             |-|          
                 |-|      
|-----------------------| 

I=02e7 C=02d9 P=009a, D=575, L=3
I=02e7 C=02cf P=013c, D=403, L=22
I=02e7 C=02d1 P=0293, D=62, L=21
02cf - 02e6
          |-|           
|--------------------|  
  |-------------------| 

I=02ea C=02e7 P=02a9, D=62, L=3
02e7 - 02ea
|-| 

I=0305 C=02ed P=0126, D=455, L=3
I=0305 C=02ee P=0249, D=165, L=3
I=0305 C=02f9 P=00f9, D=512, L=3
I=0305 C=02fd P=0218, D=229, L=3
I=0305 C=02ee P=02b6, D=56, L=23
02ed - 0305
|-|                      
 |-|                     
            |-|          
                |-|      
 |---------------------| 

I=031c C=030f P=009a, D=629, L=3
I=031c C=0306 P=02d0, D=54, L=21
0306 - 031b
         |-|          
|-------------------| 

I=0320 C=031d P=0082, D=667, L=3
031d - 0320
|-| 

I=0341 C=0323 P=01dd, D=326, L=4
I=0341 C=0326 P=0247, D=223, L=5
I=0341 C=0333 P=00f9, D=570, L=3
I=0341 C=0337 P=0218, D=287, L=3
I=0341 C=0326 P=02b4, D=114, L=25
I=0341 C=0326 P=0278, D=174, L=26
0323 - 0340
|--|                          
   |---|                      
                |-|           
                    |-|       
   |-----------------------|  
   |------------------------| 

I=0356 C=0349 P=009a, D=687, L=3
I=0356 C=0341 P=02d1, D=112, L=20
I=0356 C=0341 P=0307, D=58, L=21
0341 - 0356
        |-|           
|------------------|  
|-------------------| 

I=035b C=0357 P=0241, D=278, L=3
I=035b C=0358 P=02aa, D=174, L=3
0357 - 035b
|-|  
 |-| 

I=0378 C=036b P=009a, D=721, L=3
I=0378 C=0363 P=02d1, D=146, L=20
I=0378 C=0363 P=0341, D=34, L=21
0363 - 0378
        |-|           
|------------------|  
|-------------------| 

I=037d C=037a P=0083, D=759, L=3
037a - 037d
|-| 

I=038c C=0388 P=018e, D=506, L=3
0388 - 038b
|-| 

I=0395 C=0391 P=02e9, D=168, L=3
0391 - 0394
|-| 

I=03c4 C=03b7 P=009a, D=797, L=3
I=03c4 C=03ae P=0292, D=284, L=21
I=03c4 C=03af P=0363, D=76, L=21
03ae - 03c4
         |-|           
|-------------------|  
 |-------------------| 

I=03c9 C=03c4 P=0081, D=835, L=4
03c4 - 03c8
|--| 

I=03cf C=03cc P=0084, D=840, L=3
03cc - 03cf
|-| 

I=03d7 C=03d3 P=0190, D=579, L=3
I=03d7 C=03d4 P=0333, D=161, L=3
03d3 - 03d7
|-|  
 |-| 

I=03de C=03d9 P=0059, D=896, L=3
I=03de C=03db P=0326, D=181, L=3
03d9 - 03de
|-|   
  |-| 

Processing 80.766 seconds.
Source 998 bytes.
Result 0 bytes.
Ratio 100.00%

bruce_dev />

I think the first step is to filter out sequences that are completely covered by another. That seems to happen a lot.

I have filtered matched sequences that are eclipsed by another in the block. Since we know what to do when only one sequence remains (just use it) I filtered those from the output. So we are left with overlapping situations that remain to be studied.

CODE: SELECT ALL

bruce_dev /> jtest jniorboot.log
I=009d C=0098 P=0093, D=5, L=3
I=009d C=009a P=0074, D=38, L=3
0098 - 009d
|-|   
  |-| 

I=017f C=0167 P=002c, D=315, L=23
I=017f C=016a P=013e, D=44, L=21
0167 - 017f
|---------------------|  
   |-------------------| 

I=01bb C=01b3 P=00b1, D=258, L=3
I=01bb C=01b8 P=0129, D=143, L=3
01b3 - 01bb
|-|      
     |-| 

I=0217 C=0211 P=0180, D=145, L=3
I=0217 C=0212 P=0124, D=238, L=4
0211 - 0216
|-|   
 |--| 

I=0271 C=024d P=00a6, D=423, L=3
I=0271 C=024e P=0068, D=486, L=3
I=0271 C=0250 P=021f, D=49, L=33
024d - 0271
|-|                                  
 |-|                                 
   |-------------------------------| 

I=02af C=02a9 P=01d5, D=212, L=3
I=02af C=02ab P=00a2, D=521, L=4
02a9 - 02af
|-|    
  |--| 

I=02e7 C=02cf P=013c, D=403, L=22
I=02e7 C=02d1 P=0293, D=62, L=21
02cf - 02e6
|--------------------|  
  |-------------------| 

I=0305 C=02ed P=0126, D=455, L=3
I=0305 C=02ee P=02b6, D=56, L=23
02ed - 0305
|-|                      
 |---------------------| 

I=0341 C=0323 P=01dd, D=326, L=4
I=0341 C=0326 P=0278, D=174, L=26
0323 - 0340
|--|                          
   |------------------------| 

I=035b C=0357 P=0241, D=278, L=3
I=035b C=0358 P=02aa, D=174, L=3
0357 - 035b
|-|  
 |-| 

I=03c4 C=03ae P=0292, D=284, L=21
I=03c4 C=03af P=0363, D=76, L=21
03ae - 03c4
|-------------------|  
 |-------------------| 

I=03d7 C=03d3 P=0190, D=579, L=3
I=03d7 C=03d4 P=0333, D=161, L=3
03d3 - 03d7
|-|  
 |-| 

I=03de C=03d9 P=0059, D=896, L=3
I=03de C=03db P=0326, D=181, L=3
03d9 - 03de
|-|   
  |-| 

Processing 78.368 seconds.
Source 998 bytes.
Result 0 bytes.
Ratio 100.00%

bruce_dev />

There is one block where the remaining matches are mutually exclusive. It is also obvious what to do for that but I would still need to identify it. Maybe the goal is to reduce the sequence set to one but if you cannot then to get to a set of mutually exclusive matches.

Added a step to force mutual exclusivity. The logic now appears as follows:

    // Our LZ77 compression engine
    static void do_compress(BufferedWriter outfile, BufferedReader infile) throws Throwable {
        
        boolean bFound = false;
 
        // process uncompressed stream
        while (infile.ready()) {
            
            // obtain next byte
            int ch = infile.read();
            //System.out.print((char)ch);
            
            // process active Match objects
            Match best = null;
            for (int n = SEQ.size() - 1; 0 <= n; n--) { Match m = SEQ.get(n); if (!m.check(ch)) { if (m.len >= 3) {
                        if (best == null)
                            best = m;
                        else if (m.len > best.len)
                            best = m;
                        else if (m.len == best.len && m.distance < best.distance)
                            best = m;
                    }
                            
                    SEQ.remove(n);
                }
            }
            if (best != null) {
                REPL.add(best);
                bFound = true;
            }
            
            if (bFound && SEQ.size() == 0) {
                
                // filter out sequences eclipsed by another
                for (int n = REPL.size() - 1; 0 <= n; n--) {
                    Match mn = REPL.get(n);
                    
                    int k;
                    for (k = REPL.size() - 1; 0 <= k; k--) { if (k == n) continue; Match mk = REPL.get(k); if (mn.curptr >= mk.curptr && mn.curptr + mn.len <= mk.curptr + mk.len)
                            break;
                    }
                    if (0 <= k)
                        REPL.remove(n);
                }
                
                // Force mutual exclusivity. Note that REPL at this point has matchin SEQ with
                //  increasing CURPTR.
                for (int n = 0; n < REPL.size() - 1; n++) {
                    Match n1 = REPL.get(n);
                    Match n2 = REPL.get(n + 1);
                    if (n2.curptr < n1.curptr + n1.len) {
                        int adj = n1.curptr + n1.len - n2.curptr;
                        if (n2.len - adj < 3) REPL.remove(n2); else { n2.curptr += adj; if (n2.curptr >= WINDOW)
                                n2.curptr -= WINDOW;
                            n2.ptr += adj;
                            if (n2.ptr >= WINDOW)
                                n2.ptr -= WINDOW;
                            n2.distance -= adj;
                            n2.len -= adj;
                        }
                    }
                }
                
                // $$$$$$$$$$ - temporary only display when there are still choices
                if (REPL.size() > 1) {
 
                    // determine the overall affected range
                    int start = 0;
                    int end = 0;
                    for (int n = 0; n < REPL.size(); n++) {
                        Match m = REPL.get(n);
                        System.out.printf("I=%04x C=%04x P=%04x, D=%d, L=%d\n", 
                                INPTR, m.curptr, m.start, m.distance, m.len);
 
                        if (n == 0 || m.curptr < start) start = m.curptr; if (n == 0 || m.curptr + m.len > end)
                            end = m.curptr + m.len;
                    }
 
                    // plot
                    System.out.printf("%04x - %04x\n", start, end);
 
                    for (int n = 0; n < REPL.size(); n++) {
                        Match m = REPL.get(n);
                        for (int i = start; i <= end; i++) {
                            if (i < m.curptr || i >= m.curptr + m.len)
                                System.out.print(" ");
                            else if (i == m.curptr || i == m.curptr + m.len - 1)
                                System.out.print("|");
                            else
                                System.out.print("-");
                        }
                        System.out.println("");
                    }
                    System.out.println("");
                }
 
                REPL.clear();
                bFound = false;
            }
            
            // queue uncompressed DATA

As matches are located we select the best (lines 14 thru 29) and add them to a list (lines 30 thru 33). Then later when we reach that point where there are no more active matches and we can generate compressed output (line 35) we filter those eclipsed sequences (lines 38 thru 52). Next we force the remaining sequences to be mutually exclusive (lines 56 thru 74).

Since these sequences appear in the REPL list in increasing CURPTR order (at least they appear to be) we take pairs of sequences and shift the starting point of the next one so it no longer overlaps. If this shrinks the match to less than 3 bytes it is removed.

After that there is code to display and plot the remaining sequences if there are more than one. Here we see that in every case we have created a mutually exclusive set.

CODE: SELECT ALL

bruce_dev /> jtest jniorboot.log
I=01bb C=01b3 P=00b1, D=258, L=3
I=01bb C=01b8 P=0129, D=143, L=3
01b3 - 01bb
|-|      
     |-| 

I=0271 C=024d P=00a6, D=423, L=3
I=0271 C=0250 P=021f, D=49, L=33
024d - 0271
|-|                                  
   |-------------------------------| 

I=02af C=02a9 P=01d5, D=212, L=3
I=02af C=02ac P=00a2, D=520, L=3
02a9 - 02af
|-|    
   |-| 

I=0305 C=02ed P=0126, D=455, L=3
I=0305 C=02f0 P=02b6, D=54, L=21
02ed - 0305
|-|                      
   |-------------------| 

I=0341 C=0323 P=01dd, D=326, L=4
I=0341 C=0327 P=0278, D=173, L=25
0323 - 0340
|--|                          
    |-----------------------| 

Processing 77.415 seconds.
Source 998 bytes.
Result 0 bytes.
Ratio 100.00%

bruce_dev /> 

So it would appear now that I have what I need to generate the compressed output stream.

Generating the compressed output using the [D=,L=] format just to make things visible actually enlarges the file of course. But here it is (some line breaks at the right margin manually inserted).

bruce_dev /> cat outfile.dat
01/04/18 08:19:18.111, ** OS CRC detail updated
[D=49,L=19]58, -- Model 410 v1.6.3 - JAN[D=71,L=3]Series 4[D=61,L=21]7[D=61,L=3]Copyright (c) 2012-[D=5,L=3]8
 INTEG Process Group, Inc., Gibsonia PA USA[D=91,L=21]97,[D=130,L=7]written and[D=209,L=3]veloped by Bruce Cl
outier[D=70,L=20]216,[D=194,L=5]al Number: 614070500[D=49,L=21]3[D=49,L=3]File System moun[D=315,L=23]25[D=16
3,L=3]Registry[D=41,L=28]30[D=85,L=3]Network[D=258,L=3]it[D=143,L=3]iz[D=44,L=23]2[D=44,L=3]Ethernet Addr[D=3
18,L=3]: 9c:8d:1a:00:07:ee[D=60,L=20]44[D=145,L=3]Sensor Port i[D=108,L=30]502[D=402,L=3]/O servi[D=423,L=3][
D=49,L=33]35[D=286,L=3]TP[D=49,L=5]er enabl[D=396,L=3]f[D=113,L=3]p[D=113,L=4]21[D=55,L=21]5[D=212,L=3][D=520
,L=3]tocol[D=60,L=25]92[D=403,L=22]58[D=62,L=3]Web[D=455,L=3][D=54,L=21]8[D=54,L=21]60[D=667,L=3]Tel[D=326,L=
4][D=173,L=25]3[D=58,L=21]3[D=278,L=3]POR: 5926[D=34,L=21]53[D=759,L=3]umulative R[D=506,L=3]ime: 8[D=168,L=3
]eks 5 Days 1 Hour 24:32.28[D=284,L=21]6[D=835,L=4]Boot[D=840,L=3]mple[D=579,L=3] [2[D=896,L=3]seconds]
bruce_dev />

So it appears to be time to convert this into a bit stream with the proper length and distance codes in preparation for Huffman coding.

Sticking in a couple of placeholder bytes for the length and distance codes this is representative of the pre-Huffman coding compression ratio of this file.

bruce_dev /> jtest jniorboot.log
Processing 79.124 seconds.
Source 998 bytes.
Result 533 bytes.
Ratio 46.59%

bruce_dev />

Oh it’ll be fast in C and in the JANOS kernel. Okay… Huffman.

So before getting too deep into generating the Huffman coding with dynamic tables I figured that it would make sense to write a quick decompressor for my interim LZ77 compression as a check. I modified the compressor to output length and distance codes as shorts using a 0xff prefix which I then escaped. This stream I will later be able to digest in performing the Huffman coding. The decompressor would take outfile.dat and generate the decompressed newfile.dat.

Well after compressing and then decompressing the content of newfile.dat resembled jniorboot.log very closely but there were a few variances. First, I found the glitch in the step that eliminates any overlap in matched sequences (shouldn’t have modified distance when also modifying the CURPTR). Then I had to address the boundary conditions at the end of the file in order to properly process the entire file (I ended up a couple of bytes short initially). With that we achieved success.

You can see here how we can use MANIFEST to verify file size and content. Note that the MD5 are identical.

bruce_dev /> jtest jniorboot.log
Processing 69.323 seconds.
Source 950 bytes.
Result 671 bytes.
Ratio 1.42:1

bruce_dev /> jtest2

bruce_dev /> manifest jniorboot.log
JNIOR Manifest      Fri Jan 05 11:02:57 EST 2018
  Size                  MD5                  File Specification
 950      dc425a0283e22944b463eeab9e625adb  [Modified] /jniorboot.log
End of Manifest (1 files listed)

bruce_dev /> manifest newfile.dat  
JNIOR Manifest      Fri Jan 05 11:02:59 EST 2018
  Size                  MD5                  File Specification
 950      dc425a0283e22944b463eeab9e625adb  [New] /newfile.dat
End of Manifest (1 files listed)

bruce_dev />

Here is the original content and the resulting compressed format that I have used in bread-boarding this.

CODE: SELECT ALL

bruce_dev /> cat jniorboot.log
01/05/18 07:39:52.913, -- Model 410 v1.6.3 - JANOS Series 4
01/05/18 07:39:52.960, Copyright (c) 2012-2018 INTEG Process Group, Inc., Gibsonia PA USA
01/05/18 07:39:52.980, JANOS written and developed by Bruce Cloutier
01/05/18 07:39:52.999, Serial Number: 614070500
01/05/18 07:39:53.018, File System mounted
01/05/18 07:39:53.039, Registry mounted
01/05/18 07:39:53.089, Network Initialized
01/05/18 07:39:53.109, Ethernet Address: 9c:8d:1a:00:07:ee
01/05/18 07:39:53.229, Sensor Port initialized
01/05/18 07:39:53.284, I/O services initialized
01/05/18 07:39:53.327, FTP server enabled for port 21
01/05/18 07:39:53.347, Protocol server enabled for port 9200
01/05/18 07:39:53.368, WebServer enabled for port 80
01/05/18 07:39:53.390, Telnet server enabled for port 23
01/05/18 07:39:53.414, POR: 5927
01/05/18 07:39:53.439, Cumulative Runtime: 8 Weeks 5 Days 9 Hours 32:22.102
01/05/18 07:39:53.460, Boot Completed [2.3 seconds]

bruce_dev />

CODE: SELECT ALL

bruce_dev /> cat outfile.dat -h
00000000  30 31 2f 30 35 2f 31 38  20 30 37 3a 33 39 3a 35  01/05/18 .07:39:5
00000010  32 2e 39 31 33 2c 20 2d  2d 20 4d 6f 64 65 6c 20  2.913,.- -.Model.
00000020  34 31 30 20 76 31 2e 36  2e 33 20 2d 20 4a 41 4e  410.v1.6 .3.-.JAN
00000030  4f 53 20 53 65 72 69 65  73 20 34 0d 0a ff 00 13  OS.Serie s.4.....
00000040  00 3d 36 30 2c 20 43 6f  70 79 72 69 67 68 74 20  .=60,.Co pyright.
00000050  28 63 29 20 32 30 31 32  2d ff 00 03 00 05 38 20  (c).2012 -.....8.
00000060  49 4e 54 45 47 20 50 72  6f 63 65 73 73 20 47 72  INTEG.Pr ocess.Gr
00000070  6f 75 70 2c 20 49 6e 63  2e 2c 20 47 69 62 73 6f  oup,.Inc .,.Gibso
00000080  6e 69 61 20 50 41 20 55  53 41 ff 00 15 00 5b 38  nia.PA.U SA....[8
00000090  ff 00 03 00 5b ff 00 06  00 82 77 72 69 74 74 65  ....[... ..writte
000000A0  6e 20 61 6e 64 20 64 65  76 65 6c 6f 70 65 64 20  n.and.de veloped.
000000B0  62 79 20 42 72 75 63 65  20 43 6c 6f 75 74 69 65  by.Bruce .Cloutie
000000C0  72 ff 00 15 00 46 39 39  2c ff 00 05 00 c2 61 6c  r....F99 ,.....al
000000D0  20 4e 75 6d 62 65 72 3a  20 36 31 34 30 37 30 35  .Number: .6140705
000000E0  30 30 ff 00 12 00 31 33  2e ff 00 03 00 b9 2c 20  00....13 ......,.
000000F0  46 69 6c 65 20 53 79 73  74 65 6d 20 6d 6f 75 6e  File.Sys tem.moun
00000100  74 65 64 ff 00 15 00 2c  33 ff 00 03 00 5d 52 65  ted...., 3....]Re
00000110  67 69 73 74 72 79 ff 00  1d 00 29 38 ff 00 03 00  gistry.. ..)8....
00000120  29 4e 65 74 77 6f 72 6b  ff 00 03 01 02 69 74 ff  )Network .....it.
00000130  00 03 00 8f 69 7a ff 00  16 00 2c 31 30 ff 00 03  ....iz.. ..,10...
00000140  00 2c 45 74 68 65 72 6e  65 74 20 41 64 64 72 ff  .,Ethern et.Addr.
00000150  00 03 01 3e 3a 20 39 63  3a 38 64 3a 31 61 3a 30  ...>:.9c :8d:1a:0
00000160  30 3a ff 00 03 00 2c 65  65 ff 00 14 00 3c 32 32  0:....,e e....<22
00000170  ff 00 05 00 ee 6e 73 6f  72 20 50 6f 72 74 20 69  .....nso r.Port.i
00000180  ff 00 1e 00 6c 32 38 34  ff 00 03 01 92 2f 4f 20  ....l284 ...../O.
00000190  73 65 72 76 69 ff 00 03  01 a7 ff 00 20 00 31 33  servi... ......13
000001A0  32 37 ff 00 03 01 1e 54  50 ff 00 05 00 31 65 72  27.....T P....1er
000001B0  20 65 6e 61 62 6c ff 00  03 01 8c 66 ff 00 03 00  .enabl.. ...f....
000001C0  71 70 ff 00 04 00 71 32  31 ff 00 15 00 37 34 ff  qp....q2 1....74.
000001D0  00 03 00 37 ff 00 03 02  09 74 6f 63 6f 6c ff 00  ...7.... .tocol..
000001E0  19 00 3c 39 32 ff 00 16  01 93 33 36 ff 00 03 01  ..<92... ..36....
000001F0  93 57 65 62 ff 00 03 01  c7 ff 00 15 00 38 38 ff  .Web.... .....88.
00000200  00 16 00 36 39 ff 00 03  02 40 54 65 6c ff 00 04  ...69... .@Tel...
00000210  01 46 ff 00 19 00 ae 33  ff 00 14 00 3a 34 31 ff  .F.....3 ....:41.
00000220  00 03 01 16 50 4f 52 3a  20 35 39 32 37 ff 00 15  ....POR: .5927...
00000230  00 22 ff 00 04 01 f9 43  75 6d 75 6c 61 74 69 76  .".....C umulativ
00000240  65 20 52 ff 00 03 01 fa  69 6d 65 3a 20 38 ff 00  e.R..... ime:.8..
00000250  03 00 a8 65 6b 73 20 35  20 44 61 79 73 20 39 20  ...eks.5 .Days.9.
00000260  48 6f 75 72 73 20 33 32  3a 32 32 ff 00 03 01 da  Hours.32 :22.....
00000270  32 ff 00 15 00 4d ff 00  04 03 44 42 6f 6f 74 ff  2....M.. ..DBoot.
00000280  00 03 03 49 6d 70 6c 65  ff 00 03 02 44 20 5b 32  ...Imple ....D.[2
00000290  ff 00 03 03 81 73 65 63  6f 6e 64 73 5d 0d 0a     .....sec onds]..

bruce_dev />
ATTACHMENTS
JTest2.java
(3.04 KiB) Downloaded 25 times
JTest.java
(9.42 KiB) Downloaded 24 times

So at this point I feel like this algorithm generates the optimum LZ77 compression for the data. This should even take into account the lazy matches however that is perceived by the industry. When I cast it into C I will work on optimizing the execution.

The only question might be in optimizing distance codes to minimize extra bits. I didn’t consider that in pruning the matched sequence list for a block. When those situations occur there might be a bit or two to save if I were to retain the closer match. I am not going to worry about that. My feeling is that we son’t save anything noticeable if anything at all.

Now to handle the Huffman coding.

Well there are a couple of bugs in my coding which were discovered while testing the approach on much larger files. With those issues fixed I see that I need to focus on optimizing because this all-encompassing matching is much too slow (even when considering the Java breadboard).

The approach would find all of the sequence matches to data in the previous 32KB of the stream (sliding window) for a section of the input stream bound by non-matching data. Once collected I would then end up trashing the vast majority of those. That is wasteful of processing time. It was a logical approach if without thinking you weren’t sure if a better compression ratio couldn’t be obtained through careful selection of sequences. There is this suggestion that better compression is possible if lazy matches are allowed. Without really knowing what those are the shotgun all-encompassing approach guaranteed at least that you had all of the information you needed to reach the optimum. Let’s actually look at this more closely.

Matching

We start a match upon receipt of a data bytes. I’m keeping a bidirectional linked list for the occurrences of each byte value. This allows the routine to rapidly create an active match object for each. Subsequently as each new byte is received we check each active match for those that may be extended and those that are no longer useful. When we reach a point where none of the active matches have been extended we select the longest match completed as the best. For matches of equal length we pick the closest one (lowest distance).

The DEFLATE specification recognizes matches of 3 or more bytes (maximum 258). Why 3? That is because the compression is achieved by replacing a matched sequence with a pointer back to the same data found in previous 32KB of data (the sliding window). That pointer consists of a distance and a length. That pointer in the worst case requires about 3 bytes. So replacing shorter sequences on average won’t buy you anything. That’s for DEFALTE. I am actually using like 5 bytes for this breadboard but eventually we will be be strictly DEFLATE. Obviously the longer the match the greater the savings. Therefore the best match is the longest and closest (uses a minimum pointer size).

So for a point in the incoming stream we seek the longest match. If there is no 3 byte match then we output that first byte as is and search using the next one. The results can be impressive especially for text and log files. It’s not LZW but it works. It turns out to be good enough for the JNIOR.

Lazy Matching

So what is with this lazy matching? Well imagine a sequence of 3 or more matching bytes located someplace in the sliding window. The consider that if we ignored that match and searched for matches starting with the next byte we might find a much longer match from someplace else in the sliding window. Do we miss an opportunity for better compression?

I can graphically show the overlap of the best matches. Say the first is 5 in lenght and the other some 15.

|---|
 |-------------|

Here vertical bars denote the first and last matching byte and the dashes bytes in between. The first would replace 5 data bytes and the second 15 starting a byte later.

If we were to strictly process matches as they are found we would encode the 5-byte match. And then we would still find the latter 11 bytes of the seconds sequence (or maybe even another better sequence). This would encode as two sequences one right after another and require two pointers for a total of maybe 6 bytes.

|---||---------|
2 pointers = 6 bytes

Note that we can always prune a match. We can ignore some of its leading bytes by incrementing the replace position (CURPTR) and decrementing the length. We can even ignore trailing bytes in a match merely by shortening its length. So here we drop the first 4 bytes of the 15-byte sequence that were eclipsed by the initial 5-byte sequence. We don’t have to actually do this manipulation as supposedly our search algorithm would find it directly for us.

Now those who get excited by such things would point out that if we absolutely ignored the first 5-byte sequence completely and outputted that first raw byte then we would use just one pointer.

.|-------------|
1 raw byte plus 1 pointer = 4 bytes

And, yes, there is a savings that depending on how often such a thing occurs will in fact lead to a better result. This is a lazy match. This is even true when the second sequence is further offset as seen here.

|---|
  |-------------|

..|-------------|
2 raw bytes plus 1 pointer = 5 bytes

But there is no benefit beyond that. If the two sequences were offset by 3 bytes then you might as well include first 3 as a sequence and you end up using 6 bytes (or less) anyway with 2 pointers.

OK so

Alright for the JNIOR it is likely that these lazy matches aren’t necessary. After all we just want to create a file collection or a graphics file. We aren’t really worried about saving every byte. In fact, we are probably more concerned about it getting done quickly.

So using matches as they come works. But… if we can find a way to efficiently accommodate the lazy matching it would be cool.

Optimized Program Code

Now that I have a little better understanding as to the lazy matching we can take what we have and move on to the next step in DEFLATE. Later after I cast this into C we can decide if it is worth handling the lazy matches. It represents a trade off between an optimum compression ratio and processing time. For the JNIOR we really are more concerned about the processing time as the brute force compression ratios appear more than acceptable. Note that I bet that our original processing of all possible matches between unmatched raw data would lead to an even better compression ratio than that including just the lazy matches but that would be slow.

Speaking of slow I thought to take a little time to distill our algorithm down and to code it so it would execute faster. I know that it is still a breadboard but I would like to not waste as much time with iterations in debugging. So I have eliminated the Match object and automatic growing lists. And since we are identifying the best match for a single position I have eliminated the list of completed matches (RSEQ). I also made an adjustment so as to be able to replay bytes into the matcher should we need to output raw data.

The following is the LZ77 code. Hopefully the comments are sufficient for you to follow the algorithm.

// Our LZ77 compression engine
    static void do_compress(BufferedOutputStream outfile, BufferedInputStream infile, int filesize) 
            throws Throwable {
        
        int ch;
        
        // process uncompressed stream byte-by-byte
        while (filesize > 0) {
            
            // Make sure that there are bytes in the queue to work with. We process bytes from 
            //  the queue using SEQPTR. When SEQPTR reaches the INPTR then we add bytes from the input
            //  stream. The linked lists are updated. 
            if (SEQPTR == INPTR) {
                
                // obtain byte from uncompressed stream
                ch = infile.read();
                filesize--;
                
                // queue data and manage associated linked list
                DATA[INPTR] = (byte)ch;
 
                // Add byte to the head of the appropriate linked list. Note pointers are stored +1 so
                //  as to use 0 as an end of list marker. Lists are bi-directional so we can trim the 
                //  tail when data is dropped from the queue.
                int ptr = HEAD[ch];
                HEAD[ch] = INPTR + 1;
                FWD[INPTR] = ptr;
                BACK[INPTR] = 0;
                if (ptr != 0)
                    BACK[ptr - 1] = INPTR + 1;
 
                // advance entry pointer
                INPTR++;
                if (INPTR == WINDOW)
                    INPTR = 0;
 
                // drop old data from queue when the sliding window is full
                if (INPTR == OUTPTR) {
 
                    // trim linked list as byte is being dropped
                    if (BACK[OUTPTR] == 0)
                        HEAD[DATA[OUTPTR]] = 0;
                    else
                        FWD[BACK[OUTPTR] - 1] = 0;
 
                    // push end of queue
                    OUTPTR++;
                    if (OUTPTR == WINDOW)
                        OUTPTR = 0;
                }
            }
            
            // Obtain the next character to process. We are assured of a byte at SEQPTR now.
            //  SEQPTR allows us to replay bytes into the sequence matching.
            ch = DATA[SEQPTR++];
            if (SEQPTR == WINDOW)
                SEQPTR = 0;
            
            // Reset match state. These will define the best match should one be found for 
            //  the current CURPTR.
            int best_distance = 0;
            int best_length = 0;
            
            // If there are no active sequences we create a new set. This uses the linked list
            //  for the byte at CURPTR to initialize a series of potention sequence sites.
            if (MSIZE == 0) {
 
                // create new active matches for all CH in the queue (except last)
                int ptr = HEAD[ch];
                while (ptr != 0) {
                    if (ptr - 1 != CURPTR) {
                        int distance = CURPTR - ptr + 1;
                        if (distance < 0)
                            distance += WINDOW;
 
                        DISTANCE[MSIZE] = distance;
                        LENGTH[MSIZE] = 1;
                        MSIZE++;
                    }
 
                    ptr = FWD[ptr - 1];
                }
                
            }
                
            // Otherwise process the active sequence matches. Here we advance sequences as each
            //  new byte is processed. Of those matches that cannot be extended we keep the
            //  best (longest and closest to CURPTR). We will use the best match if all of the
            //  potential matches end.
            else {
                
                // each active match
                for (int n = MSIZE - 1; 0 <= n; n--) {
                    
                    int p = CURPTR - DISTANCE[n];
                    if (p < 0) p += WINDOW; p += LENGTH[n]; if (p >= WINDOW)
                        p -= WINDOW;
 
                    // Can we extend this match? If so we bump its length and move on to
                    //  the next match.
                    if (DATA[p] == ch && LENGTH[n] < 258) {
                        LENGTH[n]++;
 
                        if (DISTANCE[n] + LENGTH[n] < WINDOW && filesize > 0)
                            continue;
                    }
 
                    // Sequence did not get extended. See if it is the best found so far.
                    if (LENGTH[n] >= 3) {
                        
                        // first 
                        if (best_length == 0) {
                            best_distance = DISTANCE[n];
                            best_length = LENGTH[n];
                        }
                        
                        // longer
                        else if (LENGTH[n] > best_length) {
                            best_distance = DISTANCE[n];
                            best_length = LENGTH[n];
                        }
                        
                        // closer
                        else if (LENGTH[n] == best_length && DISTANCE[n] < best_distance) { best_distance = DISTANCE[n]; best_length = LENGTH[n]; } } // Competed matches are eliminated from the active list. To be quick we // replace it with the last in the list and reduce the count. MSIZE--; DISTANCE[n] = DISTANCE[MSIZE]; LENGTH[n] = LENGTH[MSIZE]; } } // If there are no active sequence matches at this point we can generate output. if (MSIZE == 0) { // If a we have a completed sequence we can output a pointer. These are escaped // into the output buffer for later processing into encoded length-distance // pairs. best_length = 0; if (best_length != 0) { bufwrite(0xff, outfile); bufint(best_length, outfile); bufint(best_distance, outfile); // Move CURPTR to the next byte after the replaced sequence CURPTR += best_length; if (CURPTR >= WINDOW)
                        CURPTR -= WINDOW;
                }                
 
                // Otherwise output a raw uncompressed byte. The unmatched byte is sent to
                //  the output stream and we move CURPTR to the next. 
                else {
                    bufbyte(DATA[CURPTR], outfile);
                    CURPTR++;
                    if (CURPTR == WINDOW)
                        CURPTR = 0;
                }
 
                // Here we reset SEQPTR to process from the nex CURPTR location. In the case that
                //  we could not match this replays bytes previously processed so as to not miss
                //  an opportunity.
                SEQPTR = CURPTR;
            }
        }
        
        // If we are done and there are unprocessed bytes left we push them to the output stream.
        while (CURPTR != INPTR) {
            bufbyte(DATA[CURPTR], outfile);
            CURPTR++;
            if (CURPTR == WINDOW)
                CURPTR = 0;
        }
    }

With this in place we are going to move on to see what needs to be done next with our output stream.

Our LZ77 compression routine loads an output buffer with processed and hopefully compressed data. When this output buffer fills (say to 64KB) we must process it further. That data is then compressed again using a form of Huffman coding.

Huffman coding for DEFLATE

If you search you can find lots of useful descriptions of Huffman coding. Not all of those will provide the detail for constructing the required tree from the data. Of those that do, most do not lead you to creating a Huffman table compatible with DEFLATE. This is because the Huffman table in DEFLATE is eventually stored using a form of shorthand. That is only possible if the Huffman encoding follows some addition rules. To meet those requirements we need to be careful in constructing our initial dynamic tree.

The DEFLATE specification cryptically defines it:

The Huffman codes used for each alphabet in "deflate" format have two additional rules:

   * All codes of a given bit length have lexicographically consecutive values, in the same order as the
     symbols they represet;

   * Shorter codes lexicographically precede longer codes.

It would be nice if they would avoid words like lexicographically but you can’t have everything. You can also get confused over the term codes verses the binary values of the bytes in the alphabet. And of course shorter refers to bit count. That being perhaps a little more obvious but here again these must “lexicographically precede” others.

Alphabet

This refers to the set of values that we intend to compress. Obviously this needs to include byte values (0..255) since we are not constraining our input to ASCII or something. We include all of the possible values in the alphabet (in increasing value) even if some do not appear in the data. That seems obvious but DEFLATE also defines an end-of-block code (like an EOF) of 256 as well as special codes from 257..285 used to represent length codes (in the length-distance pointers we created).

So we will need to encode bytes from 0 thru 285. Okay, That set requires 9 bits and makes life in a world of bytes difficult. Remember how I had to escape my length-distance pointers in the buffer? Anyway, we can handle it in building our trees as we can define the value of a node as an integer. So for DEFLATE our “alphabet” consists of the numbers 0..285.

Don’t be confused if you notice that length codes and also distance codes generally include some “extra” bits. They do and those are simply slipped into the bit stream and are not subjected to Huffman coding. We’ll get into that later.

Length codes

These lie outside of the normal byte values 0..255 simply because in decompression we need to recognize them. These are flagged just as I have escaped the same in the output buffer. There are 29 of the length codes which are used with extra bits in some cases to encode lengths of 3 to 258. You may recall that we did not create matching sequences of less that 3 bytes and there is a maximum of a 258 byte length. The 258 maximum I bet results from storing the length-3 as a byte (0..255) someplace. But I would be very curious as to the thought process that breaks these 256 possible lengths into 29 codes. That is likely based upon some probability distribution or some such thing. It is what it is.

Distance codes

Unlike the length codes the distance codes do not need to be flagged. We expect a distance code after a length code and so those use normal byte values already represented in the alphabet (0..29). Here there are 30 distance codes some also requiring extra bits encoding distance from 1..32768. This allows the matched sequence to sit in that 32KB sliding window.

Huffman coding compresses data by representing frequent values with a small number of bits. If a space ' ' (0x20) appears in the data a tremendous number of times it might get encoded by just 2 bits. That saving 6 bits for every occurrence of a space. That can be a huge savings. The down side is that a rare byte that might occur only a few times might be encoded by 10 bits. That actually increases the storage from the original 8-bit byte but happens only a few times.

This implies then that we know the frequencies of each member of our alphabet. That is the first step. We need to proceed to count each occurrence of each member in our alphabet that appears in the data.

Here we modify my bufflush() routine that is responsible for emptying the buffer. First we will add a routine to count. There are 286 members in the alphabet (256 byte values, the end-of-block code and 29 length codes). We create an integer array where we use the value as an index to count occurrences. There is one complication in that I need to convert my length-distance escaping into the DEFLATE encoding. That entails tables of length and distance ranges so we can decide which of the length and distance codes we need to use.

CODE: SELECT ALL

    // length code range maximums
    static int[] blen = { 
        4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17, 19, 23, 27, 31, 35, 
        43, 51, 59, 67, 83, 99, 115, 131, 163, 195, 227, 258, 259 
    };

Here for each or the 29 length ranges we specify the largest size plus 1 that it can encode. For a given length we will loop through these ranges to determine the proper alphabet value to use. The DEFLATE lengths are encoded as follows (from RFC 1951):

                 Extra               Extra               Extra
            Code Bits Length(s) Code Bits Lengths   Code Bits Length(s)
            ---- ---- ------     ---- ---- -------   ---- ---- -------
             257   0     3       267   1   15,16     277   4   67-82
             258   0     4       268   1   17,18     278   4   83-98
             259   0     5       269   2   19-22     279   4   99-114
             260   0     6       270   2   23-26     280   4  115-130
             261   0     7       271   2   27-30     281   5  131-162
             262   0     8       272   2   31-34     282   5  163-194
             263   0     9       273   3   35-42     283   5  195-226
             264   0    10       274   3   43-50     284   5  227-257
             265   1  11,12      275   3   51-58     285   0    258
             266   1  13,14      276   3   59-66

Similarly we create an array for the 30 distance codes.

CODE: SELECT ALL

    static int[] bdist = { 
        2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193, 
        257, 385, 513, 769, 1025, 1537, 2049, 3073, 4097, 6145, 
        8193, 12289, 16385, 24577, 32769 
    };

The DEFLATE specification encodes distances as follows:

                  Extra           Extra               Extra
             Code Bits Dist  Code Bits   Dist     Code Bits Distance
             ---- ---- ----  ---- ----  ------    ---- ---- --------
               0   0    1     10   4     33-48    20    9   1025-1536
               1   0    2     11   4     49-64    21    9   1537-2048
               2   0    3     12   5     65-96    22   10   2049-3072
               3   0    4     13   5     97-128   23   10   3073-4096
               4   1   5,6    14   6    129-192   24   11   4097-6144
               5   1   7,8    15   6    193-256   25   11   6145-8192
               6   2   9-12   16   7    257-384   26   12  8193-12288
               7   2  13-16   17   7    385-512   27   12 12289-16384
               8   3  17-24   18   8    513-768   28   13 16385-24576
               9   3  25-32   19   8   769-1024   29   13 24577-32768

So this buffer flush routine looks as follows. Note that we are not encoding the output in any way yet. This merely determines the counts.

    // length code range maximums
    static int[] blen = { 
        4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17, 19, 23, 27, 31, 35, 
        43, 51, 59, 67, 83, 99, 115, 131, 163, 195, 227, 258, 259 
    };
    
    static int[] bdist = { 
        2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193, 
        257, 385, 513, 769, 1025, 1537, 2049, 3073, 4097, 6145, 
        8193, 12289, 16385, 24577, 32769 
    };
 
    static void bufflush(BufferedOutputStream outfile) throws Throwable {
        
        // Determine frequecies by counting each occurrence of a byte value
        int[] freq = new int[286];
        for (int n = 0; n < BUFPTR; n++) {
            
            // Get the byte value. This may be escaped ith 0xff.
            int ch = BUFR[n] & 0xff;
            
            // not escaped
            if (ch != 0xff)
                freq[ch]++;
            
            // escaped
            else {
                
                // Get next byte.
                ch = BUFR[++n] & 0xff;
                
                // may just be 0xff itself
                if (ch == 0xff)
                    freq[0xff]++;
                
                // length-distance pair
                else {
                    
                    // obtain balance of length and distance values
                    int len = (ch << 8) + (BUFR[++n] & 0xff);
                    int dist = ((BUFR[++n] & 0xff) << 8) + (BUFR[++n] & 0xff);
                    
                    // determine length code to use (257..285)
                    for (int k = 0; k < blen.length; k++) {
                        if (len < blen[k]) {
                            freq[257 + k]++;
                            break;
                        }
                    }
                    
                    // determine distance code to use (0..29)
                    for (int k = 0; k < bdist.length; k++) {
                        if (dist < bdist[k]) {
                            freq[k]++;
                            break;
                        }
                    }
                    
                }
            }
        }
        
        // dump the results
        for (int n = 0; n < 256; n++) { if (freq[n] > 0)
                System.out.printf("0x%03x %d\n", n, freq[n]);
        }

And the unsorted results. Here we list only those values that appear in the data.

CODE: SELECT ALL

jtest jniorboot.log
0x004 1
0x00a 9
0x00b 13
0x00c 4
0x00d 5
0x00e 8
0x00f 3
0x010 4
0x011 8
0x012 2
0x013 2
0x020 41
0x028 1
0x029 1
0x02c 6
0x02d 5
0x02e 7
0x02f 3
0x030 16
0x031 12
0x032 12
0x033 6
0x034 10
0x035 7
0x036 6
0x037 4
0x038 9
0x039 9
0x03a 10
0x041 4
0x042 2
0x043 3
0x044 1
0x045 2
0x046 1
0x047 3
0x048 1
0x049 3
0x04a 1
0x04d 1
0x04e 4
0x04f 3
0x050 4
0x052 3
0x053 4
0x054 3
0x055 1
0x057 1
0x05b 1
0x05d 1
0x061 8
0x062 5
0x063 7
0x064 9
0x065 29
0x066 1
0x067 2
0x068 2
0x069 14
0x06b 2
0x06c 10
0x06d 6
0x06e 9
0x06f 17
0x070 5
0x072 17
0x073 11
0x074 15
0x075 8
0x076 4
0x077 2
0x079 5
0x07a 1
0x101 30
0x102 2
0x103 3
0x105 1
0x10c 1
0x10d 13
0x10e 2
0x10f 2
0x110 1

Processing 6.129 seconds.
Source 954 bytes.
Result 680 bytes.
Ratio 1.40:1

bruce_dev /> 

There is one omission. We will in fact use one end-of-block code (0x100) and so we will need to force it into the table.

I will need to build a tree. So we need to create some kind of node which will have left and right references as well as a potential value and a weigh (frequency). Here I will define a class.

    static ArrayList  nodes = new ArrayList(512);
    
    static class Node {
        int left;
        int value;
        int weight;
        int right;
        int length;
        int code;
        
        Node() {
        }
        
        // instantiate leaf
        Node(int val, int w) {
            value = val;
            weight = w;
        }
        
        // instantiate node
        Node(int l, int r, int w) {
            left = l;
            right = r;
            weight = w;
        }                
    }

I know that later I will assign individual leaves a code length and a code so those are included in the class as well. Now I will take our array of value frequencies and create leaves for each of those that appear in the data. An ArrayList will store these nodes and grow as we define a full tree.

I will also create an ordering array that we will use to properly constructing the tree. Each entry in this array will reference a node. Initially we will sort this by decreasing frequency. Also in keeping with the lexicographical requirement those nodes of the same frequency will be ordered by increasing alphabet value. This is where it gets tricky but we start here.

We replace the frequency dump loop with the following.

        // create node set (0 not used)
        int[] order = new int[286];
        int cnt = 0;
        nodes.add(new Node());  // not used (index 0 is terminator)
        for (int n = 0; n < 286; n++) { if (freq[n] > 0) {
                nodes.add(new Node(n, freq[n]));
                order[cnt++] = nodes.size() - 1;
            }
        }
        
        // sort
        for (int n = 0; n < cnt - 1; n++) {
            Node nd1 = nodes.get(order[n]);
            Node nd2 = nodes.get(order[n + 1]);
 
            if (nd1.weight < nd2.weight || nd1.weight == nd2.weight && nd1.value > nd2.value) {
                int k = order[n];
                order[n] = order[n + 1];
                order[n + 1] = k;
                n -= 2;
                if (n < -1)
                    n = -1;
            }
        }
        
        //dump
        System.out.println("");
        for (int n = 0; n < cnt; n++) {
            Node nd = nodes.get(order[n]);
            System.out.printf("%d 0x%03x %d\n", order[n], nd.value, nd.weight);
        }

And the results of the sort are displayed.

CODE: SELECT ALL

bruce_dev /> jtest jniorboot.log

12 0x020 41
75 0x101 30
55 0x065 29
64 0x06f 17
66 0x072 17
19 0x030 16
68 0x074 15
59 0x069 14
3 0x00b 13
80 0x10d 13
20 0x031 12
21 0x032 12
67 0x073 11
23 0x034 10
29 0x03a 10
61 0x06c 10
2 0x00a 9
27 0x038 9
28 0x039 9
54 0x064 9
63 0x06e 9
6 0x00e 8
9 0x011 8
51 0x061 8
69 0x075 8
17 0x02e 7
24 0x035 7
53 0x063 7
15 0x02c 6
22 0x033 6
25 0x036 6
62 0x06d 6
5 0x00d 5
16 0x02d 5
52 0x062 5
65 0x070 5
72 0x079 5
4 0x00c 4
8 0x010 4
26 0x037 4
30 0x041 4
41 0x04e 4
43 0x050 4
45 0x053 4
70 0x076 4
7 0x00f 3
18 0x02f 3
32 0x043 3
36 0x047 3
38 0x049 3
42 0x04f 3
44 0x052 3
46 0x054 3
77 0x103 3
10 0x012 2
11 0x013 2
31 0x042 2
34 0x045 2
57 0x067 2
58 0x068 2
60 0x06b 2
71 0x077 2
76 0x102 2
81 0x10e 2
82 0x10f 2
1 0x004 1
13 0x028 1
14 0x029 1
33 0x044 1
35 0x046 1
37 0x048 1
39 0x04a 1
40 0x04d 1
47 0x055 1
48 0x057 1
49 0x05b 1
50 0x05d 1
56 0x066 1
73 0x07a 1
74 0x100 1
78 0x105 1
79 0x10c 1
83 0x110 1

Processing 8.873 seconds.
Source 954 bytes.
Result 680 bytes.
Ratio 1.40:1

bruce_dev /> 

The first number if the node index. The next is the value and the last the count of occurrences. Here we see that the space (0x20) is the most frequent in this file. You can see that I included the end-of-block code (0x100) once and it is as infrequent as a few others.

The next step is to construct a Huffman tree. So to simplify things at this point we are going to use a simple phrase as the data and disable the LZ77 aspect. There is a Duke University page describing Huffman coding that uses the phrase “go go gophers”. For lack of anything better we will use the same.

With LZ77 disabled and running only that phrase our frequency sort yields the following.

CODE: SELECT ALL

bruce_dev /> jtest flash/gogo.dat

3 0x067 3
5 0x06f 3
1 0x020 2
2 0x065 1
4 0x068 1
6 0x070 1
7 0x072 1
8 0x073 1

Processing 0.649 seconds.
Source 13 bytes.
Result 13 bytes.
Ratio 1.00:1

bruce_dev /> 

bruce_dev /> cat flash/gogo.dat
go go gophers
bruce_dev />

We also ignore any end-of-block code.

The Duke page demonstrates that this phrase can be encoded with just 37 bits. It also demonstrates that there are multiple possible Huffman trees that can be created to yield that result. Interesting that none of them there meet the DEFLATE requirement. So I am going to determine the procedure that creates the right kind of table.

The game next is to combine pairs of leaves into nodes, And then pairs of leaves and nodes into other nodes. This procedure is repeated until there is only one at the head of the tree. Generally one is directed to combine the lowest two weighted leaves or nodes into a single node of combined weight.

Each time a node is constructed that defines a bit 0/1 for left/right for the one or two members the node contains. By working from the lowest weighted or least frequent leaves and then nodes one ends up building longer codes for that than the higher frequency values which are not touch right away.

The following procedure generates a tree (but not the kind we want just yet) using just such a procedure. The lowest two are combined and the list is resorted. We dump and repeat.

        // generate tree
        while (cnt > 1) {
            
            // take lowest weight nodes and create new
            int left = order[cnt - 2];
            int right = order[cnt - 1];
            Node nd1 = nodes.get(left);
            Node nd2 = nodes.get(right);
            nodes.add(new Node(left, right, nd1.weight + nd2.weight));
            order[cnt - 2] = nodes.size() - 1;
            cnt--;
 
            // sort
            for (int n = 0; n < cnt - 1; n++) {
                nd1 = nodes.get(order[n]);
                nd2 = nodes.get(order[n + 1]);
 
                if (nd1.weight < nd2.weight) {
                    int k = order[n];
                    order[n] = order[n + 1];
                    order[n + 1] = k;
                    n -= 2;
                    if (n < -1)
                        n = -1;
                }
            }
            
            //dump
            System.out.println("");
            for (int n = 0; n < cnt; n++) {
                Node nd = nodes.get(order[n]);
                if (nd.left == 0 && nd.right == 0)
                    System.out.printf("%d 0x%03x %d\n", order[n], nd.value, nd.weight);
                else
                    System.out.printf("%d %d-%d %d\n", order[n], nd.left, nd.right, nd.weight);
            }
        }

You can follow the procedure in the output although the resulting tree is not displayed. I might create a way to display the tree but I haven’t gone that far yet.

CODE: SELECT ALL

bruce_dev /> jtest flash/gogo.dat

3 0x067 3
5 0x06f 3
1 0x020 2
2 0x065 1
4 0x068 1
6 0x070 1
7 0x072 1
8 0x073 1

3 0x067 3
5 0x06f 3
1 0x020 2
9 7-8 2
2 0x065 1
4 0x068 1
6 0x070 1

3 0x067 3
5 0x06f 3
1 0x020 2
9 7-8 2
10 4-6 2
2 0x065 1

3 0x067 3
5 0x06f 3
11 10-2 3
1 0x020 2
9 7-8 2

12 1-9 4
3 0x067 3
5 0x06f 3
11 10-2 3

13 5-11 6
12 1-9 4
3 0x067 3

14 12-3 7
13 5-11 6

15 14-13 13

Processing 0.865 seconds.
Source 13 bytes.
Result 13 bytes.
Ratio 1.00:1

bruce_dev />

Here the new nodes display node reference indexes as left-right instead of a value.

Now with a tree built I create a recursive routine to walk the tree and assign leaves a code length and binary code. I then collect these leaves into a node array that would allow me to efficiently translate the raw data. Here that array is dumped. You will see the new code later. Here’s the table from the prior procedure.

0x020 ' ' count=2 length=3 code 000
0x065 'e' count=1 length=3 code 111
0x067 'g' count=3 length=2 code 01
0x068 'h' count=1 length=4 code 1100
0x06f 'o' count=3 length=2 code 10
0x070 'p' count=1 length=4 code 1101
0x072 'r' count=1 length=4 code 0010
0x073 's' count=1 length=4 code 0011

if we manually encode the phrase we see that it does in fact compress to just 37 bits.

g  o  ' ' g  o  ' ' g  o  p    h    e   r    s
01 10 000 01 10 000 01 10 1101 1100 111 0010 0011
total 37 bits

But this table does not meet the DEFLATE requirements and cannot fit the shorthand.

By the way here are two other tables from the Duke University page. Both of these are different yet again from our table but each gets the job done. None of these meet the DEFLATE requirement.

Now that third tree is close. But here is the tree that we need to learn to generate from this data.

Why? Because the two additional requirements are met.

First the code lengths (depth of the tree) increase from left to right.

Second, for the same code length (depth) the values increase from left to right (lexicographically).

And with this tree we can apply the required shorthand to properly to fit the DEFLATE format. The details of that shorthand I will get into.

Let’s revise the procedure to consider pairs of nodes from right to left.

Here we first determine the combined weight of the rightmost pair. We will combine that pair and any prior pair whose combined weight is less than or equal to that. We repeat this until we have only one node that being the head of the tree.

        // generate tree
        while (cnt > 1) {
            
            // determine to combined weight of the lowest two nodes
            int left = order[cnt - 2];
            int right = order[cnt - 1];
            Node nd1 = nodes.get(left);
            Node nd2 = nodes.get(right);
            int weight = nd1.weight + nd2.weight;
            
            // Now combine node pairs equal to or less than this weight from
            //  right to left.
            int pos = cnt;
            while (pos >= 2) {
                
                // Get the combined weight of the pair preceeding the pointer. We will
                //  combine the psir if its weight is less than or equal to that of 
                //  the rightmost (least) pair. We stop if not.
                left = order[pos - 2];
                right = order[pos - 1];
                nd1 = nodes.get(left);
                nd2 = nodes.get(right);
                int w = nd1.weight + nd2.weight;
                if (w > weight)
                    break;
                
                // Combine the pair and reduce teh order array.
                nodes.add(new Node(left, right, w));
                order[pos - 2] = nodes.size() - 1;
                for (int n = pos; n < cnt; n++)
                    order[pos - 1] = order[n];
                cnt--;
                
                // onto the the next prior pair
                pos -= 2;
            }
            
            //dump
            System.out.println("");
            for (int n = 0; n < cnt; n++) {
                Node nd = nodes.get(order[n]);
                if (nd.left == 0 && nd.right == 0)
                    System.out.printf("%d 0x%03x %d\n", order[n], nd.value, nd.weight);
                else
                    System.out.printf("%d %d-%d %d\n", order[n], nd.left, nd.right, nd.weight);
            }
        }

Now when this is executed we obtain a different tree. This one actually is the one we seek.

0x020 ' ' count=2 length=3 code 100
0x065 'e' count=1 length=3 code 101
0x067 'g' count=3 length=2 code 00
0x068 'h' count=1 length=4 code 1100
0x06f 'o' count=3 length=2 code 01
0x070 'p' count=1 length=4 code 1101
0x072 'r' count=1 length=4 code 1110
0x073 's' count=1 length=4 code 1111

CODE: SELECT ALL

bruce_dev /> jtest flash/gogo.dat

3 0x067 3
5 0x06f 3
1 0x020 2
2 0x065 1
4 0x068 1
6 0x070 1
7 0x072 1
8 0x073 1

3 0x067 3
5 0x06f 3
1 0x020 2
2 0x065 1
10 4-6 2
9 7-8 2

3 0x067 3
5 0x06f 3
12 1-2 3
11 10-9 4

14 3-5 6
13 12-11 7

15 14-13 13
0x020 ' ' count=2 length=3 code 100
0x065 'e' count=1 length=3 code 101
0x067 'g' count=3 length=2 code 00
0x068 'h' count=1 length=4 code 1100
0x06f 'o' count=3 length=2 code 01
0x070 'p' count=1 length=4 code 1101
0x072 'r' count=1 length=4 code 1110
0x073 's' count=1 length=4 code 1111

Processing 0.840 seconds.
Source 13 bytes.
Result 13 bytes.
Ratio 1.00:1

bruce_dev /> 

I may not be ready to claim victory here but this appears to be very promising. Perhaps we should return to a more complicated situation.

Alright so we re-enable the LZ77 compression and remove much of the dump output. When we run this on jniorboot.log we get the following table.

CODE: SELECT ALL

bruce_dev /> jtest jniorboot.log
0x004 '.' count=1 length=11 code 10011110110
0x00a '.' count=9 length=6 code 110000
0x00b '.' count=13 length=6 code 110100
0x00c '.' count=4 length=8 code 10110011
0x00d '.' count=5 length=6 code 110010
0x00e '.' count=8 length=7 code 1010001
0x00f '.' count=3 length=8 code 10011101
0x010 '.' count=4 length=8 code 10111100
0x011 '.' count=8 length=7 code 1001100
0x012 '.' count=2 length=10 code 1001111000
0x013 '.' count=2 length=10 code 1001111001
0x020 ' ' count=41 length=2 code 00
0x028 '(' count=1 length=11 code 10011110111
0x029 ')' count=1 length=11 code 10110111110
0x02c ',' count=6 length=6 code 110110
0x02d '-' count=5 length=6 code 110011
0x02e '.' count=7 length=8 code 10110001
0x02f '/' count=3 length=9 code 100111110
0x030 '0' count=16 length=5 code 11111
0x031 '1' count=12 length=7 code 1011100
0x032 '2' count=12 length=7 code 1011101
0x033 '3' count=6 length=6 code 110111
0x034 '4' count=10 length=7 code 1001001
0x035 '5' count=7 length=8 code 10110100
0x036 '6' count=6 length=7 code 1001010
0x037 '7' count=4 length=8 code 10111101
0x038 '8' count=9 length=6 code 110001
0x039 '9' count=9 length=6 code 100000
0x03a ':' count=10 length=6 code 111000
0x041 'A' count=4 length=7 code 1000100
0x042 'B' count=2 length=9 code 111010000
0x043 'C' count=3 length=9 code 100111111
0x044 'D' count=1 length=11 code 10110111111
0x045 'E' count=2 length=9 code 111010001
0x046 'F' count=1 length=10 code 1110110110
0x047 'G' count=3 length=8 code 11101110
0x048 'H' count=1 length=10 code 1110110111
0x049 'I' count=3 length=8 code 11101111
0x04a 'J' count=1 length=10 code 1011111110
0x04d 'M' count=1 length=10 code 1011111111
0x04e 'N' count=4 length=7 code 1000101
0x04f 'O' count=3 length=8 code 11101010
0x050 'P' count=4 length=9 code 101101100
0x052 'R' count=3 length=8 code 11101011
0x053 'S' count=4 length=9 code 101101101
0x054 'T' count=3 length=7 code 1000110
0x055 'U' count=1 length=10 code 1110100110
0x057 'W' count=1 length=10 code 1110100111
0x05b '[' count=1 length=10 code 1110110100
0x05d ']' count=1 length=10 code 1110110101
0x061 'a' count=8 length=7 code 1001101
0x062 'b' count=5 length=7 code 1010010
0x063 'c' count=7 length=8 code 10110101
0x064 'd' count=9 length=6 code 100001
0x065 'e' count=29 length=4 code 0110
0x066 'f' count=1 length=10 code 1110100100
0x067 'g' count=2 length=10 code 1011011100
0x068 'h' count=2 length=10 code 1011011101
0x069 'i' count=14 length=6 code 101011
0x06b 'k' count=2 length=9 code 101111100
0x06c 'l' count=10 length=6 code 111001
0x06d 'm' count=6 length=7 code 1001011
0x06e 'n' count=9 length=7 code 1010000
0x06f 'o' count=17 length=4 code 0111
0x070 'p' count=5 length=7 code 1010011
0x072 'r' count=17 length=5 code 11110
0x073 's' count=11 length=7 code 1001000
0x074 't' count=15 length=6 code 101010
0x075 'u' count=8 length=8 code 10110000
0x076 'v' count=4 length=8 code 10011100
0x077 'w' count=2 length=9 code 101111101
0x079 'y' count=5 length=8 code 10110010
0x07a 'z' count=1 length=10 code 1110100101
0x100 '.' count=1 length=10 code 1011111100
0x101 '.' count=30 length=3 code 010

Processing 10.284 seconds.
Source 954 bytes.
Result 680 bytes.
Ratio 1.40:1

bruce_dev />

Well okay. Just about all you can say is that it does appear that the stuff with the higher counts (frequency) does appear to use the shortest code lengths. Another good clue is the fact that the first alphabet entry that uses the smallest code is represented by a sequence of all zeroes.

I suppose now we get into what I have been calling the shorthand storage format for Huffman table. If this table can be so represented and the table reconstructed from that then we are good to go.

Huffman table “Shorthand”

While “shorthand” is my term and no one else’s that I’ve seen, it still refers to efficiently conveying the table. To start I have defined an entry for 285 possible codes each with a count and a binary code. Even with some cute integer packing this is still a lot of bytes. Having to pass the table with the compressed file painfully can cut into the benefit of the compression.

It turns out that if the Huffman table conforms to the two special rules it can be reconstructed from only knowing the code length for each of the alphabet members. So we don’t need to include the actual code.

Once we have the code lengths for each alphabet that gets packed further. It’s a bit crazy. The array of 285 code lengths contains a lot of repetition. This is packed using a form and run-length encoding where sequences of the same length are defined by the count (run length). Then that data is again (ugh) run through a Huffman encoding which results in just 19 bit lengths. Those are stored in a weird order which is intended to keep those alphabet members whose bit lengths are likely to be 0 near the end as trailing zeroes need not be included. So the entire Huffman table ends up being conveyed in just a handful of bytes. I guess people were really creative back then.

The procedure for reconstructing the Huffman table from the code lengths first requires a count of codes for each length. Let me add that to our table output. Here is the additional output from the execution.

Length=2 Count 1
Length=3 Count 1
Length=4 Count 2
Length=5 Count 2
Length=6 Count 13
Length=7 Count 15
Length=8 Count 14
Length=9 Count 8
Length=10 Count 15
Length=11 Count 4

From this table we can calculate the first binary code assigned to that code length group. Each alphabet member using that code length is then assigned an incremental binary value from that. I can add that calculation to the this table.

Length=2 Count 1 Start Code 00
Length=3 Count 1 Start Code 010
Length=4 Count 2 Start Code 0110
Length=5 Count 2 Start Code 10000
Length=6 Count 13 Start Code 100100
Length=7 Count 15 Start Code 1100010
Length=8 Count 14 Start Code 11100010
Length=9 Count 8 Start Code 111100000
Length=10 Count 15 Start Code 1111010000
Length=11 Count 4 Start Code 11110111110

Okay so I can see that the Huffman table generated DOES NOT conform. As an exercise you can see for yourself. So I wonder where error might be. Hmm…

CODE: SELECT ALL

bruce_dev /> jtest jniorboot.log
0x004 '.' count=1 length=11 code 10011110110
0x00a '.' count=9 length=6 code 110000
0x00b '.' count=13 length=6 code 110100
0x00c '.' count=4 length=8 code 10110011
0x00d '.' count=5 length=6 code 110010
0x00e '.' count=8 length=7 code 1010001
0x00f '.' count=3 length=8 code 10011101
0x010 '.' count=4 length=8 code 10111100
0x011 '.' count=8 length=7 code 1001100
0x012 '.' count=2 length=10 code 1001111000
0x013 '.' count=2 length=10 code 1001111001
0x020 ' ' count=41 length=2 code 00
0x028 '(' count=1 length=11 code 10011110111
0x029 ')' count=1 length=11 code 10110111110
0x02c ',' count=6 length=6 code 110110
0x02d '-' count=5 length=6 code 110011
0x02e '.' count=7 length=8 code 10110001
0x02f '/' count=3 length=9 code 100111110
0x030 '0' count=16 length=5 code 11111
0x031 '1' count=12 length=7 code 1011100
0x032 '2' count=12 length=7 code 1011101
0x033 '3' count=6 length=6 code 110111
0x034 '4' count=10 length=7 code 1001001
0x035 '5' count=7 length=8 code 10110100
0x036 '6' count=6 length=7 code 1001010
0x037 '7' count=4 length=8 code 10111101
0x038 '8' count=9 length=6 code 110001
0x039 '9' count=9 length=6 code 100000
0x03a ':' count=10 length=6 code 111000
0x041 'A' count=4 length=7 code 1000100
0x042 'B' count=2 length=9 code 111010000
0x043 'C' count=3 length=9 code 100111111
0x044 'D' count=1 length=11 code 10110111111
0x045 'E' count=2 length=9 code 111010001
0x046 'F' count=1 length=10 code 1110110110
0x047 'G' count=3 length=8 code 11101110
0x048 'H' count=1 length=10 code 1110110111
0x049 'I' count=3 length=8 code 11101111
0x04a 'J' count=1 length=10 code 1011111110
0x04d 'M' count=1 length=10 code 1011111111
0x04e 'N' count=4 length=7 code 1000101
0x04f 'O' count=3 length=8 code 11101010
0x050 'P' count=4 length=9 code 101101100
0x052 'R' count=3 length=8 code 11101011
0x053 'S' count=4 length=9 code 101101101
0x054 'T' count=3 length=7 code 1000110
0x055 'U' count=1 length=10 code 1110100110
0x057 'W' count=1 length=10 code 1110100111
0x05b '[' count=1 length=10 code 1110110100
0x05d ']' count=1 length=10 code 1110110101
0x061 'a' count=8 length=7 code 1001101
0x062 'b' count=5 length=7 code 1010010
0x063 'c' count=7 length=8 code 10110101
0x064 'd' count=9 length=6 code 100001
0x065 'e' count=29 length=4 code 0110
0x066 'f' count=1 length=10 code 1110100100
0x067 'g' count=2 length=10 code 1011011100
0x068 'h' count=2 length=10 code 1011011101
0x069 'i' count=14 length=6 code 101011
0x06b 'k' count=2 length=9 code 101111100
0x06c 'l' count=10 length=6 code 111001
0x06d 'm' count=6 length=7 code 1001011
0x06e 'n' count=9 length=7 code 1010000
0x06f 'o' count=17 length=4 code 0111
0x070 'p' count=5 length=7 code 1010011
0x072 'r' count=17 length=5 code 11110
0x073 's' count=11 length=7 code 1001000
0x074 't' count=15 length=6 code 101010
0x075 'u' count=8 length=8 code 10110000
0x076 'v' count=4 length=8 code 10011100
0x077 'w' count=2 length=9 code 101111101
0x079 'y' count=5 length=8 code 10110010
0x07a 'z' count=1 length=10 code 1110100101
0x100 '.' count=1 length=10 code 1011111100
0x101 '.' count=30 length=3 code 010

Length=2 Count 1 Start Code 00
Length=3 Count 1 Start Code 010
Length=4 Count 2 Start Code 0110
Length=5 Count 2 Start Code 10000
Length=6 Count 13 Start Code 100100
Length=7 Count 15 Start Code 1100010
Length=8 Count 14 Start Code 11100010
Length=9 Count 8 Start Code 111100000
Length=10 Count 15 Start Code 1111010000
Length=11 Count 4 Start Code 11110111110

Processing 10.395 seconds.
Source 954 bytes.
Result 680 bytes.
Ratio 1.40:1

bruce_dev /> 

Yeah I had at least one glitch but trying to generate the precise tree appropriate for the DEFLATE “shorthand” still eludes me. The search engines these days are much less effective for locating useful technical information than for finding ways to separate me from my money. It seems easier to reinvent the wheel and devise my own algorithm even though I know that a simple procedure is likely documented in numerous pages on the net.

It strikes me that all we need to do is determine the optimum bit length for each of the used alphabet members. It is almost irrelevant as to where in a tree a particular member ends up. Once we have the proper bit length a tree meeting the DEFLATE requirements can be directly created.

Perhaps the simple procedure for generating a valid Huffman tree ignoring the DEFLATE requirements can be employed and without actually building a tree structure. Note that when two leaves are combined you are simply assigning another bit to them regardless of which gets ‘0’ and which gets ‘1’. The bit length is incremented for the two leaves as it is combined into a node. In fact when you combine two nodes you need only increment the bit length (depth) for all of the members below it. So in creating a node I need only keep track of all of the leaves below it. A simple linked list suffices.

Such an implementation need not even retain intermediate nodes. You just need to maintain the node membership list. You need that so you can advance the bit count for all of the leaves below as the node is combined.

Maybe you follow me to this point or maybe not. I’ll go ahead an try an implementation.

Okay this new approach is golden! And its fast! Oh and I don’t need to build any damn tree!

    static void bufflush(BufferedOutputStream outfile) throws Throwable {
        
        // Determine frequecies by counting each occurrence of a byte value. 
        //  Here we force the end-of-block code that we know we will use.
        int[] sym_cnt = new int[286];
//        sym_cnt[0x100] = 1;
        
        for (int n = 0; n < BUFPTR; n++) {
            
            // Get the byte value. This may be escaped with 0xff.
            int ch = BUFR[n] & 0xff;
            
            // not escaped
            if (ch != 0xff)
                sym_cnt[ch]++;
            
            // escaped
            else {
                
                // Get next byte.
                ch = BUFR[++n] & 0xff;
                
                // may just be 0xff itself
                if (ch == 0xff)
                    sym_cnt[0xff]++;
                
                // length-distance pair
                else {
                    
                    // obtain balance of length and distance values
                    int len = (ch << 8) + (BUFR[++n] & 0xff);
                    int dist = ((BUFR[++n] & 0xff) << 8) + (BUFR[++n] & 0xff);
                    
                    // determine length code to use (257..285)
                    for (int k = 0; k < blen.length; k++) {
                        if (len < blen[k]) {
                            sym_cnt[257 + k]++;
                            break;
                        }
                    }
                    
                    // determine distance code to use (0..29)
                    for (int k = 0; k < bdist.length; k++) {
                        if (dist < bdist[k]) {
                            sym_cnt[k]++;
                            break;
                        }
                    }
                    
                }
            }
        }
        
        // Create node list containing symbols in our alphabet that are found in the
        //  data. This will be sorted and used to assign bit lengths. Note list pointers
        //  are stored +1 to reserve 0 as a list terminator.
        int[] nodes = new int[286];
        int[] cnts = new int[286];
        int nodecnt = 0;
        for (int n = 0; n < 286; n++) { if (sym_cnt[n] > 0) {
                nodes[nodecnt] = n + 1;
                cnts[nodecnt] = sym_cnt[n];
                nodecnt++;                
            }
        }
        
        // Determine optimal bit lengths. Here we initialize a bit length array and a
        //  node membership list pointer array. These will be used as we generate
        //  the detail required for Huffman coding.
        int[] sym_len = new int[286];
        int[] sym_ptr = new int[286];
        
        // Perform Huffman optimization. This loops until we've folded all the leaves
        //  into a single head node.
        while (nodecnt > 1) {
            
            // The leaves are sorted by decreasing frequency (counts).
            for (int n = 0; n < nodecnt - 1; n++) {
                if (cnts[n] < cnts[n + 1]) { int k = nodes[n]; nodes[n] = nodes[n + 1]; nodes[n + 1] = k; k = cnts[n]; cnts[n] = cnts[n + 1]; cnts[n + 1] = k; if (n > 0)
                        n -= 2;
                }
            }
 
            // The last two leaves/nodes have the lowest frequencies and are to 
            //  be combined. Here we increment the bit lengths for each and 
            //  merge leaves into a single list of node members.
            int ptr = nodes[nodecnt - 2];
            int add_ptr = nodes[nodecnt - 1];
            while (ptr > 0) {
                sym_len[ptr - 1]++;
                int p = sym_ptr[ptr - 1];
                if (p == 0 && add_ptr > 0) {
                    sym_ptr[ptr - 1] = add_ptr;
                    p = add_ptr;
                    add_ptr = 0;
                }
                ptr = p;
            }
            
            // Combine the last two nodes by adding their frequencies and dropping 
            //  the last.
            cnts[nodecnt - 2] += cnts[nodecnt - 1];
            nodecnt--;
            
        }
        
        // dump nonzero bit lengths
        for (int n = 0; n < 286; n++) { if (sym_len[n] > 0)
                System.out.printf("0x%03x '%c' count=%d optimal bits %d\n", n,
                        n >= 0x20 && n < 0x7f ? n : '.', sym_cnt[n], sym_len[n]);
        }
        
        outfile.write(BUFR, 0, BUFPTR);
        BUFPTR = 0;        
    }

Running this on the “go go gophers ” test string again with LZ77 disabled yields the desired results.

bruce_dev /> jtest flash/gogo.dat
0x020 ' ' count=2 optimal bits 3
0x065 'e' count=1 optimal bits 3
0x067 'g' count=3 optimal bits 2
0x068 'h' count=1 optimal bits 4
0x06f 'o' count=3 optimal bits 2
0x070 'p' count=1 optimal bits 4
0x072 'r' count=1 optimal bits 4
0x073 's' count=1 optimal bits 4

Processing 0.540 seconds.
Source 13 bytes.
Result 13 bytes.
Ratio 1.00:1

bruce_dev />

Now I can use this to generate the DEFLATE compatible Huffman table.

If you multiply the frequency (count) times the bit length and add them for this example you get 37 bits which we know is the optimal for this example.

The code with comments above should be reasonably understandable. If there are any questions I can describe what is going on. But basically the routine that counts occurrences of each alphabet symbol is as it was before. Unfortunately that is complicated a bit as I have to process the length-distance pointers to determine the encoding for the tally.

Next we create a list of leaves so we can combine the two with the lowest frequency of occurrence. Here’s where we simply increment the bit lengths for node members. The leaf list distills down as it would for the traditional Huffman process.

Next I can reinstate the LZ77 compressor and run this on real data. Then we can then take the process further.

We can now use the procedure detailed in the DEFLATE specification to convert assigned bit lengths into binary codes for compression. We don’t need to build a tree.

First we tally the number of symbols in our alphabet the use each bit length.

        // count the occurrence of each bit length
        int[] bits = new int[19];
        for (int n = 0; n < 286; n++) { if (sym_len[n] > 0)
                bits[sym_len[n]]++;
        }

With this we can calculate the starting binary code for each bit length. Basically for each bit length we reserve N codes and use the next as a prefix for subsequent bit lengths.

        // determine starting bit code for each bit length
        int[] start = new int[19];
        int c = 0;
        for (int n = 0; n < 19; n++) {
            start[n] = c;
            c = (c + bits[n]) << 1;
        }

This gives us the correct first codes as we see here.

bruce_dev /> jtest flash/gogo.dat
bit length 2 count 2 first code 00
bit length 3 count 2 first code 100
bit length 4 count 4 first code 1100

Now we use these starting codes in assigning the binary codes to each symbol.

        // assign codes to used alphabet symbols
        int[] code = new int[286];
        for (int n = 0; n < 286; n++) { if (sym_len[n] > 0) 
                code[n] = start[sym_len[n]]++;
        }

This results are displayed by the attached program.

CODE: SELECT ALL

bruce_dev /> jtest flash/gogo.dat
bit length 2 count 2 first code 00
bit length 3 count 2 first code 100
bit length 4 count 4 first code 1100

0x020 ' ' count=2 optimal bits 3 100
0x065 'e' count=1 optimal bits 3 101
0x067 'g' count=3 optimal bits 2 00
0x068 'h' count=1 optimal bits 4 1100
0x06f 'o' count=3 optimal bits 2 01
0x070 'p' count=1 optimal bits 4 1101
0x072 'r' count=1 optimal bits 4 1110
0x073 's' count=1 optimal bits 4 1111

Processing 0.654 seconds.
Source 13 bytes.
Result 13 bytes.
Ratio 1.00:1

bruce_dev />

This gives us all of the codes that we need to compress our data. In this case it is for the example string “go go gophers”. Happily we did not need to build any tree structure. And, this Huffman coding is compatible with the DEFLATE specification. We can move forward with the shorthand.

Curious? Here’s what I get using jniorboot.log. The content of that file has changed by the way as I have rebooted the JNIOR during the course of this topic. Here the LZ77 compression has also been re-enabled. the program however does not yet generate the bit stream compressed with these Huffman codes.

CODE: SELECT ALL

bruce_dev /> jtest jniorboot.log
bit length 4 count 3 first code 0000
bit length 5 count 8 first code 00110
bit length 6 count 20 first code 011100
bit length 7 count 22 first code 1100000
bit length 8 count 11 first code 11101100
bit length 9 count 18 first code 111101110

0x004 '.' count=1 optimal bits 9 111101110
0x00a '.' count=8 optimal bits 6 011100
0x00b '.' count=12 optimal bits 5 00110
0x00c '.' count=5 optimal bits 7 1100000
0x00d '.' count=5 optimal bits 7 1100001
0x00e '.' count=6 optimal bits 6 011101
0x00f '.' count=3 optimal bits 8 11101100
0x010 '.' count=5 optimal bits 7 1100010
0x011 '.' count=7 optimal bits 6 011110
0x012 '.' count=4 optimal bits 7 1100011
0x013 '.' count=4 optimal bits 7 1100100
0x020 ' ' count=41 optimal bits 4 0000
0x028 '(' count=1 optimal bits 9 111101111
0x029 ')' count=1 optimal bits 9 111110000
0x02c ',' count=6 optimal bits 6 011111
0x02d '-' count=5 optimal bits 7 1100101
0x02e '.' count=7 optimal bits 6 100000
0x02f '/' count=3 optimal bits 8 11101101
0x030 '0' count=18 optimal bits 5 00111
0x031 '1' count=15 optimal bits 5 01000
0x032 '2' count=9 optimal bits 6 100001
0x033 '3' count=8 optimal bits 6 100010
0x034 '4' count=9 optimal bits 6 100011
0x035 '5' count=5 optimal bits 7 1100110
0x036 '6' count=4 optimal bits 7 1100111
0x037 '7' count=5 optimal bits 7 1101000
0x038 '8' count=7 optimal bits 6 100100
0x039 '9' count=8 optimal bits 6 100101
0x03a ':' count=9 optimal bits 6 100110
0x041 'A' count=4 optimal bits 7 1101001
0x042 'B' count=2 optimal bits 8 11101110
0x043 'C' count=2 optimal bits 8 11101111
0x044 'D' count=1 optimal bits 9 111110001
0x045 'E' count=2 optimal bits 8 11110000
0x046 'F' count=1 optimal bits 9 111110010
0x047 'G' count=3 optimal bits 7 1101010
0x048 'H' count=1 optimal bits 9 111110011
0x049 'I' count=2 optimal bits 8 11110001
0x04a 'J' count=1 optimal bits 9 111110100
0x04d 'M' count=1 optimal bits 9 111110101
0x04e 'N' count=4 optimal bits 7 1101011
0x04f 'O' count=3 optimal bits 7 1101100
0x050 'P' count=5 optimal bits 7 1101101
0x052 'R' count=3 optimal bits 7 1101110
0x053 'S' count=4 optimal bits 7 1101111
0x054 'T' count=3 optimal bits 7 1110000
0x055 'U' count=1 optimal bits 9 111110110
0x057 'W' count=1 optimal bits 9 111110111
0x05b '[' count=1 optimal bits 9 111111000
0x05d ']' count=1 optimal bits 9 111111001
0x061 'a' count=8 optimal bits 6 100111
0x062 'b' count=5 optimal bits 7 1110001
0x063 'c' count=7 optimal bits 6 101000
0x064 'd' count=9 optimal bits 6 101001
0x065 'e' count=29 optimal bits 4 0001
0x066 'f' count=1 optimal bits 9 111111010
0x067 'g' count=2 optimal bits 8 11110010
0x068 'h' count=2 optimal bits 8 11110011
0x069 'i' count=14 optimal bits 5 01001
0x06b 'k' count=2 optimal bits 8 11110100
0x06c 'l' count=10 optimal bits 6 101010
0x06d 'm' count=6 optimal bits 6 101011
0x06e 'n' count=9 optimal bits 6 101100
0x06f 'o' count=17 optimal bits 5 01010
0x070 'p' count=5 optimal bits 7 1110010
0x072 'r' count=17 optimal bits 5 01011
0x073 's' count=11 optimal bits 6 101101
0x074 't' count=15 optimal bits 5 01100
0x075 'u' count=8 optimal bits 6 101110
0x076 'v' count=4 optimal bits 7 1110011
0x077 'w' count=2 optimal bits 8 11110101
0x079 'y' count=5 optimal bits 7 1110100
0x07a 'z' count=1 optimal bits 9 111111011
0x100 '.' count=1 optimal bits 9 111111100
0x101 '.' count=28 optimal bits 4 0010
0x102 '.' count=6 optimal bits 6 101111
0x103 '.' count=2 optimal bits 8 11110110
0x105 '.' count=1 optimal bits 9 111111101
0x10d '.' count=13 optimal bits 5 01101
0x10e '.' count=4 optimal bits 7 1110101
0x10f '.' count=1 optimal bits 9 111111110
0x110 '.' count=1 optimal bits 9 111111111

Processing 9.181 seconds.
Source 954 bytes.
Result 680 bytes.
Ratio 1.40:1

bruce_dev /> 

With the Huffman coding for DEFLATE there is one thing that we need to worry about. We need to limit the bit length to a maximum of 15. It is not very likely to occur I suspect. But this is because the bit length list for the alphabet is run-length encoded using a procedure that can only handle bit lengths of 0 to 15. Codes of 16, 17 and 18 are used to signal certain types of repetition. This is where I got the ’19’ I use in my breadboard code to dimension the bit length arrays. That need only be ’16’ as the 3 additional are repetition codes. If the Huffman coding results in a bit length exceeding 15 I will simply have to decrease the size of the block we are encoding until we are good to go.

So now that we have the ability to reasonably compress our data using LZ77 and then to compress it further with Huffman coding, we are ready to generate the DEFLATE formatted payload. It is time to tackle the “craziness” and “shorthand” that I have referred to. We can apply the run-length encoding and the second iteration of Huffman coding that are required. We are ready to generate the DEFLATE format compressed bit stream.

JANOS has been able to process JAR/ZIP files since early in its development. This was required to meet our goal of executing Java directly from the JAR files generated by the compiler. So we have been able to decipher the DEFLATE format. I just hadn’t needed to generate it. But the advantage to this is that there is already proven code parsing the DEFLATE structure. Referring to that helps to remove any question when trying to figure out how to generate such a structure.

Rather than drop this topic now that I have the LZ77 and Huffman procedures that I need, I’ll take a moment to review the final steps. Let me see if I can clarify some of it here. For our Java breadboard to produce something usable I would not only have to complete the DEFLATE programming but also encapsulate the result in the JAR/ZIP library format. That’s more effort than I need given that I will be doing just that at the C level within the OS and I need to get to that soon.

DEFLATE Formatted Data

When file data is compressed using DEFLATE and included in a JAR/ZIP library it is represented as a bit stream. That sounds simple enough but because our micro-controller works with bytes and retrieves bytes from a file data stream we need to be concerned with bit order and byte order.

Normally bits in a byte are ordered from right to left with the least significant bit (the first in the bit stream) on the right. Like this:

+--------+
|76543210|
+--------+

The 9th bit in the stream then comes from the next byte and bytes are in sequence in increasing memory addresses (or increasing file position). This order of bits seems only natural as it should.

So if we were to retrieve a 5-bit value from the stream we would obtain the right 5 bits from the first byte using the mask 0x1f. Placing that in a byte of its own would give us the numeric value of the 5-bit data element. The next 5-bit element would use the remaining 3 bits in the first byte and 2 from the right side of the next. We would likely be using a right shift before applying the mask to pull those together.

Huffman codes will seem to contradict this. These codes are packed starting with the most-significant bit of the code. In other words the most significant bit of the first Huffman code would be found in the rightmost bit of the first byte. Once you realize that you must process Huffman codes a bit at a time and that you are reading single bit data elements this order makes sense. Pointing out that the Huffman code appears in the byte in reverse order serves to confuse us. But you are reading it a single bit at a time using each bit to decide which direction to descend through the Huffman tree. That means that you need the code’s most-significant bit first. We also never know in advance how many bits we are going to need to reach the first leaf of the tree and thus our first encoded member of the alphabet.

Block Format

The DEFLATE bit stream is broken down into a series of 1 or more blocks of bits. In our case when we flush our 64KB LZ77 data buffer we are going to construct a single block. Since we are compressing it you would expect that it will contain much less than 512Kb (64KB x 8 bits). For large file we will likely need to flush our buffer multiple times creating a stream with multiple blocks.

Each block contains a 3-bit header. The first bit is called BFINAL and it is a 1 only for the last block in the stream. The next 2 bits are called BTYPE and these define the data encoding for this block. That means that we could use a different encoding for each block if we felt it to be beneficial. The 2 BTYPE bits gives us 4 options.

  00 - no compression
  01 - compressed with fixed Huffman codes
  10 - compressed with dynamic Huffman codes
  11 - reserved (error)

We have been working toward being able to generate blocks of BTYPE 10. We could have used the predefined Huffman tables in BTYPE 01 or not even bothered to compress using BTYPE 00. There may be times when our compression fails to reduce the size of our file data. IN that case we could decide to include a block without compressing. But for now we will concern ourselves with BTYPE 10 that includes dynamic (not adaptive) Huffman tables (tables that change block to block).

So to start this defines our block so far. I’ll show the stream progressing from left to right with the number of bits for each element shown in parentheses.

 Bit 0         1         2         3         4         5         6         7
+---------+---------+---------+---------+---------+---------+---------+---------
| BFINAL  |     BTYPE (2)     |        Balance of Block . . .
+---------+---------+---------+---------+---------+---------+---------+---------

BFINAL set on the last block in the stream.

Now logically we know that somehow we need to get the Huffman table before we see compressed data so we can decompress that data. We also have seen that the Huffman table can be defined knowing the bit lengths for each of the symbols in our alphabet. In fact we expect that, since I made a big deal about having the right kind of Huffman table for DEFLATE that can be generated from just the bit lengths. So we are looking for that array of bit lengths. Here’s where the fun begins.

To start we know that not all of the 286 members of the alphabet will be used in the data. Some entries in that table will be assigned a 0 bit length. We have to assume that the majority of the literal bytes (0..255) will be represented in the data. We also know that the end-of-block code (257) will appear once. So we need an array of bit lengths at least 257 entries long. Beyond that we don’t know how many of the sequence length codes (257..285) will be used. But if some of the trailing codes aren’t used then the array doesn’t need to be 286 entries long. We just need the non-zero bit lengths. So this array will be 257 plus however many length codes we need to cover all of the non-zero ones.

The next element in the bit stream is HLIT. This is a 5-bit element when added to 257 defines the count of entries in the bit length array that will be provided. We will assume that the reset are 0. Since there are 29 length codes beyond the end-of-block code we need only know how many of those to know the size of the array. That can be passed in a 5-bit element.

       Bit 0         1         2         3         4         5         6         7
      +---------+---------+---------+---------+---------+---------+---------+---------+-----
      | BFINAL  |     BTYPE (2)     |                     HLIT (5)                    |
      +---------+---------+---------+---------+---------+---------+---------+---------+-----

HLIT + 257 tells us how many of the 286 bit lengths will be provided. But, don’t expect that those will be forthcoming. At least not right away.

Next comes something that the breadboard program handled incorrectly. The distance codes for the length-distance pointers are Huffman coded using their own table. The length codes are Huffman coded using the same table as the literal data. This makes sense since when you retrieve the next code you don’t know if it is a literal or a length code. You do know that after the length code (and any extra bits) comes the distance code. So compressing those with their own table is a benefit. The DEFLATE specification (RFC 1951) states this but it isn’t all that obvious.

So wait! Now we need 2 Huffman tables and therefore 2 arrays of bit lengths. Yes we do.

Next we receive a 5-bit data element containing HDIST – 1. This defines the number of bit lengths to be supplied for the distance alphabet and the second Huffman table. The specification shows that there are 30 distance codes (0..29) but refers to 32 distance codes. It states that codes 30 and 31 will never occur in the data. These are perhaps reserved to allow for a larger sliding window in the future. There is also the need for a 0 distance code which would be used as a flag to indicate that no distance codes are used at all and that the data is all literals. So to pass a bit length array size of up to 33 the value is stored -1.

       Bit 0         1         2         3         4         5         6         7
      +---------+---------+---------+---------+---------+---------+---------+---------+-----
      | BFINAL  |     BTYPE (2)     |                     HLIT (5)                    |
      +---------+---------+---------+---------+---------+---------+---------+---------+-----
           8         9        10        11        12        13        14        15
 -----+---------+---------+---------+---------+---------+---------+---------+---------+-----
      |                    HDIST (5)                    |
 -----+---------+---------+---------+---------+---------+---------+---------+---------+-----

Now we know the length of two arrays defining bit lengths for two alphabets. I had alluded to the fact that we would again use Huffman coding to compress the bit length array data. That is the case and so we need yet another Huffman table and therefore a third array of bit lengths. Will it never end??

The bit length alphabet includes 3 codes for repetition for total of 19 codes. The size of the bit length array for this is conveyed in a 4-bit data element HCLEN – 4. Note though that this array defines bit lengths for the codes in a very specific order. This was devised to keep those codes that typically have 0 bit length at the end of the array so they can be omitted. The order of the codes is as follows:

16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15

That means that we will have to arrange the bit lengths in this order before deciding how many to include in our stream. When reading these we would shuffle them into the proper location.

Our bit stream now looks like this.

       Bit 0         1         2         3         4         5         6         7
      +---------+---------+---------+---------+---------+---------+---------+---------+-----
      | BFINAL  |     BTYPE (2)     |                     HLIT (5)                    |
      +---------+---------+---------+---------+---------+---------+---------+---------+-----
           8         9        10        11        12
 -----+---------+---------+---------+---------+---------+-----
      |                    HDIST (5)                    |
 -----+---------+---------+---------+---------+---------+-----
          13        14        15        16
 -----+---------+---------+---------+---------+-----
      |               HCLEN (4)               |
 -----+---------+---------+---------+---------+-----

Well there is a lot of information so far conveyed in just 2 bytes. Next we will finally receive some bit length data. Following this we are supplied HCLEN + 4 data elements each 3-bits (0..15) defining the code lengths for the Huffman table that we will use to generate bit length arrays for the other tables. Note that these are in the predefined order and must be sorted into the bit lengths for the 19 symbol alphabet. Now there are a variable number of these data elements and so I will no longer be able to number the bits nor can I show them all. HCLEN + 4 data elements however many is required follow in the stream.

          13        14        15        16        17        18        19   . . . 
 -----+---------+---------+---------+---------+---------+---------+---------+-----
      |               HCLEN (4)               |          CODE16 (3)         |
 -----+---------+---------+---------+---------+---------+---------+---------+-----

 -----+---------+---------+---------+---------+---------+---------+-----
      |          CODE17 (3)         |          CODE18 (3)         |
 -----+---------+---------+---------+---------+---------+---------+-----

 -----+---------+---------+---------+---------+-----
      |          CODE0 (3)          |       . . . 
 -----+---------+---------+---------+---------+-----

We now have a bit length array for our 19 symbol alphabet. At least we are familiar with this from the breadboard program. We can use the procedure to generate the Huffman code table. First we count the number of times each bit length is used. Then we calculate a starting code for each length. And then we assign sequential codes to the alphabet for each bit length group. So we can decipher Huffman codes. It’s a good thing too because now what follows in the bit stream are Huffman codes from this table.

Keeping up? We are talking about what is involved in decompressing a DEFLATE stream but really we are doing this so we know what to do when we compress our own data. So now is the time to consider dropping some bread crumbs because if you are going to create a DEFLATE stream you will need to find your way back through this.

Supposedly at this point we can read Huffman codes, locate the symbol they represent and retrieve the bit length array data we were originally looking for. Well almost. While some of the values we obtain will be bit lengths for the next entry in a bit length array there are 3 special codes each requiring a different action. The specification describe them as follows.

               0 - 15: Represent code lengths of 0 - 15
                   16: Copy the previous code length 3 - 6 times.
                       The next 2 bits indicate repeat length
                             (0 = 3, ... , 3 = 6)
                          Example:  Codes 8, 16 (+2 bits 11),
                                    16 (+2 bits 10) will expand to
                                    12 code lengths of 8 (1 + 6 + 5)
                   17: Repeat a code length of 0 for 3 - 10 times.
                       (3 bits of length)
                   18: Repeat a code length of 0 for 11 - 138 times
                       (7 bits of length)

Here we see that when we encounter one of the codes 16, 17 or 18 we are required to pull 2, 3 or 7 additional bits respectively from the bit stream which define a repeat count.

First we will receive Huffman codes to define HLIT + 257 bit lengths for the literal Huffman table. Then we will receive codes defining HDIST + 1 bit lengths for the distance Huffman table. But don’t think the fun ends here.

HLIT and HDIST do not define the count of Huffman codes that follow. If you obtain a code that repeats a value that counts for that many bit lengths. That perhaps makes sense. But to just make things a little trickier, once you acquire the HLIT + 257 codes you immediately start defining the HDIST + 1 codes even if you are performing repetition. Yeah, a single repeat code can take you from one table to the next. If you are repeating some 0 bit lengths trailing in the HLIT bit length array you would just keep going to define any 0 bit lengths in the first part of the HDIST array. The specification says “code lengths form a single sequence of HLIT + HDIST + 258 values.”

When you are generating these Huffman codes of course you don’t have to force it to be a single sequence. You might just be wasting a few bits. Today that’s not a big deal but it certainly must have been 40 years ago.

So start pulling Huffman codes. Remember you process these 1 bit at a time so you are starting with the most-significant bit of some code. With each bit you are either descending through a tree looking for a leaf and the corresponding symbol or otherwise collecting the code looking for a match in a code list. The former is faster but the latter easier to structure in memory (no tree). You proceed to process each symbol to define a bit length or repetition code.

Now you have 2 tables of bit lengths and you know how to generate the Huffman codes for the associated alphabets. What follows next in our bit stream are the actual Huffman codes for the compressed data block. Each Huffman code will either define a literal value (byte) or a length code. The byte you just push into your uncompressed stream and the sliding window. In the case of a length code you would retrieve the additional bits defining the length of a matched sequence. You would then use the distance Huffman table for the next code that together with extra bits defines a distance back into the sliding window. Push the referenced string into your uncompressed stream and the sliding window. This is repeated until you encounter the end-of-block code (256). If BFINAL was set for the block you can save your now uncompressed data and you are done. Otherwise another block will follow.

Now we follow this logic backwards to figure out for our own compression effort what we need to do to generate the proper DEFLATE format for our data.

Okay, feel free to post questions, comments, corrections or whatever. I would be curious to know if I have helped anyone. I have written these posts as a form of review and preparation for myself. I am now ready to generate some C code in JANOS to make perhaps broad use of DEFLATE.

  • This will allow the existing JAR/ZIP JANOS command to create or modify compressed file collections. That will be helpful in applications that generate massive amounts of log data.
  • It will let JANOS create PNG graphics using drawing and plotting commands. This will allow us to easily display the data acquired by monitoring applications.
  • The WebServer can utilize DEFLATE to more efficiently transfer content. It can really help here as we already serve files (the DCP for example) directly out of a ZIP library. Whereas we presently decompress those files and send uncompressed content, the WebServer could forward the DEFLATE formatted data directly providing a bandwidth benefit.

The JANOS WebServer uniquely can locate and serve content directly from ZIP file collections. Generally files in a ZIP collection are compressed in DEFLATE format already. The WebServer can detect a browser’s ability to accept content in DEFLATE format directly and transfer the compressed content directly. Why spend the time to decompress before transfer?

Since this does not involve a DEFLATE compressor it was quick to implement. Starting with JANOS v1.6.4 (now in Beta) the WebServer will utilize DEFLATE content encoding when transferring files already compressed in that format provided that the browser can accept it. It works nicely.

I’ve been busy extending the JAR/ZIP command in JANOS to allow you to create, update and freshen an archive file. The first step was to just store files uncompressed. Getting all of that command line logic straight is enough of a headache. Once that was behind me I was ready to implement DEFLATE.

One difference in the JANOS implementation from the approach taken earlier in this thread is that I am going to work with buffers as opposed to a byte stream. Since the JNIOR uses a 64MB heap and generally consumes only a few MB of it I can load an entire file’s content in a memory buffer. Yeah, files on the JNIOR aren’t very large. This eliminates the queue approach to the sliding window. That helps with matching as it eliminates any need to test pointers for wrap around.

Where I used a bidirectional linked list before tracking occurrences of each byte value in the sliding window, I have gravitated to a series of queues tracking the last 256 matching bytes (or fewer if that be the case) in the 32KB preceding window. There is also no need now to keep any array of active matches since we are not streaming. So a pass through the appropriate queue for the current byte generally delivers a usable match and limits the search times in interest of speed. I get results within 2% or so of what the PC generates for the same file. This is certainly acceptable for JANOS.

I use the code length generator algorithm that I devised previously so as to not have to generate any Huffman tree physically. This for certain test files tended to hit the DEFLATE maximum code length limits. So I will describe a variation that seems to avoid that problem.

I will go through the implementation details here over the next day or so and maybe share some of the code. Remember that I am writing in C so…

DEFLATE Compressor Implementation (C Language)

The goal is to compress using DEFLATE a byte buffer filled with the entire contents of a file or other data. I want a simple function call like this:

FZIP_deflate(fbuff, filesize, &cbuff, &csize)

This function needs to return two things, a buffer containing the DEFLATE formatted bit stream and the compressed length. There are other ways to return these parameters but for JANOS which has to remain thread-safe (cannot use static variables) this works. This routine returns TRUE if we can successfully compress the data. It might return FALSE if in compressing the results we decide it isn’t going to be worth it. This function is used as follows in the JAR/ZIP command in JANOS.

				// optionally compress
				compressed = FALSE;
				compsize = filesize;
				if (filesize >= 128)
				{
					char *cbuff = NULL;
					uint32_t csize;
 
					// compress the content
					if (FZIP_deflate(fbuff, filesize, &cbuff, &csize))
					{
						compressed = TRUE;
						compsize = csize;
						MM_free(fbuff);
						fbuff = cbuff;
					}
					else
						MM_free(cbuff);
				}

Here we see that successful compression replaces the uncompressed buffer and modifies the compressed data size. It sets the compressedflag so the file can be properly saved in the JAR or ZIP archive. Like magic we have a compressed file!

Preface

Why am I doing this? I mean there is code out there and people have been able to construct archives and compress files literally for multiple decades. Why reinvent the wheel?

Well there are multiple reasons. First is a design goal for JANOS. This operating system uses no third-party written code. That sounds crazy but what it means is that there is no bug or performance that we cannot correct, change, improve or alter. And, this can be done quickly and in a timely fashion. Every bit of code is understood, clearly written and documented. It is written for a single target and not littered with conditional compilation blocks which obfuscate everything. If you support an operating system you might see how you could be envious of this.

Another reason is educational. Now that maybe is selfish but if I am going to be able to fully debug something I need to fully understand it. We cannot tolerate making what seems like a simple bug correction which later turns out to break some other part of the system. The only way to guarantee that this risk is minimized is for me to know everything that is going on and exactly what is going on. Yeah there is a JVM in here. Yeah it does TLS v1.2 encryption. It’s been fun.

The real problem though is that it is difficult to find good and complete technical information on the net. Yes there is RFC 1951 defining DEFLATE. It does not tell how to do it just tells you what you need to do. And, some aspects of it are not clear until you encounter it (or recreate it) in action. It describes LZ77 but you don’t realize that this is very difficult to implement and not have it take 5 minutes to compress 100 KB.

There are numerous web pages discussing DEFLATE and some by reputable universities. These usually include a good discussion on Huffman coding. Yet I have found none the creates a Huffman table that actually meets the additional requirements for DEFLATE. If you are going to describe Huffman in connection with DEFLATE, shouldn’t it be compatible? Would it have helped if you actually had implemented DEFLATE before describing it?

The procedure to create a compatible Huffman tree is not described anywhere that I have found, Most don’t even mention that you need 3 separate Huffman tables (one for literals, one for distance codes and one for code lengths) and that there is a limit of a 15 bit code length for 2 of the tables and 7 bit for the third. Then they say only to “adjust the Huffman table” accordingly. So there is no procedure for generating a less than optimum Huffman tree meeting DEFLATE restrictions. I had to get creative myself.

Enough of my rant. The result really is that I have had to reinvent the wheel. I am not the only one to have done so. I am going to try to document it here for your edification.

Overview

First let me greatly over-simplify the process and provide an outline for the compression procedure.

  1. Perform efficient LZ77 scanning the uncompressed data byte-by-byte filling a 64 KB interim buffer with raw unmatched literal bytes and escaped length-distance references.
  2. Scan the interim LZ77 data creating two DEFLATE compatible Huffman tables, one for literals and length codes combined and one for distance codes.
  3. Assign code lengths (15 bits max) to the used alphabet for both tables.
  4. Determine the size of the alphabet required for the length and distance Huffman tables.
  5. Combine code lengths into a single run-length compressed array.
  6. Determine the DELFATE compatible Huffman table needed to code this compressed code length array.
  7. Assign code lengths (7 bits max) to the used alphabet for code lengths.
  8. Sort the resulting code lengths for this 3rd Huffman table into the unique order specified for DEFLATE and determine the length of the array (trim trailing zeroes),
  9. Output the block header marking only the last as BFINAL and output the alphabet sizes.
  10. Output the reordered code lengths.
  11. Output the Huffman codes compressing the run-length encoded combined code length array. Insert extra bits where required.
  12. Output the Huffman codes compressing the LZ77 data. Use the literal table for literals and sequence lengths. Use the distance table for distance codes. Insert extra bits where required.
  13. Output end-of-block code.
  14. If not the final block keep the bit stream going and continue LZ77 at step #1.

Uh. That about summarizes it. All we can do is to push through this step by step. It amounts the two phases. The first compresses the data using LZ77 and determines the Huffman coding requirements. The second outputs a bit stream encoding the Huffman tables and then the actual data. We first determine everything that we need for the bit stream and then generate the bit stream itself.

Time-Efficient LZ77 Compression

First off let me note that we end up trading off compression ratio for processing speed. As we had experimented earlier in Java we could discover every possible matching sequence for a block of data and then analyze those matches selecting the optimum set. This arguably would create the best compression possible. This is certainly doable if we are running on a multiple core GHz processor and it is coded carefully. Still it would be lengthy and possibly not appropriate even then for some applications. The gain in compression ratio is expected to be only slight and not worth the processing cycles. It is certainly not critical for JANOS. So we will not go down this path.

Another approach is what is called lazy matching. Here we are concerned that a matched sequence might prevent a longer matching sequence starting in the next byte or two from being used. In the analysis it appears that 1 or 2 bytes may be unnecessarily forced into the output stream in these situations. Those may be rare but for certain types of data it could be more of a concern. Again the gain in compression ratio if we were to take the time to perform lazy matching is assumed not to be necessary for JNIOR.

As a result we are just going to go ahead and perform straight up sequence match detection for each byte position in the data. Even with this the amount of processing involved prevents us from using any kind of brute force scanning. Imagine how much processing would be involved if for each byte in the file we have to compare it directly against 32 KB of bytes in the preceding sliding window. For a large file this is starts to become a very large number of processor cycles. It gets even worse when bytes match and you need to check following bytes to determine the usability of the sequence.

I had implemented the brute force search with no trickery at first. Small files produced proper LZ77 output in a short time but a 20 KB JSON file took almost 30 seconds (JNIOR remember). A large binary file basically stalled the system. The JSON performed more poorly due to the high occurrence of certain characters such as curly braces, quotes and colons. That code has long been discarded or I would include it here for better understanding.

In the prior Java experimentation I used a linked list to track the occurrences of matching byte values. This eliminated the need to scan the entire 32 KB sliding window saving time. For the JANOS implementation I decided to use a pointer queue for each byte value. Each queue holding a maximum of 256 pointers. So basically we would test only the last 256 matching byte positions for usable sequences. This might miss some good sequence matches deep in the sliding window for very frequently occurring byte values but not for those appearing less often. Again it’s a trade off. It is an interesting approach.

I had figured that I could adjust the depth of these pointer queues. Since there are 256 possible literal values and each with 256 pointers which require 4 bytes this results in a matrix of 65,536 pointers or 256 KB of memory. There is room for that in the JANOS heap. Increasing the depth increases the memory requirement as well as the time spent in sequence detection. I was pleased with the results at a depth of 256. Perhaps later I will conduct some experiments plotting the effects of this parameter on compression ration and execution time.

I will present the resulting routine in the next post.

Here is the resulting routine. This handles only the LZ77 leaving all of the Huffman to the routine responsible for flushing the interim buffer. Note that I have made both the SLIDING_WINDOW and DEPTH parameters adjustable through the Registry. I will use this later to measure performance.

/* -- function ---------------------------------------------------------------
** FZIP_deflate()
**
** Compresses the supplied buffer.
**
** -------------------------------------------------------------------------*/
int FZIP_deflate(char *inb, uint32_t insize, char **outbuf, uint32_t *outsize)
{
	char *obuf;
	int optr;
	int err = FALSE;
	int curptr, len, seqlen;
	char *seqptr, *p1, *p2, *s1, *s2;
	struct bitstream_t stream;
	int *matrix, *mat, *track;
	int ch, trk;
	int window, depth;
 
	// obtain sliding window size
	window = REG_getRegistryInteger("Zip/Window", 16384);
	if (window < 2048) window = 2048; else if (window > 32768)
		window = 32768;
 
	// obtain tracking queue depth
	depth = REG_getRegistryInteger("Zip/Depth", 256);
	if (depth < 16) depth = 16; else if (depth > 1024)
		depth = 1024;
 
	// check call
	if (outbuf == NULL || outsize == NULL)
		return (FALSE);
 
	// initialize bit stream
	memset(&stream, 0, sizeof(struct bitstream_t));
	stream.buffer = MM_alloc(insize + 1024, &_bufflush);
 
	// create an output buffer
	obuf = MM_alloc(64 * 1024, &FZIP_deflate);
	optr = 0;
 
	// initialize matrix
	matrix = MM_alloc(256 * depth * sizeof(int), &FZIP_deflate);
	track = MM_alloc(256 * sizeof(int), &FZIP_deflate);
 
	// process uncompressed stream byte-by-byte
	curptr = 0;
	while (curptr < insize)
	{
		// get current byte value
		ch = inb[curptr];
 
		// Locate best match. This is the longest match located the closest to the curPtr. This
		//  is intended to be fast at the slight cost in compression ratio. We do not handle lazy
		//  matches or block optimization (selective matching). Only seqlen of 3 or more matter so
		//  we initialize seqlen to 2 to limit unnecessary best match updates. Try to limit cycles in
		//  this loop.
		mat = &matrix[depth * ch];
		trk = track[ch] - 1;
		if (trk < 0) trk = depth - 1; seqlen = 2; p2 = &inb[curptr]; while (trk != track[ch]) { if (mat[trk] == 0 || curptr - mat[trk] >= window)
				break;
 
			s1 = p1 = &inb[mat[trk] - 1];
			s2 = p2;
			while (*s1 == *s2)
				s1++, s2++;
 
			// check for improved match
			len = s1 - p1;
			if (len > seqlen)
			{
				seqptr = p1;
				seqlen = len;
			}
 
			trk--;
			if (trk < 0) trk = depth - 1; } // track the character mat[track[ch]] = curptr + 1; track[ch]++; if (track[ch] >= depth)
			track[ch] = 0;
 
		// check validity (match past end of buffer)
		if (curptr + seqlen > insize)
			seqlen = insize - curptr;
 
		// If we have a good sequence we output a pointer and advance curPtr
		if (seqlen >= 3)
		{
			// check maximum allowable sequence which is 258 bytes but we reserve one
			//  for 0xff escaping
			if (seqlen > 257)
				seqlen = 257;
 
			// escape length-distance pointer
			obuf[optr++] = 0xff;
			obuf[optr++] = seqlen - 3;
			*(short *)&obuf[optr] = p2 - seqptr;
			optr += 2;
 
			// advance curPtr
			curptr += seqlen;
			PROC_yield();
		}
 
		// otherwise we output the raw uncompressed byte and keep searching
		else
		{
			// escape 0xff
			if (*p2 == 0xff)
				obuf[optr++] = 0xff;
			obuf[optr++] = *p2;
			curptr++;
		}
 
		// flush output buffer as needed. Because we escape 0xff our length-pointer encoding
		//  requires 4 bytes. This at times replaces a 3-byte match and so the compression
		//  ration in this buffer is compromised. That will be corrected as we move into
		//  Huffman coding. Blocks will then be slightly less than 64KB. We also want to
		//  flush this before over-running it.
		if (optr > 65530)
		{
			if (!_bufflush(obuf, optr, &stream, FALSE))
				err = TRUE;
 
			// if compression seems fruitless
			if (stream.length > curptr)
				err = TRUE;
 
			optr = 0;
		}
 
		if (err)
			break;
	}
 
	// Flush remaining data
	if (!err && !_bufflush(obuf, optr, &stream, TRUE))
		err = TRUE;
 
	// if compression was fruitless
	if (stream.length > insize)
		err = TRUE;
 
	// clean up
	MM_free(track);
	MM_free(matrix);
 
	// return
	*outbuf = stream.buffer;
	*outsize = stream.length;
	return (!err);
}

So the preliminaries are over by line 50 in this code. You might notice that when JANOS allocates memory it retains a reference pointer. That is used to locate memory leaks among other things. I can see quickly where any block is allocated.

The main loop which processes byte-by-byte through the uncompressed data is done by line 150. After that we merely flush any partially compressed block and be done.

From lines 55 to 95 we search for the best match. From 95 to about 130 we either output the sequence length-distance reference or the raw unmatched data byte. After that we check to see if we need to flush the interim buffer.

The magic occurs in the search where we employ the pointer matrix. The trk array keeps a list of current positions in the each byte value queue. This is where we add the pointer to the current byte once we’ve searched it. The search runs backward through the queue either until we’ve check all of the pointers (wraps back to the starting position) or we run out of the sliding window (pointer reference too far back in the data). This checks for closer matches first. When a longer sequence of characters matches the current position we update our best match.

After that if we have a sequence of 3 or more bytes we output the length-distance reference otherwise we output the current byte and move on to process the next.

I know that I can optimize this some more. I haven’t gone through that step of trying to minimize processor cycles. We do use the RX compiler optimization. But… this is my approach for LZ77. There is no hash table or confusing prefix-suffix tricks. It seems to run fast enough at least for our needs at this point. The LZ77 search is where all of the time is consumed.

Next we will look at what happens when the interim buffer is flushed.

The FZIP_deflate() routine involves 90+ percent of the processing time and handles only step #1 in the prior outline. The buffer flush routine handles all of the remaining items.

We will be creating three different Huffman tables. These amount to a structure for each alphabet symbol.

struct huff_t {
	uint16_t code;
	uint16_t len;
	uint16_t cnt;
	uint16_t link;
};

Here the cnt will represent the symbol’s frequency. It is the count of occurrences of the symbol’s value in the data. The len will eventually be the code length in bits assigned to the symbol. The code will hold the bit pattern to be used in coding. And the link will be used in the code length determination routine.

The _bufflush() routine is called with a buffer of LZ77 compressed data. This includes raw data bytes that could not be included in a sequence and length-distance references for matches. Since data can take on all byte values I use a form of escaping. The escape character is 0xFF and a 0xFF in the data is represented by repeating the escape (e.g. 0xff 0xff). Otherwise the next byte is the length of the sequence -3 and the following short value the distance. To make this work I disallow the sequence length of 258 since that would be confused with an escaped 0xFF. I don’t see this as having any significant impact. Later I could find a way around that but for now it works.

// Routine applies DEFLATE style Huffman coding to the buffer content.
static int _bufflush(char *obuf, int osize, struct bitstream_t *stream, int final)
{
	struct huff_t *littbl;
	struct huff_t *dsttbl;
	struct huff_t *cnttbl;
	int n, len, c, code, dst, lastc, ncnt;
	int *cntlist, *litcnts, *dstcnts, *bitcnts, *bitlens;
	int hlit, hdist, hclen;
	int *startcd;
	int totdist = 0;
	int err = FALSE;
 
	// Now we need to construct two Huffman trees (although I am going to avoid
	//  actual trees). One for the literal data and one for the distance codes.
	//  Note that extra bits are just extra bits inserted in the stream.
	littbl = MM_alloc(286 * sizeof(struct huff_t), &_bufflush);
	dsttbl = MM_alloc(30 * sizeof(struct huff_t), &_bufflush);
 
	// Not a loop. This allows the use of break.
	for (;;)
	{
 
		// Now we analyze the data to determine frequencies. Note that this is complicated
		//  just a bit because of the escaping that I have had to use. I will have to
		//  temporarily decode length and distance encoding. We'll have to do that again
		//  later when we stream the coding. We will also use one end-of-block code so we
		//  virtually count it first.
		littbl[256].cnt = 1;
		for (n = 0; n < osize; n++)
		{
			// tally literal if not escaped
			if (obuf[n] != 0xff)
				littbl[obuf[n]].cnt++;
			else
			{
				// check and tally escaped 0xff
				if (obuf[++n] == 0xff)
					littbl[0xff].cnt++;
				else
				{
					totdist++;
 
					// table defined above
					//static const int lcode_ofs[29] = {
					//	3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17, 19, 23, 27, 31,
					//	35, 43, 51, 59, 67, 83, 99, 115, 131, 163, 195, 227, 258
					//};
 
					// determine required length code for lengths (3..258). This code is
					//  coded in the literal table.
					len = (obuf[n++] & 0xff) + 3;
					for (c = 0; c < 29; c++) if (lcode_ofs[c] > len)
							break;
					code = 256 + c;
					littbl[code].cnt++;
 
					// table define above
					//static const int dcode_ofs[30] = {
					//	1, 2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193,
					//	257, 385, 513, 769, 1025, 1537, 2049, 3073, 4097, 6145,
					//	8193, 12289, 16385, 24577
					//};
 
					// determine required distance code for distances (1..32768). This code is
					//  coded in the distance table.
					dst = (obuf[n++] & 0xff) << 8;
					dst |= (obuf[n] & 0xff);
					for (c = 0; c < 30; c++) if (dcode_ofs[c] > dst)
							break;
					code = c - 1;
					dsttbl[code].cnt++;
				}
			}
		}

So here we create two of the Huffman tables, one for the literal alphabet (0..285) and one for the distance alphabet (0..29). Note that when JANOS allocates memory it is zero filled.

Don’t be confused by my use of for(;;) { }. This is not an infinite loop. In fact it is not a loop at all. Rather it allows me to exit the procedure at any point using break; and I just have to remember to place a break; at the very end. There are other ways to achieve the same thing.

The first step is to determine the frequency of symbols from both alphabets. Here we scan the supplied data and count the literals. The escaped length-distance references are translated temporarily into their length codes and distance codes. Those are tallied in the appropriate table. The extra bits are ignored. Length codes are combined with literal data since when they are read you don’t know which it will be. The distance codes use their own alphabet. We will have to do this same translation again later when we encode the references for output. Then, of course, we will insert the required extra bits.

Note that this also tallies one end-of-block code (0x100) as we will be using that.

If I were to dump these two tables after this they may look something like this. These are just the symbols that occur in a particular set of data – the jniorsys.log file. This is the symbol value followed by its count.

CODE: SELECT ALL

 0x00a 2
 0x00d 2
 0x020 70
 0x027 4
 0x028 11
 0x029 2
 0x02a 2
 0x02b 7
 0x02c 7
 0x02d 20
 0x02e 133
 0x02f 16
 0x030 105
 0x031 153
 0x032 124
 0x033 124
 0x034 103
 0x035 110
 0x036 97
 0x037 90
 0x038 93
 0x039 84
 0x03a 74
 0x03c 1
 0x03d 3
 0x03e 3
 0x041 5
 0x043 4
 0x044 1
 0x046 1
 0x048 1
 0x04a 3
 0x04c 2
 0x04d 4
 0x04e 4
 0x04f 4
 0x050 8
 0x052 8
 0x053 4
 0x054 4
 0x055 1
 0x057 3
 0x05a 1
 0x05f 3
 0x061 34
 0x062 4
 0x063 15
 0x064 26
 0x065 37
 0x066 9
 0x067 9
 0x068 8
 0x069 36
 0x06a 6
 0x06b 3
 0x06c 21
 0x06d 13
 0x06e 36
 0x06f 41
 0x070 24
 0x071 1
 0x072 29
 0x073 20
 0x074 29
 0x075 11
 0x076 6
 0x077 7
 0x078 3
 0x079 6
 0x07a 4
 0x100 1
 0x101 962
 0x102 457
 0x103 136
 0x104 127
 0x105 55
 0x106 71
 0x107 47
 0x108 14
 0x109 67
 0x10a 208
 0x10b 103
 0x10c 40
 0x10d 52
 0x10e 44
 0x10f 15
 0x110 42
 0x111 180
 0x112 287
 0x113 30
 0x114 5
 0x115 12
 0x116 6

 0x002 1
 0x003 2
 0x008 2
 0x009 2
 0x00a 13
 0x00b 150
 0x00c 166
 0x00d 40
 0x00e 127
 0x00f 63
 0x010 157
 0x011 112
 0x012 155
 0x013 116
 0x014 198
 0x015 169
 0x016 240
 0x017 197
 0x018 288
 0x019 215
 0x01a 321
 0x01b 226

We need to assign code lengths to these alphabet symbols. Here there are two tables and later we will process a third. So the procedure is handled by a separate rom.buffer;
*outsize = stream.length;
return (!err);
}

The _bitlength() routine assigns code lengths creating possibly a slightly less than optimal Huffman table that does not exceed a 15-bit maximum. The assignments must make the Huffman tables compatible with the DEFLATE requirements.

Huffman Tables for DEFLATE

The buffer full of data that we have and which has been compressed using LZ77 will be compressed further using Huffman coding. The DEFLATE format specifies two separate Huffman code sets. One to encode both the literal bytes in the data (0..255), the end-of-block code (256), and the sequence match length codes (257..285). The second Huffman code set will encode the distance codes (0..29). We can use a separate table for that because we know when we are reading a distance code as one always follows a length code. We never know whether we are reading a literal or a length code so those need to be decoded the same way and therefore from the same table.

Previously we scanned the data and counted the occurrences of each symbol. We now know which symbols occur in the data and how frequently. We have defined our alphabets for each Huffman table. Now we need to create the Huffman trees themselves. This is where things get tricky.

Creating a Huffman tree is not very difficult. But creating a Huffman tree that is compatible with the DEFLATE format is quite another thing altogether. The DEFLATE specification dictates that the Huffman trees must meet two additional rules. In fact they need to adhere to three rules. The third is mentioned later in the specification.

  1. All codes of a given bit length have lexicographically consecutive values, in the same order as the symbols they represent.
  2. Shorter codes lexicographically precede longer codes.
  3. The bit length cannot exceed 15 bits for the literal and distance code sets. It cannot exceed 7 bits for the code length set (comes into play much later).

The first trick is to not generate a tree at all. If you create a tree using the standard Huffman approach you are almost guaranteed to not have a tree that is usable for DEFLATE. All you need from that effort are the bit lengths that end up being assigned to each symbol. You can get those from the same procedure without dealing with right and left links and an actual tree structure. You then use the procedure defined in the DEFLATE specification to create the compatible tree.

The standard Huffman approach is to take a list of all the symbols that occur in the data and sort it in descending frequency. Combine the rightmost two least frequent symbols into a node whose frequency is the total of the two symbols. Next resort the list so this new node repositions itself according to its combined frequency. Now repeat the process combining the next two rightmost entries which may be symbols (leaves) or previously create nodes. This continues until you have just one node which is the head of your tree.

We are not going to bother to build the tree structure. We are only going to keep a list of the symbols that fall beneath a node. We are also going to realize that the combination of the two rightmost entries in the sorted list merely increases by one the bit length of each of the new node’s member symbols. When we finally reach the point where there is only one node in the list we would have assigned a bit length to every symbol based on its frequency. That is all we need to then create the codes for Huffman coding in DEFLATE format using the procedure for that outlined in the specification.

Here is where things really get confusing. This process doesn’t always create a DEFLATE compatible Huffman code set. Sometimes the bit length will exceed 15 (or 7 for the table later). We need a procedure for dealing with that. It amounts to being able to create a less than optimal Huffman tree with bit lengths limited to a maximum. This was a puzzle but I have a way to get it done.

So next I’ll take us through examples.

Let’s use an example. Here we have an alphabet of 19 symbols (0..18). The data set consists of 150 of these and after counting the occurrences of each we have the following. To make discussion simpler I will assign these symbols uppercase names. On the right are the results of the tally for each.

 
A  0x000 4
B  0x001 0
C  0x002 0
D  0x003 2
E  0x004 6
F  0x005 3
G  0x006 2
H  0x007 2
J  0x008 53
K  0x009 26
L  0x00a 5
M  0x00b 4
N  0x00c 3
P  0x00d 1
Q  0x00e 1
R  0x00f 1
S  0x010 37
T  0x011 0
U  0x012 0

In the standard approach to creating a Huffman table we ignore the symbols that do not appear in the data and arrange the other in order of decreasing frequency.

Used symbols:
  A    D    E    F    G    H    J    K    L    M    N    P    Q    R    S 
  4    2    6    3    2    2   53   26    5    4    3    1    1    1   37

Sorted by decreasing frequency:
  J    S    K    E    L    A    M    F    N    D    G    H    P    Q    R
 53   37   26    6    5    4    4    3    3    2    2    2    1    1    1

Next we combine the lowest two frequency symbols into a new node with a combined total. I will name the nodes with lowercase characters just so you can track them. The shorter list is then resorted before proceeding to repeat. The process continues until this is only one node. Here I will show only the combining action for each step. We will get into more detail afterwards.

Starting set:
  J    S    K    E    L    A    M    F    N    D    G    H    P    Q    R
 53   37   26    6    5    4    4    3    3    2    2    2    1    1    1

Step #1 combines Q and R into (a) with new frequency of 2. The list is resorted.
  J    S    K    E    L    A    M    F    N    D    G    H   (a)   P
 53   37   26    6    5    4    4    3    3    2    2    2    2    1

Step #2 combines (a) and P into (b) with new frequency of 3.
  J    S    K    E    L    A    M    F    N   (b)   D    G    H
 53   37   26    6    5    4    4    3    3    3    2    2    2

Step #3 combines G and H into (c) with new frequency of 4.
  J    S    K    E    L    A    M   (c)   F    N   (b)   D
 53   37   26    6    5    4    4    4    3    3    3    2

Step #4 combines (b) and D into (d) with new frequency of 5.
  J    S    K    E    L   (d)   A    M   (c)   F    N
 53   37   26    6    5    5    4    4    4    3    3

Step #5 combines F and N into (e) with new frequency of 6.
  J    S    K    E   (e)   L   (d)   A    M   (c)
 53   37   26    6    6    5    5    4    4    4

Step #6 combines M and (c) into (f) with new frequency of 8.
  J    S    K   (f)   E   (e)   L   (d)   A
 53   37   26    8    6    6    5    5    4

Step #7 combines (d) and A into (g) with new frequency of 9.
  J    S    K   (g)  (f)   E   (e)   L
 53   37   26    9    8    6    6    5

Step #8 combines (e) and L into (h) with new frequency of 11.
  J    S    K   (h)  (g)  (f)   E
 53   37   26   11    9    8    6

Step #9 combines (f) and E into (i) with new frequency of 14.
  J    S    K   (i)  (h)  (g)
 53   37   26   14   11    9

Step #10 combines (h) and (g) into (j) with new frequency of 20.
  J    S    K   (j)  (i)
 53   37   26   20   14

Step #11 combines (j) and (i) into (k) with new frequency of 34.
  J    S   (k)   K
 53   37   34   26

Step #12 combines (k) and K into (m) with new frequency of 60.
 (m)   J    S
 60   53   37

Step #13 combines J and S into (n) with new frequency of 90.
 (n)  (m)
 90   60

Step #14 combines (n) and (m) into our final node (p) with new frequency of 150.
 (p)
150

We are done.

Did you notice how nodes that we created were often quickly reused in another combination? This is what leads to a tree structure exceeding the maximum code length. Imagine a tree with one long branch down its right edge. It is very common when there are a few symbols that appear with high frequency and the balance are relatively low frequency symbols.

This combining nodes exercise was all well and good but something else needs to occur to make it useful. In the typical Huffman case in creating a node you would assign one combining nodes to the left link (bit = 0) and the other to the right link (bit = 1). This would then develop the tree structure that would work for you although most likely not to be DEFLATE compatible.

I am going to avoid the tree but note that in the act of combination all of the symbols participating will have their code length increased by 1. Combining leaves into a node creates another level in the tree. That’s what we are doing. In the code I will use a linked list to simply collect the symbols that are a member of (or lie below) any given node. In combining I will concatenate the member lists for the two leaves/nodes being combined and then increment the code length for each member.

For each step I going to show the node membership and the code length for our alphabet as we proceed through the process. Hopefully this table will make sense to you.

             Step   1    2    3    4    5    6    7    8    9    10   11   12   13   14  clen
A  0x000 4     0    0    0    0    0    0    0    1g   1g   1g   2j   3k   4m   4m   5p   5
D  0x003 2     0    0    0    0    1d   1d   1d   2g   2g   2g   3j   4k   5m   5m   6p   6
E  0x004 6     0    0    0    0    0    0    0    0    0    1i   1i   2k   3m   3m   4p   4
F  0x005 3     0    0    0    0    0    1e   1e   1e   2h   2h   3j   4k   5m   5m   6p   6
G  0x006 2     0    0    0    1c   1c   1c   2f   2f   2f   3i   3i   4k   5m   5m   6p   6
H  0x007 2     0    0    0    1c   1c   1c   2f   2f   2f   3i   3i   4k   5m   5m   6p   6
J  0x008 53    0    0    0    0    0    0    0    0    0    0    0    0    0    1n   2p   2
K  0x009 26    0    0    0    0    0    0    0    0    0    0    0    0    1m   1m   2p   2
L  0x00a 5     0    0    0    0    0    0    0    0    1h   1h   2j   3k   4m   4m   5p   5
M  0x00b 4     0    0    0    0    0    0    1f   1f   1f   2i   2i   3k   4m   4m   5p   5
N  0x00c 3     0    0    0    0    0    1e   1e   1e   2h   2h   3j   4k   5m   5m   6p   6
P  0x00d 1     0    0    1b   1b   2d   2d   2d   3g   3g   3g   4j   5k   6m   6m   7p   7
Q  0x00e 1     0    1a   2b   2b   3d   3d   3d   4g   4g   4g   5j   6k   7m   7m   8p   8
R  0x00f 1     0    1a   2b   2b   3d   3d   3d   4g   4g   4g   5j   6k   7m   7m   8p   8
S  0x010 37    0    0    0    0    0    0    0    0    0    0    0    0    0    1n   2p   2

In this table we follow the code length (clen) associated with each occurring symbol through the step by step combination process. Here as a symbol is combined into a new node (letter changes) we increment its bit depth or code length. The final column shows the resulting clen for this table.

So we mechanically have shown how to derive the code lengths for an alphabet symbol set with given frequencies. Next we create the DEFLATE Huffman code table for this.

My code to perform the node creation and code length incrementing looks something like this.

// Establish bit length for the Huffman table based upon frequencies
static int _bitlength(struct huff_t *tbl, int ncodes, int maxb)
{
	uint16_t *list = MM_alloc(ncodes * sizeof(uint16_t), &_bitlength);
	int *freq = MM_alloc(ncodes * sizeof(int), &_bitlength);
	int nlist = 0;
	int n, c, p;
	int ret = TRUE;
 
	// List all of the symbols used in the data along with their frequencies. Note that
	//  we store pointers +1 so as to keep 0 as a linked list terminator.
	for (n = 0; n < ncodes; n++) if (tbl[n].cnt > 0)
		{
			list[nlist] = n + 1;
			freq[nlist] = tbl[n].cnt;
			nlist++;
		}
 
	// Note that there is a special boundary case when only 1 code is used. In this case
	//  the single code is encoded using 1 bit and not 0.
	if (nlist == 1)
		tbl[list[0] - 1].len = 1;
 
	// process this list down to a single node
	while (nlist > 1)
	{
		// sort the list by decreasing frequency
		for (n = 0; n < nlist - 1; n++)
			if (freq[n] < freq[n + 1]) { // swap order c = list[n]; list[n] = list[n + 1]; list[n + 1] = c; c = freq[n]; freq[n] = freq[n + 1]; freq[n + 1] = c; // need to sort back if (n > 0)
					n -= 2;
			}
 
		// Combine the member lists associated with the last two entries. We combine the
		//  linked lists for the two low frequency nodes.
		p = list[nlist - 2];
		while (tbl[p - 1].link)
			p = tbl[p - 1].link;
		tbl[p - 1].link = list[nlist - 1];
 
		// The new node has the combined frequency.
		freq[nlist - 2] += freq[nlist - 1];
		nlist--;
 
		// Increase the code length for members of this node.
		p = list[nlist - 1];
		while (p)
		{
			tbl[p - 1].len++;
			p = tbl[p - 1].link;
		}
 
	}
 
	MM_free(freq);
	MM_free(list);
	return (ret);
}

You might note the check at line 20 handling a special case when there is only one used item in our alphabet. Here we need to use a code length of 1.

Now to be fair there is a lot more that I will be adding to this routine before we are done.

Now that we know the lengths of the code that we will be using to compress the data we can predict the compression ratio.

        freq  clen     freq*clen
A  0x000 4     5           20
D  0x003 2     6           12
E  0x004 6     4           24
F  0x005 3     6           18
G  0x006 2     6           12
H  0x007 2     6           12
J  0x008 53    2          106
K  0x009 26    2           52
L  0x00a 5     5           25
M  0x00b 4     5           20
N  0x00c 3     6           18
P  0x00d 1     7            7
Q  0x00e 1     8            8
R  0x00f 1     8            8
S  0x010 37    2           74
                      -----------
                          416 bits (52 bytes)

If we multiply the frequency of a symbol times the code length and total that for the set we get total number of bits required to encode the original message. Originally we had 150 bytes or 1200 bits. When we are done we can store that same message in only 52 bytes. We’ve reduced the data to almost one third it’s original size.

Let’s see how to derive the bit codes that we will use in encoding the data.

We want to derive the actual binary code patterns for encoding this symbol set. The DEFLATE specification tells us to first count the number of codes for each code length.

N  bl_count[N]
0      0
1      0
2      3
3      0
4      1
5      3
6      5
7      1
8      2

Next we find the numerical value of the smallest code for each code length. The following code is provided by the specification.

        code = 0;
        bl_count[0] = 0;
        for (bits = 1; bits <= MAX_BITS; bits++) {
            code = (code + bl_count[bits-1]) << 1;
            next_code[bits] = code;
        }

In performing this procedure we get the following. Here I will also show the codes in binary form.

N  bl_count[N]  next_code[N]
0      0
1      0            0
2      3            0     00
3      0            6     110
4      1           12     1100
5      3           26     11010
6      5           58     111010
7      1          126     1111110
8      2          254     11111110 

Now we assign codes to each symbol based upon its length. The DEFLATE specification provides this code snippet. Basically the above defines the starting code which we increment after each use.

        for (n = 0; n <= max_code; n++) {
            len = tree[n].Len;
            if (len != 0) {
                tree[n].Code = next_code[len];
                next_code[len]++;
            }
        }

And this ends up giving us the following codes for encoding this data.

        freq  clen     code
A  0x000 4     5      11010
D  0x003 2     6      111010
E  0x004 6     4      1100
F  0x005 3     6      111011
G  0x006 2     6      111100
H  0x007 2     6      111101
J  0x008 53    2      00
K  0x009 26    2      01
L  0x00a 5     5      11011
M  0x00b 4     5      11100
N  0x00c 3     6      111110
P  0x00d 1     7      1111110
Q  0x00e 1     8      11111110
R  0x00f 1     8      11111111
S  0x010 37    2      10

Take a few moments to picture what is going on. Basically with just 2 bits you can encode no more than 4 symbols. Since we have more than 4 symbols to encode we cannot use all 4 combinations of two bits. We reserve 1 or more bit combinations as a prefix indicting that an additional bit or more will be needed to identify other symbols. The decompressor will be processing the bit stream 1 bit at a time as it doesn’t know in advance how many bits will be needed to identify the next symbol.

In this symbol set it turns out that we use all but the last combination of two bits and 11b becomes the prefix accessing the rest of the symbol set. Note how this coincides with the 3 high frequency codes. It then turns out to be most efficient not to use any 3 bit codes and to jump right to 4 bits for the next possible symbol encoding. In fact if for any bit length if you reserve only the last combination as a prefix then the next bit length has only two combinations (a node). For 3 bit codes here that would be 110b and 111b. If we save all for prefix then we can encode more symbols. Here for the 4 bit codes there are 4 combinations: 1100b, 1101b, 1110b, and 1111b. Again this tree decided to use only one of the 4 bit combinations for a symbol.

Another thing to notice is that the final code for the largest bit length corresponds to the rightmost leaf in the tree. For DEFLATE that lexicographically is the longest code and requires a series of 1 bits to reach. So for this 8 bit code the last symbol is identified by 11111111b. This last code should always be 2**N – 1 where N is the largest bit length (code length). Note that ‘**’ indicates exponentiation here. Two is raised to the power of N.

If you think about it, the set of code lengths have to be just right to end up properly assigning codes to end up this way or to not overflow those available for any one bit length. This is assured by the procedure. If you try some random code lengths you will quickly see what I mean. In general you will be trying to create an impossible tree.

But wait!!

This table looks suspiciously like the code length encoding (third Huffman table that we have not discussed as yet). If it is, didn’t you mention that it would be limited to a bit depth or code length of 7? This one is 8 bits. Is that okay?

No. Is is not okay. You are right. I purposely chose this real-world example which actually does violate that code length rule. So this is not a DEFLATE compatible table. At least not for that third Huffman coding. This table occurs in trying to compress the /etc/JanosClasses.jarfile currently on my development JNIOR. And actually before this the literal table exceeds the 15 bit limit. The resulting compression fails.

Most of what you read now tells you to “adjust the Huffman tree” accordingly and prods on without a hint as to how you might do that. You can certainly detach and reattach nodes to get it done but how do you know that you haven’t significantly affected the compression ratio? You could punt and use the fixed Huffman tables afforded by another DEFLATE block type BTYPE 01. You know that you can’t just fiddle with the code lengths because you will end up trying to create an impossible tree. So what now?

Well, I can show you how to get it done.

Adjust the Huffman Tree Accordingly

We have created the optimum Huffman table for coding our data. Unfortunately we find out that it is not compatible for use with DEFLATE. The DEFLATE specification dictates that this particular table have a code length maximum of 7 bits. That being forced by the fact that these code lengths are stored in the DEFLATE stream using 3 bits each. That limits code lengths for this table to the set containing 0 thru 7.

Our table is too deep. It requires 8 bits to encode our data. What do we do about that? It seems that anything we do will reduce the efficiency of the compression. We need to create a less than optimum Huffman coding. How do we do that and keep the impact at a minimum? How do we not seriously damage the compression ratio?

We are going to adjust our Huffman table so that it does not exceed the maximum bit depth. Ideally we want to stop incrementing the code length of an symbols that reach the maximum. How can we properly do that? Well, we have to go back to the math. Let’s understand what makes a certain set of code lengths valid while others are not?

In the prior post we assigned code lengths to symbols in our alphabet based upon a tree construction algorithm. The steps in this algorithm are those that create a valid tree. It is not surprising then that our set of code lengths represent a real tree. As a result when we calculate the starting codes for each bit length the process ends up with usable codes. And as we noticed the last code in the tree is in fact 2**N – 1. In our case this is 2**8 -1 or 255 and in binary that being 11111111b.

Let me expand the loop in the starting code generation so we can see what happens. Here we will generate starting codes (S0..S8) for our code lengths. We will use a shorthand for the code length counts (N0..N8). In our example those are N = {0 0 3 0 1 3 5 1 2}. Note too that a left shift of 1 is equivalent to multiplication by 2. The starting codes (S) are calculated as follows:

    S1 = (S0 + N0) << 1 = (0 + 0) << 1 = 2 * 0 = 0
    S2 = (S1 + N1) << 1 = (0 + 0) << 1 = 2 * 0 = 0         00
    S3 = (S2 + N2) << 1 = (0 + 3) << 1 = 2 * 3 = 6         110
    S4 = (S3 + N3) << 1 = (6 + 0) << 1 = 2 * 6 = 12        1100
    S5 = (S4 + N4) << 1 = (12 + 1) << 1 = 2 * 13 = 26      11010
    S6 = (S5 + N5) << 1 = (26 + 3) << 1 = 2 * 29 = 58      111010
    S7 = (S6 + N6) << 1 = (58 + 5) << 1 = 2 * 63 = 126     1111110
    S8 = (S7 + N7) << 1 = (126 + 1) << 1 = 2 * 127 = 254   11111110

There are two things to notice here other than the fact that hist matches the table generated earlier. First, the count of 8 bit code lengths (N8) doesn’t come into play. Yet we know that there are 2 and the first will be assigned 11111110b and the second 11111111b. This being the 2**8 -1 that we now expect. The second thing is that all of the starting codes are even numbers. That being driven by the fact that they are the product of multiplication by 2. We will use this fact later.

Now I can reverse this procedure to generate starting codes back from the maximum bit length knowing that the last code must be 2**N – 1. So for this table we get the following:

    S8 = 2**8 - N8 = 2**8 - 2 = 254
    S7 = S8/2 - N7 = 254/2 - 1 = 127 - 1 = 126
    S6 = S7/2 - N6 = 126/2 - 5 = 63 - 5 = 58
    S5 = S6/2 - N5 = 58/2 - 3 = 29 - 3 = 26
    S4 = S5/2 - N4 = 26/2 - 1 = 13 - 1 = 12
    S3 = S4/2 - N3 = 12/2 - 0 = 6
    S2 = S3/2 - N2 = 6/2 - 3 = 3 - 3 = 0
    S1 = 0
    S0 = 0

This is just a matter of running the calculations backwards and knowing (or realizing) that the last code has to be 2**N – 1. You can see it generates the same results.

Now what if we decide to not accept a bit depth exceeding the maximum? So we are going to force those two symbols wanting to be 8 bit codes to be two additional 7 bit codes. So our code length array will look like this: N = {0 0 3 0 1 3 5 3}. Here we combined S7 and S8 and eliminated 8 bit altogether. Legal? Of course not. You can’t visualize what that does to a tree. let’s try the calculations back from the new maximum of 2**7 – 1.

    S7 = 2**7 - N7 = 128 - 3 = 125

Here this fails immediately. We know that starting codes (S) must be even numbers and 125 is odd! Not surprising as we are kind of floating a pair of leaves up in the air somehow. Can we make a home for them?

Clearly if we were to reattach those leaves somewhere else in the tree structure other nodes must be increased in bit depth. We need another 7 bit symbol to get S7 to be an even 124. To do that with minimum impact on compression ratio we increase the 6 bit coded symbol with the lowest frequency to 7 bits. Our array now being N = {0 0 3 0 1 3 4 4}. Try again:

    S7 = 2**7 - N7 = 128 - 4 = 124
    S6 = 124/2 - N6 = 62 - 4 = 58
    S5 = S6/2 - N5 = 58/2 - 3 = 29 - 3 = 26
    S4 = S5/2 - N4 = 26/2 - 1 = 13 - 1 = 12
    S3 = S4/2 - N3 = 12/2 - 0 = 6
    S2 = S3/2 - N2 = 6/2 - 3 = 3 - 3 = 0
    S1 = 0
    S0 = 0

Um. Everything seemed to fit right in. Is this tree valid? Let’s see.

N  bl_count[N]  next_code[N]
0      0
1      0            0
2      3            0     00
3      0            6     110
4      1           12     1100
5      3           26     11010
6      4           58     111010
7      4          124     1111100
        freq  clen     code
A  0x000 4     5      11010
D  0x003 2     6      111010
E  0x004 6     4      1100
F  0x005 3     6      111011
G  0x006 2     6      111100
H  0x007 2     6      111101
J  0x008 53    2      00
K  0x009 26    2      01
L  0x00a 5     5      11011
M  0x00b 4     5      11100
N  0x00c 3     7      1111100
P  0x00d 1     7      1111101
Q  0x00e 1     7      1111110
R  0x00f 1     7      1111111
S  0x010 37    2      10

This appears to have successfully generated a Huffman tree that would work with DEFLATE format! Let’s look at the compression ratio for this.

        freq  clen     freq*clen
A  0x000 4     5           20
D  0x003 2     6           12
E  0x004 6     4           24
F  0x005 3     6           18
G  0x006 2     6           12
H  0x007 2     6           12
J  0x008 53    2          106
K  0x009 26    2           52
L  0x00a 5     5           25
M  0x00b 4     5           20
N  0x00c 3     7           21
P  0x00d 1     7            7
Q  0x00e 1     7            7
R  0x00f 1     7            7
S  0x010 37    2           74
                      -----------
                          417 bits (53 bytes)

Wait! This only cost us 1 bit? Yes it did but to store it we would need another whole byte. So the impact of this procedure is likely (though not proven) to have a minimum impact on compression ratio. Yet, it corrects the table to insure that it is compatible with DEFLATE.

To generalize the process, the reversed starting code calculation is repeated from 2**N – 1 when N is the maximum bit depth back to 0 for bit length 0. If at any point the calculated starting code is not even, you must set the bit depth for the next least frequent symbol to include it at this code length and make the starting code even.

In my next post I will show code for this.

The complete procedure for generating a DEFLATE format compatible Huffman table limited to a maximum bit depth is shown here. I know that this is not optimized code. There is some unnecessary execution but I had wanted to keep steps separate and clear. You can be sure that over time I will optimize the coding and obfuscate it suitably for all future generations. ;)

This routine has the capacity to return FALSE if a table cannot be created. It was doing just that when the bit depth (code length) exceeded maximum. That has since been corrected. It will always return TRUE now.

// Establish bit length for the Huffman table based upon frequencies
static int _bitlength(struct huff_t *tbl, int ncodes, int maxb)
{
	uint16_t *list = MM_alloc(ncodes * sizeof(uint16_t), &_bitlength);
	int *freq = MM_alloc(ncodes * sizeof(int), &_bitlength);
	int nlist = 0;
	int n, c, p;
	int ret = TRUE;
	uint16_t *ptr;
 
	// List all of the symbols used in the data along with their frequencies. Note that
	//  we store pointers +1 so as to keep 0 as a linked list terminator.
	for (n = 0; n < ncodes; n++) if (tbl[n].cnt > 0)
		{
			list[nlist] = n + 1;
			freq[nlist] = tbl[n].cnt;
			nlist++;
		}
 
	// Note that there is a special boundary case when only 1 code is used. In this case
	//  the single code is encoded using 1 bit and not 0.
	if (nlist == 1)
		tbl[list[0] - 1].len = 1;
 
	// process this list down to a single node
	while (nlist > 1)
	{
		// sort the list by decreasing frequency
		for (n = 0; n < nlist - 1; n++)
			if (freq[n] < freq[n + 1]) { // swap order c = list[n]; list[n] = list[n + 1]; list[n + 1] = c; c = freq[n]; freq[n] = freq[n + 1]; freq[n + 1] = c; // need to sort back if (n > 0)
					n -= 2;
			}
 
		// Combine the member lists associated with the last two entries. We combine the
		//  linked lists for the two low frequency nodes.
		p = list[nlist - 2];
		while (tbl[p - 1].link)
			p = tbl[p - 1].link;
		tbl[p - 1].link = list[nlist - 1];
 
		// The new node has the combined frequency.
		freq[nlist - 2] += freq[nlist - 1];
		nlist--;
 
		// Sort the members of this node by decreasing code length. Longer codes to the
		//  left. This will also sort the frequency of the symbols in increasing order
		//  when code lengths are equal. We need this arrangement for the next step should
		//  we be required to balance the tree and avoid exceeding the maximum code
		//  length (maxb).
		p = TRUE;
		while (p)
		{
			p = FALSE;
 
			ptr = &list[nlist - 1];
			while (*ptr && tbl[*ptr - 1].link)
			{
				c = tbl[*ptr - 1].link;
				if ((tbl[*ptr - 1].len < tbl[c - 1].len) || (tbl[*ptr - 1].len == tbl[c - 1].len && tbl[*ptr - 1].cnt > tbl[c - 1].cnt))
				{
					n = tbl[c - 1].link;
					tbl[*ptr - 1].link = n;
					tbl[c - 1].link = *ptr;
					*ptr = c;
					p = TRUE;
				}
				ptr = &tbl[*ptr - 1].link;
			}
		}
 
		// Increase the code length for members of this node. We cannot exceed the maximum
		//  code length (maxb).
		p = list[nlist - 1];
		while (p)
		{
			if (tbl[p - 1].len < maxb)
				tbl[p - 1].len++;
 
			p = tbl[p - 1].link;
		}
 
		// Now verify the structure. This should be absolutely proper up until the point when
		//  we prevent code lengths from exceeding the maximum. Once we do that we are likely
		//  creating an impossible tree. We will need to correct that.
		p = list[nlist - 1];
		c = tbl[p - 1].len;
		if (c == maxb)
		{
 
			n = 1 << c;
			while (p)
			{
				if (tbl[p - 1].len == c)
					n--;
				else
				{
					// n must be even at this point or we extend the length group
					if (n & 1)
					{
						tbl[p - 1].len = c;
						n--;
					}
					else
					{
						c--;
						n /= 2;
					}
				}
 
				p = tbl[p - 1].link;
			}
		}
	}
 
	MM_free(freq);
	MM_free(list);
	return (ret);
}

Here are the steps that it performs:

  1. Create an array for all used alphabet symbols (those with non-zero frequency).
  2. Check for the special case where there is only one symbol. In that case we use a 1 bit code where one is unused.
  3. Sort the list array by decreasing frequency. The least frequent symbols are then at the end of the list.
  4. Combine the rightmost two least frequent symbols or nodes into a single new node having the combined frequency. All members of the two combined nodes become members of the new node.
  5. Sort the member list for the new node by decreasing bit depth (current code length) and secondly by increasing frequency.
  6. Increase the bit depth for all members of the new node by 1. If a symbol will exceed the maximum bit depth do not increment it.
  7. If we have reached the maximum bit depth then confirm the tree structure using the reverse starting code length calculations. Elevate the next least frequent symbol to the current bit depth if the calculated starting code is not an even number. Check all code lengths.
  8. If there is more than one entry in the list array (not down to one node yet) then repeat at Step #3.

So to review….

First, we covered how to perform a reasonably fast version of LZ77. This filling a 64 KB buffer with compressed data which contains literals and length-distance codes.

When we need to we will flush the 64 KB buffer and generate a block of DEFLATE format. At that point we call a routine that first analyzes the data for code frequencies. There are to be two alphabets. One for literals and length codes. The other for distance codes.

Now we’ve developed a method for generating DEFLATE compatible Huffman tables for our two alphabets. The bulk of the complexity is now behind us but there is still some work to do before we can start generate our compressed bit stream.

Moving on…

Now we need the size of our literal alphabet (HLIT) and our distance alphabet (HDIST) since it is unlikely the we have utilized all literal symbols (0..285) or distance codes (0..29). So here we scan each to trim unused symbols.

CODE: SELECT ALL

		// Now we combine the bit length arrays into a single array for run-length-like repeat encoding.
		//  In DEFLATE this encoding can overlap for the literal table to the distance code table as
		//  if a single array. First determine the alphabet size for each table.
		for (hlit = 286; 0 < hlit; hlit--)
			if (littbl[hlit - 1].len > 0)
				break;

		for (hdist = 30; 0 < hdist; hdist--)
			if (dsttbl[hdist - 1].len > 0)
				break;

So we have the two alphabets. One for literals and length codes (0..HLIT) and one for distance codes (0..HDIST). We’ve seen that all we need are the code lengths for each symbol set to define the associated Huffman tree. We need to convey this information to the decompressor. The DEFLATE format combines the code length arrays for the two alphabets now into one long array of HLIT + HDIST code lengths.

CODE: SELECT ALL

		// Now create the bit length array
		cntlist = MM_alloc((hlit + hdist) * sizeof(int), &_bufflush);
		for (n = 0; n < hlit; n++)
			cntlist[n] = littbl[n].len;
		for (n = 0; n < hdist; n++)
			cntlist[hlit + n] = dsttbl[n].len;

Now we know that the end-of-block code (256) is used as well as at least a handful of length codes and distance codes. So this array of code lengths itself is fairly long. It will be somewhere between 260 and 315 entries. Each code length is is in the set 0..15. So typically there is a lot of repetition. There can be a large number of 0’s in a row. Consider data that is constrained to 7-bit ASCII. In that case there are 128 literal codes that never occur and would have a code length of 0.

The DEFLATE specification defines a kind of run-length encoding for this array. This encodes sequences of code lengths using three additional codes.

         The Huffman codes for the two alphabets appear in the block
         immediately after the header bits and before the actual
         compressed data, first the literal/length code and then the
         distance code.  Each code is defined by a sequence of code
         lengths, as discussed in Paragraph 3.2.2, above.  For even
         greater compactness, the code length sequences themselves are
         compressed using a Huffman code.  The alphabet for code lengths
         is as follows:

               0 - 15: Represent code lengths of 0 - 15
                   16: Copy the previous code length 3 - 6 times.
                       The next 2 bits indicate repeat length
                             (0 = 3, ... , 3 = 6)
                          Example:  Codes 8, 16 (+2 bits 11),
                                    16 (+2 bits 10) will expand to
                                    12 code lengths of 8 (1 + 6 + 5)
                   17: Repeat a code length of 0 for 3 - 10 times.
                       (3 bits of length)
                   18: Repeat a code length of 0 for 11 - 138 times
                       (7 bits of length)

So as you can see we are heading towards our third Huffman table. This one with an alphabet of 19 codes (0..18). That’s why the symbol set I used for the example in handling maximum code length has 19 members. I used one of these tables as an example.

In my next post I will show the procedure I use for applying the repeat codes in this alphabet.

The current version of jniorsys.log on my development JNIOR compresses into a single block using the following literal and distance tables.

CODE: SELECT ALL

littbl
 0x00a 2 11 11111101110
 0x00d 2 12 111111111000
 0x020 93 6 101000
 0x027 4 11 11111101111
 0x028 14 9 111101010
 0x029 3 11 11111110000
 0x02a 2 12 111111111001
 0x02b 9 10 1111101000
 0x02c 9 10 1111101001
 0x02d 23 8 11101100
 0x02e 158 5 01100
 0x02f 18 9 111101011
 0x030 120 6 101001
 0x031 182 5 01101
 0x032 147 5 01110
 0x033 150 5 01111
 0x034 127 6 101010
 0x035 132 6 101011
 0x036 113 6 101100
 0x037 110 6 101101
 0x038 113 6 101110
 0x039 106 6 101111
 0x03a 91 6 110000
 0x03c 2 12 111111111010
 0x03d 3 11 11111110001
 0x03e 3 11 11111110010
 0x041 6 10 1111101010
 0x043 5 11 11111110011
 0x044 1 13 1111111111100
 0x046 2 12 111111111011
 0x048 1 13 1111111111101
 0x04a 3 11 11111110100
 0x04c 3 11 11111110101
 0x04d 5 11 11111110110
 0x04e 4 11 11111110111
 0x04f 5 10 1111101011
 0x050 11 9 111101100
 0x052 10 9 111101101
 0x053 6 10 1111101100
 0x054 5 10 1111101101
 0x055 2 12 111111111100
 0x057 3 11 11111111000
 0x05a 1 13 1111111111110
 0x05f 5 10 1111101110
 0x061 44 7 1101100
 0x062 7 10 1111101111
 0x063 19 9 111101110
 0x064 30 8 11101101
 0x065 52 7 1101101
 0x066 14 9 111101111
 0x067 13 9 111110000
 0x068 9 10 1111110000
 0x069 45 7 1101110
 0x06a 6 10 1111110001
 0x06b 3 11 11111111001
 0x06c 29 8 11101110
 0x06d 21 8 11101111
 0x06e 46 7 1101111
 0x06f 51 7 1110000
 0x070 31 8 11110000
 0x071 2 12 111111111101
 0x072 37 8 11110001
 0x073 27 8 11110010
 0x074 40 7 1110001
 0x075 16 9 111110001
 0x076 6 10 1111110010
 0x077 8 10 1111110011
 0x078 4 11 11111111010
 0x079 9 10 1111110100
 0x07a 4 11 11111111011
 0x100 1 13 1111111111111
 0x101 1248 2 00
 0x102 580 4 0100
 0x103 168 5 10000
 0x104 160 5 10001
 0x105 72 7 1110010
 0x106 119 6 110001
 0x107 82 6 110010
 0x108 17 9 111110010
 0x109 116 6 110011
 0x10a 253 5 10010
 0x10b 120 6 110100
 0x10c 52 7 1110011
 0x10d 69 7 1110100
 0x10e 79 6 110101
 0x10f 20 8 11110011
 0x110 54 7 1110101
 0x111 236 5 10011
 0x112 371 4 0101
 0x113 34 8 11110100
 0x114 6 10 1111110101
 0x115 17 9 111110011
 0x116 6 10 1111110110

dsttbl
 0x002 1 10 1111111100
 0x003 2 10 1111111101
 0x008 2 10 1111111110
 0x009 2 10 1111111111
 0x00a 14 8 11111110
 0x00b 204 4 0100
 0x00c 225 4 0101
 0x00d 46 7 1111110
 0x00e 168 5 11100
 0x00f 85 6 111110
 0x010 202 4 0110
 0x011 130 5 11101
 0x012 193 4 0111
 0x013 146 5 11110
 0x014 259 4 1000
 0x015 209 4 1001
 0x016 301 4 1010
 0x017 255 4 1011
 0x018 377 3 000
 0x019 293 4 1100
 0x01a 433 3 001
 0x01b 332 4 1101

We determine HLIT and HDIST for this and combine the code lengths into one HLIT + HDIST length array.

HLIT: 279
HDIST: 28
 0 0 0 0 0 0 0 0 0 0 11 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 11 9 11 12 10 10 8 5 9 6 5 5 5 6 6 6 6 6 6 6 0 12 11 11 0 0 10 0 11 13 0 12 0 13 0 11 0 11 11 11 10 9 0 9 10 10 12 0 11 0 0 13 0 0 0 0 10 0 7 10 9 8 7 9 9 10 7 10 11 8 8 7 7 8 12 8 8 7 9 10 10 11 10 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 2 4 5 5 7 6 6 9 6 5 6 7 7 6 8 7 5 4 8 10 9 10 0 0 10 10 0 0 0 0 10 10 8 4 4 7 5 6 4 5 4 5 4 4 4 4 3 4 3 4

So this array has 307 entries and you can see that there is a lot of repetition. We next apply the repeat codes as appropriate to shorten this array to 139 entries.

HLIT: 279
HDIST: 28
 0 0 0 0 0 0 0 0 0 0 11 0 0 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 11 9 11 12 10 10 8 5 9 6 5 5 5 6 6 6 6 6 6 6 0 12 11 11 0 0 10 0 11 13 0 12 0 13 0 11 0 11 11 11 10 9 0 9 10 10 12 0 11 0 0 13 0 0 0 0 10 0 7 10 9 8 7 9 9 10 7 10 11 8 8 7 7 8 12 8 8 7 9 10 10 11 10 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 2 4 5 5 7 6 6 9 6 5 6 7 7 6 8 7 5 4 8 10 9 10 0 0 10 10 0 0 0 0 10 10 8 4 4 7 5 6 4 5 4 5 4 4 4 4 3 4 3 4
NCNT: 139
 17/7 11 0 0 12 18/7 6 17/3 11 9 11 12 10 10 8 5 9 6 5 5 5 6 16/3 0 12 11 11 0 0 10 0 11 13 0 12 0 13 0 11 0 11 11 11 10 9 0 9 10 10 12 0 11 0 0 13 17/1 10 0 7 10 9 8 7 9 9 10 7 10 11 8 8 7 7 8 12 8 8 7 9 10 10 11 10 11 18/122 13 2 4 5 5 7 6 6 9 6 5 6 7 7 6 8 7 5 4 8 10 9 10 0 0 10 10 17/1 10 10 8 4 4 7 5 6 4 5 4 5 4 16/0 3 4 3 4

Here when one of the repeat codes are used I show the value of the extra bits we will insert following the Huffman code for the symbol.

Now we create the Huffman table for this array and determine frequencies. Note that extra bits are inserted later and are not part of the Huffman encoding. Here the maximum code length is 7 bits.

CODE: SELECT ALL

		// Ugh. Now we need yet another Huffman table for this run-length alphabet. First we establish
		//  the frequencies.  Note that we skip the byte defining the extra bits.
		cnttbl = MM_alloc(HUFF_HLEN * sizeof(struct huff_t), &_bufflush);
		for (n = 0; n < ncnt; n++)
		{
			cnttbl[cntlist[n]].cnt++;
			if (cntlist[n] >= 16)
				n++;
		}

		// We need to determine the bit lengths. 
		if (!_bitlength(cnttbl, HUFF_HLEN, 7))
		{
			err = TRUE;
			break;
		}

This results in the following third Huffman table that we will use to code the code lengths which in turn are used to generate the literal and distance tables for the data compression.

cnttbl
 0x000 17 3 000
 0x002 1 6 111100
 0x003 2 6 111101
 0x004 9 4 1000
 0x005 11 3 001
 0x006 9 4 1001
 0x007 11 4 1010
 0x008 10 4 1011
 0x009 10 4 1100
 0x00a 19 3 010
 0x00b 14 3 011
 0x00c 6 4 1101
 0x00d 4 5 11100
 0x010 2 6 111110
 0x011 4 5 11101
 0x012 2 6 111111

Are we there yet? :shock:

Well, we are getting close.

As before we will need to convey the code lengths in this ‘cnttbl’ to the decompressor. We have N = {3 0 6 6 4 3 4 4 4 4 3 3 4 5 0 0 6 5 6}. I mentioned earlier that these are stored each with 3 bits in the bit stream limiting the code length to 7 bits.

But DEFLATE wants to save every possible byte. Each of these code lengths has a probability of being used over some general set of data types. They decided to sequence these into an array i a custom order such that the least likely to be used code lengths fall at the end and can be trimmed. So we get the order from the specification and sequence these. Notice that our three 0 bit code length do in fact get trimmed.

CODE: SELECT ALL

		// Finally (haha) we establish a custom order for these bit lengths
		// array defined above
		//static const char hclen_order[HUFF_HLEN] = {
		//	16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2,
		//	14, 1, 15
		//};
		bitlens = MM_alloc(HUFF_HLEN * sizeof(int), &_bufflush);
		for (n = 0; n < HUFF_HLEN; n++)
			bitlens[n] = cnttbl[hclen_order[n]].len;

		// Now the the end of this array should be 0's so we find a length for the array
		for (hclen = HUFF_HLEN; 0 < hclen; hclen--)
			if (bitlens[hclen - 1] > 0)
				break;
 6 5 6 3 4 4 4 4 3 3 3 4 4 6 5 6 0 0 0
HCLEN: 16

It is hard to believe but we now have everything that we need to actually generate the compressed bit stream. Well, all except a couple of routines to actually do the serialization. So that will be the next step.

We’re going to output to a bit stream. Each write will likely involve a different number of bits. Someplace we have to properly sequence these into bytes to append to the output buffer.

To do this we need one routine to write called _writeb() and one to use at the end to flush any remaining unwritten bits called _flushb().

CODE: SELECT ALL

// Routine to stream bits
static void _writeb(int num, int code, struct bitstream_t *stream)
{
	// if no bits then nothing to do
	if (num == 0)
		return;

	// insert bits into the stream
	code &= ((1 << num) - 1);
	code <<= stream->nbits;
	stream->reg |= code;
	stream->nbits += num;

	// stream completed bytes
	while (stream->nbits >= 8)
	{
		stream->buffer[stream->length++] = (stream->reg & 0xff);
		stream->reg >>= 8;
		stream->nbits -= 8;
	}
}


// Routine flushes the bitstream
static void _flushb(struct bitstream_t *stream)
{
	if (stream->nbits)
	{
		stream->buffer[stream->length++] = (stream->reg & 0xff);
		stream->reg = 0;
		stream->nbits = 0;
	}
}

So to tie this together we use a structure.

// structure to assist with bit streaming
struct bitstream_t {
	char *buffer;
	int length;
	uint32_t reg;
	int nbits;
};

This seems simple enough but there are a couple things to understand. The DEFLATE specification gets into it right off the bat.

The first bit of the bit stream is the least significant bit of the first byte in the buffer. Once 8 bits are retrieved the 9th is the least significant bit of the second byte and so on.

A value appears in the stream starting with it’s least significant bit. That means that the bit order does not need to be reversed to pack the value at the tail of the current bit stream.

Huffman codes are processed a bit at a time. When you are reading a Huffman code you do not know how many bits you will need to retrieve the code for a valid symbol in the alphabet. So in this case you must insert the Huffman code so the most-significant bit is read first. The order of Huffman bits needs to be reversed. Armed with that fact, I have stored these codes in the tables in reverse order. That will be apparent in the following code to generate the tables for coding.

So the last thing we need to do is generate the actual Huffman codes for the three tables.

CODE: SELECT ALL

		// Now we need Huffman codes for these tables because eventually someday we will actually be
		//  generating a compressed bit stream.

		// Next we total the number of symbols using each bit length. These will be used to assign
		//  bit codes for each alphabet symbol.
		litcnts = MM_alloc(16 * sizeof(int), &_bufflush);
		for (n = 0; n < 286; n++)
			if (littbl[n].len > 0)
				litcnts[littbl[n].len]++;

		dstcnts = MM_alloc(16 * sizeof(int), &_bufflush);
		for (n = 0; n < 30; n++)
			if (dsttbl[n].len > 0)
				dstcnts[dsttbl[n].len]++;

		bitcnts = MM_alloc(16 * sizeof(int), &_bufflush);
		for (n = 0; n < HUFF_HLEN; n++)
			if (cnttbl[n].len)
				bitcnts[cnttbl[n].len]++;

		// Now we calculate starting codes for each bit length group. This procedure is defined in the
		//  DEFLATE specification. We can define the Huffman tables in a compressed format provided that
		//  the Huffman tables follow a couple of additional rules. Using these starting codes we
		//  can assing codes for each alphabet symbol. Note that Huffman codes are processed bit-by-bit
		//  and therefore must be generated here in reverse bit order.

		// Define codes for the literal Huffman table
		startcd = MM_alloc(16 * sizeof(int), &_bufflush);
		for (n = 0; n < 15; n++)
			startcd[n + 1] = (startcd[n] + litcnts[n]) << 1;
		for (n = 0; n < 286; n++)
		{
			len = littbl[n].len;
			if (len)
			{
				c = startcd[len]++;
				while (len--)
				{
					littbl[n].code <<= 1;
					littbl[n].code |= (c & 1);
					c >>= 1;
				}
			}
		}

		// Define codes for the distance Huffman table
		for (n = 0; n < 15; n++)
			startcd[n + 1] = (startcd[n] + dstcnts[n]) << 1;
		for (n = 0; n < 30; n++)
		{
			len = dsttbl[n].len;
			if (len)
			{
				c = startcd[len]++;
				while (len--)
				{
					dsttbl[n].code <<= 1;
					dsttbl[n].code |= (c & 1);
					c >>= 1;
				}
			}
		}

		// Define codes for the bit length Huffman table
		for (n = 0; n < 15; n++)
			startcd[n + 1] = (startcd[n] + bitcnts[n]) << 1;
		for (n = 0; n < HUFF_HLEN; n++)
		{
			len = cnttbl[n].len;
			if (len)
			{
				c = startcd[len]++;
				while (len--)
				{
					cnttbl[n].code <<= 1;
					cnttbl[n].code |= (c & 1);
					c >>= 1;
				}
			}
		}

Coming up next: Actually Generating the Bit Stream

The DEFLATE specification greatly oversimplifies the whole process by defining the block format in about a single page.

         We can now define the format of the block:

               5 Bits: HLIT, # of Literal/Length codes - 257 (257 - 286)
               5 Bits: HDIST, # of Distance codes - 1        (1 - 32)
               4 Bits: HCLEN, # of Code Length codes - 4     (4 - 19)

               (HCLEN + 4) x 3 bits: code lengths for the code length
                  alphabet given just above, in the order: 16, 17, 18,
                  0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15

                  These code lengths are interpreted as 3-bit integers
                  (0-7); as above, a code length of 0 means the
                  corresponding symbol (literal/length or distance code
                  length) is not used.

               HLIT + 257 code lengths for the literal/length alphabet,
                  encoded using the code length Huffman code

               HDIST + 1 code lengths for the distance alphabet,
                  encoded using the code length Huffman code

               The actual compressed data of the block,
                  encoded using the literal/length and distance Huffman
                  codes

               The literal/length symbol 256 (end of data),
                  encoded using the literal/length Huffman code

         The code length repeat codes can cross from HLIT + 257 to the
         HDIST + 1 code lengths.  In other words, all code lengths form
         a single sequence of HLIT + HDIST + 258 values.

There are a couple of reasons why the suggestion to teach JANOS how to compress files has been in Redmine for 5 years. That’s about how long ago when in development JANOS began to directly use JAR files for application programming. The obvious reason that it took 5 years to implement is that there really isn’t a huge need for compression in JNIOR. Storage was limited in the previous series and if you had wanted to keep log files around for any serious length of time it would have helped if we could compress them. The Series 4 has much more storage but still not a lot by today’s standards.

The real reason you may now realize. It is a bit involved. Given that JANOS uses no third party developed code and that I had not been able to find usable reference materials (search engines having been damaged by marketing greed).There were some hurdles that left this suggestion sit on the list at a low priority for practically ever. Well, we’ve got it done now.

Let’s generate the bit stream and be done with this tome.

First we stream the Block Header. The very first bit indicates whether or not the block is the last for the compression. For the last block this BFINAL bit will be a 1. It is 0 otherwise. We also are using the dynamic Huffman table type. The next two bits indicate the block type BTYPE. Here we use 10b as we will be providing our own Huffman tables. Yeah, we could have taken the easy path and used a fixed set of tables (BTYPE 01b) but that is no fun.

CODE: SELECT ALL

		// Now we have codes and everything that we need (Finally!) to generate the compressed bit
		//  stream.

		// set BFINAL and type BTYPE of 10
		_writeb(1, final ? 1 : 0, stream);
		_writeb(2, 0x02, stream);

We have already determined the sizes of our alphabets. We have HLIT defining the size of the literal alphabet that includes codes for sequence lengths. The complete alphabet space covers the range 0..285 but since we are not likely to use them all HLIT defines how many actually are used (0..HLIT). We also have HDIST playing a similar role for the distance alphabet which could range 0..29.

Those two parameters are supplied next. Here each values is conveyed in 5 bits. We can do that since we know that HDIST has to be at least 257 since we do have to use the end-of-block code of 256 in our alphabet. So we supply really the count to length codes used which is similar in magnitude to teh count of distance codes. So HLIT is supplied as HLIT – 257 and HDIST as HDIST – 1.

CODE: SELECT ALL

		// Now we output HLIT, HDIST, and HCLEN
		_writeb(5, hlit - 257, stream);
		_writeb(5, hdist - 1, stream);
		_writeb(4, hclen - 4, stream);

		// Output the HCLEN counts from the bit length array. Note that we have already ordered it
		//  as required.
		for (n = 0; n < hclen; n++)
			_writeb(3, bitlens[n], stream);

Following the delivery of HLIT and HDIST we supply HCLEN as HCLEN – 4. Recall that HCLEN defines the size of alphabet we are going to use to encode the array of code lengths for the main two Huffman tables. HCLEN defines the the number of elements we are going to supply for that 19 element array that has the custom order. The balance beyond HCLEN elements is assumed to be 0. This is the array whose elements are supplied with 3 bits and thus limited to the range 0..7. We include HCLEN elements simply as shown above.

Next we stream the concatenated array of code lengths. This array has HLIT + HDIST elements that have been compressed using repeat codes. Here we output the compressed array using the 19 symbol alphabet that we just defined for the decompressor by sending the array above. We encounter the extra bits usage for the first time. You can see in the following that after streaming the repeat codes 16, 17 or 18 we insert the field specifying the repeat count using the required number of extra bits.

CODE: SELECT ALL

		// Output the run-length compressed HLIT + HDIST code lengths using the code length
		//  Huffman codes. The two tables are blended together in the run-length encoding.
		//  Note that we need to insert extra bits where appropriate.
		for (n = 0; n < ncnt; n++)
		{
			c = cntlist[n];
			_writeb(cnttbl[c].len, cnttbl[c].code, stream);

			if (c >= 16)
			{
				switch (c)
				{
				case 16:
					_writeb(2, cntlist[++n], stream);
					break;
				case 17:
					_writeb(3, cntlist[++n], stream);
					break;
				case 18:
					_writeb(7, cntlist[++n], stream);
					break;
				}
			}
		}

If you are the decompressor at this point you have everything you need to develop the Huffman tables for literals and distance codes. You had to expand thee repeat compression and separate the two code length sets. You then tallied code length counts and calculated starting codes for each bit length. Finally you assigned codes to each used alphabet symbol. So we are ready to deal with the actual compressed data.

When we initially scanned the LZ77 compressed data to determine frequencies we had to temporarily expand the length-distance references to determine which codes from the length and distance alphabets we would be using. Well the following loop is practically identical because we again process the LZ77 compressed data expanding the length-distance references. This time we will output the codes to the bit stream. Here again we have the extra bits to insert. Where we ignored them before, now we insert them into the stream when necessary.

CODE: SELECT ALL

		// Unbelievable! We can actually now output the compressed data for the block! This is
		//  encoded using the literal and distance Huffman tables as required. Note we need to
		//  again process the escaping and length-distance codes.
		for (n = 0; n < osize; n++)
		{
			c = obuf[n];
			if (c != 0xff)
				_writeb(littbl[c].len, littbl[c].code, stream);
			else
			{
				// check and tally escaped 0xff
				if (obuf[++n] == 0xff)
					_writeb(littbl[0xff].len, littbl[0xff].code, stream);
				else
				{
					// table defined above
					//static const int lcode_ofs[29] = {
					//	3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17, 19, 23, 27, 31,
					//	35, 43, 51, 59, 67, 83, 99, 115, 131, 163, 195, 227, 258
					//};

					// determine required length code for lengths (3..258). This code is
					//  coded in the literal table.
					len = (obuf[n++] & 0xff) + 3;
					for (c = 0; c < 29; c++)
						if (lcode_ofs[c] > len)
							break;
					code = 256 + c;
					_writeb(littbl[code].len, littbl[code].code, stream);

					// insert extra bits as required by the code
					// table defined above
					//static const int lcode_bits[29] = {
					//	0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3,
					//	3, 4, 4, 4, 4, 5, 5, 5, 5, 0
					//};
					c = lcode_bits[code - 257];
					if (c)
						_writeb(c, len - lcode_ofs[code - 257], stream);

					// table define above
					//static const int dcode_ofs[30] = {
					//	1, 2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193,
					//	257, 385, 513, 769, 1025, 1537, 2049, 3073, 4097, 6145,
					//	8193, 12289, 16385, 24577
					//};

					// determine required distance code for distances (1..32768). This code is
					//  coded in the distance table.
					dst = (obuf[n++] & 0xff) << 8;
					dst |= (obuf[n] & 0xff);
					for (c = 0; c < 30; c++)
						if (dcode_ofs[c] > dst)
							break;
					code = c - 1;
					_writeb(dsttbl[code].len, dsttbl[code].code, stream);

					// insert extra bits as required by the code
					// table defined above
					//static const int dcode_bits[30] = {
					//	0, 0, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8,
					//	9, 9, 10, 10, 11, 11, 12, 12, 13, 13
					//};
					c = dcode_bits[code];
					if (c)
						_writeb(c, dst - dcode_ofs[code], stream);
				}
			}
		}

And finally we output that end-of-block code (256). If BFINAL were set and this were the last block in the compression (LZ77 flushing the last partial buffer of data) we would flush the serial buffer. This provides a final byte with the remaining bits for the stream.

CODE: SELECT ALL

			// And finally the end-of-block code and flush
			_writeb(littbl[256].len, littbl[256].code, stream);
			if (final)
				_flushb(stream);

At this point the buffer supplied by the LZ77 compression has been flushed to the bit stream. We would reset that buffer pointer and return. If there is more data the LZ77 compressor will then proceed with it. Remember that I had defined this buffer to be 64KB so there will likely be more in the case of larger files.

This DEFLATE compression capability will be part of JANOS v1.6.4 and later OS. I am likely to do some optimization before release. Here we see that it works. I’ll create an archive containing the files in my JNIOR’s root folder.

bruce_dev /> zip -c test.zip /
 6 files saved
bruce_dev /> 

bruce_dev /> zip test.zip
     Size   Packed          CRC32        Modified
    56096    10764   81%  0b779605  Jan 31 2018 08:42  jniorsys.log
    40302     6249   84%  b1dffc05  Jan 26 2018 07:38  web.log
    22434     9499   58%  059a09d9  Jan 25 2018 14:53  manifest.json
       89       89    0%  f97bbba2  Jan 28 2018 09:11  access.log
      990      460   54%  9699bbfe  Jan 30 2018 14:48  jniorboot.log.bak
      990      461   53%  8f3c0390  Jan 31 2018 08:42  jniorboot.log
 6 files listed
bruce_dev />

This archive verifies. This verification does decompress each file and confirm the CRC32.

bruce_dev /> zip -vl test.zip
  verifying: jniorsys.log (compressed)
  verifying: web.log (compressed)
  verifying: manifest.json (compressed)
  verifying: access.log
  verifying: jniorboot.log.bak (compressed)
  verifying: jniorboot.log (compressed)
 6 entries found - content verifies!
bruce_dev />

We can repeat the construction in verbose output mode. Here we see timing. Again, keep in mind that JANOS is running on a 100 MHz Renesas RX63N micro-controller.

bruce_dev /> zip -cl test.zip /
  deflate: /jniorsys.log (56096 bytes)
   saving: jniorsys.log (compressed 80.8%) 1.176 secs
  deflate: /web.log (40302 bytes)
   saving: web.log (compressed 84.5%) 0.671 secs
  deflate: /manifest.json (22434 bytes)
   saving: manifest.json (compressed 57.7%) 1.863 secs
   saving: access.log (stored) 0.011 secs
  deflate: /jniorboot.log.bak (990 bytes)
   saving: jniorboot.log.bak (compressed 53.5%) 0.054 secs
  deflate: /jniorboot.log (990 bytes)
   saving: jniorboot.log (compressed 53.4%) 0.054 secs
 6 files saved
bruce_dev />

The /etc/JanosClasses.jar file compresses and this file is where I first encountered Huffman tables whose bit depth (code length) exceeded the DEFLATE maximums (15 bit for literal and distance tables, 7 bit for code length encoding).

bruce_dev /> zip -cl test.zip /etc
  deflate: /etc/JanosClasses.jar (266601 bytes)
   saving: etc/JanosClasses.jar (compressed 11.2%) 24.215 secs
 1 files saved
bruce_dev /> 

bruce_dev /> zip test.zip
     Size   Packed          CRC32        Modified
   266601   236758   11%  20916587  Jan 11 2018 09:58  etc/JanosClasses.jar
 1 files listed
bruce_dev /> 

bruce_dev /> zip -vl test.zip
  verifying: etc/JanosClasses.jar (compressed)
 1 entries found - content verifies!
bruce_dev />

I know that 24 seconds for 1/4 megabyte file is nothing to write home about. Now that things are fully functional I can go back and work on the LZ77 where basically all of the time is consumed. I can certainly improve on this performance but as is is, it isn’t that bad. The JNIOR is a controller after all and you likely wouldn’t need to compress other archives.

I noticed that The JANOS runtime library for applications did not support a means of data encryption and decryption. It isn’t a problem to expose a cipher algorithm for use by applications. I have added the Security.rc4cipher() method for this purpose. I know that RC4 has been rumored to have been broken. For our purposes it remains plenty secure.

Here’s a test program. This requires JANOS v1.6.3-rc4 or later.

package jtest;
 
import com.integpg.system.Security;
 
public class Main {
    
    public static void main(String[] args) throws Exception {
        
        // source text and cipher key
        String text = "Best thing since sliced bread.";
        byte[] key = "Piece of cake".getBytes();
        
        // encrypt
        byte[] coded = Security.rc4cipher(text.getBytes(), text.length(), key);
        
        // encrypted content
        for (int n = 0; n < coded.length; n++) {
            System.out.printf(" %02x", coded[n] & 0xff);
            if (n % 16 == 15 || n == coded.length - 1)
                System.out.println("");
        }
        
        // decrypt
        byte[] result = Security.rc4cipher(coded, coded.length, key);
        
        // received message
        String msg = new String(result);
        System.out.println(msg);      
        
    }
}

This program outputs the following when run.

bruce_dev /> jtest
 ae 87 ae 84 bc 3e c2 b6 92 0f 25 c0 30 42 03 ef
 96 39 c5 cd b3 99 6f aa 36 ba c8 58 5b fd
Best thing since sliced bread.

bruce_dev />

To be honest I have not confirmed that the encoded string is in fact RC4. But JANOS uses the underlying cipher in many places and it has proven to be accurate there.

Remember PGP? I think that stood for (or stands for) Pretty Good Privacy. This basically was an simple approach to encrypting data for transfer through the email system. It used the RSA Private/Public Key technology. Well JANOS does RSA as part of my SSL/TLSv1.2 implementation. Why shouldn’t I expose that for use by applications. You may need to securely pass information.

Hypothetically the JNIOR could be monitoring doors and conveyors collecting numbers that might be directly related to sales or something that you might consider to be proprietary and quite sensitive. Each day you would like to forward the results to an email account. While the email transfer from the JNIOR is done over a secure connection the data is not stored at the other end in any encrypted format. Nor are you sure that the data is then transferred over any remaining connections securely.

The solution is to encrypt the data at the source and later decrypt. Well you can do that now with RC4 provided that you keep the key private. The same key is used to encrypt and then at the other location to decrypt. It is a risk.

Here the RSA key pair comes to the rescue. So I have exposed it in JANOS v1.6.3. Basically you can encrypt using a public key data which can only be decrypted by the corresponding private key some time later.

So you can use OpenSSL to generate an RSA key pair. Use a 1024-bit key as anything larger will tax the JNIOR a little too much. From that you can export the RSA Public Key in PEM format. It will look like this.

-----BEGIN PUBLIC KEY-----
MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDEwEHsRkk592MEFyZXvvfsDkaF
u169uXwKugo2J7JMh8fkruiKe7B2tbuA143JSYeI0o4mpqWwd06CbjDG2gVEMgbf
5SK7quMdflJ5mW7t3ZPQZdMdryttPq3C4pzTfuH6/MGMzaNdobXSOQ7+SkH7goRd
sUYx6flLXn1KnQjPCQIDAQAB
-----END PUBLIC KEY-----

I will show you how you can use this PEM formatted Public Key to encrypt data for transfer. Later you can use the corresponding Private Key that you have kept secret and sequestered to access the data.

The following program uses new extensions to the com.integpg.system.Security class.

Here we are demonstrating encryption using our internal Public Key and then successful decryption using the internal Private Key.

package jtest;
 
import com.integpg.system.Debug;
import com.integpg.system.Security;
 
public class Main {
    
    public static void main(String[] args) throws Exception {
 
        String msg = "The quick brown fox jumped over the lazy dog.";
        System.out.println(msg);
 
        byte[] data = Security.encrypt(msg.getBytes(), msg.length(), Security.PUBKEY);
        System.out.println("encrypted: ");
        Debug.dump(data);
        
        byte[] result = Security.decrypt(data, data.length, Security.PRIVKEY);
        System.out.println("decrypted: ");
        Debug.dump(result);
        
    }
        
}
bruce_dev /> jtest
The quick brown fox jumped over the lazy dog.
encrypted: 
 17 66 0a 66 d8 aa 67 7c-a6 41 81 69 b1 c9 d2 82    .f.f..g| .A.i....
 ab a6 9d ef fd 31 7b 67-2a 3a 23 82 05 55 3d dd    .....1{g *:#..U=.
 8a 33 36 2d 5c 61 ae 25-39 b6 40 28 5f 1f de d2    .36-\a.% 9.@(_...
 77 b4 47 9d 53 6c ee 7a-4b e2 29 8c e0 79 06 9f    w.G.Sl.z K.)..y..
 30 3c 2e 6e d0 41 cf 40-a2 2b e5 bd 03 dd d4 b4    0<.n.A.@ .+......
 a2 b4 d1 8b 33 31 f1 2e-84 e0 8d 01 b0 4d 7b 54    ....31.. .....M{T
 65 61 56 44 ee f4 45 fb-4a 39 96 c1 c9 0e 2a 2a    eaVD..E. J9....**
 3d 2b a6 71 a8 89 91 c0-cf 80 0b 3d e3 dc dc 8e    =+.q.... ...=....
decrypted: 
 54 68 65 20 71 75 69 63-6b 20 62 72 6f 77 6e 20    The.quic k.brown.
 66 6f 78 20 6a 75 6d 70-65 64 20 6f 76 65 72 20    fox.jump ed.over.
 74 68 65 20 6c 61 7a 79-20 64 6f 67 2e             the.lazy .dog.

bruce_dev />

By the way the dump() method in the com.integpg.system.Debug is also new. I got tired of coding a dump so it will be available now.

I will show you how to use an external Public Key for encryption next.

To show you how to encrypt using a supplied Public Key I will extract the internal public key and apply it as you would one obtained from a file let’s say. The following program uses a method in the class that supplies the Public Key.

package jtest;

import com.integpg.system.Debug;
import com.integpg.system.Security;

public class Main {
    
    public static void main(String[] args) throws Exception {
        
        // Let's see the Public Key
        byte[] pubkey = Security.pubkey();
        System.out.println(new String(pubkey));
 
        String msg = "The quick brown fox jumped over the lazy dog.";
        System.out.println(msg);
 
        byte[] data = Security.encrypt(msg.getBytes(), msg.length(), pubkey, 0);
        System.out.println("encrypted: ");
        Debug.dump(data);
        
        byte[] result = Security.decrypt(data, data.length, Security.PRIVKEY);
        System.out.println("decrypted: ");
        Debug.dump(result);
        
    }
        
}
bruce_dev /> jtest
-----BEGIN PUBLIC KEY-----
MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDEwEHsRkk592MEFyZXvvfsDkaF
u169uXwKugo2J7JMh8fkruiKe7B2tbuA143JSYeI0o4mpqWwd06CbjDG2gVEMgbf
5SK7quMdflJ5mW7t3ZPQZdMdryttPq3C4pzTfuH6/MGMzaNdobXSOQ7+SkH7goRd
sUYx6flLXn1KnQjPCQIDAQAB
-----END PUBLIC KEY-----

The quick brown fox jumped over the lazy dog.
encrypted: 
 a8 35 44 4a 15 4e 1f fe-b4 30 c3 e6 51 38 90 be    .5DJ.N.. .0..Q8..
 e4 4f 7c 5d fb e6 38 16-63 f1 93 ba a5 3f 24 00    .O|]..8. c....?$.
 eb 46 5d 27 25 f1 5a b1-bf 0e 46 f9 5b 1b e9 13    .F]'%.Z. ..F.[...
 ac 6c 77 db bd 1e 22 be-b5 32 6b 5c cc 0b 46 d7    .lw...". .2k\..F.
 3f 1b 30 4c 61 03 eb 2f-dd 84 54 d5 35 86 32 56    ?.0La../ ..T.5.2V
 16 56 7c 41 a3 ef 2f 70-2d 67 3f a5 97 fb 60 c2    .V|A../p -g?...`.
 df 61 5f 5a 76 90 56 db-21 66 6f f3 00 af aa a8    .a_Zv.V. !fo.....
 71 a2 a1 2e 31 7d 82 ab-34 e2 cc 3b 52 64 32 09    q...1}.. 4..;Rd2.
decrypted: 
 54 68 65 20 71 75 69 63-6b 20 62 72 6f 77 6e 20    The.quic k.brown.
 66 6f 78 20 6a 75 6d 70-65 64 20 6f 76 65 72 20    fox.jump ed.over.
 74 68 65 20 6c 61 7a 79-20 64 6f 67 2e             the.lazy .dog.

bruce_dev />

You can export the JNIOR’s Public Key now using the CERTMGR command.

bruce_dev /> help certmgr
CERTMGR

 -V             Verify installed keys and certificate
 -C [file]      Regenerate Certificate [Install file]
 -S file        Verify signature on certificate
 -K file        Install RSA Key Pair
 -D [file]      Decode and dump certificate [file]
 -E file        Export certificate to file
 -P file        Export public key to file
 -B             Export in binary
 -G [len]       Generate key pair [bit length]
 -R             Restore default credentials

SSL Certificate Management.

bruce_dev />

Here I will export the public key to a file. I’ll show you what is in the file and I’ll use CERTMGR to dump the encoded ASN.1 format for the key.

bruce_dev /> certmgr -p mykey.pub

bruce_dev /> cat mykey.pub
-----BEGIN PUBLIC KEY-----
MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDEwEHsRkk592MEFyZXvvfsDkaF
u169uXwKugo2J7JMh8fkruiKe7B2tbuA143JSYeI0o4mpqWwd06CbjDG2gVEMgbf
5SK7quMdflJ5mW7t3ZPQZdMdryttPq3C4pzTfuH6/MGMzaNdobXSOQ7+SkH7goRd
sUYx6flLXn1KnQjPCQIDAQAB
-----END PUBLIC KEY-----

bruce_dev /> certmgr -d mykey.pub

0000  30 81 9F       SEQUENCE {  (159 bytes)
0003  30 0D          |  SEQUENCE {  (13 bytes)
0005  06 09          |  |  OBJECT IDENTIFIER 1.2.840.113549.1.1.1
0010  05 00          |  |  NULL 
                     |  }
0012  03 81 8D       |  BITSTRING[140] Encapsulates {
0000  30 81 89       |  |  SEQUENCE {  (137 bytes)
0003  02 81 81       |  |  |  INTEGER 
                     |  |  |     C4C041EC464939F76304172657BEF7EC0E4685BB5EBDB97C
                     |  |  |     0ABA0A3627B24C87C7E4AEE88A7BB076B5BB80D78DC94987
                     |  |  |     88D28E26A6A5B0774E826E30C6DA05443206DFE522BBAAE3
                     |  |  |     1D7E5279996EEDDD93D065D31DAF2B6D3EADC2E29CD37EE1
                     |  |  |     FAFCC18CCDA35DA1B5D2390EFE4A41FB82845DB14631E9F9
                     |  |  |     4B5E7D4A9D08CF09
0087  02 03          |  |  |  INTEGER 010001
                     |  |  }
                     |  }
                     }

bruce_dev />

You might see now that you can take mykey.pub and send it to another JNIOR that can load it as the pubkey for encryption as demonstrated.

NO. THERE IS NO WAY TO EXPORT THE JNIOR’S PRIVATE KEY.

And, the encryption and decryption does not support use of a private key PEM format.

Why limit key size to 1024-bits on the JNIOR?

A 1024-bit Private Key operation (encrypting a single block of 128 bytes) on the JNIOR take about 3.4 seconds. The same operation using a 2048-bit key takes almost 26 seconds. That will cause browsers to timeout when trying to use HTTPS among other things.

A 2048-bit key can be installed on the JNIOR. You need a 2048-bit key pair which you can generated with OpenSSL.

OpenSSL> genpkey -out private.pem -des3 -algorithm rsa rsa_keygen_bits:2048
.................++++++
....................++++++
Enter PEM pass phrase:
Verifying - Enter PEM pass phrase:
OpenSSL> genpkey -out private.pem -des3 -algorithm rsa -pkeyopt rsa_keygen_bits:
2048
................................................................................
..............................................................................++
+
................................+++
Enter PEM pass phrase:
Verifying - Enter PEM pass phrase:
OpenSSL>

Move the resulting private.pem file onto the JNIOR and run the CERTMGR -K command to load it.

bruce_dev /> certmgr -k private.pem
Passphrase: *****
keys installed

bruce_dev />

Now let’s validate that it works.

bruce_dev /> certmgr -v            
2048-bit key pair verifies
private key operation requires about 25.7 seconds
certificate verifies 
certificate not valid with current keys

bruce_dev />

Oh, and we can update the certificate. That would likely happen automatically at some point but we can force it.

bruce_dev /> certmgr -c
certificate updated

bruce_dev /> certmgr -v
2048-bit key pair verifies
private key operation requires about 25.7 seconds
certificate verifies 

bruce_dev />

Let’s run the program mentioned earlier to see if it succeeds.

bruce_dev /> jtest
-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEArvnTH4JvTzzVW76iFOKf
akQ2EqbXVhEEoDZ0d1x2Q/8R8jvwZAdvvlcV63ixvTBSR+xInCfVAsjQDzeOVQq/
kKsQm7VNeqTHAZ4TobKYpcG2N3n4PGQRhT1H0bwqfopEWg/iqauCejKX6ivInZC6
kPD1rkCbr6HRSgnKKNbnmIL37nrx3XlhkrHeOV1/jEBGTFm1KIpNmVkaN83PuZSk
Xj2RJ9TGTdVwtrSNVe+VnQ4s+66BlPZrXBfi4P8lSyNd4J0eIVujfXor3Kxz2TAD
zmgyMylyJuHO/Ss/3PdGwnXIx7fbNKEK4OnMwRdz0DtvSMcJE6NhBiimD/Jm2F0H
awIDAQAB
-----END PUBLIC KEY-----

The quick brown fox jumped over the lazy dog.
encrypted: 
 41 2b 02 92 30 d5 50 7c-92 b6 95 eb 8c 8d f4 76    A+..0.P| .......v
 f1 22 0a c5 63 48 f7 1b-af 85 47 4e 1d b2 0d bc    ."..cH.. ..GN....
 5b 6a f9 6d c7 1a c5 90-69 f9 28 4c 93 e1 8c 2e    [j.m.... i.(L....
 3f 5b 95 26 9d c4 ae 15-15 84 74 1e c4 a5 21 29    ?[.&.... ..t...!)
 e2 e0 c8 f7 f0 3e 99 aa-ed a9 36 ab 18 4f f8 ca    .....>.. ..6..O..
 cc 23 b3 57 d2 5c d6 6f-fa 83 2b 44 82 a5 ab ef    .#.W.\.o ..+D....
 c7 44 98 14 6d 8e 58 a2-05 b9 e0 9c 87 fc 52 22    .D..m.X. ......R"
 ee 46 38 2e 32 4e 4d c1-92 cd fc 3d 80 1c 81 19    .F8.2NM. ...=....
 1b 95 56 93 ff 4a 06 e0-9e c2 30 0c 83 ee 01 08    ..V..J.. ..0.....
 8f 98 d7 f3 50 b5 2b 80-0c 9b 23 8b 45 df 56 85    ....P.+. ..#.E.V.
 60 06 30 e3 35 a1 3c 82-19 57 b6 7e cf a2 02 e4    `.0.5.<. .W.~....
 55 3f b4 3c a8 39 77 79-0f f0 d6 aa da 1d b4 73    U?.<.9wy .......s
 7e ef 13 54 a8 d7 b0 a1-d2 67 0a 66 08 b9 81 13    ~..T.... .g.f....
 11 17 c2 d4 be 98 b5 fe-50 34 49 ab da cf 75 d7    ........ P4I...u.
 c1 b5 18 4e 32 27 2f e4-81 35 51 4a 62 42 6e a1    ...N2'/. .5QJbBn.
 47 67 e5 e4 c4 2c 70 c2-9b ea d8 09 5a 52 fd cb    Gg...,p. ....ZR..
decrypted: 
 54 68 65 20 71 75 69 63-6b 20 62 72 6f 77 6e 20    The.quic k.brown.
 66 6f 78 20 6a 75 6d 70-65 64 20 6f 76 65 72 20    fox.jump ed.over.
 74 68 65 20 6c 61 7a 79-20 64 6f 67 2e             the.lazy .dog.

bruce_dev />

If you are *REAL* patient the 2048-bit key works with SSL/TLS.

This took a couple of minutes to come up and the browser did once tell me that the site was taking too long to respond. This was with Chrome.

The bottom line is that a 1024-bit key is really secure enough for controller device like the JNIOR.

NOTICE

As we are concerned about export restrictions the exposed cryptography functions demonstrated in this topic will remain unavailable for the time being. This may change should we decide to create a version of the product for shipment and use in the USA only.

Please feel free to offer your perspectives on this issue.

In the Programming Tips I show you how to make an outgoing secure connection using SSS/TLSv1.2. A secure connection encrypts data with a 128-bit key that is itself securely negotiated. So the information that you exchange cannot be read from the wire as it passes. But Windows still says “Not Secure” sometimes. Why is that?

By “Not Secure” Windows is really telling you that the connection is Not Trusted. The data is still encrypted but Windows doesn’t recognize the connected client. If you have connected to a JNIOR using the HTTPS URL before you might heave seen a Privacy Error page that you must bypass in order to access the unit. This occurs as the browser receives a certificate from the JNIOR which is not traceable through a Root Certificate Authority in its database. The JNIOR certificates are self-signed and not issued by such an authority. Ergo the privacy concern.

When your application makes an outgoing connection the destination sends a copy of its certificate. Can you verify that certificate as the browser does? Well, with limitations you can. I have added a method in the Socket class allowing you to retrieve that certificate.

Once you indicate that the connection should be secure there is a brief period in which the negotiations transpire. The following code waits for a certificate. The getCertificate() method returns an empty byte array until a certificate has been received.

CODE: Select All


bruce_dev /> jtest
 30 82 02 ed 30 82 02 56-a0 03 02 01 02 02 04 24    0...0..V .......$
 99 a9 00 30 0d 06 09 2a-86 48 86 f7 0d 01 01 0b    ...0...* .H......
 05 00 30 81 81 31 20 30-1e 06 03 55 04 0a 0c 17    ..0..1.0 ...U....
 49 4e 54 45 47 20 50 72-6f 63 65 73 73 20 47 72    INTEG.Pr ocess.Gr
 6f 75 70 20 49 6e 63 31-17 30 15 06 03 55 04 0b    oup.Inc1 .0...U..
 0c 0e 4a 4e 49 4f 52 20-43 6f 6e 74 72 6f 6c 73    ..JNIOR. Controls
 31 1d 30 1b 06 03 55 04-03 0c 14 68 6f 6e 65 79    1.0...U. ...honey
 70 6f 74 2e 69 6e 74 65-67 70 67 2e 63 6f 6d 31    pot.inte gpg.com1
 25 30 23 06 09 2a 86 48-86 f7 0d 01 09 01 16 16    %0#..*.H ........
 62 63 6c 6f 75 74 69 65-72 32 40 63 6f 6d 63 61    bcloutie r2@comca
 73 74 2e 6e 65 74 30 1e-17 0d 31 37 30 33 32 32    st.net0. ..170322
 31 37 33 30 32 33 5a 17-0d 31 39 30 33 32 32 31    173023Z. .1903221
 37 33 30 32 33 5a 30 81-81 31 20 30 1e 06 03 55    73023Z0. .1.0...U
 04 0a 0c 17 49 4e 54 45-47 20 50 72 6f 63 65 73    ....INTE G.Proces
 73 20 47 72 6f 75 70 20-49 6e 63 31 17 30 15 06    s.Group. Inc1.0..
 03 55 04 0b 0c 0e 4a 4e-49 4f 52 20 43 6f 6e 74    .U....JN IOR.Cont
 72 6f 6c 73 31 1d 30 1b-06 03 55 04 03 0c 14 68    rols1.0. ..U....h
 6f 6e 65 79 70 6f 74 2e-69 6e 74 65 67 70 67 2e    oneypot. integpg.
 63 6f 6d 31 25 30 23 06-09 2a 86 48 86 f7 0d 01    com1%0#. .*.H....
 09 01 16 16 62 63 6c 6f-75 74 69 65 72 32 40 63    ....bclo utier2@c
 6f 6d 63 61 73 74 2e 6e-65 74 30 81 9f 30 0d 06    omcast.n et0..0..
 09 2a 86 48 86 f7 0d 01-01 01 05 00 03 81 8d 00    .*.H.... ........
 30 81 89 02 81 81 00 a9-94 83 17 4b 2e bc 85 78    0....... ...K...x
 ec ea 5b e9 f7 58 40 70-3b 06 ea 49 d9 33 3d 49    ..[..X@p ;..I.3=I
 3d 03 5a 8d 84 db 5a b7-e5 49 1d 33 4b af 1b 59    =.Z...Z. .I.3K..Y
 a3 a2 71 e2 5c 42 76 d4-10 f3 b3 c9 0e 80 1e 89    ..q.\Bv. ........
 a1 62 c6 a2 82 ec 51 ab-05 cf 97 31 56 1a 95 22    .b....Q. ...1V.."
 a0 b3 03 9d f7 2f a2 5b-a1 06 1e 6b bb 7a 1a a6    ...../.[ ...k.z..
 b2 87 a3 14 fd db b9 e1-03 4b 45 d5 e1 ff c1 5a    ........ .KE....Z
 59 c4 0d 77 2d 3c da d6-14 2a 70 76 50 f1 1e bc    Y..w-<.. .*pvP...
 d3 0c ff 75 e6 5e 91 02-03 01 00 01 a3 70 30 6e    ...u.^.. .....p0n
 30 1d 06 03 55 1d 0e 04-16 04 14 29 cb 03 57 bc    0...U... ...)..W.
 dd 26 e7 8a d5 e5 64 c1-d0 87 b0 3b 58 30 82 30    .&....d. ...;X0.0
 0c 06 03 55 1d 13 04 05-30 03 01 01 ff 30 3f 06    ...U.... 0....0?.
 03 55 1d 11 04 38 30 36-87 04 32 c5 22 4b 82 14    .U...806 ..2."K..
 68 6f 6e 65 79 70 6f 74-2e 69 6e 74 65 67 70 67    honeypot .integpg
 2e 63 6f 6d 82 08 68 6f-6e 65 79 70 6f 74 82 0e    .com..ho neypot..
 68 6f 6e 65 79 70 6f 74-5f 6a 6e 69 6f 72 30 0d    honeypot _jnior0.
 06 09 2a 86 48 86 f7 0d-01 01 0b 05 00 03 81 81    ..*.H... ........
 00 2b 42 e0 5e 33 1a ee-b2 65 f4 da c1 18 df 73    .+B.^3.. .e.....s
 e7 f5 55 d7 26 05 f6 ec-ab 67 d8 60 32 4a 7c 50    ..U.&... .g.`2J|P
 56 14 c5 20 33 37 a9 8c-21 57 d8 5c 57 a7 36 b8    V...37.. !W.\W.6.
 2d da 88 47 5e 93 a6 c9-fc 2c 59 83 67 8c 8d 46    -..G^... .,Y.g..F
 1a 9c e7 f5 3a 27 66 db-bd 26 c0 b9 9c e1 f4 51    ....:'f. .&.....Q
 4f 6b ac 3d 09 c3 30 00-bc 7e 5f 61 51 c0 ba 17    Ok.=..0. .~_aQ...
 5f 29 b6 e7 3b 8e 7f eb-ae 10 99 26 9a 9a fd 70    _)..;... ...&...p
 67 17 c6 7c f9 c7 f1 7e-bb 3f 8d b2 ed 43 53 c2    g..|...~ .?...CS.
 d1                                                 .


bruce_dev /> 

So you can see here that we receive something that looks to have the company name in it. This is the certificate and unfortunately it is in a binary ASN.1 format. That at this point is not very useful. You would have a lot of work to do if you were to parse information out of that.

So let’s see if I can help in that department.

Since in this example we attempt to connect to the HoneyPot I can separately pull its certificate in PEM format. In that form it looks like this.

bruce_dev /> cat flash/honeypot.cer
-----BEGIN CERTIFICATE-----
MIIC7TCCAlagAwIBAgIEJJmpADANBgkqhkiG9w0BAQsFADCBgTEgMB4GA1UECgwX
SU5URUcgUHJvY2VzcyBHcm91cCBJbmMxFzAVBgNVBAsMDkpOSU9SIENvbnRyb2xz
MR0wGwYDVQQDDBRob25leXBvdC5pbnRlZ3BnLmNvbTElMCMGCSqGSIb3DQEJARYW
YmNsb3V0aWVyMkBjb21jYXN0Lm5ldDAeFw0xNzAzMjIxNzMwMjNaFw0xOTAzMjIx
NzMwMjNaMIGBMSAwHgYDVQQKDBdJTlRFRyBQcm9jZXNzIEdyb3VwIEluYzEXMBUG
A1UECwwOSk5JT1IgQ29udHJvbHMxHTAbBgNVBAMMFGhvbmV5cG90LmludGVncGcu
Y29tMSUwIwYJKoZIhvcNAQkBFhZiY2xvdXRpZXIyQGNvbWNhc3QubmV0MIGfMA0G
CSqGSIb3DQEBAQUAA4GNADCBiQKBgQCplIMXSy68hXjs6lvp91hAcDsG6knZMz1J
PQNajYTbWrflSR0zS68bWaOiceJcQnbUEPOzyQ6AHomhYsaiguxRqwXPlzFWGpUi
oLMDnfcvoluhBh5ru3oaprKHoxT927nhA0tF1eH/wVpZxA13LTza1hQqcHZQ8R68
0wz/deZekQIDAQABo3AwbjAdBgNVHQ4EFgQUKcsDV7zdJueK1eVkwdCHsDtYMIIw
DAYDVR0TBAUwAwEB/zA/BgNVHREEODA2hwQyxSJLghRob25leXBvdC5pbnRlZ3Bn
LmNvbYIIaG9uZXlwb3SCDmhvbmV5cG90X2puaW9yMA0GCSqGSIb3DQEBCwUAA4GB
ACtC4F4zGu6yZfTawRjfc+f1VdcmBfbsq2fYYDJKfFBWFMUgMzepjCFX2FxXpza4
LdqIR16Tpsn8LFmDZ4yNRhqc5/U6J2bbvSbAuZzh9FFPa6w9CcMwALx+X2FRwLoX
Xym25zuOf+uuEJkmmpr9cGcXxnz5x/F+uz+Nsu1DU8LR
-----END CERTIFICATE-----

bruce_dev />

There is a nice option in the CERTMGR command to dump that in some meaningful form.

CODE: Select All


bruce_dev /> certmgr -d flash/honeypot.cer

0000  30 82 02 ED    SEQUENCE {  (749 bytes)
0004  30 82 02 56    |  SEQUENCE {  (598 bytes)
0008  A0 03          |  |  [0] EXPLICIT {  (3 bytes)
000A  02 01          |  |  |  INTEGER 02
                     |  |  }
000D  02 04          |  |  INTEGER 2499A900
0013  30 0D          |  |  SEQUENCE {  (13 bytes)
0015  06 09          |  |  |  OBJECT IDENTIFIER 1.2.840.113549.1.1.11
0020  05 00          |  |  |  NULL 
                     |  |  }
0022  30 81 81       |  |  SEQUENCE {  (129 bytes)
0025  31 20          |  |  |  SET {  (32 bytes)
0027  30 1E          |  |  |  |  SEQUENCE {  (30 bytes)
0029  06 03          |  |  |  |  |  OBJECT IDENTIFIER 2.5.4.10
002E  0C 17          |  |  |  |  |  UTF8STRING 'INTEG Process Group Inc'
                     |  |  |  |  }
                     |  |  |  }
0047  31 17          |  |  |  SET {  (23 bytes)
0049  30 15          |  |  |  |  SEQUENCE {  (21 bytes)
004B  06 03          |  |  |  |  |  OBJECT IDENTIFIER 2.5.4.11
0050  0C 0E          |  |  |  |  |  UTF8STRING 'JNIOR Controls'
                     |  |  |  |  }
                     |  |  |  }
0060  31 1D          |  |  |  SET {  (29 bytes)
0062  30 1B          |  |  |  |  SEQUENCE {  (27 bytes)
0064  06 03          |  |  |  |  |  OBJECT IDENTIFIER 2.5.4.3
0069  0C 14          |  |  |  |  |  UTF8STRING 'honeypot.integpg.com'
                     |  |  |  |  }
                     |  |  |  }
007F  31 25          |  |  |  SET {  (37 bytes)
0081  30 23          |  |  |  |  SEQUENCE {  (35 bytes)
0083  06 09          |  |  |  |  |  OBJECT IDENTIFIER 1.2.840.113549.1.9.1
008E  16 16          |  |  |  |  |  IA5STRING 'bcloutier2@comcast.net'
                     |  |  |  |  }
                     |  |  |  }
                     |  |  }
00A6  30 1E          |  |  SEQUENCE {  (30 bytes)
00A8  17 0D          |  |  |  UTCTIME[13] 170322173023Z
00B7  17 0D          |  |  |  UTCTIME[13] 190322173023Z
                     |  |  }
00C6  30 81 81       |  |  SEQUENCE {  (129 bytes)
00C9  31 20          |  |  |  SET {  (32 bytes)
00CB  30 1E          |  |  |  |  SEQUENCE {  (30 bytes)
00CD  06 03          |  |  |  |  |  OBJECT IDENTIFIER 2.5.4.10
00D2  0C 17          |  |  |  |  |  UTF8STRING 'INTEG Process Group Inc'
                     |  |  |  |  }
                     |  |  |  }
00EB  31 17          |  |  |  SET {  (23 bytes)
00ED  30 15          |  |  |  |  SEQUENCE {  (21 bytes)
00EF  06 03          |  |  |  |  |  OBJECT IDENTIFIER 2.5.4.11
00F4  0C 0E          |  |  |  |  |  UTF8STRING 'JNIOR Controls'
                     |  |  |  |  }
                     |  |  |  }
0104  31 1D          |  |  |  SET {  (29 bytes)
0106  30 1B          |  |  |  |  SEQUENCE {  (27 bytes)
0108  06 03          |  |  |  |  |  OBJECT IDENTIFIER 2.5.4.3
010D  0C 14          |  |  |  |  |  UTF8STRING 'honeypot.integpg.com'
                     |  |  |  |  }
                     |  |  |  }
0123  31 25          |  |  |  SET {  (37 bytes)
0125  30 23          |  |  |  |  SEQUENCE {  (35 bytes)
0127  06 09          |  |  |  |  |  OBJECT IDENTIFIER 1.2.840.113549.1.9.1
0132  16 16          |  |  |  |  |  IA5STRING 'bcloutier2@comcast.net'
                     |  |  |  |  }
                     |  |  |  }
                     |  |  }
014A  30 81 9F       |  |  SEQUENCE {  (159 bytes)
014D  30 0D          |  |  |  SEQUENCE {  (13 bytes)
014F  06 09          |  |  |  |  OBJECT IDENTIFIER 1.2.840.113549.1.1.1
015A  05 00          |  |  |  |  NULL 
                     |  |  |  }
015C  03 81 8D       |  |  |  BITSTRING[140] Encapsulates {
0000  30 81 89       |  |  |  |  SEQUENCE {  (137 bytes)
0003  02 81 81       |  |  |  |  |  INTEGER 
                     |  |  |  |  |     A99483174B2EBC8578ECEA5BE9F75840703B06EA49D9333D
                     |  |  |  |  |     493D035A8D84DB5AB7E5491D334BAF1B59A3A271E25C4276
                     |  |  |  |  |     D410F3B3C90E801E89A162C6A282EC51AB05CF9731561A95
                     |  |  |  |  |     22A0B3039DF72FA25BA1061E6BBB7A1AA6B287A314FDDBB9
                     |  |  |  |  |     E1034B45D5E1FFC15A59C40D772D3CDAD6142A707650F11E
                     |  |  |  |  |     BCD30CFF75E65E91
0087  02 03          |  |  |  |  |  INTEGER 010001
                     |  |  |  |  }
                     |  |  |  }
                     |  |  }
01EC  A3 70          |  |  [3] EXPLICIT {  (112 bytes)
01EE  30 6E          |  |  |  SEQUENCE {  (110 bytes)
01F0  30 1D          |  |  |  |  SEQUENCE {  (29 bytes)
01F2  06 03          |  |  |  |  |  OBJECT IDENTIFIER 2.5.29.14
01F7  04 16          |  |  |  |  |  OCTETSTRING[22] Encapsulates {
0000  04 14          |  |  |  |  |  |  OCTETSTRING[20] 
                     |  |  |  |  |  |     29CB0357BCDD26E78AD5E564C1D087B0  )..W..&....d....
                     |  |  |  |  |  |     3B583082                          ;X0.
                     |  |  |  |  |  }
                     |  |  |  |  }
020F  30 0C          |  |  |  |  SEQUENCE {  (12 bytes)
0211  06 03          |  |  |  |  |  OBJECT IDENTIFIER 2.5.29.19
0216  04 05          |  |  |  |  |  OCTETSTRING[5] Encapsulates {
0000  30 03          |  |  |  |  |  |  SEQUENCE {  (3 bytes)
0002  01 01          |  |  |  |  |  |  |  BOOLEAN TRUE(255)
                     |  |  |  |  |  |  }
                     |  |  |  |  |  }
                     |  |  |  |  }
021D  30 3F          |  |  |  |  SEQUENCE {  (63 bytes)
021F  06 03          |  |  |  |  |  OBJECT IDENTIFIER 2.5.29.17
0224  04 38          |  |  |  |  |  OCTETSTRING[56] Encapsulates {
0000  30 36          |  |  |  |  |  |  SEQUENCE {  (54 bytes)
0002  87 04          |  |  |  |  |  |  |  [7] 32C5224B  2."K
0008  82 14          |  |  |  |  |  |  |  [2] 
                     |  |  |  |  |  |  |     686F6E6579706F742E696E7465677067  honeypot.integpg
                     |  |  |  |  |  |  |     2E636F6D                          .com
001E  82 08          |  |  |  |  |  |  |  [2] 686F6E6579706F74  honeypot
0028  82 0E          |  |  |  |  |  |  |  [2] 686F6E6579706F745F6A6E696F72  honeypot_jnior
                     |  |  |  |  |  |  }
                     |  |  |  |  |  }
                     |  |  |  |  }
                     |  |  |  }
                     |  |  }
                     |  }
025E  30 0D          |  SEQUENCE {  (13 bytes)
0260  06 09          |  |  OBJECT IDENTIFIER 1.2.840.113549.1.1.11
026B  05 00          |  |  NULL 
                     |  }
026D  03 81 81       |  BITSTRING[128]  0 unused bits
                     |     2B42E05E331AEEB265F4DAC118DF73E7  +B.^3...e.....s.
                     |     F555D72605F6ECAB67D860324A7C5056  .U.&....g.`2J|PV
                     |     14C5203337A98C2157D85C57A736B82D  .. 37..!W.\W.6.-
                     |     DA88475E93A6C9FC2C5983678C8D461A  ..G^....,Y.g..F.
                     |     9CE7F53A2766DBBD26C0B99CE1F4514F  ...:'f..&.....QO
                     |     6BAC3D09C33000BC7E5F6151C0BA175F  k.=..0..~_aQ..._
                     |     29B6E73B8E7FEBAE1099269A9AFD7067  )..;......&...pg
                     |     17C67CF9C7F17EBB3F8DB2ED4353C2D1  ..|...~.?...CS..
                     }

bruce_dev />

Uh, This is still likely quite cryptic for your use. ASN.1 is fun. You might notice the hexadecimal in this dump follows that dumped by our application in the prior post. This demonstrates the inherent structure in the ASN.1 Certificate Format.

So if I am going to be of any help there is more work to be done.

After thinking about some kind of conversion from ASN.1 to JSON I have decided to stick with ASN.1 for this purpose. I’ll develop an ASN1 class that will help with parsing. The reason to hang with ASN.1 is that you will be able to confirm signatures.

That reminds me too that, I should write something about Signing. Since you now have access to RSA cryptography…

Alright. A couple of posts back we extracted the certificate from our secure connection. I dumped it in binary and also using CERTMGR to see the ASN.1 structure.

First of all the certificate is delivered in DER format. This defines the binary encoding used to transfer the signed certificate and that we see in the dump. A standard ASN.1 definition for a signed certificate is compiled into DER. The format defined for these certificates is x509 which is defined in RFC 5280. You may also need information contained in RFC 5246 which is the latest for TLSv1.2.

Okay so that is a lot of work and if you have to read all of that then forget it, right? Let me try to gloss over it and drive to doing something meaningful with this binary certificate stuff.

I have started to pull together an Asn1 class which will help us work with the DER encoded binary data. It was apparent from the CERTMGR dump that there is some structure to it. I’ll try to vaguely describe that from the top down.

First notice that the whole signed certificate as obtained from the connection is enclosed in a SEQUENCE. That is an ASN.1 object which in DER has a tag (ASN_SEQUENCE), a length, and data or content. From the RFCs we expect the following structure.

Certificate  ::=  SEQUENCE  {
    tbsCertificate       TBSCertificate,
    signatureAlgorithm   AlgorithmIdentifier,
    signatureValue       BIT STRING  }

So the top SEQUENCE contains three objects. Here “TBS” stands for To Be Signed. So the tbsCertificate is the Certificate to be signed or that has been signed. It is information that by itself is a SEQUENCE of objects. The signatureAlgorithm defines the procedure used in the signing. That is a SEQUENCE too with some objects within. And, the signatureValue turns out to be some tacked on bit data in a BITSTRING. That we will see is the actual signature.

So let’s modify our little program that gets the target host’s certificate to use my prototype Asn1 class. We will first confirm that the initial SEQUENCE covers all of the signed certificate and then itemize its content.

package jtest;
 
import com.integpg.system.Debug;
import java.net.Socket;
 
public class Main {
    
    public static void main(String[] args) throws Exception {
 
        // Establish a Secure Socket, get streams, and set a timeout
        Socket dataSocket = new Socket("50.197.34.75", 443);
        dataSocket.setSecure(true);
        
        // Obtain the certificate
        byte[] cert;
        while ((cert = dataSocket.getCertificate()).length == 0)
            System.sleep(100);
        dataSocket.close();
 
        // analyze
        Asn1 asn = new Asn1(cert);
 
        // details about the object
        System.out.println("Overall Signed Certificate Length: " + cert.length);
        System.out.println("ASN.1 Object tag: " + asn.getTag());
        System.out.printf("ASN.1 Object flags: 0x%02x\n", asn.getFlags());
        System.out.println("ASN.1 Object content size: " + asn.getLength());
        
        // skip the object and check for more data (should be only 1 object)
        asn.skip();
        if (!asn.hasMoreData())
            System.out.println("Signed Certificate is a sigle object as expected");
        else
            System.out.println("Something is wrong!");
    }
        
}
bruce_dev /> jtest
Overall Signed Certificate Length: 753
ASN.1 Object tag: 16
ASN.1 Object flags: 0x20
ASN.1 Object content size: 749
Signed Certificate is a sigle object as expected

bruce_dev />

This demonstrates that the SEQUENCE object contains the entire signed certificate. 753 bytes were delivered and aside from the 4-byte header (tag and length) the content covers the rest of the data. The tag of 16 tells us it is a SEQUENCE and the flag 0x20 tells us it is a CONSTRUCT.

Here are tags and flags that I have defined in the Asn1 class.

CODE: Select All


    static public final int ASN_BOOLEAN = 1;
    static public final int ASN_INTEGER = 2;
    static public final int ASN_BITSTRING = 3;
    static public final int ASN_OCTETSTRING = 4;
    static public final int ASN_NULL = 5;
    static public final int ASN_OBJECTID = 6;
    static public final int ASN_OBJECTDESC = 7;
    static public final int ASN_INSTANCEOF = 8;
    static public final int ASN_REAL = 9;
    static public final int ASN_ENUM = 10;
    static public final int ASN_EMBEDDED = 11;
    static public final int ASN_UTF8STRING = 12;
    static public final int ASN_RELATIVEOID = 13;
    static public final int ASN_SEQUENCE = 16;
    static public final int ASN_SET = 17;
    static public final int ASN_NUMERIC = 18;
    static public final int ASN_PRINTABLE = 19;
    static public final int ASN_T61 = 20;
    static public final int ASN_VIDEOTEX = 21;
    static public final int ASN_IA5STRING = 22;
    static public final int ASN_UTCTIME = 23;
    static public final int ASN_GENTIME = 24;
    static public final int ASN_GRAPHIC = 25;
    static public final int ASN_VISIBLESTR = 26;
    static public final int ASN_GENSTRING = 27;
    static public final int ASN_UNIVSTRING = 28;
    static public final int ASN_CHARSTR = 29;
    static public final int ASN_BMPSTR = 30;
    static public final int ASN_HIGHFORM = 31;

    static public final int ASN_CONSTRUCT = 0x20;
    static public final int ASN_APPLICATION = 0x40;
    static public final int ASN_CONTEXT = 0x80;
    static public final int ASN_PRIVATE = 0xC0;

So let’s look into the overall SEQUENCE and see that those three objects are to be found. We’ll just list the tags for the objects we find. Her are the changes to our test program.

        // analyze
        Asn1 asn = new Asn1(cert);
        
        // descend into the SEQUENCE object and itemize the objects it contains.
        asn.descend();
        
        while (asn.hasMoreData()) {
            System.out.println("ASN.1 Object tag: " + asn.getTag());
            System.out.println("ASN.1 Object length: " + asn.getLength());
            System.out.println("");
            asn.skip();
        }
bruce_dev /> jtest
ASN.1 Object tag: 16
ASN.1 Object length: 598

ASN.1 Object tag: 16
ASN.1 Object length: 13

ASN.1 Object tag: 3
ASN.1 Object length: 129


bruce_dev />

So there are three parts. Two SEQUENCEs and a BITSTRING. Those correspond to tbsCertificate, signatureAlgorithm and signatureVauerespectively which is what is expected.

Certificate  ::=  SEQUENCE  {
    tbsCertificate       TBSCertificate,
    signatureAlgorithm   AlgorithmIdentifier,
    signatureValue       BIT STRING  }

Let’s extract the key parts of this signed certificate and dump the signatureValue.

        // analyze
        Asn1 asn = new Asn1(cert);
        asn.descend();
        
        // obtain the certificate
        Asn1 tbsCertificate = new Asn1(asn.getData());
        asn.skip();
        Asn1 signatureAlgorithm = new Asn1(asn.getData());
        asn.skip();
        byte[] bitstring = asn.getData();
        
        // remove leading unused bit count supplied with BITSTRING
        byte[] signatureValue = new byte[bitstring.length - 1];
        ArrayUtils.arraycopy(bitstring, 1, signatureValue, 0, signatureValue.length);
        
        // dump the signature
        Debug.dump(signatureValue);
bruce_dev /> jtest
 2b 42 e0 5e 33 1a ee b2-65 f4 da c1 18 df 73 e7    +B.^3... e.....s.
 f5 55 d7 26 05 f6 ec ab-67 d8 60 32 4a 7c 50 56    .U.&.... g.`2J|PV
 14 c5 20 33 37 a9 8c 21-57 d8 5c 57 a7 36 b8 2d    ...37..! W.\W.6.-
 da 88 47 5e 93 a6 c9 fc-2c 59 83 67 8c 8d 46 1a    ..G^.... ,Y.g..F.
 9c e7 f5 3a 27 66 db bd-26 c0 b9 9c e1 f4 51 4f    ...:'f.. &.....QO
 6b ac 3d 09 c3 30 00 bc-7e 5f 61 51 c0 ba 17 5f    k.=..0.. ~_aQ..._
 29 b6 e7 3b 8e 7f eb ae-10 99 26 9a 9a fd 70 67    )..;.... ..&...pg
 17 c6 7c f9 c7 f1 7e bb-3f 8d b2 ed 43 53 c2 d1    ..|...~. ?...CS..

bruce_dev />

We see from the CERTMGR dump a few posts back that this is correct.

How can we check the signature?

To start since I know this is from our HoneyPot unit I will grab the public key directly from the JNIOR. I’ll save this in a pubkey.pem file. Since this is a self-signed certificate this public key is already in the tbsCertificate but to avoid the complexity of digging in to get it we’ll start with a handy copy of the key. We can also tell that this certificate’s signature was done with RSA encryption and the SHA256 or SHA2 hash. There are other signature algorithms. This is the one that the JNIOR used. So to keep it simple we’ll just work with that right now.

The Certificate Signing procedure is “simple”. When the certificate was signed the JNIOR

  1. computed the SHA256 hash over the ASN.1 DER encoded tbsCertificate object
  2. built a simple ASN.1 structure defining the algorithm with an OID and storing the hash as an OCTET STRING
  3. encrypted the DER encoded hash value using the JNIOR’s RSA Private Key
  4. appended the signingAlgorithm information and the signatureValue to the tbsCertificate creating the signed certificate.

So to verify the Signed Certificate we can reverse the process. So we will do the following:

  1. extract the tbsCertificate ASN.1 DER encoding from the signed certificate
  2. calculate the SHA256 over the tbsCertificate block
  3. obtain the BIT STRING appended to the signed certificate
  4. decrypt the BIT String using the JNIOR’s RSA Public Key
  5. look into the resulting ASN.1 structure for the stored copy of the hash
  6. if our calculated hash matches that stored then the certificate verifies
        // analyze
        Asn1 asn = new Asn1(cert);
        asn.descend();
        
        // obtain the certificate
        byte[] tbsCertificate = asn.getObject();
        asn.skip();
        byte[] signatureAlgorithm = asn.getData();
        asn.skip();
        byte[] bitstring = asn.getData();
        
        // remove leading unused bit count supplied with BITSTRING
        byte[] signatureValue = new byte[bitstring.length - 1];
        ArrayUtils.arraycopy(bitstring, 1, signatureValue, 0, signatureValue.length);

Here we parse the signed certificate to extract both the tbsCertificate and the signatureValue. Note that I used getObject() from the Asn1class to not only get the certificate content but also the header for the ASN.1 SEQUENCE. The hash includes all of it.

Next we calculate the SHA256 for the tbsCertificate block. The SHA256 methods are exposed in JANOS v1.6.3 and later.

        // calculate SHA-256 on tbsCertificate and signatureAlgorithm
        byte[] hash = Security.hashMessage256(tbsCertificate);
        Debug.dump(hash);
        System.out.println("");
bruce_dev /> jtest
 db 67 e8 3b 8a 7e c1 ab-ef 76 16 0b 2b 45 e1 26    .g.;.~.. .v..+E.&
 c6 fa eb 31 4a 1c d0 5f-23 b0 a7 0f 7a 03 5b e6    ...1J.._ #...z.[.

Finally we read the HoneyPot’s public key from the file and perform the RSA decryption. This dumps the decrypted BIT STRING content.

        // fetch the HoneyPot public key
        File keyfile = new File("/flash/pubkey.pem");
        DataInputStream fin = new DataInputStream(new FileInputStream(keyfile));
        byte[] pubkey = new byte[fin.available()];
        fin.readFully(pubkey);
        fin.close();
        
        byte[] sig = Security.decrypt(signatureValue, 0, pubkey, 0);
        Debug.dump(sig);
bruce_dev /> jtest
 db 67 e8 3b 8a 7e c1 ab-ef 76 16 0b 2b 45 e1 26    .g.;.~.. .v..+E.&
 c6 fa eb 31 4a 1c d0 5f-23 b0 a7 0f 7a 03 5b e6    ...1J.._ #...z.[.

 30 31 30 0d 06 09 60 86-48 01 65 03 04 02 01 05    010...`. H.e.....
 00 04 20 db 67 e8 3b 8a-7e c1 ab ef 76 16 0b 2b    ....g.;. ~...v..+
 45 e1 26 c6 fa eb 31 4a-1c d0 5f 23 b0 a7 0f 7a    E.&...1J .._#...z
 03 5b e6                                           .[.

bruce_dev />

If the public key properly decrypts the signingValue you will see a valid ASN.1 DER encoded structure. Manually we see that it starts with a SEQUENCE and the length is 49 bytes. In that SEQUENCE there is another of just 13 bytes. That contains the OID. After that there is a 32 byte OCTET STRING containing the hash.

So just by eye we see that the last 32 bytes of the decrypted signingValue do match the calculated SHA256 hash. We have verified the signature!

One of the parts of the tbsCertificate defines the Issuer and the other the Subject of the certificate. Since the JNIOR creates a self-signed certificate the Issuer and Subject are the same.

If you look back to the CERTMGR dump of the certificate you see that INTEG Process Group Inc appears twice. The first is for the Issuer and the second the Subject. There is a SEQUENCE following that which contains a BIT STRING that encapsulates two INTEGERs. That is the Subject’s RSA Public Key. That would match the HoneyPot’s Public Key. We could have gone into the certificate for that key. But that works ONLY for a self-signed certificate like this.

More generally the Issuer signs the Certificate using the Issuer’s RSA Private (and highly secret) Key and the Issuer is not the same as the Subject. In that case the Issuer’s RSA Public Key is NOT in the certificate. We would need to find an independent source for the key. Windows, for instance, looks to the Trusted Root Certificate Authorities store for another certificate, one for the Issuer where the public key can be found.

It can even be more complex as there might be a chain of trust. If the certificate is signed by an Issuer that is likely not to be found in the system’s certificate store then an additional one or more certificates might be transmitted during TLS negotiation. We would have to follow the chain verifying each certificate until we reached a trusted certificate from the system’s store or otherwise.

The JNIOR does not contain a specific trusted certificate store for this purpose. If we were to be verifying certificates in this way we would need to create something or otherwise rely on a remote system.

To demonstrate an outgoing HTTP request I am going to use the IP Address Location service that our HoneyPot unit uses. This creates the JSONdatabase used to generate the map at http://honeypot.integpg.com/map.php .

The JANOS Runtime Library does not provide classes to handle different web requests. Perhaps over time we will supply external libraries for that. But, you can easily do that directly. And, it is probably more educational to know how things work at the low level.

The procedure is straight forward.

  1. Establish an outgoing socket. (Lines 19-22)
  2. Issue a minimally formatted HTTP request. (Lines 25-27)
  3. Read the response. (Lines 45-50)
  4. Use the data. (Line 53)
package jtest;
 
import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.net.Socket;
 
public class Main {
    
    public static void main(String[] args) throws Exception {
 
        // IP Address query
        String ipaddr = "50.197.34.75";
        
        // Location services
        String serverHostname = "ip-api.com";
        int port = 80;
 
        // Establish a Socket, get streams, and set a timeout
        Socket dataSocket = new Socket(serverHostname, port);
        DataOutputStream sockout = new DataOutputStream(dataSocket.getOutputStream());
        DataInputStream sockin = new DataInputStream(dataSocket.getInputStream());
        dataSocket.setSoTimeout(5000);
 
        // Issue the HTTP request
        sockout.writeBytes("GET /json/" + ipaddr + " HTTP/1.1\r\n");
        sockout.writeBytes("Host: " + serverHostname + "\r\n");
        sockout.writeBytes("\r\n");
 
        // Process the response header
        int length = 0;
        String response;
        while ((response = sockin.readLine()) != null) {
            
            // Header ends with blank line
            if (response.length() == 0)
                    break;
            
            System.out.println(response);
            if (response.startsWith("Content-Length: ")) 
                length = Integer.parseInt(response.substring(16));
        }
        System.out.println();
 
        // Obtain the entire response (if any)
        response = "";
        if (length > 2) {
            byte[] resp = new byte[length];
            sockin.readFully(resp);
            response = new String(resp, "UTF8");
        }
 
        // Data (should be JSON)
        System.out.println(response);
 
        // Close the Socket
        sockout.close();
        sockin.close();
        dataSocket.close();
    }
        
}
bruce_dev /> jtest
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json; charset=utf-8
Date: Fri, 08 Dec 2017 14:01:58 GMT
Content-Length: 321

{"as":"AS7922 Comcast Cable Communications, LLC","city":"Pittsburgh","country":"United States","countryCode":"US","isp":"Comcast Business","lat":40.4406,"lon":-79.9959,"org":"Comcast Business","query":"50.197.34.75","region":"PA","regionName":"Pennsylvania","status":"success","timezone":"America/New_York","zip":"15282"}

bruce_dev />

So you can see that the response is JSON and can be easily used.

If you replace line 53 with Debug.dump(response.getBytes()); which is the new dump method in the library the data can be more easily reviewed.

bruce_dev /> jtest
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json; charset=utf-8
Date: Fri, 08 Dec 2017 14:27:52 GMT
Content-Length: 321

 7b 22 61 73 22 3a 22 41-53 37 39 32 32 20 43 6f    {"as":"A S7922.Co
 6d 63 61 73 74 20 43 61-62 6c 65 20 43 6f 6d 6d    mcast.Ca ble.Comm
 75 6e 69 63 61 74 69 6f-6e 73 2c 20 4c 4c 43 22    unicatio ns,.LLC"
 2c 22 63 69 74 79 22 3a-22 50 69 74 74 73 62 75    ,"city": "Pittsbu
 72 67 68 22 2c 22 63 6f-75 6e 74 72 79 22 3a 22    rgh","co untry":"
 55 6e 69 74 65 64 20 53-74 61 74 65 73 22 2c 22    United.S tates","
 63 6f 75 6e 74 72 79 43-6f 64 65 22 3a 22 55 53    countryC ode":"US
 22 2c 22 69 73 70 22 3a-22 43 6f 6d 63 61 73 74    ","isp": "Comcast
 20 42 75 73 69 6e 65 73-73 22 2c 22 6c 61 74 22    .Busines s","lat"
 3a 34 30 2e 34 34 30 36-2c 22 6c 6f 6e 22 3a 2d    :40.4406 ,"lon":-
 37 39 2e 39 39 35 39 2c-22 6f 72 67 22 3a 22 43    79.9959, "org":"C
 6f 6d 63 61 73 74 20 42-75 73 69 6e 65 73 73 22    omcast.B usiness"
 2c 22 71 75 65 72 79 22-3a 22 35 30 2e 31 39 37    ,"query" :"50.197
 2e 33 34 2e 37 35 22 2c-22 72 65 67 69 6f 6e 22    .34.75", "region"
 3a 22 50 41 22 2c 22 72-65 67 69 6f 6e 4e 61 6d    :"PA","r egionNam
 65 22 3a 22 50 65 6e 6e-73 79 6c 76 61 6e 69 61    e":"Penn sylvania
 22 2c 22 73 74 61 74 75-73 22 3a 22 73 75 63 63    ","statu s":"succ
 65 73 73 22 2c 22 74 69-6d 65 7a 6f 6e 65 22 3a    ess","ti mezone":
 22 41 6d 65 72 69 63 61-2f 4e 65 77 5f 59 6f 72    "America /New_Yor
 6b 22 2c 22 7a 69 70 22-3a 22 31 35 32 38 32 22    k","zip" :"15282"
 7d                                                 }

bruce_dev />

By the way, the Lat and Lon returned by these sites varies in accuracy. We use the above as a free service. I believe that some services will provide more precise locations when used in a paid mode. The free data however is just fine when mapped on the globe (http://honeypot.integpg.com/map.php).

We showed you how to make an Outgoing HTTP Request. If you would like to make a secure connection you need only add a single line of code.

        dataSocket.setSecure(true);

Here I will securely connect from my development JNIOR to the external HoneyPot JNIOR. From the example in the other topic I have modified the host and the request to attempt to access the JNIOR.

package jtest;
 
import com.integpg.system.Debug;
import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.net.Socket;
 
public class Main {
    
    public static void main(String[] args) throws Exception {
 
        // Location services
        String serverHostname = "50.197.34.75";
        int port = 443;
 
        // Establish a Socket, get streams, and set a timeout
        Socket dataSocket = new Socket(serverHostname, port);
        DataOutputStream sockout = new DataOutputStream(dataSocket.getOutputStream());
        DataInputStream sockin = new DataInputStream(dataSocket.getInputStream());
        dataSocket.setSoTimeout(5000);
        
        // Negotiate a secure connection
        dataSocket.setSecure(true);
 
        // Issue the HTTP request
        sockout.writeBytes("GET / HTTP/1.1\r\n");
        sockout.writeBytes("Host: " + serverHostname + "\r\n");
        sockout.writeBytes("\r\n");
 
        // Process the response header
        int length = 0;
        String response;
        while ((response = sockin.readLine()) != null) {
            
            // Header ends with blank line
            if (response.length() == 0)
                    break;
            
            System.out.println(response);
            if (response.startsWith("Content-Length: ")) 
                length = Integer.parseInt(response.substring(16));
        }
        System.out.println();
 
        // Obtain the entire response (if any)
        response = "";
        if (length > 2) {
            byte[] resp = new byte[length];
            sockin.readFully(resp);
            response = new String(resp, "UTF8");
        }
 
        // Data (should be JSON)
        Debug.dump(response.getBytes());
 
        // Close the Socket
        sockout.close();
        sockin.close();
        dataSocket.close();
    }
        
}
bruce_dev /> jtest
HTTP/1.1 401 Unauthorized
WWW-Authenticate: Digest realm="JANOS Web Server", qop=auth, nonce="66d0bb430469f01e9153358cfa7f"
Content-Length: 99

 3c 48 54 4d 4c 3e 3c 48-45 41 44 3e 3c 54 49 54    <HTML><H EAD><TIT
 4c 45 3e 34 30 31 20 55-6e 61 75 74 68 6f 72 69    LE>401.U nauthori
 7a 65 64 3c 2f 54 49 54-4c 45 3e 0d 0a 3c 2f 48    zed</TIT LE>..</H
 45 41 44 3e 3c 42 4f 44-59 3e 3c 68 31 3e 34 30    EAD><BOD Y><h1>40
 31 20 55 6e 61 75 74 68-6f 72 69 7a 65 64 3c 2f    1.Unauth orized</
 68 31 3e 3c 2f 42 4f 44-59 3e 3c 2f 48 54 4d 4c    h1></BOD Y></HTML
 3e 0d 0a                                           >..

bruce_dev />

So we get the response expected. But was it done securely? Here’s the transaction from the Wireshark point of view.

Timeouts are often required. JANOS supports timeouts on most inputs. Sometimes though you just need to implement an overall timeout. For example, if you leave an UI unattended for some period of time you might want it to return to a default Splash Screen or something along those lines.

This is easily achieved and we usually use the uptime in milliseconds as a reference. The approach is to add the timeout period to the current uptime and when the uptime exceeds that perform the timeout action. In the meantime if events occur (such as UI leypad activity) you reset the tmeout by recalculating.

Here is example code:

public class Main {
    
    static final int TIMEOUT = 120000;
    
    public static void main(String[] args) throws Exception {
        
        // Print time
        System.out.println(new Date());
        
        // establish and initialize a timeout
        long timer = JANOS.uptimeMillis() + TIMEOUT;
        
        while (JANOS.uptimeMillis() < timer) {
            
            // some event would reset the timer
            // timer = JANOS.uptimeMillis() + TIMEOUT;
            
            System.sleep(1000);
        }
        
        System.out.println("Timeout expired.");
        System.out.println(new Date());
    }
        
}
bruce_dev /> jtest
Wed Dec 06 09:16:31 EST 2017
Timeout expired.
Wed Dec 06 09:18:32 EST 2017

bruce_dev />

And there you go. Close enough to 2 minutes (120000 milliseconds). Typically such timeouts need not be precise. You do not want to load the CPU doing little else than checking for timeout. If you are trying to implement an accurate delay or scheduling an event at a precise time there are other approaches. This is your quick and dirty timeout.

Note that you do not want to use the RTC time of day for this kind of timeout. That can jump discontinuously when for instance the clock is adjusted manually or through the NTP protocol.

It seems that more often than not these days (at INTEG at least) when we need to store or transfer data we are using the JavaScript Object Notation (JSON). This is also known as ECMA-404 The JSON Data Interchange Standard. I have attached the document. If you are not at all interested in JSON you should at least visit http://www.json.org/ if for no other reason but to see how amazingly clearly and impressively concisely the format is described in just a single page.

JANOS uses JSON in numerous ways. Firstly, it is the data format that we use for the built-in Websockets interface that replaces the original binary JNIOR Protocol. That interface provides remote monitoring and control of the JNIOR. It is the underlying connection that makes the DCP (Dynamic Configuration Pages) possible. Secondly, the MANIFEST command uses JSON as a database format for the manifest.json reference files. And also, the JSON functionality is exposed through the JANOS Runtime Environment where it has been employed by applications for communications, databases, and configuration. The CAT command even has an option (-J) that will format a JSON text (including manifest.json) file making it readable.

From the applications point of view, JSON is handled through the java.util.Json class supplied by the etc/JanosClasses.jar runtime library. The JANOS programming environment is not your standard Java environment. This is one of many classes that we have provided allowing you to benefit from the underlying operating system. I am going to dive into this JSON class to show you what can be achieved and some of the tricks.

ECMA-404.pdf - 1 MB - MD5: a2e87492ab7c03f557c98c8ecde9c79a

I have to admit that I have been recently developing applications that utilize JSON and this forced me to refresh my memory of the java.util.Json class. There are few enhancements to that class that I might implement. So I thought that I would do that here and out in the open. I will be clear though as to what is new for JANOS v1.6.3 and what you already have available in your JNIORs.

So to start I have attached the current JavaDoc for the class. In this there is one new method getBoolean() that I have just added. Before this you needed to use the get() method and cast the returned Object it to a boolean to retrieve the logical. That by itself is not a problem. But, it wouldn’t be clear to you that you could do that.

Ok, in the process I will give the JavaDoc some attention too. Please feel free to jump in and comment.

Json-JanosRuntime-JavaDOC.pdf - 190 KB - MD5: 7ad7bc53e195b896aba20960e0013d18

By the way, if you are programming and want access to this JavaDoc you need to get the runtime JAR from us. You can develop using the etc/JanosClasses.jar runtime but that does not contain the JavaDoc. Elsewhere I describe how to configure Netbeans to build your application specifically for our runtime. Here I have attached the enhanced Runtime JAR for JANOS v1.6.2. If you configure for it you can confirm that the Json getBoolean() method is absent from the class.

JanosRuntime_1.6.2.jar - 259 KB - MD5: fc229e178cd2c76cc0c3fdf0631a7ea2

To get started we could use some examples of JSON with which to work.

You can run the MANIFEST -U command and generate an updated manifest.json for your JNIOR. That file actually is stored in two places: /manifest.json and /flash/manifest.json. One is the backup for the other. Here is what it looks like. I have to apologize as my development JNIOR has a crap load (technical term) of files on it so its a long listing. We’ll use CAT -J to format it.

CODE: SELECT ALL

bruce_dev /> cat -j manifest.json
{
  "model":"410",
  "serno":614070500,
  "vers":"v1.6.3-rc4",
  "date":"12/01/17 13:04:04",
  "files":{
    "/etc/janosclasses.jar":{
      "length":243492,
      "date":1512145376,
      "md5":"ece8d0ebf9b6882e488d1f7c9e764ce0",
      "crc":"bde463c8",
      "sha":"373b88f011b49f65eafd8e293fc185cc2563892e"
    },
    "/flash/serialcontrol.jar":{
      "length":31344,
      "date":1450364184,
      "md5":"b349e02b7efc64c0dfe5eb74292a5ee6",
      "crc":"3a005104"
    },
    "/flash/serialethernet.jar":{
      "length":25266,
      "date":1433505362,
      "md5":"ee5e266bb8418b4223a666bd046a8c56",
      "crc":"c3961df2"
    },
    "/flash/modbusserver.jar":{
      "length":51907,
      "date":1502219129,
      "md5":"77c16d6134dbd7ec93313fbad2b00d93",
      "crc":"b7456b42",
      "sha":"fad4ecc3d1607aafe0a385a10fb5ee90eff521bd"
    },
    "/flash/snmp.jar":{
      "length":239949,
      "date":1493062048,
      "md5":"b77d35c322ef6645f1eca9d22b29400b",
      "crc":"a4073dcb",
      "sha":"44a3c2b41a2375ef603063cc9b04642903dad973"
    },
    "/flash/www/base64.js":{
      "length":3493,
      "date":1433505378,
      "md5":"1138db1b5a6e165beae3ed81739dd2ec",
      "crc":"baceb6f6"
    },
    "/flash/www/configure/index.html":{
      "length":1349,
      "date":1433505382,
      "md5":"0454014aecfd0b7d9e4ce1efe0979139",
      "crc":"11ba5486"
    },
    "/flash/www/jr310applet.jar":{
      "length":287159,
      "date":1441207703,
      "md5":"f9c4840e7244824b75858a1a40dfb163",
      "crc":"3d1d0c72"
    },
    "/flash/www/jniorprotocol.jar":{
      "length":115148,
      "date":1441207710,
      "md5":"404b40c4293bf3c334e3b88e2fe0dd10",
      "crc":"5143ec4f"
    },
    "/flash/www/jniorprotocolhelpers.jar":{
      "length":34991,
      "date":1433505394,
      "md5":"b08e33e0c21e6c075b9b242bf092b68e",
      "crc":"48990308"
    },
    "/flash/www/task/index.html":{
      "length":1415,
      "date":1433505397,
      "md5":"bbdc32dce371881b3eebd15f5b3fce96",
      "crc":"cdbe02e4"
    },
    "/flash/www/taskmanagerinterface.jar":{
      "length":123052,
      "date":1433505400,
      "md5":"077cddccee476fab552d52a5eefd26a7",
      "crc":"647bb4b3"
    },
    "/flash/www/jquery/jquery-1.9.0.min.js":{
      "length":93071,
      "date":1433505404,
      "md5":"2b869ea9c8edd4c2243c5d44f665f632",
      "crc":"6a2a8434"
    },
    "/flash/www/jquery/jquery-ui.css":{
      "length":33441,
      "date":1433505405,
      "md5":"c6bd2971b8e625f2ae43ede9f655a27b",
      "crc":"0497b7a6"
    },
    "/flash/www/jquery/jquery-ui.min.js":{
      "length":96395,
      "date":1433505409,
      "md5":"8f636d4c90ea0abfcbb25528c635bf7d",
      "crc":"820662f5"
    },
    "/flash/www/vendor/bowser/bowser_0.7.2.min.js":{
      "length":3359,
      "date":1433505412,
      "md5":"61a36d48aad1298b17284b53f6ce3fd1",
      "crc":"22deb9e6"
    },
    "/flash/www/text":{
      "length":1336,
      "date":1434044220,
      "md5":"bab65804218b18b9e1a79f2d8e873259",
      "crc":"dda17d61"
    },
    "/flash/www/cycle":{
      "length":419,
      "date":1434044214,
      "md5":"9eb9bbdae70c1f994ebb7f51b18783b8",
      "crc":"9e496eb9"
    },
    "/flash/slaveservice.jar":{
      "length":73323,
      "date":1465435094,
      "md5":"cd6f5e177d75675607e9523d52e133f7",
      "crc":"9a871cd7"
    },
    "/flash/ftp.jar":{
      "length":9563,
      "date":1475783634,
      "md5":"793e460054f07867685e87f98fd402e6",
      "crc":"36fd641e"
    },
    "/flash/task.ini":{
      "length":4311,
      "date":1433782061,
      "md5":"b1f877ac198306b266311eab557ed1dd",
      "crc":"36a57579"
    },
    "/flash/task.jar":{
      "length":102655,
      "date":1434645611,
      "md5":"1979b16970127f2c38912777cb105133",
      "crc":"ed4d6ad7"
    },
    "/flash/jnior.ini":{
      "length":4874,
      "date":1512052838,
      "md5":"90740fe1ddcf0ddf0774c2574e234dfe",
      "crc":"c78e61d7",
      "sha":"76aa475db28479a22e748e6181cf11423988c266"
    },
    "/jniorsys.log":{
      "length":32844,
      "date":1512145587,
      "md5":"be4968cceb2fe0b2bebf50daac17d739",
      "crc":"637fb821",
      "sha":"2b6b56f5e3a731b933cf6e1594dfe1e003674d6b"
    },
    "/jniorboot.log.bak":{
      "length":1041,
      "date":1512074628,
      "md5":"8261c4f9cd12695626755ba6d1b0b9ad",
      "crc":"03e23ea1",
      "sha":"761a2b3fa74a921778d1c6fc438b5bfd0d51bc29"
    },
    "/jniorboot.log":{
      "length":995,
      "date":1512145452,
      "md5":"f053bbba44bea8f6333702fef922d950",
      "crc":"fa976c1c",
      "sha":"c2665669c49028a549a2e30e10b27e8f2aba5861"
    },
    "/flash/benchmark.jar":{
      "length":24351,
      "date":1464873509,
      "md5":"987f4044786771f31e0656cf91ed73f3",
      "crc":"1eed095a"
    },
    "/flash/threadtest.jar":{
      "length":3601,
      "date":1434645124,
      "md5":"902ce61cbd2524ca9b83dea335c395d3",
      "crc":"cd2479ff"
    },
    "/flash/test4to20.jar":{
      "length":3862,
      "date":1434659455,
      "md5":"a2e309c9d6dd112e5303aa76d2470740",
      "crc":"976f8208"
    },
    "/flash/dirs.bat":{
      "length":87,
      "date":1435691869,
      "md5":"531d655733ee668d829f9b3bdad96038",
      "crc":"6a11f77a"
    },
    "/flash/www/console/index.php":{
      "length":4347,
      "date":1438974987,
      "md5":"8728680bbc36d369429f7ca2c73cce7d",
      "crc":"c939c423"
    },
    "/flash/clean.bat":{
      "length":56,
      "date":1436532855,
      "md5":"ac9ce6553e1629412fb426b342440493",
      "crc":"3b661614"
    },
    "/flash/jnior1024.key":{
      "length":887,
      "date":1437746752,
      "md5":"b76b5351a92fdcc8d9b6b38ca62d8d71",
      "crc":"7983e14c"
    },
    "/flash/www/config/md5.js":{
      "length":5693,
      "date":1433505379,
      "md5":"a60fec5a81f207ff99ec1b97e3ccad0e",
      "crc":"e2a43d16"
    },
    "/flash/www/config/node.png":{
      "length":253,
      "date":1440435886,
      "md5":"1a8dbfaf1771a06e48dea0e3dc604392",
      "crc":"799c6dfc"
    },
    "/flash/www/config/tabs-styles.css":{
      "length":970,
      "date":1477590404,
      "md5":"68bca7015f51e26ab42199b5eb17a356",
      "crc":"f8870a33"
    },
    "/flash/www/config/tabs.js":{
      "length":3662,
      "date":1449678641,
      "md5":"ff728c86018341548ee70028062c89e0",
      "crc":"1a813112"
    },
    "/flash/www/config/styles.css":{
      "length":4450,
      "date":1504814044,
      "md5":"9ad78cca1b794dbcf9db3c55f1be5f1b",
      "crc":"acbd2e14",
      "sha":"3cf0bbc864840994a49f62d0ae00df6d8eb47ef3"
    },
    "/flash/www/config/comm.js":{
      "length":3541,
      "date":1507912287,
      "md5":"e7d2e56a443176d6150bbcc8b56e1911",
      "crc":"0ac0ed26",
      "sha":"5e66b96227779c5ef3736a7ca891a43cacffbbf1"
    },
    "/flash/www/config/console.js":{
      "length":5137,
      "date":1504815652,
      "md5":"33289e4b09f462efdb50e8d30d22d791",
      "crc":"b89fe380",
      "sha":"c2f3ea4fc0344d43b0c30b7f60b2b6c79c1f4817"
    },
    "/flash/www/config/config.js":{
      "length":12639,
      "date":1507912576,
      "md5":"75bf22a88d8a23b17de267607b88a14c",
      "crc":"d693e2f4",
      "sha":"cf9e9bcf7cc7d79ae648b241af16ee194199d7b3"
    },
    "/flash/www/config/index.php":{
      "length":22103,
      "date":1510262716,
      "md5":"6fe98e5238c5834d55b0140a7172fec6",
      "crc":"81f11698",
      "sha":"b5b440d43bb19da0396e8ab615161be9200e6180"
    },
    "/flash/www/jnior.ico":{
      "length":3262,
      "date":1439548680,
      "md5":"1c3b3dda6b10c6259fcf7c068b760f09",
      "crc":"051803eb"
    },
    "/flash/www/favicon.ico":{
      "length":156790,
      "date":1486410493,
      "md5":"07cb90c7f3573eff80222269625ed1dd",
      "crc":"7e367afa",
      "sha":"284add71fe3d3ba48fba059b88ff5143d3964b1d"
    },
    "/flash/analogpresets.jar":{
      "length":163902,
      "date":1441372806,
      "md5":"25eacc647412535e320302d3680ce327",
      "crc":"e6b656fc"
    },
    "/flash/www/config/config.css.php":{
      "length":1045,
      "date":1475072901,
      "md5":"1692861e9abd7f8d81f5b7cf8a176046",
      "crc":"4c386a21"
    },
    "/flash/www/config/inputs.png":{
      "length":18047,
      "date":1443116143,
      "md5":"e2151c93b6cdeaa154d15fab486ae61b",
      "crc":"16290877"
    },
    "/flash/www/config/loading.gif":{
      "length":3236,
      "date":1264096270,
      "md5":"d96f6517e00399c37a9765e045eaaf22",
      "crc":"16f442ed"
    },
    "/flash/jtest.jar":{
      "length":1832,
      "date":1511984925,
      "md5":"89f28d11945790915112f0a4880b6adc",
      "crc":"cf00edbe",
      "sha":"df53eab9f4eb1360c7ab48f30298ce7c48b0e440"
    },
    "/flash/www/vendor/angular_1.3.15/angular.min.js":{
      "length":125909,
      "date":1449498838,
      "md5":"ca1a58818682c3e858a585f283ab9beb",
      "crc":"9d8147d7"
    },
    "/flash/www/vendor/bootstrap_3.3.0/css/bootstrap-theme.css":{
      "length":21740,
      "date":1449498835,
      "md5":"c64043a3388612233d7eb947918a9bfc",
      "crc":"638f58a3"
    },
    "/flash/www/vendor/bootstrap_3.3.0/css/bootstrap-theme.css.map":{
      "length":41933,
      "date":1449498838,
      "md5":"c5da8241305bfe7e19919e6e943739eb",
      "crc":"11260772"
    },
    "/flash/www/vendor/bootstrap_3.3.0/css/bootstrap-theme.min.css":{
      "length":19199,
      "date":1449498840,
      "md5":"374df0ad5809a5314b0577802430a272",
      "crc":"8b3c47b7"
    },
    "/flash/www/vendor/bootstrap_3.3.0/css/bootstrap.css":{
      "length":137590,
      "date":1449498845,
      "md5":"ad6381ebfa541b55b0152349c6cabf76",
      "crc":"371e67da"
    },
    "/flash/www/vendor/bootstrap_3.3.0/css/bootstrap.css.map":{
      "length":366866,
      "date":1449498854,
      "md5":"4ba278e0c420d166e5a0eb71545f9509",
      "crc/www/vendor/bootstrap_3.3.0/fonts/glyphicons-halflings-regular.woff":{
      "length":23320,
      "date":1449498858,
      "md5":"68ed1dac06bf0409c18ae7bc62889170",
      "crc":"cec1a35c"
    },
    "/flash/www/vendor/bootstrap_3.3.0/js/bootstrap.min.js":{
      "length":34653,
      "date":1449498862,
      "md5":"281cd50dd9f58c5550620fc148a7bc39",
      "crc":"32d6c689"
    },
    "/flash/www/vendor/bootstrap_3.3.0/js/bootstrap.js":{
      "length":65813,
      "date":1449498862,
      "md5":"d5a03d9cca57637f008124916b86b585",
      "crc":"f504a7b3"
    },
    "/flash/www/vendor/bootstrap_3.3.0/js/npm.js":{
      "length":484,
      "date":1449498863,
      "md5":"ccb7f3909e30b1eb8f65a24393c6e12b",
      "crc":"cc50e34d"
    },
    "/flash/www/vendor/jquery_1.11.1/jquery-1.11.1.min.map":{
      "length":141680,
      "date":1449498870,
      "md5":"ffbeb16578d8cdf58104889baacbbef2",
      "crc":"e4e92bfd"
    },
    "/flash/www/vendor/jquery_1.11.1/jquery-1.11.1.min.js":{
      "length":95786,
      "date":1449498869,
      "md5":"8101d596b2b8fa35fe3a634ea342d7c3",
      "crc":"804ff984"
    },
    "/flash/www/config/integlogo.png":{
      "length":5773,
      "date":1449163436,
      "md5":"9111308273dadea73f5d09a5e02c7311",
      "crc":"60c4e184"
    },
    "/flash/utility.jar":{
      "length":106794,
      "date":1449773066,
      "md5":"ac559b91b537dfa70720a416f32f2960",
      "crc":"888936f1"
    },
    "/flash/generators/json/colour.js":{
      "length":4327,
      "date":1449774238,
      "md5":"c67e10d0e0e698fcdbbbadcaa55600d4",
      "crc":"19e8a38f"
    },
    "/flash/generators/json/ethernet.js":{
      "length":1409,
      "date":1449774238,
      "md5":"1b6bae08feb93f6bd345a3780c3acb69",
      "crc":"848097a7"
    },
    "/flash/generators/json/inputs.js":{
      "length":2825,
      "date":1449774239,
      "md5":"6959db5a769ff3ceea45bf606bda940a",
      "crc":"c544d780"
    },
    "/flash/generators/json/lists.js":{
      "length":12006,
      "date":1449774239,
      "md5":"5cc489ac77db7a3369b2ffc30cbd3a86",
      "crc":"ba761254"
    },
    "/flash/generators/json/logic.js":{
      "length":4404,
      "date":1449774239,
      "md5":"9cd1cf854976ebb69a6c20a7ac88d2f9",
      "crc":"6c2189f9"
    },
    "/flash/generators/json/loops.js":{
      "length":6040,
      "date":1449774239,
      "md5":"e8e9021b5d4eb2e0cc43f11ad5b3bfd7",
      "crc":"b30a758a"
    },
    "/flash/generators/json/math.js":{
      "length":14673,
      "date":1449774240,
      "md5":"fa22c29efc362e02d8f35838fcca46e5",
      "crc":"8fc62e67"
    },
    "/flash/generators/json/other.js":{
      "length":983,
      "date":1449774240,
      "md5":"dd77f555bc9b50ed17a215d7935f10ab",
      "crc":"3e07810d"
    },
    "/flash/generators/json/outputs.js":{
      "length":3861,
      "date":1449774240,
      "md5":"72a118cd7829b5a510e5a901d8863d6e",
      "crc":"bdd5e320"
    },
    "/flash/generators/json/procedures.js":{
      "length":3945,
      "date":1449774240,
      "md5":"cb9fb880bebb3375273353fafc12dc9c",
      "crc":"20d43aad"
    },
    "/flash/generators/json/text.js":{
      "length":1363,
      "date":1449774241,
      "md5":"a0bd39f638202a0800c100b4eac3cbc3",
      "crc":"b17b24d6"
    },
    "/flash/generators/json/timing.js":{
      "length":2638,
      "date":1449774241,
      "md5":"b1ee803dd8e6e00de74e0a3269f0a2ff",
      "crc":"489061b8"
    },
    "/flash/generators/json/variables.js":{
      "length":1500,
      "date":1449774241,
      "md5":"fecce79a400d5e4e1edbe521699fa604",
      "crc":"cb724c91"
    },
    "/flash/generators/json.js":{
      "length":4115,
      "date":1449774238,
      "md5":"cc72f2468eb970110f3f6f0278f43467",
      "crc":"25a98f30"
    },
    "/flash/www/config/link_to.png":{
      "length":259,
      "date":1450466976,
      "md5":"b1ed68183be4f97ce1793139496dbbb4",
      "crc":"a067876a"
    },
    11124a10766",
      "crc":"62d153fb"
    },
    "/flash/public/dcp.zip":{
      "length":181914,
      "date":1504795829,
      "md5":"655e8587293f35f11c5c24fc38201d2f",
      "sha":"5fcfd8e38826e648f98f8d50f3613deb0d6312b6",
      "crc":"da99b7d0"
    },
    "/flash/test.txt":{
      "length":304,
      "date":1495131459,
      "md5":"fc9f1f5e67928ccb9be3aeaa66cd9e52",
      "sha":"6100d999f484f98ab476408c801dd000e579a62c",
      "crc":"765047c5"
    },
    "/flash/dmx.jar":{
      "length":4476,
      "date":1500567859,
      "md5":"3fd35bbe6bbf53a32aecf273275d1839",
      "sha":"4f702a87adb060294b553e6bd212672727d5d25f",
      "crc":"e81db9aa"
    },
    "/flash/juptime.jar":{
      "length":3201,
      "date":1506713589,
      "md5":"d4c2482fae18482727c1b2afabcf94b4",
      "sha":"86268b720b99760a4ebdb803db53f3f7fd18fd18",
      "crc":"44b0878c"
    },
    "/flash/jscan.jar":{
      "length":2189,
      "date":1507141493,
      "md5":"a0a42e17f003cedcac9c8e662ada6b36",
      "sha":"f1cafb56fdae33b66fff9b20cd2ff2705d96da9e",
      "crc":"60f00fe2"
    },
    "/access.log":{
      "length":177,
      "date":1510081848,
      "md5":"914113dd52c4e74d2675eb1094ba10c6",
      "sha":"0212252f4f04ab136ce74ab0425cd7fce26b7c47",
      "crc":"e9a7f8d8"
    },
    "/auxio.log":{
      "length":1589,
      "date":1511288557,
      "md5":"a52713575d5c449ff8e8cdbeb7e10ba6",
      "sha":"22106e83ff429cc08fe16f21dc32623850f5673c",
      "crc":"a29ad191"
    },
    "/jniorio.log":{
      "length":3332,
      "date":1511289076,
      "md5":"d3c685fde34b343f2ba53dd60e4bf11d",
      "sha":"dd001970b69d61ab619745853addaf2910aabb31",
      "crc":"1bbc78de"
    },
    "/flash/hmi.jar":{
      "length":8329,
      "date":1511283865,
      "md5":"1a1b247ccb5e3eb9623d12578c1ba833",
      "sha":"7a1f5868817e8a3e60fe8fb2c4d9ed168e53d141",
      "crc":"fb2a0367"
    },
    "/flash/ckeypad.jar":{
      "length":11194,
      "date":1512145569,
      "md5":"71288ea4ffa40e936dbecfd010fff785",
      "sha":"23f944b627705716697ece761c6c95f8c1f873bb",
      "crc":"3d9fc092"
    }
  }
}
bruce_dev /> 

I can get some small ones say from the protocol we are defining to the Cinema UI project. Here’s the response to a GetInfo request. And, I’ll get the response to a MacroList in a bit.

CODE: SELECT ALL

bruce_dev /> cat -j getinfo.json                                                                
{
  "Message":"GetInfoResponse",
  "Information":"Cinema v2.4.0.473"
}
bruce_dev />

Okay so here is a response to a MacroListRequest that we are using in the Cinema UI implementation. It is just another JSON example that we can work with here.

CODE: SELECT ALL

bruce_dev /> cat inforesponse.json -j
{
  "Message":"GetMacroListResponse",
  "MacroList":[
    "Preshow Start",
    "Preshow End",
    "Flat Start Trailers",
    "Scope Start Trailers",
    "Feature Start",
    "Feature Credits",
    "Feature End",
    "Start Intermission",
    "Stop Intermission",
    "Extend Intermission 15 Seconds",
    "Shorten Intermission 15 Seconds",
    "Intermission End",
    "Fire Alarm",
    "Fire Alarm Clear",
    "Lights Dim",
    "Lights Half",
    "Lights Full",
    "Multiple",
    "Test",
    "ticket sold",
    "no ticket sold",
    "violet",
    "LED Off",
    "LED Green",
    "Core Command 1",
    "Core Command 2",
    "RWB"
  ]
}
bruce_dev /> 

So the file actually contains one huge string. Here I will manually wrap it.

{"Message":"GetMacroListResponse","MacroList":["Preshow Start","Preshow End","Flat Start Trailers",
  "Scope Start Trailers","Feature Start","Feature Credits","Feature End","Start Intermission",
  "Stop Intermission","Extend Intermission 15 Seconds","Shorten Intermission 15 Seconds","Intermission End",
  "Fire Alarm","Fire Alarm Clear","Lights Dim","Lights Half","Lights Full","Multiple","Test","ticket sold",
  "no ticket sold","violet","LED Off","LED Green","Core Command 1","Core Command 2","RWB"]}

let’s access the MacroListResponse and fetch data from it. The following program creates a Jason object from the content of the specified File. We need to use a File object here since we can also instantiate a Json object directly from a string. So we can’t just directly specify the filename. Once we have the object we search for the “message” name-value pair and retrieve its String content.

package jtest;
 
import java.io.File;
import java.util.Json;
 
public class Main {
    
    public static void main(String[] args) throws Exception {
        
        // Obtain Json Object from a file
        Json jdb = new Json(new File("inforesponse.json"));
        
        // Fetch String content
        String msg = jdb.getString("Message");
        System.out.println(msg);
        
    }
}
bruce_dev /> jtest
GetMacroListResponse

bruce_dev />

In this JSON object we know that the MacroList is an array of Strings. So it is a simple matter to itemize or list them.

package jtest;
 
import java.io.File;
import java.util.Json;
 
public class Main {
    
    public static void main(String[] args) throws Exception {
        
        // Obtain Json Object from a file
        Json jdb = new Json(new File("inforesponse.json"));
        
        // List the macros
        String[] macros = (String[]) jdb.get("MacroList");
        for (int n = 0; n < macros.length; n++)
            System.out.println(macros[n]);
        
    }
}
bruce_dev /> jtest
GetMacroListResponse

bruce_dev /> jtest
Preshow Start
Preshow End
Flat Start Trailers
Scope Start Trailers
Feature Start
Feature Credits
Feature End
Start Intermission
Stop Intermission
Extend Intermission 15 Seconds
Shorten Intermission 15 Seconds
Intermission End
Fire Alarm
Fire Alarm Clear
Lights Dim
Lights Half
Lights Full
Multiple
Test
ticket sold
no ticket sold
violet
LED Off
LED Green
Core Command 1
Core Command 2
RWB

bruce_dev />

But if we didn’t know it was an array could we figure it out?

package jtest;
 
import java.io.File;
import java.util.Json;
 
public class Main {
    
    public static void main(String[] args) throws Exception {
        
        // Obtain Json Object from a file
        Json jdb = new Json(new File("inforesponse.json"));
        
        // List the macros
        Object obj = jdb.get("MacroList");
 
        if (obj instanceof Object[]) {
            System.out.println("MacroList is an array.");
            
            // first few entries
            for (int n = 0; n < ((Object[]) obj).length && n < 4; n++)
                System.out.println( " " + (n + 1) + ". " + ((Object[]) obj)[n] );
        }  
    }
}
bruce_dev /> jtest
MacroList is an array.
 1. Preshow Start
 2. Preshow End
 3. Flat Start Trailers
 4. Scope Start Trailers

bruce_dev />

It just seems that I can make this a little cleaner. You know, like by providing an isArray() method or something along those lines.

When you are confronted with some JSON and you don’t have an official schema how do you know what is there? The get() methods require that you supply the name (or key) for an object but you might not know what name:value pairs are present. You can enumerate the structure.

Every JSON object encloses 0 or more name:value or string:value pairs in curly braces {}.

The following can enumerate the keys which are guaranteed to be strings identifying each name:value pair. Let’s work now with the MANIFEST database file to decipher its structure.

    public static void main(String[] args) throws Exception {
        
        // Obtain Json Object from a file
        Json jdb = new Json(new File("manifest.json"));
        
        // List top level enties
        Enumeration e = jdb.keys();
        while (e.hasMoreElements()) 
            System.out.println(e.nextElement());
    }
bruce_dev /> jtest
model
serno
vers
date
files

bruce_dev />

We can expand this to show the values associated with these keys. Note that we generically fetch the value Object and explicitly convert it to a string for display.

    public static void main(String[] args) throws Exception {
        
        // Obtain Json Object from a file
        Json jdb = new Json(new File("manifest.json"));
        
        // List top level enties
        Enumeration e = jdb.keys();
        while (e.hasMoreElements()) {
            String name = (String) e.nextElement();
            System.out.printf("%s = %s\n", name, jdb.get(name).toString());
        }
    }
bruce_dev /> jtest
model = 410
serno = 614070500
vers = v1.6.3-rc4
date = 12/01/17 13:04:04
files = {"/etc/janosclasses.jar":{"length":243492,"date":1512145376,"md5":"ece8d0ebf9b6882e488d1f7c9e764ce0","crc":"bde463c8","sha":"373b88f011b49f65eafd8e293fc185cc2563892e"},"/flash/serialcontrol.jar":{"length":31344,"date":1450364184,"md5":"b349e02b7efc64c0dfe5eb74292a5ee6","crc":"3a005104"},"/flash/serialethernet.jar":{"length":25266,"date":1433505362,"md5":"ee5e266bb8418b4223a666bd046a8c56","crc":"c3961df2"},"/flash/modbusserver.jar":{"length":51907,"date":1502219129,"md5":"77c16d6134dbd7ec93313fbad2b00d93","crc":"b7456b42","sha":"fad4ecc3d1607aafe0a385a10fb5ee90eff521bd"},"/flash/snmp.jar":{"length":239949,"date":1493062048,"md5":"b77d35c322ef6645f1eca9d22b29400b","crc":"a4073dcb","sha":"44a3c2b41a2375ef603063cc9b04642903dad973"},"/flash/www/base64.js":{"length":3493,"date":1433505378,"md5":"1138db1b5a6e165beae3ed81739dd2ec","crc":"baceb6f6"},"/flash/www/configure/index.html":{"length":1349,"date":1433505382,"md5":"0454014aecfd0b7d9e4ce1efe0979139","crc":"11ba5486"},"/flash/www/jr310applet.jar":{"length":287159,"date":1441207703,"md5":"f9c4840e7244824b75858a1a40dfb163","crc":"3d1d0c72"},"/flash/www/jniorprotocol.jar":{"length":115148,"date":1441207710,"md5":"404b40c4293bf3c334e3b88e2fe0dd10","crc":"5143ec4f"},"/flash/www/jniorprotocolhelpers.jar":{"length":34991,"date":1433505394,"md5":"b08e33e0c21e6c075b9b242bf092b68e","crc":"48990308"},"/flash/www/task/index.html":{"length":1415,"date":1433505397,"md5":"bbdc32dce371881b3eebd15f5b3fce96","crc":"cdbe02e4"},"/flash/www/taskmanagerinterface.jar":{"length":123052,"date":1433505400,"md5":"077cddccee476fab552d52a5eefd26a7","crc":"647bb4b3"},"/flash/www/jquery/jquery-1.9.0.min.js":{"length":93071,"date":1433505404,"md5":"2b869ea9c8edd4c2243c5d44f665f632","crc":"6a2a8434"},"/flash/www/jquery/jquery-ui.css":{"length":33441,"date":1433505405,"md5":"c6bd2971b8e625f2ae43ede9f655a27b","crc":"0497b7a6"},"/flash/www/jquery/jquery-ui.min.js":{"length":96395,"date":1433505409,"md5":"8f636d4c90ea0abfcbb25528c635bf7d","crc":"820662f5"},"/flash/www/vendor/bowser/bowser_0.7.2.min.js":{"length":3359,"date":1433505412,"md5":"61a36d48aad1298b17284b53f6ce3fd1","crc":"22deb9e6"},"/flash/www/text":{"length":1336,"date":1434044220,"md5":"bab65804218b18b9e1a79f2d8e873259","crc":"dda17d61"},"/flash/www/cycle":{"length":419,"date":1434044214,"md5":"9eb9bbdae70c1f994ebb7f51b18783b8","crc":"9e496eb9"},"/flash/slaveservice.jar":{"length":73323,"date":1465435094,"md5":"cd6f5e177d75675607e9523d52e133f7","crc":"9a871cd7"},"/flash/ftp.jar":{"length":9563,"date":1475783634,"md5":"793e460054f07867685e87f98fd402e6","crc":"36fd641e"},"/flash/task.ini":{"length":4311,"date":1433782061,"md5":"b1f877ac198306b266311eab557ed1dd","crc":"36a57579"},"/flash/task.jar":{"length":102655,"date":1434645611,"md5":"1979b16970127f2c38912777cb105133","crc":"ed4d6ad7"},"/flash/jnior.ini":{"length":4874,"date":1512052838,"md5":"90740fe1ddcf0ddf0774c2574e234dfe","crc":"c78e61d7","sha":"76aa475db28479a22e748e6181cf11423988c266"},"/jniorsys.log":{"length":32844,"date":1512145587,"md5":"be4968cceb2fe0b2bebf50daac17d739","crc":"637fb821","sha":"2b6b56f5e3a731b933cf6e1594dfe1e003674d6b"},"/jniorboot.log.bak":{"length":1041,"date":1512074628,"md5":"8261c4f9cd12695626755ba6d1b0b9ad","crc":"03e23ea1","sha":"761a2b3fa74a921778d1c6fc438b5bfd0d51bc29"},"/jniorboot.log":{"length":995,"date":1512145452,"md5":"f053bbba44bea8f6333702fef922d950","crc":"fa976c1c","sha":"c2665669c49028a549a2e30e10b27e8f2aba5861"},"/flash/benchmark.jar":{"length":24351,"date":1464873509,"md5":"987f4044786771f31e0656cf91ed73f3","crc":"1eed095a"},"/flash/threadtest.jar":{"length":3601,"date":1434645124,"md5":"902ce61cbd2524ca9b83dea335c395d3","crc":"cd2479ff"},"/flash/test4to20.jar":{"length":3862,"date":1434659455,"md5":"a2e309c9d6dd112e5303aa76d2470740","crc":"976f8208"},"/flash/dirs.bat":{"length":87,"date":1435691869,"md5":"531d655733ee668d829f9b3bdad96038","crc":"6a11f77a"},"/flash/www/console/index.php":{"length":4347,"date":1438974987,"md5":"8728680bbc36d369429f7ca2c73cce7d","crc":"c939c423"},"/flash/clean.bat":{"length":56,"date":1436532855,"md5":"ac9ce6553e1629412fb426b342440493","crc":"3b661614"},"/flash/jnior1024.key":{"length":887,"date":1437746752,"md5":"b76b5351a92fdcc8d9b6b38ca62d8d71","crc":"7983e14c"},"/flash/www/config/md5.js":{"length":5693,"date":1433505379,"md5":"a60fec5a81f207ff99ec1b97e3ccad0e","crc":"e2a43d16"},"/flash/www/config/node.png":{"length":253,"date":1440435886,"md5":"1a8dbfaf1771a06e48dea0e3dc604392","crc":"799c6dfc"},"/flash/www/config/tabs-styles.css":{"length":970,"date":1477590404,"md5":"68bca7015f51e26ab42199b5eb17a356","crc":"f8870a33"},"/flash/www/config/tabs.js":{"length":3662,"date":1449678641,"md5":"ff728c86018341548ee70028062c89e0","crc":"1a813112"},"/flash/www/config/styles.css":{"length":4450,"date":1504814044,"md5":"9ad78cca1b794dbcf9db3c55f1be5f1b","crc":"acbd2e14","sha":"3cf0bbc864840994a49f62d0ae00df6d8eb47ef3"},"/flash/www/config/comm.js":{"length":3541,"date":1507912287,"md5":"e7d2e56a443176d6150bbcc8b56e1911","crc":"0ac0ed26","sha":"5e66b96227779c5ef3736a7ca891a43cacffbbf1"},"/flash/www/config/console.js":{"length":5137,"date":1504815652,"md5":"33289e4b09f462efdb50e8d30d22d791","crc":"b89fe380","sha":"c2f3ea4fc0344d43b0c30b7f60b2b6c79c1f4817"},"/flash/www/config/config.js":{"length":12639,"date":1507912576,"md5":"75bf22a88d8a23b17de267607b88a14c","crc":"d693e2f4","sha":"cf9e9bcf7cc7d79ae648b241af16ee194199d7b3"},"/flash/www/config/index.php":{"length":22103,"date":1510262716,"md5":"6fe98e5238c5834d55b0140a7172fec6","crc":"81f11698","sha":"b5b440d43bb19da0396e8ab615161be9200e6180"},"/flash/www/jnior.ico":{"length":3262,"date":1439548680,"md5":"1c3b3dda6b10c6259fcf7c068b760f09","crc":"051803eb"},"/flash/www/favicon.ico":{"length":156790,"date":1486410493,"md5":"07cb90c7f3573eff80222269625ed1dd","crc":"7e367afa","sha":"284add71fe3d3ba48fba059b88ff5143d3964b1d"},"/flash/analogpresets.jar":{"length":163902,"date":1441372806,"md5":"25eacc647412535e320302d3680ce327","crc":"e6b656fc"},"/flash/www/config/config.css.php":{"length":1045,"date":1475072901,"md5":"1692861e9abd7f8d81f5b7cf8a176046","crc":"4c386a21"},"/flash/www/config/inputs.png":{"length":18047,"date":1443116143,"md5":"e2151c93b6cdeaa154d15fab486ae61b","crc":"16290877"},"/flash/www/config/loading.gif":{"length":3236,"date":1264096270,"md5":"d96f6517e00399c37a9765e045eaaf22","crc":"16f442ed"},"/flash/jtest.jar":{"length":1832,"date":1511984925,"md5":"89f28d11945790915112f0a4880b6adc","crc":"cf00edbe","sha":"df53eab9f4eb1360c7ab48f30298ce7c48b0e440"},"/flash/www/vendor/angular_1.3.15/angular.min.js":{"length":125909,"date":1449498838,"md5":"ca1a58818682c3e858a585f283ab9beb","crc":"9d8147d7"},"/flash/www/vendor/bootstrap_3.3.0/css/bootstrap-theme.css":{"length":21740,"date":1449498835,"md5":"c64043a3388612233d7eb947918a9bfc","crc":"638f58a3"},"/flash/www/vendor/bootstrap_3.3.0/css/bootstrap-theme.css.map":{"length":41933,"date":1449498838,"md5":"c5da8241305bfe7e19919e6e943739eb","crc":"11260772"},"/flash/www/vendor/bootstrap_3.3.0/css/bootstrap-theme.min.css":{"length":19199,"date":1449498840,"md5":"374df0ad5809a5314b0577802430a272","crc":"8b3c47b7"},"/flash/www/vendor/bootstrap_3.3.0/css/bootstrap.css":{"length":137590,"date":1449498845,"md5":"ad6381ebfa541b55b0152349c6cabf76","crc":"371e67da"},"/flash/www/vendor/bootstrap_3.3.0/css/bootstrap.css.map":{"length":366866,"date":1449498854,"md5":"4ba278e0c420d166e5a0eb71545f9509","crc":"b7c9868d"},"/flash/www/vendor/bootstrap_3.3.0/css/bootstrap.min.css":{"length":114011,"date":1449498852,"md5":"78e7f91c0c4cca415e0683626aa23925","crc":"34387388"},"/flash/www/vendor/bootstrap_3.3.0/fonts/glyphicons-halflings-regular.eot":{"length":20335,"date":1449498855,"md5":"7ad17c6085dee9a33787bac28fb23d46","crc":"f171b590"},"/flash/www/vendor/bootstrap_3.3.0/fonts/glyphicons-halflings-regular.svg":{"length":62926,"date":1449498857,"md5":"ff423a4251cf2986555523dfe315c42b","crc":"385cd4ad"},"/flash/www/vendor/bootstrap_3.3.0/fonts/glyphicons-halflings-regular.ttf":{"length":41280,"date":1449498858,"md5":"e49d52e74b7689a0727def99da31f3eb","crc":"0617f1ff"},"/flash/www/vendor/bootstrap_3.3.0/fonts/glyphicons-halflings-regular.woff":{"length":23320,"date":1449498858,"md5":"68ed1dac06bf0409c18ae7bc62889170","crc":"cec1a35c"},"/flash/www/vendor/bootstrap_3.3.0/js/bootstrap.min.js":{"length":34653,"date":1449498862,"md5":"281cd50dd9f58c5550620fc148a7bc39","crc":"32d6c689"},"/flash/www/vendor/bootstrap_3.3.0/js/bootstrap.js":{"length":65813,"date":1449498862,"md5":"d5a03d9cca57637f008124916b86b585","crc":"f504a7b3"},"/flash/www/vendor/bootstrap_3.3.0/js/npm.js":{"length":484,"date":1449498863,"md5":"ccb7f3909e30b1eb8f65a24393c6e12b","crc":"cc50e34d"},"/flash/www/vendor/jquery_1.11.1/jquery-1.11.1.min.map":{"length":141680,"date":1449498870,"md5":"ffbeb16578d8cdf58104889baacbbef2","crc":"e4e92bfd"},"/flash/www/vendor/jquery_1.11.1/jquery-1.11.1.min.js":{"length":95786,"date":1449498869,"md5":"8101d596b2b8fa35fe3a634ea342d7c3","crc":"804ff984"},"/flash/www/config/integlogo.png":{"length":5773,"date":1449163436,"md5":"9111308273dadea73f5d09a5e02c7311","crc":"60c4e184"},"/flash/utility.jar":{"length":106794,"date":1449773066,"md5":"ac559b91b537dfa70720a416f32f2960","crc":"888936f1"},"/flash/generators/json/colour.js":{"length":4327,"date":1449774238,"md5":"c67e10d0e0e698fcdbbbadcaa55600d4","crc":"19e8a38f"},"/flash/generators/json/ethernet.js":{"length":1409,"date":1449774238,"md5":"1b6bae08feb93f6bd345a3780c3acb69","crc":"848097a7"},"/flash/generators/json/inputs.js":{"length":2825,"date":1449774239,"md5":"6959db5a769ff3ceea45bf606bda940a","crc":"c544d780"},"/flash/generators/json/lists.js":{"length":12006,"date":1449774239,"md5":"5cc489ac77db7a3369b2ffc30cbd3a86","crc":"ba761254"},"/flash/generators/json/logic.js":{"length":4404,"date":1449774239,"md5":"9cd1cf854976ebb69a6c20a7ac88d2f9","crc":"6c2189f9"},"/flash/generators/json/loops.js":{"length":6040,"date":1449774239,"md5":"e8e9021b5d4eb2e0cc43f11ad5b3bfd7","crc":"b30a758a"},"/flash/generators/json/math.js":{"length":14673,"date":1449774240,"md5":"fa22c29efc362e02d8f35838fcca46e5","crc":"8fc62e67"},"/flash/generators/json/other.js":{"length":983,"date":1449774240,"md5":"dd77f555bc9b50ed17a215d7935f10ab","crc":"3e07810d"},"/flash/generators/json/outputs.js":{"length":3861,"date":1449774240,"md5":"72a118cd7829b5a510e5a901d8863d6e","crc":"bdd5e320"},"/flash/generators/json/procedures.js":{"length":3945,"date":1449774240,"md5":"cb9fb880bebb3375273353fafc12dc9c","crc":"20d43aad"},"/flash/generators/json/text.js":{"length":1363,"date":1449774241,"md5":"a0bd39f638202a0800c100b4eac3cbc3","crc":"b17b24d6"},"/flash/generators/json/timing.js":{"length":2638,"date":1449774241,"md5":"b1ee803dd8e6e00de74e0a3269f0a2ff","crc":"489061b8"},"/flash/generators/json/variables.js":{"length":1500,"date":1449774241,"md5":"fecce79a400d5e4e1edbe521699fa604","crc":"cb724c91"},"/flash/generators/json.js":{"length":4115,"date":1449774238,"md5":"cc72f2468eb970110f3f6f0278f43467","crc":"25a98f30"},"/flash/www/config/link_to.png":{"length":259,"date":1450466976,"md5":"b1ed68183be4f97ce1793139496dbbb4","crc":"a067876a"},"/flash/www/config/collapsed.png":{"length":232,"date":1452087215,"md5":"ef7dd392142824ec54b7b7188717411c","crc":"c7bd8428"},"/flash/www/config/linked.png":{"length":174,"date":1452088114,"md5":"56d2755d08a0857ff6e7750c4b2822dd","crc":"ff59187e"},"/flash/www/config/expanded.png":{"length":238,"date":1452097812,"md5":"905b26e96849524dd6c37e1878f66779","crc":"68686921"},"/flash/www/config/registry.js":{"length":8276,"date":1452271284,"md5":"fc35855793b2bbfe577e420f34cb0dda","crc":"6c73e25a"},"/flash/www/config/deletex.png":{"length":240,"date":1452284181,"md5":"2750f1e60d0222d7f3c0752207fb41e7","crc":"386b823b"},"/flash/www/config/modules.js":{"length":13520,"date":1484149578,"md5":"5d79964a8ca70cc7dc0504c343be3e3c","crc":"3c09b9e2","sha":"d6f0b3ec60796662acd105694ef39543e3dc50a2"},"/flash/www/logging.php":{"length":4853,"date":1463582298,"md5":"170c17bd0962f434eebe699129491912","crc":"dce15f4e"},"/flash/www/slaving.zip":{"length":113815,"date":1465493787,"md5":"b3e85080154b5a7dc10078a6c6fe75c7","crc":"975c987e"},"/flash/0-10vtest.jar":{"length":5053,"date":1438104444,"md5":"3a7be82077e29c598bdd8694d47805f4","crc":"05e27897"},"/flash/4routtest.jar":{"length":2993,"date":1373644405,"md5":"14381605ec8f2f0d0dbe34843b7178b8","crc":"8240fc03"},"/flash/environ.jar":{"length":3881,"date":1476102546,"md5":"8d738f0145516d287174a00dda32dabc","crc":"ff1ecc8b"},"/flash/current.key":{"length":898,"date":1455116261,"md5":"035a0d79bd6c8258c12111479fe7353e","crc":"cbdd8ffe"},"/flash/serialtest.jar":{"length":4532,"date":1457448880,"md5":"48fc4bd9421a5cf275b42235d2f4e2cb","crc":"6d86943b"},"/flash/intellij.jar":{"length":969,"date":1464918560,"md5":"aea445862e32190fa61abc5d97e5b25f","crc":"959a1596"},"/flash/jmodule.jar":{"length":5580,"date":1465240063,"md5":"af7d42f427d0e711c4a79c8e1c1d341d","crc":"40058988"},"/flash/udptest.jar":{"length":5811,"date":1465328251,"md5":"5bbc399b4eb1f5ec427ccbf93c8b135d","crc":"3d976325"},"/flash/buffer.jar":{"length":95325,"date":1467321013,"md5":"0c66b2a130de483b64b91d87471eb952","crc":"5d0819e2"},"/flash/display.jar":{"length":2992,"date":1468953410,"md5":"efcfc78470e98842f52579c81c088a2d","crc":"5ec67fd0"},"/flash/rz.jar":{"length":13079,"date":1469638127,"md5":"c4b7e9f4072d64e3dde9fe5a62406a1e","crc":"20367148"},"/flash/www/config/folder.png":{"length":329,"date":1454662486,"md5":"316b7810fa502618b4e85788a82617a8","crc":"55f20187"},"/flash/www/config/file.png":{"length":286,"date":1454662486,"md5":"1b75c23448e9c6eed675404f6130491d","crc":"d327c449"},"/flash/www/config/warning.png":{"length":3068,"date":1332275646,"md5":"9c96d831cfc50fdedfdc980bc2abb2cf","crc":"e90bb05a"},"/flash/www/config/folders.js":{"length":19270,"date":1504815735,"md5":"c7a59ef1aea3aad95d3315627d3a3b29","crc":"6b1adf25","sha":"93d7e851c9a1a65ed45b7c1bbe4368d3d941b32f"},"/flash/clktest.jar":{"length":2616,"date":1470249535,"md5":"345b4a9a22ec05bc89bb291b7b047e0e","crc":"270f1d8b"},"/flash/timesearch.jar":{"length":4180,"date":1471371624,"md5":"bf719e65d8f4be9d7348a621ac69bc2b","crc":"25075aa7"},"/flash/janosruntime_1.5.1.jar":{"length":1621696,"date":1472744987,"md5":"b8beb71b94b36129534ef4d6ec13f5ab","crc":"abc7b327"},"/flash/www/config/relays.js":{"length":4189,"date":1484587793,"md5":"803af5c2431b8f58c110260b3f317838","crc":"ee9ab3af","sha":"21ec766fe220bd0618b43050851f9cd67dd1bf54"},"/flash/www/config/temperature.js":{"length":2870,"date":1475245816,"md5":"262c339513007cd746ee01da9a4a843f","crc":"d062a444"},"/flash/www/config/dimmer.js":{"length":8255,"date":1475265861,"md5":"e7213c6fb8c263ac71acb766e62dc4ce","crc":"b9edf051"},"/flash/www/config/range.css":{"length":2212,"date":1475499110,"md5":"6932c76ab79879ea4c5d826d9cb60db9","crc":"3334dfd1"},"/flash/www/config/analog.js":{"length":7267,"date":1484587793,"md5":"87abcaf68dea5e2e203326a55bc2bca5","crc":"9766b532","sha":"dd788111904d41826164ea151f78dd4b3e3b84e6"},"/flash/www/config/ledon.png":{"length":626,"date":1475506220,"md5":"6018d69896fcba49da54c39d8ee19803","crc":"32a65f15"},"/flash/www/config/panel.js":{"length":2038,"date":1475509052,"md5":"e0631cb06777f63f0a071f7aa5d198d0","crc":"a38a7db3"},"/flash/www/config/ledoff.png":{"length":757,"date":1475509575,"md5":"4bb71e412a20ae6f098a29b195b10e13","crc":"3fd16f7a"},"/flash/jpanel.jar":{"length":3142,"date":1358430294,"md5":"39825ccddf7b61c1ad41d261d84f4950","crc":"446bee7f"},"/flash/www/config/syslog.js":{"length":1929,"date":1496773328,"md5":"4e8ecca50284c2aeae8e8b90db27ded8","crc":"ac2a2541","sha":"e413d70cc2bb6717448bc84c2980abc764bc3dd6"},"/flash/www/config/peers.js":{"length":5885,"date":1505835290,"md5":"2536fc521f916341b98183f6ce0b2453","crc":"f2a44392","sha":"5d949b8daa8e5081f19c88e42af968b24955e02c"},"/flash/www/index.php":{"length":356,"date":1477657721,"md5":"3ba20cf61f44f9ace09104261acf2711","crc":"7f8eaed3"},"/flash/www/www.zip":{"length":85751,"date":1477663620,"md5":"296baa71d70bf40c1ad6ee0c71066c49","crc":"69922bd1"},"/flash/www/download1.php":{"length":465,"date":1480616431,"md5":"1f69c84031dbdbe9aeecd634c0ab9607","sha":"9770a8f6534f17f86eeb332309b7cbe07441022e","crc":"c7b59619"},"/flash/www/short.php":{"length":273,"date":1481120537,"md5":"2fb318c42bd07c0ec34551502bc20c73","sha":"9b9831ca6abda2a14a922e058430fe114b8b34e0","crc":"fbca8ae2"},"/flash/ctrlc.jar":{"length":1510,"date":1482421756,"md5":"b7ce2da5b761674e626ae62c4b9edbcc","sha":"51a17a3f092333a0a48aa8e6dcebe0ce99cef3de","crc":"bd2a0810"},"/flash/www.zip":{"length":87642,"date":1510262835,"md5":"b2cb6f23c2fc91d8f5d79a6e62c5ee10","sha":"a24717087011d3b87b1135ad27556c2337598d98","crc":"df0db0a9"},"/flash/www/config/favicon.ico":{"length":766,"date":1486410493,"md5":"07cb90c7f3573eff80222269625ed1dd","sha":"284add71fe3d3ba48fba059b88ff5143d3964b1d","crc":"7e367afa"},"/flash/www/map.html":{"length":1170,"date":1485380108,"md5":"901c9971c3c591b3d736cd91516960de","sha":"5ded94156ca71884af1afae0fcaf1e78d3bac23d","crc":"71f8c837"},"/flash/jmanifest.jar":{"length":5651,"date":1485192866,"md5":"dfb84226c647a42295d9f671cfb99fa5","sha":"a7331cca377c1f96e400ddd5044c01a175ee230f","crc":"1a64c6d6"},"/flash/jping.jar":{"length":2174,"date":1485201152,"md5":"0d533008847888e0dfcf497c0cff1a96","sha":"75fbff5a973b8dac3408fdda46e47e708b585e58","crc":"f1203f43"},"/flash/jaccess.jar":{"length":4820,"date":1485805203,"md5":"29ce866873686dd133a724e4db29c690","sha":"239bf75c1597a25fdbbbb78798fe72971ca15f63","crc":"e5ae0d1c"},"/flash/somepath/path2/testx.php":{"length":5282,"date":1486397961,"md5":"ce1a071b258c936c65679d6bb67db198","sha":"30342828ebaeb69cd8ecefd75f2dd01e80c6388b","crc":"ecd9251a"},"/flash/bruce_dev.cer":{"length":902,"date":1487172768,"md5":"e9917f27384ddee36817c04c8cde9199","sha":"4b2b82a042a0019679c1b071956278f6ddd1f27b","crc":"115ed2ae"},"/flash/www/config/registrydoc.css":{"length":21460,"date":1504201641,"md5":"15423ca727b03e6b1581910c6ca2eab5","sha":"f521b53a4518e7490768d2a8ae0e707c1dfb943b","crc":"0d5fd8c9"},"/flash/www/config/registrydoc.html":{"length":169108,"date":1509114127,"md5":"e30fe001dfab22c848d1d440f79c96db","sha":"a6edec26f5cbfd249e9f1b3947e92a4766d3bacb","crc":"25c25ccc"},"/flash/www/panel/comm.js":{"length":4715,"date":1498074333,"md5":"44aa80868230fbfeee0a3c48c390896d","sha":"37b479f65e7e8221d6fd9349439a8193cc645ba7","crc":"0d5e92bd"},"/flash/www/panel/index.php":{"length":2648,"date":1501526934,"md5":"923ce6739971521191f9000662f38323","sha":"a35d1d5f24da487be376595b46598e162e0f5310","crc":"ffd86d7b"},"/flash/www/panel/panel.js":{"length":993,"date":1501527049,"md5":"9d9a2cbb435ffe8af5bd9d8c0598dccd","sha":"2ef881dc8d90b4b0fb80a59d717c7125ca23fb04","crc":"4fcd0f37"},"/flash/www/panel/panel.css":{"length":2586,"date":1501527291,"md5":"2a3a66d14d7bc6d4b01dfbd745205c7d","sha":"886770297a07a594b88430d5db4ae9e23738d118","crc":"2dd8a81d"},"/flash/www/graphr.zip":{"length":556637,"date":1506536442,"md5":"891b1dfa8d774b85aefcbd8791abe11f","sha":"e5d204333658bd5c2f7c5b5ff682911124a10766","crc":"62d153fb"},"/flash/public/dcp.zip":{"length":181914,"date":1504795829,"md5":"655e8587293f35f11c5c24fc38201d2f","sha":"5fcfd8e38826e648f98f8d50f3613deb0d6312b6","crc":"da99b7d0"},"/flash/test.txt":{"length":304,"date":1495131459,"md5":"fc9f1f5e67928ccb9be3aeaa66cd9e52","sha":"6100d999f484f98ab476408c801dd000e579a62c","crc":"765047c5"},"/flash/dmx.jar":{"length":4476,"date":1500567859,"md5":"3fd35bbe6bbf53a32aecf273275d1839","sha":"4f702a87adb060294b553e6bd212672727d5d25f","crc":"e81db9aa"},"/flash/juptime.jar":{"length":3201,"date":1506713589,"md5":"d4c2482fae18482727c1b2afabcf94b4","sha":"86268b720b99760a4ebdb803db53f3f7fd18fd18","crc":"44b0878c"},"/flash/jscan.jar":{"length":2189,"date":1507141493,"md5":"a0a42e17f003cedcac9c8e662ada6b36","sha":"f1cafb56fdae33b66fff9b20cd2ff2705d96da9e","crc":"60f00fe2"},"/access.log":{"length":177,"date":1510081848,"md5":"914113dd52c4e74d2675eb1094ba10c6","sha":"0212252f4f04ab136ce74ab0425cd7fce26b7c47","crc":"e9a7f8d8"},"/auxio.log":{"length":1589,"date":1511288557,"md5":"a52713575d5c449ff8e8cdbeb7e10ba6","sha":"22106e83ff429cc08fe16f21dc32623850f5673c","crc":"a29ad191"},"/jniorio.log":{"length":3332,"date":1511289076,"md5":"d3c685fde34b343f2ba53dd60e4bf11d","sha":"dd001970b69d61ab619745853addaf2910aabb31","crc":"1bbc78de"},"/flash/hmi.jar":{"length":8329,"date":1511283865,"md5":"1a1b247ccb5e3eb9623d12578c1ba833","sha":"7a1f5868817e8a3e60fe8fb2c4d9ed168e53d141","crc":"fb2a0367"},"/flash/ckeypad.jar":{"length":11194,"date":1512145569,"md5":"71288ea4ffa40e936dbecfd010fff785","sha":"23f944b627705716697ece761c6c95f8c1f873bb","crc":"3d9fc092"}}

bruce_dev />

So beyond identification header information such as model, serial number, JANOS version and timestamp the last entry files appears to be another JSON Object, That seems to be the database content.

In manifest.json the files element is another JSON Object. So let’s load that and enumerate the keys it contains.

    public static void main(String[] args) throws Exception {
        
        // Obtain Json Object from a file
        Json j = new Json(new File("manifest.json"));
        Json jdb = (Json) j.get("files");
        
        // List top level enties
        Enumeration e = jdb.keys();
        while (e.hasMoreElements()) {
            String name = (String) e.nextElement();
            System.out.println(name);
        }
    }

CODE: SELECT ALL

bruce_dev /> jtest
/etc/janosclasses.jar
/flash/serialcontrol.jar
/flash/serialethernet.jar
/flash/modbusserver.jar
/flash/snmp.jar
/flash/www/base64.js
/flash/www/configure/index.html
/flash/www/jr310applet.jar
/flash/www/jniorprotocol.jar
/flash/www/jniorprotocolhelpers.jar
/flash/www/task/index.html
/flash/www/taskmanagerinterface.jar
/flash/www/jquery/jquery-1.9.0.min.js
/flash/www/jquery/jquery-ui.css
/flash/www/jquery/jquery-ui.min.js
/flash/www/vendor/bowser/bowser_0.7.2.min.js
/flash/www/text
/flash/www/cycle
/flash/slaveservice.jar
/flash/ftp.jar
/flash/task.ini
/flash/task.jar
/flash/jnior.ini
/jniorsys.log
/jniorboot.log.bak
/jniorboot.log
/flash/benchmark.jar
/flash/threadtest.jar
/flash/test4to20.jar
/flash/dirs.bat
/flash/www/console/index.php
/flash/clean.bat
/flash/jnior1024.key
/flash/www/config/md5.js
/flash/www/config/node.png
/flash/www/config/tabs-styles.css
/flash/www/config/tabs.js
/flash/www/config/styles.css
/flash/www/config/comm.js
/flash/www/config/console.js
/flash/www/config/config.js
/flash/www/config/index.php
/flash/www/jnior.ico
/flash/www/favicon.ico
/flash/analogpresets.jar
/flash/www/config/config.css.php
/flash/www/config/inputs.png
/flash/www/config/loading.gif
/flash/jtest.jar
/flash/www/vendor/angular_1.3.15/angular.min.js
/flash/www/vendor/bootstrap_3.3.0/css/bootstrap-theme.css
/flash/www/vendor/bootstrap_3.3.0/css/bootstrap-theme.css.map
/flash/www/vendor/bootstrap_3.3.0/css/bootstrap-theme.min.css
/flash/www/vendor/bootstrap_3.3.0/css/bootstrap.css
/flash/www/vendor/bootstrap_3.3.0/css/bootstrap.css.map
/flash/www/vendor/bootstrap_3.3.0/css/bootstrap.min.css
/flash/www/vendor/bootstrap_3.3.0/fonts/glyphicons-halflings-regular.eot
/flash/www/vendor/bootstrap_3.3.0/fonts/glyphicons-halflings-regular.svg
/flash/www/vendor/bootstrap_3.3.0/fonts/glyphicons-halflings-regular.ttf
/flash/www/vendor/bootstrap_3.3.0/fonts/glyphicons-halflings-regular.woff
/flash/www/vendor/bootstrap_3.3.0/js/bootstrap.min.js
/flash/www/vendor/bootstrap_3.3.0/js/bootstrap.js
/flash/www/vendor/bootstrap_3.3.0/js/npm.js
/flash/www/vendor/jquery_1.11.1/jquery-1.11.1.min.map
/flash/www/vendor/jquery_1.11.1/jquery-1.11.1.min.js
/flash/www/config/integlogo.png
/flash/utility.jar
/flash/generators/json/colour.js
/flash/generators/json/ethernet.js
/flash/generators/json/inputs.js
/flash/generators/json/lists.js
/flash/generators/json/logic.js
/flash/generators/json/loops.js
/flash/generators/json/math.js
/flash/generators/json/other.js
/flash/generators/json/outputs.js
/flash/generators/json/procedures.js
/flash/generators/json/text.js
/flash/generators/json/timing.js
/flash/generators/json/variables.js
/flash/generators/json.js
/flash/www/config/link_to.png
/flash/www/config/collapsed.png
/flash/www/config/linked.png
/flash/www/config/expanded.png
/flash/www/config/registry.js
/flash/www/config/deletex.png
/flash/www/config/modules.js
/flash/www/logging.php
/flash/www/slaving.zip
/flash/0-10vtest.jar
/flash/4routtest.jar
/flash/environ.jar
/flash/current.key
/flash/serialtest.jar
/flash/intellij.jar
/flash/jmodule.jar
/flash/udptest.jar
/flash/buffer.jar
/flash/display.jar
/flash/rz.jar
/flash/www/config/folder.png
/flash/www/config/file.png
/flash/www/config/warning.png
/flash/www/config/folders.js
/flash/clktest.jar
/flash/timesearch.jar
/flash/janosruntime_1.5.1.jar
/flash/www/config/relays.js
/flash/www/config/temperature.js
/flash/www/config/dimmer.js
/flash/www/config/range.css
/flash/www/config/analog.js
/flash/www/config/ledon.png
/flash/www/config/panel.js
/flash/www/config/ledoff.png
/flash/jpanel.jar
/flash/www/config/syslog.js
/flash/www/config/peers.js
/flash/www/index.php
/flash/www/www.zip
/flash/www/download1.php
/flash/www/short.php
/flash/ctrlc.jar
/flash/www.zip
/flash/www/config/favicon.ico
/flash/www/map.html
/flash/jmanifest.jar
/flash/jping.jar
/flash/jaccess.jar
/flash/somepath/path2/testx.php
/flash/bruce_dev.cer
/flash/www/config/registrydoc.css
/flash/www/config/registrydoc.html
/flash/www/panel/comm.js
/flash/www/panel/index.php
/flash/www/panel/panel.js
/flash/www/panel/panel.css
/flash/www/graphr.zip
/flash/public/dcp.zip
/flash/test.txt
/flash/dmx.jar
/flash/juptime.jar
/flash/jscan.jar
/access.log
/auxio.log
/jniorio.log
/flash/hmi.jar
/flash/ckeypad.jar

bruce_dev />

So here we see that there is basically an entry for every file on the JNIOR. The files are defined by their absolute path from the root.

Let’s enumerate the content of the first file entry. I’ll just break in the loop to do one.

    public static void main(String[] args) throws Exception {
        
        // Obtain Json Object from a file
        Json j = new Json(new File("manifest.json"));
        Json jdb = (Json) j.get("files");
        
        // List top level enties
        Enumeration e = jdb.keys();
        while (e.hasMoreElements()) {
            String name = (String) e.nextElement();
            System.out.printf("%s = %s\n", name, jdb.get(name).toString());
            break;
        }
    }
bruce_dev /> jtest
/etc/janosclasses.jar = {"length":243492,"date":1512145376,"md5":"ece8d0ebf9b6882e488d1f7c9e764ce0","crc":"bde463c8","sha":"373b88f011b49f65eafd8e293fc185cc2563892e"}

bruce_dev />

So the file entry is yet another JSON Object/ It looks to contain file information.

Okay. Let’s see what it has to say about the first file.

    public static void main(String[] args) throws Exception {
        
        // Obtain Json Object from a file
        Json j = new Json(new File("manifest.json"));
        Json jdb = (Json) j.get("files");
        
        // Get first entry whatever it is and enumerate it
        Enumeration e = jdb.keys();
        while (e.hasMoreElements()) {
            String name = (String) e.nextElement();
            System.out.println(name);
 
            Json jfile = (Json) jdb.get(name);
            Enumeration efile = jfile.keys();
            while (efile.hasMoreElements()) {
                String name2 = (String) efile.nextElement();
                System.out.printf("   %s = %s\n", name2, jfile.get(name2));
            }            
            
            break;
        }
    }

We see the file size in bytes, some kind of encoded date, and three difference checksums or digests (MD5, CRC, and SHA). The CRC is CRC32 by teh way and SHA is SHA1.

The date is in Unix format GMT timezone.

So that pretty much is the structure of the MANIFEST database. That is a good example of using the JSON format for database storage.

By the way, the JanosClasses.jar file was just changed. The database has an earlier reference. So the date and checksum don’t match. This is how MANIFEST knows to indicate that the file has been “Modified”.

bruce_dev /> manifest etc
JNIOR Manifest      Tue Dec 05 15:26:12 EST 2017
  Size                  MD5                  File Specification
 265756   f26d0f63dd5cc055dad699f117bb4f17  [Modified] /etc/JanosClasses.jar
End of Manifest (1 files listed)

bruce_dev />

We could update our database with a MANIFEST -U command and examine it again. If we did you would see that the MD5 then matches.

 

We’ve been experimenting with ride-thru power supplies. Basically the JNIOR continues to run after power is removed even though the unit is not battery powered. The hold time is relatively short and only a matter of seconds depending on the load on the supply at the time. But it is sufficient to weather short power outages and glitches that would otherwise cause the JNIOR to reboot. This ride-thru supply design makes sense for controller applications. It may save you the cost of a UPS if dirty power appears to be your issue. It will debut for INTEG in our 412DMX JNIOR.
One advantage of this is that we know when power has been removed or lost. We also know when it has been restored. That means we can now tell if the JNIOR has been left powered off and then booted up by looking at the log. With product without this technology you cannot tell from the log if there was a spontaneous reboot or what may have occurred.
This 412DMX prototype (the first) had been off for a couple of days and, well, it can’t hide that fact.

We see now that the unit was powered down for roughly 74 hours. When NTP re-synchronized the clock it had gotten 2.87 seconds fast. That is an error of about 1%. It goes to the importance of clock synchronization through NTP. If your JNIOR does not have access to the Internet and the host of NTP servers out there, maybe there is one on your internal network. If not perhaps there is another approach you can use to keep accurate time.
It is surprising that in this day and age that clocks in our computers are not much more precise. Some devices now get accurate time through the GPS system. Others over the cellular networks. Nether are generally available to a JNIOR. Don’t ask me about the RTC in the Renesas RX63N. I’ll go off on a rant!
 
So here is what you would see for a brief power outage. In this case the JNIOR never skipped a beat. There was no reboot and the DCP never had to reconnect.

Here power was out for some 7 to 8 seconds (I yanked the plug). Subsequently the ride-thru supply recharged itself so it would be ready to do it all over again.
Okay, not so impressive given that we are spoiled by all of our battery devices like phones, tablets and laptops. But there isn’t the cost of the battery nor the risk of fire. And, given that the JNIOR controller is generally powered 24/7 that extra hardware isn’t justified. But it does piss you of when the power company decides to reboot all of your unprotected equipment when the reconfigure the Grid. This handles that.
 
By the way, this R00 prototype has a 5F capacitor in which it stores the holding energy. The R01 prototype has 10F and that is likely what we will be shipping. You can see here using the STATS command that the average hold time for the 5F unit is about 12.5 seconds (no real load). The 10F units pushes 20 seconds.

Applications receive command line arguments just as any standard Java program would. You can use these to establish new default configuration settings. For example we are developing a UI for our Cinema application. The UI application (called “Cinekey” right now) when run makes a TCP/IP connection to a host running the Cinema application. By default localhost is used and the default port is 9610.

The command line syntax is:

cineky [Ip_address [port]]

You can even include these parameters in a Run key. If the optional host Ip_address is specified it will be recorded as the new default. If you specify the host you can also optionally specify the port. That will also be save as default. That means that if you do that once then you can just use the program name from that point forward to run it and it will use the last specified socket default.

Here is the code for that. You can see how it can be adapted perhaps to your configuration needs.

    public static void main(String[] args) throws Throwable {
        
        String host = "";
        int port;
        
        // obtain port
        if (args.length > 1) {
            port = Integer.parseInt(args[1]);
            JANOS.setRegistryString("Cinekey/Port", Integer.toString(port));
        }
        else
            port = JANOS.getRegistryInt("Cinekey/Port", 9610);
        
        // Obtain host
        if (args.length > 0) {
            host = args[0];
            JANOS.setRegistryString("Cinekey/Host", host);
        }
        else
            host = JANOS.getRegistryString("Cinekey/Host", "localhost");
        
        // Establish connection
        conn = new CineConnect(host, port);
        new Thread(conn).start();

While JANOS strives to create a secure environment we generally fall short in that arena when it comes to applications. An application can listen for connections and process its own custom protocol. That is not so complicated to do but it is another big step to insure some level of security. We hardly ever get that done.

If the custom protocol first requires a username and password, you can use the method that JANOS provides User.validate() to validate the login. The issue here is that the username and password are transferred in the clear unless the protocol requires a secure SSL/TLS connection.

When an application implements the client side of a connection there is the User.digestMD5() method that can be used in combination with a NONCE to transfer credentials securely. Unfortunately we don’t have method available to validate digest encoded credentials on the server side.

By the way, I think it is better to transfer the username and password in the clear than to not implement authentication at all. Note also that the vast majority of our applications run on a physically secure network.

Still we can certainly ramp this up a level.

Also, applications run with Administrator privileges and merely authenticate the supplied username and password. The supplied account then does not limit the application. So you can define a guest level account solely for authenticating access. If those credentials are then compromised perhaps by being transferred in the clear they are really not a security issue otherwise. In other words, you shouldn’t use an administrator account to log into these application protocols. In addition, an application protocol shouldn’t implement capabilities that compromise the security of the JNIOR by allowing configuration changes.

We’ve been using a “nonce” string to encrypt credentials for transfer over clear text channels. The approach was first employed as an option in the JNIOR Protocol. It works like this.

Nonce String

The “nonce string” is any string of random (usually printable) characters. It is generated by the server and supplied to the client either upon request or as part of an announcement on connection. The nonce can only be used once to authenticate a set of credentials. It should only be valid for a brief period of time, usually 1 or 2 minutes.

The Hash

The client uses an MD5 message digest function to obtain a hash from a combination of username, password and nonce. Our procedure combines the username followed by the nonce followed by the password each separated by a colon ‘:’. Therefore:

hash = MD5( username + ":" + nonce + ":" + password )

The hash produced here is a 16 byte binary array. It is converted to a 32 character hexadecimal (case-insensitive) representation before it is used.

Encoded Credentials

The credentials are then supplied with the username in plain text as follows:

encoded_credentials = username + ":" + hash_hexadecimal

The encoded credentials string is supplied to the server for authentication. The server takes the username from the string and looks up the password for the account. It then uses the nonce it supplied to calculate the digest as defined above. If the calculated digest matches that sent by the client the login is valid.

Since it is practicably impossible to reverse the hash to determine the password for the account this limits risk when transmitted in the clear. It does not matter if the attacker knows the nonce. It is imperative that the nonce be single use and if possible only valid for the one socket connection. This is to prevent a replay attack where the encoded credentials are repeated by the attacker to gain access.

An issue with the above is that applications do not have access to user account passwords in clear text in order to calculate the digest. JANOS needs to provide some assistance here. A validation method is needed.

So it can be done. The following takes encoded credentials as if they were from a client to which we had supplied the random nonce. We then process the authentication without access to the password for the user.

package jtest;
 
import com.integpg.system.ArrayUtils;
import com.integpg.system.User;
import java.util.StringTokenizer;
 
public class Main {
    
    public static void main(String[] args) throws Exception {
        
        // supplied encoded credentials and original nonce
        String creds = "jnior:4f163e3fdaee54babdc0a8aaad7df1c1";
        String nonce = "jhfjh23k4k3489ysf989(*(98a98a9835h2k3";
        
        // parse credentials
        StringTokenizer tokenizer = new StringTokenizer(creds, ":");
        String username = tokenizer.nextToken();
        String digest = tokenizer.nextToken();
        
        // obtain binary digest
        byte[] hash = new byte[16];
        for (int n = 0; n < 16; n++)
            hash[n] = (byte)Integer.parseInt(digest.substring(2*n, 2*(n+1)), 16);
        
        // obtain digest using digestMD5()
        int userid = User.getUserID(username);
        byte[] hash2 = User.digestMD5(userid, username + ":" + nonce + ":", "");
 
        // compare hashes
        if (ArrayUtils.arrayComp(hash, 0, hash2, 0, hash.length))
            System.out.println("Login successful!");
        
    }
}
bruce_dev /> jtest
Login successful!

bruce_dev />

I am sure there are other ways to parse the credentials and to convert the hexadecimal string to a byte[]. JANOS does not implement the String.split() method. You can use Regex.

As an alternative to processing the hexadecimal string you could convert hash2 into the hex string and compare. I am not sure which is faster.

Here’s the split done using Regex.

    // parse credentials
        String[] parts = Pattern.compile(":").split(creds);
        String username = parts[0];
        String digest = parts[1];

I had thought that I had implemented the String.split() method but no. I am probably thinking of the split() function in the PHP scripting.

And here’s another way to convert the hexadecimal digest string into the byte array.

        // obtain binary digest
        byte[] hash = new byte[16];
        for (int n = 0; n < 32; n++)
            hash[n/2] = (byte)(16 * hash[n/2] + Character.digit(digest.charAt(n), 16));

JANOS implements a file permission scheme modeled after Unix file permissions. Those familiar with the Linux recognize the permissions in JANOS file listings.

bruce_dev /> ls -v
total 12
drwxrwxrwx   1 root      root          10 Nov 28 10:23 .
drwxrwxrwx   1 root      root          10 Nov 28 10:23 ..
dr-xr-xr-x   1 root      root           1 Dec 31 1999  etc
drwxr-xr-x   1 root      root          49 Nov 28 10:16 flash
drwxrwxrwx   1 root      root           0 Dec 31 1999  temp
-rw-r--r--   1 root      root       39023 Nov 28 14:50 jniorsys.log
-rw-r--r--   1 root      root        1011 Nov 28 10:23 jniorboot.log
-rw-r--r--   1 root      root        1082 Nov 28 10:16 jniorboot.log.bak
-rw-r--r--   1 jnior     root       20585 Nov 22 11:52 manifest.json
-rw-r--r--   1 jnior     root        3332 Nov 21 13:31 jniorio.log
-rw-r--r--   1 jnior     root        1589 Nov 21 13:22 auxio.log
-rw-r--r--   1 root      root         177 Nov 07 14:10 access.log
  1891.7 KB available

bruce_dev />

There are 3 groups of ‘rwx’ permissions. The first is for the file owner. The second for the group associated with the file. And, the third is for everyone else. This implies some kind of User Groups. Note that on the Series 3 there are no User Groups and so file permissions were somewhat shortened.

JANOS allows you to define a User Group using the GROUPADD command. There is a root group by default to which noone belongs.

The GROUPS command lists the defined user groups and any users associated with each.

bruce_dev /> groups
 root        0    
 techadmin   2    
 techs       1    tech      

bruce_dev />

Back during JANOS development Kevin brought it to my attention how the TAB key was being used on the command line in other systems. Basically it served as an auto-complete function.

The TAB has been implemented in JANOS with some twists. Once you work with a Series 4 at the command line you just can’t handle a Series 3 where you have to type every character.

Where a file path or file name is expected the TAB will cycle through all of the valid names. For example In the following video I will type CAT and space and then hit TAB a few times slowly. When the desired file name appears I can hit ENTER. Let’s see jniorboot.log without typing jniorboot.log.

In the above post Bruce showed us how to use the TAB as an autocomplete for the commands that are available from the command line.

My favorite feature of the TAB autocomplete is filling out file names. In this quick video you will see that there are two files that names that begin with ‘jn’. You will see that i start typing the filename for ‘jniorsys.log’. I use the TAB key to cycle through the file names that start with “jn’.

Another great place that the TAB work is when working with the registry from the command line. “Who does that?” you ask. I do. Yes the registry tab in the DCP is wonderful but for some, yours truly, the command line is faster. Especially when the TAB is utilized!

In this video I want to change the hostname. Yes, there is a hostname command but I want to show how to use the registry from the command line.

You will see that i use ‘reg’ which is the alias for registry. I type ‘i’ then TAB to look through registry keys that start with ‘i’. I select the ‘IpConfig’ folder. Now I use TAB to cycle through the available registry keys. Once i find hostname I type ‘ =” and TAB again to see the value.

Use of the up arrow will enter the previous command and then I edit the key value using backspace and enter my change. Now the hostname is ‘kev-dev’

Take a look

In general the TAB performs context specific auto-complete.

By using TAB repeatedly each valid completion is displayed. If you find the a form of the entry that is appropriate you simply continue to build the command line and hit ENTER to execute. Generally TAB offers matching file and folder names from the current working folder or other folder if specified by preceding content on the command line.

You may begin to type an entry and then use TAB. Only those completions which incorporate the starting characters are shown. So if you wish to filter the possibilities you can enter the first or first couple of characters. Similarly you can enter a path to a folder and completions will be content from that folder.

A TAB used within the first word on a line will auto-complete valid commands and lines from those previously entered. Recently entered command lines are preserved in a history (See HISTORY command) which you can normally access using UP-ARROW and DN-ARROW. The TAB auto complete will include your history. So if you want to execute a MANIFEST command with the same options that you had previously run, you could hit ‘m’ followed by TAB and that complete command line will be one of the completions offered. Completions for the first word on a line will also include normal file entries which may be useful if you want to execute a program.

If the REGISTRY or REG command is being built, TAB instead offers completion options from the set of matching Registry Keys instead of folder content. In this case too the TAB can be used immediately after the ‘=’ to complete the balance of the REGISTRY command with the current content of the specified key.

With some experience you learn to use TAB efficiently and rarely need to enter an entire filename or Registry Key.

For example once you have copied a UPD file to the /temp folder you can generally execute the JRUPDATE command very quickly with the following keystrokes. This assumes that the UPD file is the sole occupant of that temporary folder.

jru[TAB][SP]-fup[SP]t[TAB]/[TAB][ENTER]

Note that TAB presents you with optional auto-completion text alphabetically.

JANOS remembers the last 16 unique commands entered during a single session. This allows you to user the UP_ARROW and DOWN_ARROW keys to scroll through the recent commands. A command can easily be re-executed by scrolling back to it and hitting ENTER. A prior command can be first edited. That may be useful when wanting to add an option to its execution.

The HISTORY command displays the recorded commands. They are numbered but the HISTORY command does not give you the means to select from the list.

bruce_dev /> history
1: history
2: whoami
3: help passwd
4: passwd tech
5: passwd
6: users
7: useradd -cd tech
8: useradd
9: userdel
10: usermod
11: help usermod
12: help

bruce_dev />

Note that in addition to scrolling using the up and down keys you can use the TAB key to retrieve from the list.

TAB when used at the beginning of the command line offers valid commands in addition to file names. It includes those from the recent HISTORY. If you want to recall the ‘help usermod’ command given the above HISTORY you can simply type ‘h’ followed by TAB until you get the line that you are seeking. Note that TAB presents options alphabetically.