'cng2jpg' Decoder Notes

 
A description of a Tcl script
to convert 'cng' files to JPEG files.


The 'Complete National Geographic' (CNG)
box of DVD's

! Preliminary ! More notes and images may be added.

Home > RefInfo menu > Computer topics menu > Linux Guides/Notes by Blaze menu > This cng2jpg Notes page

Introduction :

In 2011 Dec, while Christmas shopping, I ran across the 'Complete National Geographic' (called CNG for short). This is a set of 6 DVD's containing images of the pages of National Geographic Magazines published from 1888 to 2008.

The DVD's contain over 1,400 issues, 8,000 articles, and 200,000 photos.

The DVD's consist of images of the pages of the magazines. At about 160 pages per magazine (on average), this means there are on the order of 160 x 1400 = 224,000 page images on the DVD's. If you ignore the many advertising pages (about 40 or more per issue), there may be on the order of 150,000 actual article pages.

The 'reader' software in the collection --- software for reading the encrypted '.cng' files which contain images of the pages of the magazines --- is designed to run on Microsoft Windows or Mac OS X. A Linux version of the 'Adobe Air' reader software is not shipped on the DVD's --- and according to reports, at some links at the bottom of this web page, 'Adobe Air' was not implemented well for Linux.

In fact, many comments on the web (see the links at the bottom of this web page), indicate the 'reader' was not implemented well on MS-Windows or Mac OS X either.

In doing some web searches on terms like 'cng file complete national geographic', I found that even people who had installed the software on MS-Windows or Mac OS X were often left high and dry --- not being able to read the magazine pages, even after some had spent $200 for the collection.

In doing some searching on the nature of the '.cng' files, I found the following quote.

    "The cng files are all jpegs, XOR'd bitwise with 239."


Devising a 'cng2jpg' converter :

Using the Gnome calculator, 'gcalctool', one can see that decimal 239 is hex 'EF' --- which is binary 11101111.

In looking at the top of a '.jpg' file, I saw that among the 'unreadable' binary bytes, in characters 7 through 10 of the first part of the typical JPEG file, is the 'human-readable' string 'JFIF'.

I brought up one of the '.cng' files, from one of the CNG DVD's, in the binary editor 'bless'. Sure enough, the eighth and tenth characters of the '.cng' file were identical hex codes ---- corresponding to the location of the two 'F' characters of JFIF. But they were not the hex code for the letter 'F'.

The 'bless' editor has a feature that lets you pick a hex byte and do a logical operation on the byte (like an XOR operation, with a code like hex 'EF' or binary 11101111). Sure enough, when I performed that operation on the 8th (or 10th) character of the '.cng' file, I got the character 'F'.

I knew that the Tcl script language supported 'bit-wise' operations, so I decided I would try to make a script that converted a '.cng' file to a '.jpg' file.

There aren't many good examples, in the various Tcl-Tk textbooks --- nor on the Internet, of doing an XOR operation on byte codes (whether hex or binary or decimal), but I was able to devise a 'cng2jpg.tcl' script. (This is a local link to the source.)

After the script 'cng2jpg.tcl' opens the input and output files, the following statements do the main work.

   ## NOTE: We are reading one byte at a time here.
   set BYTEin [read $f1 1]

   ## In 'binary scan', 'c' indicates we want the new variable CHARin
   ## to be 'typed' as an 8-bit character code.
   binary scan "$BYTEin" c CHARin

   ## TRANSLATION OF THE BYTE IS HERE -- XOR with hex 'EF' = decimal 239 for each byte.
   set XORout [expr { $CHARin ^ 239 }]

   ## Making sure the byte is in binary form, for writing out.
   set BYTEout [binary format c $XORout]

   ## We write out one byte at a time here.
   puts -nonewline $f2 $BYTEout

The speed of the conversion process might be improved by seeing if we can eliminate or combine some of these 5 statements. (In 2012, I may see if I can eliminate the 'binary scan' or the 'binary format' commands --- and perhaps eliminate the intermediate variables like 'CHARin', 'XORout', and 'BYTEout'. But I like the almost-self-documenting readability of the five statement approach.)

The comments in the script point out that ... although we appear to be reading-from and writing-to the two disk files a byte at a time, the reading and writing really goes quite fast, because ...

Underneath it all, the file 'reads' are actually buffered. So the reading of the CNG file should go really fast, in blocks. We are actually fetching a byte at a time from a cache (in-memory) input buffer, which should go very fast. And ...

As with reading, underneath it all, the file 'writes' are actually buffered and are done in blocks. We are actually putting a byte at a time into a cache (in-memory) output buffer. The actual writes to the resulting JPEG file should go very fast.

In fact, the large-sized (the non-thumbnail) CNG files (and the corresponding output JPEG files of exactly the same size) are on the order of 60 to 450 Kilobytes in size --- depending mainly on the variety of colors and the nature of the color changes in the page image.

In uncompressed pixels instead of bytes:
The JPEG files are generally about 1340x2000 pixels in size. The height, 2000, is generally constant, but the width varies from about 1320 to 1350.

    Note that these 1340x2000 images are about 2.6 Megapixel images, whereas today's (2011) digital cameras take pictures of resolution 6 Megapixels or even 14 Megapixels. So these page images are about the resolution (that is, picture quality) of today's very low-end digital cameras.

This translator script converts each CNG file to a JPEG in a few seconds --- on the order of 50 to 100 Kilobytes per second --- on the typical 2011-era mid-range PC computer.

To make it easy to apply that script to many '.cng' files in a directory, I made a 'wrapper shell script' (a local link to the 'multiFile' script source is here) to call on the 'cng2jpg.tcl' script for each '.cng' file selected in the directory. This wrapper script is suited to being implemented as a Nautilus script, where Nautilus is the GUI file manager available on Linux systems.


Magazine Issue Directories ('month directories') :

(converting multi-months of files)

It turns out that, even with the 'multiFile' cng2jpg wrapper script, it is tedious to convert the '.cng' files to '.jpg' files because the '.cng' files are in about 1,400 month-of-the-year directories --- with about 100 to 250 '.cng' page-image files in each month-directory --- for a total of about 225,000 '.cng' (magazine page) files.

See the images of the directories on the Disk1 and Disk3 DVD's below. Click on an image to see a larger version of the image.


Disk 1 top level directories.
The 'month-directories' are under the 'disk1/images' directories.
The month-directories are under the '199x' and '200x' directories ---
which contain 1995 through 2008 month-directories.


Disk 1 monthly directories, under the 'disk1/images/200x' directories.
Here we see the 12 year-2000 month-directories --- and
the start of the year-2001 month directories.


Disk 1 CNG files in a monthly directory.
Here we see the start of the page image files --- '.cng' files
--- for month 2000_01 = Jan 2000 --- pages 1, 2, 3, etc.


Disk 3 top level directories.
Here we see that Disk 3 does not have the Mac 'osx' installer directory,
nor the Microsoft 'windows' installer files. They were only on Disk 1.
There is just the 'disk3/images' directory path,
with the '196x' and '197x' subdirectories.


Disk 3 monthly directories.
Here we see the start of the 197x 'month-directories' ---
--- for months 1970 Jan, 1970 Feb, etc. --- into 1971.

If one is copying the '.cng' files to a USB disk drive or 'pen' drive(s) --- about 40 Gigabytes of '.cng' files on the 6 DVD's --- it seems handy to keep the '.cng' files (and the translated '.jpg' files) in month directories with the same names as on the DVD's. Example month-directory names for the year 1964:

  • 19640101
  • 19640201
  • 19640301
  • ...
  • 19641201

(They could have dropped the '01' on the end, but it was probably left there to indicate that the magazine was intended to be distributed near the beginning of the indicated month, rather than near the end.)

To be able to simply select a bunch of month-directories in the Nautilus file manager and then choose a script to run to convert the 100-plus '.cng' files in each of the selected directories, I made another 'wrapper' shell script (a link to the 'multiDir' script source is here) that calls on the 2 scripts above to do the 'bulk' conversion.

So now, after I copy many years of '.cng' files to a USB disk drive --- or to multiple USB 'pen' drives --- I am ready to convert massive numbers of '.cng' files to '.jpg' files.


Time required for copying and conversion :

    Note that both

    • the copying from DVD's to USB drive(s), and
    • the cng-to-jpg conversion process

    will take many hours to perform. It typically takes on the order of an hour to copy the month-directories (about 7.7 Gigabytes) from one of the DVD's to a USB drive --- so on the order of 6 hours for all 6 DVD's.

    Furthermore, since the conversion of each magazine's '.cng' files (about 160 page images per magazine issue, on average) takes about 3 secs/page x 160 pages = 480 secs = 8 minutes per magazine-issue, for about 1,400 magazine issues --- the entire cng-to-jpg conversion process takes about 8 x 1,400 = 11,200 minutes = 186 hours.

    That's a big investment in time for images,

    • many of which are ads (for cars, drugs, cameras, dog food, etc. --- about 30 ad pages at the front and about 10 ad pages at the back, of each issue), and

    • many of which were scanned at unbelievably poor quality.

    Here are some examples of the poor quality :

     
    On the left is a reduced size image of a NatGeo magazine page
    containing a painting of a Mars landscape.
    On the right is a full-sized image (the actual scan or
    delivered image) of a portion of the sky ---
    clipped from the 1311x2000 pixel full-size image of the page.
    Note the blotchy rectangles of color, rather than a smooth color gradient.
    I can see why many people feel that they were 'ripped off'.

    I ran across a picture of a fox whose fur, on moderately
    close examination (of the full-sized image, about 1300x2000 pixels),
    looked like the hairs were matted over a mesh.
    The mesh showed through the fur in spots. Strange!
    Surely there was not an actual fabric mesh on the fox.
    A fox with a toupee??? More likely, a strange form of image dithering???
    If I find that picture again, I will put a pair of images here.

    I ran across a picture of the Golden Gate bridge shrouded
    in a low fog, a common occurrence in the Bay Area.
    At the top of the picture was clear blue sky, and on moderately
    close examination (of the full-sized image, about 1300x2000 pixels),
    the sky was made up of vertical stripes, of blue and light-blue.
    Another strange form of image dithering???
    If I find that picture again, I will put a pair of images here.


Images crossing multiple pages :

After converting a couple of years of '.cng' files and browsing through the resulting '.jpg' files, I noticed that some images were separated across two adjacent pages.

I did a little web searching on terms like 'imagemagick convert (merge|join|append) images' and found that I could paste two image files together 'horizontally' (side-by-side, rather than over-under) by using the 'convert +append' command.

So that I would be able to simply select a pair (or more) of image files in the Nautilus file manager and then choose a script to run to 'append' the selected '.jpg' files together (horizontally or vertically), I made a 'multiAPPEND' script (link to the 'multiAPPEND' script source here) that makes a new '.jpg' file, in the same directory as the selected '.jpg' files. It worked like a charm on the first test that I chose --- a picture of a border collie spread across 2 pages.

However, on some subsequent images, I found it did not work so well, because, apparently, many of the page pairs were not carefully scanned, and hence do not match up well.


What next?

So now I am prepared to copy many 'month-directories' of '.cng' files from the DVD's to an external USB disk drive --- or to USB sticks --- and convert many or all of the '.cng' files to '.jpg' files.

Then, instead of using the Adobe Air reader (which has implementation problems and super-frustrating performance problems, as noted in external web links referenced at the bottom of this page), I can use an image viewer on Linux (the PC operating system that I use) to browse through the '.jpg' files.

For example, I can use the 'eog' (Eye of Gnome) image viewer to quickly 'page through' the image files in a directory --- quickly skipping over the many advertising images in the month-directory. (I could then delete the ad images or move them to a sub-directory of that month-directory.)

Unfortunately, as noted above with some sample images and as noted in the first external web link in the 'EXTERNAL LINKS' section below, the people who assembled the CNG DVD's did not scan the magazine pages at very high definition --- and they did not capture the colors very well. So the quality of the images is nowhere near the quality seen on the printed magazine pages.

Apparently, instead of using a minimum of 300 dots (or pixels) per inch --- low-end ink-jet printer resolution, they decided publish the images with resolution of about 200 pixels per inch. And they may have used a poor JPEG compression scheme that 'dithered' the images.

    A National Geographic page is 10 inches by 7 inches (or actually 10 inches by about 6 and 7/8 inches) --- about 25.4 centimeters by 17.46 centimeters. At least, that was the page size in a 1994 magazine issue.

    So the vertical resolution of the page images is 2000 pixels / 10 inches = 200 pixels per inch. The horizontal resolution is about 1340 pixels / 6.875 inches = about 195 pixels per inch.

Hopefully they have the photos stored at a higher resolution (and not compressed and not dithered). In particular, hopefully, for posterity, they will publish the photos (or at least the best photos) at a higher resolution (and a good compression scheme, if any) someday.

And hopefully they will store the text someday as ASCII 8-bit characters, not just as text images embedded in scanned page images.

In any case, at least I will be able to preserve my investment in this DVD collection in a readable form --- relatively immune to future operating system changes and immune to the fact that Adobe Air (the reader) will probably not run in future versions of the Microsoft, Mac, and Linux OSes.


Text search of the magazine articles :

Someday I could even try to implement, on Linux, the text search facility provided with the CNG DVD's (as outlined in one of the links below). But I am not highly motivated to do so, because, as some have pointed out, the text search facility is pretty lame.

Apparently someone just chose some keywords for many of the pages, and created a cross-reference (via an SQL database) between those keywords and the magazine pages. It is nowhere near a complete database of all the text in the magazine articles.


A tedious task remains :   (to be documented here?)

I may slowly accumulate the 'good' pages --- and better quality images --- from among all the advertising pages of the magazine issues. But this 'gleaning process' is going to be a long, tedious task --- and I have many better things to do.

If I make some progress in determining a way of preserving (that is, organizing) my investment in these DVD's (their image files), then I may document that progress here at some future date.

Since I find maps quite appealing, and those images may not be too garbled by the poor quality of the published images, I may try to collect many of the map images together, in some sort of organized form.


Intent of this page :

This page is mainly meant as a personal page --- to remind me of how I can preserve my investment in these DVD's. That is, these scripts make it possible for me to access the page images even though the Adobe Air viewer is, reportedly, a pain to get running on Linux --- and an EXTREME-(performance)-pain to use on Linux.

In fact, people have had 'show-stopping' problems with the viewer on various versions of the Microsoft Windows and Mac operating systems, indicating that the viewer will probably not be runnable on the computers of the coming years.

If anyone has found that they cannot access their copy of the DVD's using the viewing software delivered on the DVD's, AND, if they stumble across this web page, they may want to try implementing these scripts (or these techniques) on their operating system, to make at least some of their CNG files viewable.

NOTE that this page is not intended to make available a copy of the JPEG files translated from the CNG files. A person should buy a copy of the DVD's before using these scripts.

I am at the end of my documentation of this (mis)adventure.

SOME EXTERNAL LINKS on 'cng' files and the CNG package :

  • See a 2010april post at a blog of 'Andrei' (of Purdue U.) --- blog title: "The Complete National Geographic".

    Note that several people complain that they can no longer view their files because of operating system changes that are incompatible with the reader software.

    Andrei, in his post, points out that "My main complaint is that they used jpeg to compress the images and did so at really high compression leading to lots of artifacts. Looking at a page that has a black background can be painful. Which is a bit of a shame as one of the greatest things about NGM [National Geographic Magazine] is the high quality of the photographs and printing.

    The viewer [Adobe Air] also has way too many needless animations and there's sadly no way to disable them. So I wouldn't mind knowing the file formats involved ... more importantly a viewer that just does its job and gets out of the way would be great." [This latter 'viewer' statement is what 'Bob' was responding to, with his statement below.]

    Andrei also says "Note that, as with most Adobe technologies on Linux, it [Adobe Air] is needlessly CPU intensive, it pegs my CPU at 100% ...".

    So you see several reasons for having a way to backup and view the cng files --- other than via the 'flaky' Adobe viewing system provided.

    Note that 'Bob', in a 2011 October response to this posting, gives the hint "The cng files are all jpegs, XOR'd bitwise with 239. If anyone wants to hack up a viewer, feel free."

    This 'Andrei' post on the CNG is also available via the URL
    csclub.uwaterloo.ca/~abarbu/.

  • See blog Notes of 'Gordon' - dated March 2007, which express similar frustrations :

    "It's exasperating to think that all that historical data, all those articles, all those photographs, are sitting on my shelf and cannot be viewed with today's operating system. ( The Reader/Searcher looked to my naive eye like a kissing cousin to [Adobe] Acrobat.)

    If only the source code for the Complete National Geographic CD-ROM set were available and could be updated to run natively on OS-X and other contemporary platforms ...

    Let's learn from this. Don't invest in products dependent on closed source solutions."

  • This 2009 October page at blogs.nationalgeographic.com indicates that others have bought the product and were not able to install it on their computer.

    Here's a quote from that page indicating more problems with the way the 'Complete National Geographic' was implemented :

    "@ NatGeo: Shame on you for selling such a bloated, slow, crapulous piece of software. I can't believe I paid $200 for this."

    And :

    "vince said: This product is not ready for prime time (i.e. should not be sold to anyone in its present form). The time to move from page to page using the DVDs is ridiculously slow. It is virtually unuseable in its present form."

    In addition, there are many postings by people saying they can't install the reader software on their operating system --- or the reader quit working after a reader upgrade or an operating system upgrade.

  • The 'cng.htm' page at bilbo.online.bg gives some insight on how the (rather limited) search facility in the Complete National Geographic (CNG) can be implemented based on an SQL database.

    That page may be dead, but here is a backup copy --- preserved here since the Google-cached page will probably disappear sometime around 2012.

  • A thread at ubuntuforums.org documents some attempts --- successful and not --- to get the CNG reader running on Linux.

  • Here is an ad for The Complete National Geographic, at amazon.com --- list price $79.99. (Every issue since 1988 --- on six DVD's --- over 1,400 issues, 8,000 articles, and 200,000 photos.)

  • Here is info on the cng file extension, at file-extensions.org This page says "Currently there is no specific information available, about how to convert this file extension." But the scripts on THIS web page changes that.

  • A Google search on terms like 'cng file convert national geographic' was used to find URL's like those above.

  • You may be able to find more web pages on the CNG via this query at linuxidx.com

Bottom of this web page on 'cng2jpg' - Notes on a Tcl script to convert 'cng' files to JPEG files.

To return to a previously visited web page location, click on
the Back button of your web browser, a sufficient number of times.
OR, use the History-list option of your web browser.
OR ...

< Go to TOP of this page, above. >
< Go to Top of External Links, above. >

OR ...

Home > RefInfo menu > Computer topics menu >
my Linux Notes/Guides menu > This cng2jpg Notes/Guide page

Page created 2011 Dec 23. Changed 2011 Dec 31.