Next stop, clay tablets

Publication date: 19 June 2009
Originally published 2008 in Atomic: Maximum Power Computing
Last modified 03-Dec-2011.

 

I just spent a while playing with an unusual data storage option.

That unusual storage option is... paper.

But not paper with boring old plain text on it. Paper with arbitrary digital data on it.

Paper is a great format on which to store really important information. Thieves seldom bother to steal it. Magnetic fields or power surges don't damage it. Paper can also tolerate much higher temperatures than any digital storage system. And if those high temperatures are created by a housefire, paper in a simple wooden box, like the bottom shelf of a chest of drawers, is actually very likely to survive.

If your house burns to the ground then any paper not in a very fireproof safe (read the small print before you buy one of those, to see what "fireproof" means to this particular manufacturer...) will of course be gone along with everything else. And metal filing cabinets pass heat through, sacrificing their contents to save themselves. But even if the fire brigade take half an hour to turn up, paper in a wooden bottom-drawer will probably survive a house fire.

And most ordinary cheap printer paper today is acid-free, so fifty years from now it won't be brown and flaky like an old paperback book.

OK, if the roof leaks over your backup, then a flash-drive will probably come out better than paper would. The shelf life of modern flash RAM ought to be at least a few decades, too - but there's no guarantee that the hyperconductive thinking aluminum computers of the year 2075 will have USB ports, or support for current filesystems. Paper could, therefore, actually be more compatible in the future than any of today's conventional data-storage options. Paper is really pretty awesome stuff.

You can fit something in the order of twelve thousand characters of small-but-legible eight-point text on one sheet of A4 paper. At one byte per character, that's 11.7 kilobytes (in the powers-of-two sense) of data per one-sided page, or more than 87 pages per megabyte. You can fit considerably more on if you print it all tiny and squinty, but no human-legible text gives you very good data capacity per page.

You can do a lot better than text, though.

"Matrix codes" are the two-dimensional "square barcodes" that're popping up all over the place these days.

Well, they're usually square. UPS's released-into-the-public-domain "MaxiCode" uses little circles with a bullseye in the middle, Microsoft have as usual invented their own standard, and there's also the very distinctive-looking Palo Alto Research Center "DataGlyphs", the standard version of which encodes data as a rectangle of little slashes and backslashes. The little lines can be printed in different weights and in different colours without changing the data they encode, so you can make a halftone image that contains "hidden" digital data. (For some reason, PARC seem to have abandoned dataglyphs.com and scrubbed all mention of the things from parc.com. If you're reading this a while after I wrote it, perhaps they'll have sorted themselves out.)

All of the matrix codes made for barcode sorts of jobs are, of course, only meant to be used to store bar-code-y sorts of data. This means they usually have hard format limits that make sure the matrix will fit on product packaging, and will be coarse-grained enough to be "scanned" with a low-res camera, like those in cheap mobile phones. The maximum capacity of an alphanumeric QR Code, for instance, is 4,296 characters. Data Matrix tops out at 3,116 characters, and Aztec Code can do 3,067 alphabetic characters with no numbers or punctuation, or 1,914 bytes of arbitrary data.

(For comparison, a standard IBM punched card, such as still survives here and there, has 80 columns of 12 punch locations. That gives a theoretical maximum capacity of 960 bits, or 120 of today's conventional 8-bit bytes, each of which more or less equals an alphanumeric character. In practice this full capacity was unattainable, though, partly because no encoding system supported using every location for user-data storage - 80 characters of user data was actually the most that anybody ever got from an 80-column card - and partly because a card with too many holes punched in it, also known as a "lace card", would jam in the reader. And if you've enjoyed this digression, see also "Rainbow Storage", a bold step forward for information theory into the realm of utter bollocks.)

Let's stick for the moment with the job of storing plain text. English words generally average about 5.5 characters each, plus one for a space or punctuation; that means about 660 words for QR Code, about 480 for Data Matrix, or about 300 for Aztec Code. (Here's a neat online encoder that lets you create a Data Matrix or QR Code.)

A capacity of a few hundred words is actually quite useful, for some kinds of everyday text. Newspaper stories, for instance, commonly come in at less than 400 words. The Sunday paper would be a lot smaller if we were all able to read stories encoded as blocks of dots.

And these capacity numbers are also very approximate. That's partly because of the variability of text, but also because smarter encoding systems - a widely-understood compression system, like the gzip used by some Web servers, for instance - can push capacity up considerably. And, at the same time, error-correction code can push capacity down, but make the data resistant to damage. Many data-matrix systems use Reed-Solomon error correction, and allow you to dial the error-correction content up to 90% or more of the total encoded data. That gives you a lot less space for user data, but makes the data extremely hard to destroy.

Error-correction makes data matrices suitable for another job - "paper keys" for strong encryption.

You probably only need 128 bits of entropy for functionally unbreakable encryption. That's a tiny amount by computer standards, but makes for a fairly cumbersome password or passphrase. If it's OK to turn the key into a physical object, though, you can encode it as some kind of matrix code. You can easily fit 128 bits of key into the area of a postage stamp, and still have room for enough error-correction data to make the key highly resistant to folding, spindling or mutilation.

(You can even tattoo matrix codes on yourself. Persons of ordinary dimensions are likely to find it difficult, not to mention painful, to fit more than a very short message.)

But never mind all that. What about general-purpose backups?

Even if all you want to back up is a few megabytes of accounts data, a system that can only store a few kilobytes per data matrix is useless.

This is a great shame, though, when you realise that fitting two or three kilobytes into a one-inch square means a single sheet of A4 paper could hold at least a couple of hundred kilobytes, even if you include plenty of error-correction redundancy to minimise the chance of silverfish-related data loss.

There's no upper limit to the amount of data you can store as matrix codes, if you've got the space. Look at Dolby Digital and Sony Dynamic Digital Sound movie audio, for instance; they're encoded optically, just like any other matrix code, on on the edge of the film.

200 kilobytes ain't much if you're backing up your whole hard drive. But it's actually pretty decent for a lot of really important files. Financial data, program source code. The novel you're writing. Your university thesis.

There are already at least two data backup utilities that use matrix codes, expanded to cover the whole of an arbitrary number of pages.

One of them is Twibright's "Optar" (OPTical ARchiver), which can reliably pack 200 kilobytes of data onto one laser-printed A4 page. Optar doesn't come as ready-to-go software, though; you have to compile the C source code yourself.

This can actually be a plus, though. If the computing world as we know it, with x86 CPUs and USB ports, still exists when you have to restore your Optar backup, you can just use the same software you compiled last time. And if you package a printout of the 20-odd pages of "unoptar" C source code with your Optar backup, people fifty or a hundred years from now will probably still be able to compile it. People are still working in Fortran and Lisp today (though not always by choice...), and the original versions of those languages are more than 50 years old; C isn't quite middle-aged yet, but I don't think it's a stretch to say that C will still be compilable in 2075, if we're not all busy fighting the rad-zombies for Soylent.

All this is, of course, a bit much for someone who just wants to play with the technology. Fortunately, there's also a ready-to-go free-software Windows paper-backup program, inventively named "PaperBack".

The PaperBack source is downloadable too (C++, this time), so PaperBack is another real option for long-term backups. And it can cram about half a megabyte of data onto a 600dpi A4 page, though I wouldn't trust my cheap laser printer with more than 180k per page.

PaperBack includes compression tuned to work very well with plain text, so it's an ideal solution for backing up written works, program source code, lists of passwords and exported data from your accounting program. I found that with compression turned on, a 746-kilobyte plain-text version of Charles Dickens' A Tale Of Two Cities only took up about one and a quarter PaperBack pages...

PaperBacked-data

...even using my crummy laser printer. Printed as tight-packed eight-point text, it would have been more than sixty pages.

(General-purpose compression like Zip or 7-Zip will give the best results with most files, but PaperBack's compression is clearly better for plain text.)

PaperBack even has built-in encryption, though you can of course also encrypt your data in some other way before backing it up. However you encrypt any backup, you should of course make sure you remember the password, or separately back up the key certificates, or whatever the key for the encryption scheme you're using happens to be. If you don't, encryption can more accurately be called the "delayed Recycle Bin". If your data doesn't need "real" encryption, data-matrix encoding just by itself will stymie casual snoopers.

PaperBack also has error-correction, adjustable from enough for your data to survive the loss of one little square block of dots in every ten, to enough to tolerate the loss of one block in every two. I did my capacity tests with the default one-in-five redundancy, and also tested the correction with a bit of hole-punching and scribbling.

At 180 kilobytes per page, you'll need 5,825 pages of A4 copy paper to back up a gigabyte of data. And a few toner cartridges. And a paper-slave to keep feeding the printer.

All of my passwords and other login info are only 24,074 bytes, though. Even without compression, PaperBack can fit that on an A7 index card.

And ten years of my business accounts zip down to about 2.7Mb. That's only fifteen cheap-laser pages.

Works for me!

Other columns

Learning to love depreciation

Overclockers: Get in early!

Stuff I Hate

Why Macs annoy me

USB: It's worth what you pay

"Great product! Doesn't work!"

The virus I want to see

Lies, damned lies and marketing

Unconventional wisdom

How not to e-mail me

Dan's Quick Guide to Memory Effect, You Idiots

Your computer is not alive

What's the point of robot pets?

Learning from spam

Why it doesn't matter whether censorware works

The price of power

The CPU Cooler Snap Judgement Guide

Avoiding electrocution

Video memory mysteries

New ways to be wrong

Clearing the VR hurdles

Not So Super

Do you have a license for that Athlon?

Cool bananas

Getting rid of the disks

LCDs, CRTs, and geese

Filling up the laptop

IMAX computing

Digital couch potatoes, arise!

Invisible miracles

Those darn wires

Wossit cost, then?

PFC decoded

Cheap high-res TV: Forget it.

V-Pr0n

Dan Squints At The Future, Again

The programmable matter revolution

Sounding better

Reality Plus™!

I want my Tidy-Bot!

Less go, more show

In search of stupidity

It's SnitchCam time!

Power struggle

Speakers versus headphones

Getting paid to play

Hurdles on the upgrade path

Hatin' on lithium ion

Wanted: Cheap giant bit barrel

The screen you'll be using tomorrow

Cool gadget. Ten bucks.

Open Sesame!

Absolutely accurate predictions

The truth about everything

Burr walnut computing

Nothing new behind the lens

Do it yourself. Almost.

The quest for physicality

Tool time

Pretty PCs - the quest continues

The USB drive time bomb

Closer to quietness

Stuff You Should Want

The modular car

Dumb smart houses

Enough already with the megapixels

Inching toward the NAS of our dreams

Older than dirt

The Synthetics are coming

Pr0nBack!

Game Over is nigh

The Embarrassingly Easy Case Mod

Dumb then, smart now

Fuel cells - are we there yet?

A PC full of magnets

Knowledge is weakness

One Laptop Per Me

The Land of Wind, Ghosts and Minimised Windows

Things that change, things that don't

Water power

Great interface disasters

Doughnut-shaped universes

Grease and hard drive change

Save me!

Impossible antenna, only $50!

I'm ready for my upgrade

The Great Apathetic Revolution

Protect the Wi-Fi wilderness!

Wi-Fi pirate radio

The benign botnet

Meet the new DRM, same as the old DRM

Your laptop is lying to you

Welcome to super-surveillance

Lemon-fresh power supplies

A>B>C>A!

Internet washing machines, and magic rip-off boxes

GPGPU and the Law of New Features

Are you going to believe me, or your lying eyes?

We're all prisoners of game theory

I think I'm turning cyborg-ese, I really think so

Half an ounce of electrons

Next stop, clay tablets

A bold new computer metaphor

Won't someone PLEASE think of the hard drives?!

Alternate history

From aerial torpedoes to RoboCars

How fast is a hard drive? How long is a piece of string?

"In tonight's episode of Fallout 4..."

How hot is too hot?

Nerd Skill Number One

What'll be free next?

Out: Hot rods. In: Robots.

500 gig per second, if we don't get a flat

No spaceship? No sale.

The shifting goalposts of AI

Steal This Education

Next stop: Hardware piracy

A hundred years of EULAs

The triumph of niceness

The daily grind

Speed kings

Alt-tCRASH

Game crazy

Five trillion bits flying in loose formation

Cannibalise the corpses!

One-note NPCs

Big Brother is watching you play

Have you wasted enough time today?

The newt hits! You die...

Stuck in the foothills

A modest censorship proposal

In Praise of the Fisheye

Filenames.WTF

The death of the manual

Of magic lanterns, and MMORPGs

When you have eliminated the impossible...

Welcome to dream-land

Welcome to my museum

Stomp, don't sprint!

Grinding myself down

Pathfinding to everywhere

A deadly mouse trap

If it looks random, it probably isn't

Identical voices and phantom swords

Boing!

Socialised entertainment

Warfare. Aliens. Car crashes. ENTERTAINMENT!

On the h4xx0ring of p4sswordZ

Seeing past the normal

Science versus SoftRAM

Righteous bits

Random... ish... numbers

I get letters

Money for nothing



Give Dan some money!
(and no-one gets hurt)