New ways to be wrong

Originally published 2002 in Atomic: Maximum Power Computing
Last modified 03-Dec-2011.

 

A popular objection to the use of the Internet as a research tool is that the information you find there isn't reliable. Neither are books, of course, but the Web has fewer editors and librarians, so there's some validity to the complaint.

Unreliability of Internet information takes two forms.

First, there's plain old wrongness - but it's often in the middle of a bunch of correct stuff that's lulled you into a gullible state.

There's a term for this phenomenon. It's "database pollution".

Take, for instance, Penn & Teller's Swedish Lemon Angels. The Angels are un-makeable biscuits from P&T's excellent book How To Play With Your Food. The recipe includes both baking soda and lemon juice, and when you're instructed to "add the lemon juice all at once and blend into the mixture", said mixture will foam merrily out of the bowl, for elementary chemical reasons. Hilarity ensues.

The Swedish Lemon Angels recipe can be found in various on-line recipe books. I found the deadly Angels lurking on RecipeLand, RecipeSource and Chef2Chef when I first wrote this column for Atomic magazine. Fizzy, lemony database pollution, kids.

Now, there seems to be a "volcano" disclaimer on the end of all three of those Angels recipes, which I could have sworn wasn't there when I first looked. Hold that thought.

Database pollution can be a protest gesture. If you object to some company's marketing behaviour, filling their user database with 107 year old grandmothers from North Yemen who make more than a million US dollars a year and use the company's software 26 hours a day will probably reduce that database's value. It's getting to the point where automated tools to do this are turning up; consider the now-slightly-harder-to-use New York Times Random Login Generator, for instance.

The second kind of Internet info unreliability is actually good, in a way. It's mutability of information. A page that said one thing when you first looked at it may now have been corrected to say something else. Like the above Angels recipes. When this column showed up in print in Atomic, the recipe links above led to un-warninged versions of the Angels. Well, I think they did, anyway; now there are warnings on the end of all three of them, and there's no way to prove that they weren't always that way.

There's another fine example at - cover your ears, children - fuckmicrosoft.com.

That site contains some pretty good information about why you shouldn't like the Dark Lord Bill's empire.

But it also contains "Microsoft's Really Hidden Files", which has been linked from the site's front page for ages now.

This latter screed, written by a person glorying in the name "The Riddler", tells you about all sorts of apparently privacy-infringing secret data collection by Microsoft.

The first version of it was, to two significant digits, a bunch of misleading hooey.

It's not perfect now, but it's better.

But if you believed version 1.0, you're going to look like a doofus if you point to version 2.6b (November 3, 2001) as evidence for something that the page doesn't say any more. The good old Internet Archive Wayback Machine will provide you with an explanation for your mistake (though it only goes back to v2.0 of the page), but it won't give you an excuse.

By The Riddler's definition in all versions of the page so far, the computer I'm using at the moment has well over a gigabyte of stuff in "folders that Microsoft has tried hard to keep secret".

This is, in my view, a rather uncharitable way to describe temporary files, the swap file, the cookie file, the browser cache, URL auto-completion, and so on. There is quite a bit of Windows data that's not quite as deletable as you might think, and that may be a security risk for those of us with meth labs in our garage or inquisitive younger siblings. But for most people, it's more of a disk space wastage issue than a privacy one, and not much of a problem either way.

The Riddler still tells you that directories that have the System attribute must have it for nefarious reasons, rather than to strongly discourage the uninformed from blundering around in there "making space". And he still implies that the reason for Internet Explorer cache files being inside weird alphanumeric-named subfolders must be because Microsoft doesn't want you getting at them, rather than the fact that there was an exploit some years ago in which a l337 h4XX0r would take rapeyourpc.exe and rename it to bunny.jpg, then stick <IMG SRC="innocentfiles/bunny.jpg"> in a Web page. Try to load that page and the renamed program wouldn't display, but it would end up in your browser cache, from which it could be executed by other software. The random-named cache directories are a kludge to stop that sort of thing from happening.

The Riddler's also still unhappy about the fact that Outlook Express doesn't automatically compact mail folders after things are deleted, with the result that even after you delete e-mail from the Trash folder, the messages will still be there in the DBX file. OK, maybe Microsoft should have put in an auto-compress feature if someone's just deleted the entire contents of a folder, because compression should be quite fast, then, instead of the usual lengthy drive-flog. But then the data wouldn't be recoverable in the event that the deletion was an accident, of the sort suffered strangely often by Outlook Express users.

The Riddler also still tells you that Cookies Are Bad, m'kay. But, like a bunch of other cookie-phobes, he doesn't tell you why. Lots of people seem to be under the impression that cookies let Web sites find out things about you that you haven't already told them. That's not the problem; this page gives a less alarmist explanation of what the problem really is.

Personally, I rather like not having to log in to various low-security Web sites. When I used to use SpamCop a lot (I don't, any more), I would have gone bananas without cookies turned on.

But all of these errors may well be tidied up in the near future, if The Riddler writes v3.0. In that case, this page of mine won't make me look stupid (well, no stupider than I look all the time, anyway), because I clearly state that I'm talking about v2.6b of Really Hidden Files. But most people aren't in the habit of specifying that, and many Web pages don't even have a last-updated date.

Heck, I'm not generally in the habit of clearly timestamping my off-site references, and it doesn't necessarily help anyway. When I put this page up on the Web, I failed to notice that the above-linked Swedish Lemon Angels recipes now had a disclaimer on the end of them, and used them as straight examples of database pollution, thereby making myself look as if I hadn't noticed in the first place. Well, maybe I hadn't; there's no way for me to be sure that the recipes weren't this way all along.

Often, if you have a problem with a link to a Web page that's been around for a while, it's just a broken link; the page has moved, or vanished entirely. But sometimes it's still there, but different.

Usually, this is pretty obvious. For instance, Penn and Teller's site, which I link to above, used to be www.sincity.com; as I write this, that's still the number one Google hit when you search for "penn and teller". But that URL is now owned, as you might expect just by looking at it, by a porn site.

If a couple of magicians sell their domain to a smut site (I don't know if that's what they did, but I don't see anyone complaining about their domain being stolen...), or an accounting firm went broke and its domain was re-registered by a porn site (or, worse yet, the reverse happened) then you're unlikely to mistake the new site for the old one.

But if someone just rewrites their page, then your references to it can look as if you're deliberately misquoting them, or worse.

Startlingly, there's a point to this rant which doesn't involve grand sociological statements about two-edged swords and the slippery slope of revisionism.

It's that the changeability of Web info, as well as its reliability at any given time, is, simply, something that it pays to remember.

Because it gives you a new way to be wrong.

Other columns

Learning to love depreciation

Overclockers: Get in early!

Stuff I Hate

Why Macs annoy me

USB: It's worth what you pay

"Great product! Doesn't work!"

The virus I want to see

Lies, damned lies and marketing

Unconventional wisdom

How not to e-mail me

Dan's Quick Guide to Memory Effect, You Idiots

Your computer is not alive

What's the point of robot pets?

Learning from spam

Why it doesn't matter whether censorware works

The price of power

The CPU Cooler Snap Judgement Guide

Avoiding electrocution

Video memory mysteries

New ways to be wrong

Clearing the VR hurdles

Not So Super

Do you have a license for that Athlon?

Cool bananas

Getting rid of the disks

LCDs, CRTs, and geese

Filling up the laptop

IMAX computing

Digital couch potatoes, arise!

Invisible miracles

Those darn wires

Wossit cost, then?

PFC decoded

Cheap high-res TV: Forget it.

V-Pr0n

Dan Squints At The Future, Again

The programmable matter revolution

Sounding better

Reality Plus™!

I want my Tidy-Bot!

Less go, more show

In search of stupidity

It's SnitchCam time!

Power struggle

Speakers versus headphones

Getting paid to play

Hurdles on the upgrade path

Hatin' on lithium ion

Wanted: Cheap giant bit barrel

The screen you'll be using tomorrow

Cool gadget. Ten bucks.

Open Sesame!

Absolutely accurate predictions

The truth about everything

Burr walnut computing

Nothing new behind the lens

Do it yourself. Almost.

The quest for physicality

Tool time

Pretty PCs - the quest continues

The USB drive time bomb

Closer to quietness

Stuff You Should Want

The modular car

Dumb smart houses

Enough already with the megapixels

Inching toward the NAS of our dreams

Older than dirt

The Synthetics are coming

Pr0nBack!

Game Over is nigh

The Embarrassingly Easy Case Mod

Dumb then, smart now

Fuel cells - are we there yet?

A PC full of magnets

Knowledge is weakness

One Laptop Per Me

The Land of Wind, Ghosts and Minimised Windows

Things that change, things that don't

Water power

Great interface disasters

Doughnut-shaped universes

Grease and hard drive change

Save me!

Impossible antenna, only $50!

I'm ready for my upgrade

The Great Apathetic Revolution

Protect the Wi-Fi wilderness!

Wi-Fi pirate radio

The benign botnet

Meet the new DRM, same as the old DRM

Your laptop is lying to you

Welcome to super-surveillance

Lemon-fresh power supplies

A>B>C>A!

Internet washing machines, and magic rip-off boxes

GPGPU and the Law of New Features

Are you going to believe me, or your lying eyes?

We're all prisoners of game theory

I think I'm turning cyborg-ese, I really think so

Half an ounce of electrons

Next stop, clay tablets

A bold new computer metaphor

Won't someone PLEASE think of the hard drives?!

Alternate history

From aerial torpedoes to RoboCars

How fast is a hard drive? How long is a piece of string?

"In tonight's episode of Fallout 4..."

How hot is too hot?

Nerd Skill Number One

What'll be free next?

Out: Hot rods. In: Robots.

500 gig per second, if we don't get a flat

No spaceship? No sale.

The shifting goalposts of AI

Steal This Education

Next stop: Hardware piracy

A hundred years of EULAs

The triumph of niceness

The daily grind

Speed kings

Alt-tCRASH

Game crazy

Five trillion bits flying in loose formation

Cannibalise the corpses!

One-note NPCs

Big Brother is watching you play

Have you wasted enough time today?

The newt hits! You die...

Stuck in the foothills

A modest censorship proposal

In Praise of the Fisheye

Filenames.WTF

The death of the manual

Of magic lanterns, and MMORPGs

When you have eliminated the impossible...

Welcome to dream-land

Welcome to my museum

Stomp, don't sprint!

Grinding myself down

Pathfinding to everywhere

A deadly mouse trap

If it looks random, it probably isn't

Identical voices and phantom swords

Boing!

Socialised entertainment

Warfare. Aliens. Car crashes. ENTERTAINMENT!

On the h4xx0ring of p4sswordZ

Seeing past the normal

Science versus SoftRAM

Righteous bits

Random... ish... numbers

I get letters

Money for nothing



Give Dan some money!
(and no-one gets hurt)