If it looks random, it probably isn't

Publication date: 13 November 2012
Originally published 2012 in Atomic: Maximum Power Computing
Last modified 13-Nov-2012.

 

Are you ready for another episode of Fun With Conditional Probability? Of course you are!

[See also: Probability and hard-drive failures here and here, probability as it applies to game-NPC dialogue and to second-hand smoke statistics, and this piece on transitive and nontransitive relationships. Pay attention, I'll be asking questions later.]

Suppose that there is some event that has the same chance of happening in any given period of time. Say, for instance, that it's a lightning strike near enough to your house to fry most or all of your electronics. Let's make the "given period of time" a day. Let's say the chance of a lightning strike on any given day is one in ten thousand, ten thousand days being 27.4 years. And, for the sake of simplicity, let's say that more than one strike in a day is impossible.

Now wait until lightning actually strikes. (You may be waiting decades.)

Now, what is the most likely next day when it will strike?

The obvious answer to this question is "there is no most likely day; the chance of a strike on a given day is 1/10,000."

That answer is wrong. The most probable day for the next strike is tomorrow.

I have the great advantage, in writing this, of just being able to assure you that this is the case and not having to try to explain it in eight different ways. But here's a basic way in:

How likely is it that the next strike will be ten million years from now?

Not very, obviously. You'd be surprised if the next strike took only fifty years to occur. It would be extremely amazing if lightning failed to strike for millions of years, if the daily probability of a strike is only one in ten thousand.

This is where the "conditional probability" thing kicks in. The "condition" required for the next strike to be ten million years from now is that lightning must not strike on every day in between. The chance of lightning not striking on any given day is 1 minus the chance that it will, which in this case means 1 minus 0.0001, giving a 0.9999 probability in the standard statistical form where 0 means impossibility and 1 means certainty.

On any given day, a 0.9999 probability of no lightning strike is very close to certainty. But if you look at a very long run of days, it becomes close to certain that the 0.0001-probability event will happen long, long before you've made it to even one million days, let alone ten million years.

This same conditional-probability argument, though, also applies to the day after tomorrow.

There's a 0.0001 chance of lightning tomorrow. But for the next strike to be the day after tomorrow then lightning must strike on that day, and not strike tomorrow. So the probability becomes 0.9999 for no strike tomorrow, times 0.0001 for a strike the next day. Which is 0.00009999.

This is only very slightly less than 0.0001, but it is less. The probability of the next strike - not any strike, but the next strike - occurring the day after tomorrow is thus very slightly lower than the probability of the next strike being tomorrow.

And the further you go into the future, the smaller the number gets. In a week it's about 0.00009994, in a year it's about 0.00009643, in ten years it's down to about 0.00006943 - until it becomes ridiculously small, millions of years in the future.

So the most likely day for the next lightning strike - whether or not it actually even struck today - is tomorrow. It's only a tiny bit more likely that the next strike will be tomorrow than that it will be the next day, but it is more likely.

At this point you may be wondering why I'm injuring your brain with this stuff. It's because this is a really important thing you need to know about the world. This statistical bias for chance events to happen closer to each other than seems intuitively likely means that all sorts of chance phenomena have "clusters" that people naturally think don't look very random at all.

We are surrounded at all times by things that have a somewhat random distribution in space and/or time. Computer hardware failures. Car crashes. Disease outbreaks. The distribution of stars in the sky. Individual kills, and personal and team victories, in all sorts of games, sports and real-world wars.

None of these things are entirely random - actually achieving true, robust randomness is surprisingly difficult. But all of them have a chance component. And the stronger that chance component is, the more clusters you'll see, and the easier it'll be to incorrectly attribute those clusters to some non-chance phenomenon.

"This "xxx_SupaFly69_xxx" dude's fragged me three times in a row! He must be stalking me!"

"Stars seem to clump together into constellations, rather than appear in a more evenly "random" scattering across the sky! That must mean something!"

"I've told the dealer to hit me on the last three hands, and each time that took me straight to 21! Better raise my bets, I've got hot hands tonight!"

(Note: This is exactly what you should do, but only if you're playing New Vegas and have a really high Luck.)

"My usually-hopeless favourite football team just had a four-match winning streak! Clearly their luck has turned around!"

"The average incidence of autism in the Western world is about six cases per thousand children, but this little town of 1,037 people has seventeen cases! This cannot possibly just be a coincidence!"

"That fellow charting where the V-1s and V-2s land on London says they're just following a "Poisson distribution", whatever that is, but if he knows where they're going to land, why won't he tell us?!"

"In the six hours I've spent peering at this roulette table and making notes, I'm pretty darn sure I've discovered some patterns!"

(And yes, in the past, you actually might have done. Today, though, not so much. Casinos love note-takers, because they usually sooner or later come up with a bold new gambling system, or just reinvent an old one, and then lose all of their money.)

I just went to random.org, which delivers high-quality random numbers, created from atmospheric radio noise, and asked it for eight random bytes, expressed in binary as 64 zeroes and ones.

What'd I get?

1111000010101110011001000100100111101111010011111011100010100010, that's what.

The first byte is 11110000. Doesn't look very damn random, does it? Byte five is 11101111. There are five runs of three, four runs of four and one run of five repeated digits in just these 64 bits.

I did it again. This time, I got 1111110111101111010000001011010110111111101001110010100001010000. Only one zero in the first byte. One run of three, four runs of four, two runs of six and one run of seven repeated bits, for pity's sake.

But these strings of bits really are robustly random. Random.org is not running some perverse scam.

Ask random.org to pick numbers from one to ten and, over time, you'll get every digit about the same number of times (though not exactly the same number of times, any more than it's reasonable to expect 100 tosses of a coin to give you exactly fifty heads and fifty tails).

Ask people to choose a number from one to ten, and especially if you make sure to specifically tell them to pick a random number from one to ten, almost no-one will choose one or ten. Of the other numbers, humans (in Western nations at least) have a tendency to pick seven.

(If asked for a "random" number from 1 to 20, a surprising number of people will pick 17.)

If your random-number generator gives you nothing but nines then, yes, there probably is something wrong with it. But clusters that seem far too common to the human mind are actually an indicator that something really is random.

So the next time a boss drops crap loot in four consecutive raids, or three of your friends all have their hard drive fail in the same week, bear in mind that this certainly may indicate something fishy is going on. But it can just as easily be a complete fluke.

And now you can prove that mathematically!

Other columns

Learning to love depreciation

Overclockers: Get in early!

Stuff I Hate

Why Macs annoy me

USB: It's worth what you pay

"Great product! Doesn't work!"

The virus I want to see

Lies, damned lies and marketing

Unconventional wisdom

How not to e-mail me

Dan's Quick Guide to Memory Effect, You Idiots

Your computer is not alive

What's the point of robot pets?

Learning from spam

Why it doesn't matter whether censorware works

The price of power

The CPU Cooler Snap Judgement Guide

Avoiding electrocution

Video memory mysteries

New ways to be wrong

Clearing the VR hurdles

Not So Super

Do you have a license for that Athlon?

Cool bananas

Getting rid of the disks

LCDs, CRTs, and geese

Filling up the laptop

IMAX computing

Digital couch potatoes, arise!

Invisible miracles

Those darn wires

Wossit cost, then?

PFC decoded

Cheap high-res TV: Forget it.

V-Pr0n

Dan Squints At The Future, Again

The programmable matter revolution

Sounding better

Reality Plus™!

I want my Tidy-Bot!

Less go, more show

In search of stupidity

It's SnitchCam time!

Power struggle

Speakers versus headphones

Getting paid to play

Hurdles on the upgrade path

Hatin' on lithium ion

Wanted: Cheap giant bit barrel

The screen you'll be using tomorrow

Cool gadget. Ten bucks.

Open Sesame!

Absolutely accurate predictions

The truth about everything

Burr walnut computing

Nothing new behind the lens

Do it yourself. Almost.

The quest for physicality

Tool time

Pretty PCs - the quest continues

The USB drive time bomb

Closer to quietness

Stuff You Should Want

The modular car

Dumb smart houses

Enough already with the megapixels

Inching toward the NAS of our dreams

Older than dirt

The Synthetics are coming

Pr0nBack!

Game Over is nigh

The Embarrassingly Easy Case Mod

Dumb then, smart now

Fuel cells - are we there yet?

A PC full of magnets

Knowledge is weakness

One Laptop Per Me

The Land of Wind, Ghosts and Minimised Windows

Things that change, things that don't

Water power

Great interface disasters

Doughnut-shaped universes

Grease and hard drive change

Save me!

Impossible antenna, only $50!

I'm ready for my upgrade

The Great Apathetic Revolution

Protect the Wi-Fi wilderness!

Wi-Fi pirate radio

The benign botnet

Meet the new DRM, same as the old DRM

Your laptop is lying to you

Welcome to super-surveillance

Lemon-fresh power supplies

A>B>C>A!

Internet washing machines, and magic rip-off boxes

GPGPU and the Law of New Features

Are you going to believe me, or your lying eyes?

We're all prisoners of game theory

I think I'm turning cyborg-ese, I really think so

Half an ounce of electrons

Next stop, clay tablets

A bold new computer metaphor

Won't someone PLEASE think of the hard drives?!

Alternate history

From aerial torpedoes to RoboCars

How fast is a hard drive? How long is a piece of string?

"In tonight's episode of Fallout 4..."

How hot is too hot?

Nerd Skill Number One

What'll be free next?

Out: Hot rods. In: Robots.

500 gig per second, if we don't get a flat

No spaceship? No sale.

The shifting goalposts of AI

Steal This Education

Next stop: Hardware piracy

A hundred years of EULAs

The triumph of niceness

The daily grind

Speed kings

Alt-tCRASH

Game crazy

Five trillion bits flying in loose formation

Cannibalise the corpses!

One-note NPCs

Big Brother is watching you play

Have you wasted enough time today?

The newt hits! You die...

Stuck in the foothills

A modest censorship proposal

In Praise of the Fisheye

Filenames.WTF

The death of the manual

Of magic lanterns, and MMORPGs

When you have eliminated the impossible...

Welcome to dream-land

Welcome to my museum

Stomp, don't sprint!

Grinding myself down

Pathfinding to everywhere

A deadly mouse trap

If it looks random, it probably isn't

Identical voices and phantom swords

Boing!

Socialised entertainment

Warfare. Aliens. Car crashes. ENTERTAINMENT!

On the h4xx0ring of p4sswordZ

Seeing past the normal

Science versus SoftRAM

Righteous bits

Random... ish... numbers

I get letters

Money for nothing



Give Dan some money!
(and no-one gets hurt)