Really Big Data

I always dread the question "What do you do?".  On the surface, it would seem like I should be able to answer that question pretty easily.  But like everyone else, I'm a delicate snowflake of interests and and abilities.   Picking a single five-second snippet is seldom satisfying.

Depending on whom I am addressing, I might pick any of the following:

"I run a software company"

"I, um, well, that's a good question..."

"I'm a witty raconteur, a man-about-town and a man of action."

"I've got nothing to hide.  What do you do?"

"I'm b-b-b-b-b-bad to the bone"

None of that answers the question, really, and beyond the first  answer, nobody cares. But as a middle-aged white guy "what do you do" is beyond small talk.  It is the existential question.  At the most basic level, I make copies of myself.  That's the principal of The Selfish Gene.  But now that that's done (I have two sons), the universe is pretty much done with me, so it's programmed my eventual destruction.  Whatever.

I'm a set of algorithms.  And so are you.  Our atomic structure is governed by the physical laws of the universe.  Our DNA is a computer program that arranges the molecules that define how our kidneys, sphincters and sacroiliacs work.  On top of that, electrochemical signals give us the illusion of free will and consciousness and our tendency toward solipsism.  We are made of information.

What I do is an aggregation of all these processes.  What I do is defined by literally trillions of data points being processed in parallel by me and by the machines and people around me.  The act of writing this blog posting is made possible by the fact that I am not suffering from an aneurism and my basement is not being overrun by wolverines.   I've built up skills through environment and genes the ability to use Squarespace and type pretty fast.  Most of it is pure chance and probability.

Compared with the rest of the universe, all this data is pretty paltry, but it is still a metric shit-ton of data.

The meme du jour in computer science is this concept of "Big Data".  You can't swing a dead cat without slamming into a software vendor who claims to handle "big data" better than everybody else.  But it's really just another marketing buzzword, and the hype is starting to fade.  Like "cloud" a few years ago, big data is not going away, but it will be replaced soon by the next fad that everybody will claim to have been doing for the past 10 years.   Gosh, I hope it's "Data Search" software, 'cause that's what Epinomy is.

Speaking of dead cats, the amount of data stored in your average house cat is staggering.  Trillions of atoms and molecules expressing proteins that build the cat.   The cat's brain is a miasma of mouse-hunting, catnip loving, chair rubbing hormones and electrical signals conspiring with one evolutionary purpose - to make more cats.   If you were able to attach signal monitors to every single process happening in that cat, you would have more data in 10 minutes than the entire Internet has produced in the past 30 years.  Granted, most of that data would be completely useless.  But imagine having enough data to know exactly how frustrated your cat is when he can't murder you in your sleep. Just the psychosomatic processes that feed that seething rage, is enough to fill in a Wikipedia's worth of data in a few minutes.

And that's just your average felis catus domesticus. Imagine your eccentric uncle with the neon beer sign collection in his basement.   Those are some complex signals ripe for capturing.  But we're not even close.   All the MRI, EKG, EEG and other TLA diagnostic instruments do not even capture a tiny fraction of the data that is generated internally by your average toll booth attendant at 3:00 AM on Sunday.

When we say "Big Data" nowadays, what we really mean is "Now that we have lots of cheap storage, memory and processing, we can be a lot more reckless with the trade-offs of space and performance."  There *is* a brave new world opening up with increased processing power, but like artificial intelligence, it has been 5 years away for the past 60 years and will be 5 years away for the foreseeable future.

I bet you thought I was going to say "nobody can handle that kind of data, except of course Epinomy".   Nope.   We can handle some humongous data sets, but nothing of the scale that I'm talking about here.  Maybe we should start calling that something other than "big data". 

I propose "humongodata".