Big Data 1997
Another argument for the Ducks
From the archives
Dateline February 2016
I just came across this information on the interwebz. The past's vision of the future always looks awkward from the POV of the actual current future. These are all applications I could pretty much build single-handedly on Full360's Data Platform in AWS.
The Winter Corporation, a consulting firm that specializes in large-scale databases, data warehousing, and strategic information management, conducts a yearly survey of large corporations and then announces winners in a number of VLDB categories. In 1997, the largest database was Knight Ridder's DIALOG, a text database, with 7 terabytes (seven thousand gigabytes) of storage. Replications of databases present in redundant array of independent disks (RAID) systems and mirror sites were discounted in the survey.
Our table summarizes their results for 1997:
I ran like hell away from raised floors and mainframe computers when I came out of college in the mid 80s, but they held dominance over UNIX systems for a dozen years after when it came to big data. I was always impressed with Britton Lee, which later became Teradata, but there was always a huge gap between the having of such data and the presentation of it to the minds that would make sense of it. Nothing is so foolish as Batman in his cave waiting for a printout from the Bat Computer, no matter how much the Boy Wonder assents. Caped data crusaders always get snagged in the real world when faced with Riddler and Jokers.
So I'm saying that it is the collaboration of minds with access to large data sets that make the big difference, not just data scientists sitting in their expensive hideouts plotting world domination. That's what's new and possible now that the cost of computing has come down and the ability to spread seven terabytes among hundreds or even thousands of analysts make for a new dynamic in decision making.
Dateline December 2024
Bigger isn’t always better. But the real question is about the marginal increase in infrastructure complexity and compute cost needed to process truly big data as it relates to the marginal improvement of the decisions possible with x amount of data. I know this intuitively, but I have yet to do the kind of sampling and PCA that proves my point.
Back in the days (2016) when I wrote this, our motto at Full 360 was Big, Wide, & Fast Data. Not just big data. You should have seen the look on my face when I imported a Stata file with 6300 columns (as variables) into DuckDB. Fast, we can take for granted.
I’m ready for the regime of rearchitecting oversized data. We shall see.



