So my definition of big data was data so big it cannot be processed on a single ...

jawns · 2026-03-12T12:27:54 1773318474

I think it's partly tongue in cheek, because when "big data" was over hyped, everyone claimed they were working with big data, or tried to sell expensive solutions for working with big data, and some reasonable minds spoke up and pointed out that a standard laptop could process more "big data" than people thought.

rattray · 2026-03-12T12:23:03 1773318183

> For our first experiment, we used ClickBench, an analytical database benchmark. ClickBench has 43 queries that focus on aggregation and filtering operations. The operations run on a single wide table with 100M rows, which uses about 14 GB when serialized to Parquet and 75 GB when stored in CSV format.

very much so…

rrr_oh_man · 2026-03-12T12:36:35 1773318995

In my former life as a soulless consultant mid-level IT managers really liked to hear the 3 "V"s mentioned: Velocity, Volume, Variety

speedgoose · 2026-03-12T12:38:09 1773319089

The V of Value is very important in some circles.

speedgoose · 2026-03-12T12:37:33 1773319053

Computers got bigger and software got smarter.

You have phones that are faster than cloud VMs of the past. You can use bare metal servers with up to 344 cores and 16TB of ram.

I used to share your definition too, but I now say that if it doesn’t open in Microsoft Excel, it’s big data.

Zambyte · 2026-03-12T12:43:28 1773319408

Processing data that cannot be processed on a single machine is fundamentally a different problem than processing data that can be processed on a single machine. It's useful to have a term for that.

As you say, single machines can scale up incredibly far. That just means 16 TB datasets no longer demand big data solutions.

speedgoose · 2026-03-12T12:49:15 1773319755

I get your point, but I don’t know if big data is the right term anymore.

Many people like to think they have big data, and you kinda have to agree with them if you want their money. At least in consulting.

Also you could go well beyond a 16TB dataset on a single machine. You assume that the whole uncompressed dataset has to fit in memory, but many workloads don’t need that.

How many people in the world have such big datasets to analyse within reasonable time?

Some people say extreme data.

brudgers · 2026-03-12T12:50:22 1773319822

“Your data isn’t big” is a good working definition of big data.

Google has big data. You are not google.

antonyh · 2026-03-12T14:50:52 1773327052

I think the definition of big is smaller than that. Mine was "too big to fit on a maxed-out laptop", effectively >8TB. Our photo collection is bigger than that, it's not 'big data'.

Or one could define it as too big to fit on a single SSD/HDD, maybe >30TB. Still within the reach of a hobbyist, but too large to process in memory and needs special tools to work with. It doesn't have to be petabyte scale to need 'big data' tooling.

brudgers · 2026-03-12T23:44:39 1773359079

“Your data is not big” comes from this thread…https://news.ycombinator.com/item?id=7192839

8TB is a couple hundred hours of 4k RAW video assets.

antonyh · 2026-03-16T17:21:35 1773681695

This is true, but 8TB is big data if it's text.

bcye · 2026-03-12T12:19:44 1773317984

I think they are simply referring to analytical workloads.