You probably need a database

Standard

When I see organizations using and talking about their data, they love to present the tools they’re using to handle and wrangle it. You’ve probably heard terms like Hadoop, Spark, Shark, PostgreSQL, MySQL, MongoDB, and rarely Excel. (If you haven’t, there’s a good list to look up on Wikipedia.)

I won’t argue that taming data doesn’t take good tools, but what I will argue is that the tools you use depend on the scale of your data.

I like to think of the following rough categories of data scale:

  • Small data–dataset fits in RAM (anywhere from 1Mb to 8Gb)
  • Medium data–dataset fits on a single hard drive (8Gb to 1Tb)
  • Big data–dataset takes multiple hard drives to store (anything above 1Tb)

Now, I’m a big believer of the Pareto principle, which should lead me to believe that of all of the tech companies out there, only about 20% (or fewer) need the tools suited for big data. Here’s a look at some counts from Indeed.com that roughly confirm that relationship:

  • Spark – 8,701
  • Hadoop – 13,723
  • Oracle Database – 27,177
  • MySQL – 21,770
  • PostgreSQL – 4,285
  • Microsoft Access – 67,538

So what does that mean for the tools you adopt? First, it means that as soon as your data is too big for Excel/Python/PHP/R/Memory it doesn’t mean that it’s time to adopt Hadoop and go hire a team to set it up. It means that you should look into using something like a relational database to interact and investigate your data. Ideally you’re thinking about how to transform your data into something like a spreadsheet anyway which means that a RDBMS is a natural fit.

Of the four that I listed above, two are free so the only cost you’d incur would be in the machines to host it and the time setting it up. The other main reason is that it’s likely someone on your team/in your company already knows how to start using it now.

All that said, there’s definitely a place for tools like Hadoop, but it’ll be very specific to your implementation and how your dataset is growing.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s