If you’re like many IT professionals who’ve had anything to do with large amounts of data, you’ve become immune to the phrase ‘big data’. Mostly because the meaning behind that phrase can vary so wildly.
Processing ‘big data’ can seem out of reach for many organizations. Either because of the costs in infrastructure required to establish a foothold on this front or because of a lack organizational expertise. And since the meaning of ‘big data’ can vary so much, you may find that you’re doing ‘big data’ work and then ask yourself, “Is this big data?” Or an observer can suggest that something is ‘big data’ when you know full well that it isn’t.
With my own background in data, I’m ever curious about what’s out there that can make the threshold into ‘big data’ seem less insurmountable. Also, I’m interested in the security considerations around these solutions.
In the last week or so, I’ve gotten more familiar with AWS s3 buckets and a querying service called Amazon Athena. Here’s the truly amazing thing. You can simply drop files in an s3 bucket and query them straight from Amazon Athena. (There are just a couple steps to go through, but they are mostly trivial.) And for the most part, there’s not much of a limit for how much data you can query and analyze. You can scan 1tb of data for $5. What? That’s right. And you didn’t have to set up servers, database platforms, or any of that. I’ll be exploring Amazon Athena more and more over the coming weeks. If you have an interest in this sort of thing, I suggest you do the same.
One note: Google has something similar called BigQuery, so that might be worth a look as well. I’ve explored BigQuery briefly but I keep coming back to various AWS services since they seem to be holding strong as a dominant leader in emerging cloud technologies. But as well all know, the emerging technology landscape can change very quickly!