Welcome! If you are wondering where I am coming from, having just started up my new Big Data analytics-focused blog, have a look at my other blog here (MSSQLDUDE) and my background on LinkedIn (please connect in!).
I am going to keep this blog focused on musings, trends, technologies and techniques specific to business intelligence & analytics for Big Data. That is what I do at Razorfish and I want to use this blog as a way to provide more value back into the Big Data community.
Big Data has become an overloaded buzzword in recent years so I’ll define for you what you will find here in this blog:
- I generally categorize “Big Data” as data that is too large to manage with traditional RDBMS tools and techniques.
- This does not preclude traditional row-based database systems completely from the picture. However, Big Data would require a more of a complex database solution to effectively manage the massive amounts of data. This would typically require a data warehouse appliance scenario, including columnar data storage.
- While a database or data warehouse would be beneficial for analytics, a majority of the data in Big Data scenarios is unstructured in nature and not easily ETL’d into a relational model. Instead, MapReduce jobs can be run against data sitting in Hadoop, which is a much more effective way of managing and searching 100s of TBs and PBs of raw data.
- This is where NoSQL data stores are very useful in these scenarios as an alternative to RDBMS, and come in many different flavors. Some of the more popular are HBase, MongoDB, Cassandra and RavenDB. The output streams of MapReduce jobs can feed into those data stores, or into your data warehouse for analytics & BI.
Alright, that’s our starting point. We’ll see where things take us from here! Best, Mark