Archive | Databases RSS feed for this section

The Role of “Big Databases” in Big Data

1 Oct

Big Data requires a Big Database right?

First, let me explain what I mean by a “big database”. I’m referring to products like a data warehouse appliance such as Oracle’s Exadata, Microsoft’s Parallel Data Warehouse or Teradata.

But then there are also “NoSQL” databases that store key/value pairs or JSON document objects like MongoDB, Cassandra and DynamoDB.

And then there are also column-oriented databases like Vertica or MPP style like Aster Data and Netezza.

In the world of Big Data Analytics, you must serve your clients with extremely large, fine-grained data sets that can be quickly & easily traversed, queried, loaded and archived.

In practice, classic database configurations of shared SAN storage and SMP servers does not scale well to this degree of scalability requirement. NoSQL databases are not always feasible because you may want to create, store and archive data at all grains and aggregations as well as creating in-database analytics.

That leaves data warehouse appliances, column-oriented and MPP as the best targets for these data patterns. One more note first: you could perform aggregations and some analytics during data parsing and loading with tools like MapReduce. But I’ll go into that detail in another posting.

What I am finding is that many of the business leaders and decision-makers in organizations that are currently looking to Big Data solutions for their business do not want to put a lot of resources and investment into traditional RDBMS configurations that require a large amount of care, feeding and maintenance. You will still have plenty of knobs to turn, indexes to tune and other settings to tweak with Oracle & SQL Server.

In the big data analytics world, then, Massively Parallel Processing (MPP) databases are very popular. It’s an easier image for a business decision maker to visualize in their head when a database can be pictured as partitioned across worker nodes that can be load-balanced and extended by adding more capacity.

Whether that is the best fit for you or not takes a lot of analysis and examination of all of those data store options. I would say to even be leery of the database vendors over-selling the MPP option unless you also fully have accounted for the additional complexities involved in managing a fully distributed, sharded database.

Br, Mark

A Starting Point

3 Sep

Welcome! If you are wondering where I am coming from, having just started up my new Big Data analytics-focused blog, have a look at my other blog here (MSSQLDUDE) and my background on LinkedIn (please connect in!).

I am going to keep this blog focused on musings, trends, technologies and techniques specific to business intelligence & analytics for Big Data. That is what I do at Razorfish and I want to use this blog as a way to provide more value back into the Big Data community.

Big Data has become an overloaded buzzword in recent years so I’ll define for you what you will find here in this blog:

  1. I generally categorize “Big Data” as data that is too large to manage with traditional RDBMS tools and techniques.
  2. This does not preclude traditional row-based database systems completely from the picture. However, Big Data would require a more of a complex database solution to effectively manage the massive amounts of data. This would typically require a data warehouse appliance scenario, including columnar data storage.
  3. While a database or data warehouse would be beneficial for analytics, a majority of the data in Big Data scenarios is unstructured in nature and not easily ETL’d into a relational model. Instead, MapReduce jobs can be run against data sitting in Hadoop, which is a much more effective way of managing and searching 100s of TBs and PBs of raw data.
  4. This is where NoSQL data stores are very useful in these scenarios as an alternative to RDBMS, and come in many different flavors. Some of the more popular are HBase, MongoDB, Cassandra and RavenDB. The output streams of MapReduce jobs can feed into those data stores, or into your data warehouse for analytics & BI.

Alright, that’s our starting point. We’ll see where things take us from here! Best, Mark

cbailiss

Microsoft SQL/BI and other bits and pieces

TIME

Current & Breaking News | National & World Updates

Tech Ramblings

My Thoughts on Software

SQL Authority with Pinal Dave

SQL Server Performance Tuning Expert

Insight Extractor - Blog

Paras Doshi's Blog on Analytics, Data Science & Business Intelligence.

The SQL Herald

Databases et al...

Chris Webb's BI Blog

Microsoft Analysis Services, MDX, DAX, Power Pivot, Power Query and Power BI

Bill on BI

Info about Business Analytics and Pentaho

Big Data Analytics

Occasional observations from a vet of many database, Big Data and BI battles

Blog Home for MSSQLDUDE

The life of a data geek