I’ve been closely observing the evolution of marketing of the classic database and data warehouse products over the past 2 years with great interest. Now that Big Data is top-of-mind of most CIOs in corporations around the globe, traditional data vendors like IBM, Oracle, Teradata and Microsoft are referring to their platforms as “Big Data” or “Big Data Warehouses”.
I guess, in the final analysis, this is really an attempt by data vendors at shifting perceptions and melding CIO thinking about Big Data away from Apache Hadoop, Cloudera and Hortonworks and toward their own platforms. Certainly, there are some changes taking place to those traditional data warehouse platforms (MPP, in-memory, columnstore) that are important for workloads that are classic “Big Data” use cases: clickstream analysis, big data analytics, log analytics, risk modeling … And most of those vendors will even tack-on a version of Hadoop with their databases!
But this is not necessarily breaking new ground or an inflection point in terms of technologies. Teradata pioneered MPP decade ago, Oracle led the way with smart caching and proved (once again) the infamous bottleneck in databases is I/O. Columnar databases like Vertica proved their worth in this space and that led to Microsoft and Oracle adopting those technologies, while Aster Data led with MapReduce-style distributed UDFs and analytics, which Teradata just simply bought up in whole.
In other words, the titans in the data market finally felt enough pressure from their core target audiences that Hadoop was coming out of the shadows and Silicon Valley to threaten their data warehouse market share that you will now hear these sorts of slogans from traditional data warehouses:
Oracle: http://www.oracle.com/us/technologies/big-data/index.html. Oracle lists different products for dealing with different “Big Data” problems: acquire, organize and analyze. The product page lists the Oracle Big Data Appliance, Exadata and Advanced Analytics as just a few products for those traditional data warehouse problems. Yikes.
Teradata: In the world of traditional DWs, Teradata is the Godfather and pioneered many of the concepts that we are talking about today for Big Data Analytics and Big Data DWs. But Aster Data is still a separate technology and technology group under Teradata and sometimes they step on their own messaging by forcing their EDW database products into the same “Big Data” space as Aster Data: http://www.prnewswire.com/news-releases/latest-teradata-database-release-supports-big-data-and-the-convergence-of-advanced-analytics-105674593.html.
But the fact remains that “Hadoop” is still seen as synonymous with “Big Data” and the traditional DW platforms had been used in many of those same scenarios for decades. Hadoop has been seen as an alternative means to provide Big Data Analaytics at a lower cost per scale. Just adding Hadoop to an Oracle Exadata installation, for example, doesn’t solve that problem for customers outside of the original NoSQL and Hadoop community: Yahoo, Google, Amazon, etc.
So what are your criteria for a database data warehouse to qualify as a “Big Data Warehouse”? Here are a few for me that I use:
- MPP scale-out nodes
- Column-oriented compression and data stores
- Distributed programming framework (i.e. MapReduce)
- In-memory options
- Built-in analytics
- Parallel and fast-load data loading options
To me, the “pure-play” Big Data Analytics “warehouses” are: Vertica (HP), Greenplum (EMC) and Aster (Teradata). But the next-generation of platforms that will include improved distributed access & programming, better than today’s MapReduce and Hive, will be Microsoft with PDW & Polybase, Teradata’s appliance with Aster & SQL-H and Cloudera’s Impala, if you like Open Source Software.
Reblogged this on Sutoprise Avenue, A SutoCom Source.
Mark, I like your list of ‘what makes big data = big data’ but perhaps, strictly speaking, if the column-oriented compression and data stores are simply a technical technique not visible to the user, does it matter how the vendor achieves their ability to handle large quantities? Or, as a practical matter, does ever Big Data product use column-oriented compression and this is not a very interesting question?
Definitely this was a technically-oriented list, no doubt. However, without certain enabling technologies such as columnstore, column compression, distributed programming and in-memory databases, my experience has been that you will limit the analysis and responsiveness to the point in which you are not experimenting with new data (i.e. data discovery) and data sets outside of the range of the classic data warehouse.
That’s why I like to see a more distinct understand of what constitutes a “Big Data Warehouse” vs. a traditional “Data Warehouse”.
From a business perspective, a Big Data Analytics approach will give you better ROI and more value because you are working with new data sets and larger samples.
[…] ago, I submitted a post title “What Makes Your Data Warehouse a Big Data Warehouse?” here. I had a number of responses and back-and-forth on the blog as well as during my travels and Big […]