Archive | In-memory database RSS feed for this section

SQL Server Big Data Session Demo Files

12 May

Thanks to all who joined me at Penn State Abington on Saturday for the Philly Code Camp 2013.1! As promised, here are the supporting files that I used for the Big Data demos on Hadoop (Microsoft’s HDInsight). If you would like the slides, you can click over here on Slideshare for those. Best, Mark

This is the PowerPivot Excel file with sample reports that I used to create the Power View and reports using the Microsoft Hive ODBC driver to pull the data from Hadoop: icatab. BTW, ICA stands for “impressions, clicks, actions” and is based on a sample set of clicksteam analytics that I generated with aggregated data from each month of the past 2 years. The idea is that you can use this data to simulate Big Data Analytics with tools like PowerPivot from aggregated data that would be generated from MapReduce and/or Hive:

ica   ica2


This is the sample SSIS package that I created which also used the Hive table that I craeted in Hadoop (HDInsight) and again uses the ODBC driver as a source, with a simple transformation and a SQL Server destination: Use this technique as a better way of putting aggregated data from Hive queries into SQL Server for analysis instead of running a series of Hive commands directly or using Sqoop. I found this ODBC / SSIS approach performs much better.

Did Big Data Kill OLAP Cubes?

19 Sep

Did Big Data Kill OLAP Cubes? Not yet, but very possibly soon.

Think about the traditional usage and purpose of OLAP cubes in terms of their predominate deployment today. In most cases, enterprises are using cubes to aggregate data and pre-process data from multiple data source and/or a data warehouse to provide BI capabilities.

Many of these use cases are based upon data processing cycles that occur daily with large sets of data in bulk fashion. Well, that sounds quite a bit like Big Data requirements of processing large data sets in bulk fashion and then providing access to that post-processed data to analysts, scientists, etc.

So there is still clearly a correlation and applicability of OLAP cubes in the Big Data world.

OLAP cubes provide value in a number of ways, including abstracting report queries away from the database and providing fast access to knowledge through techniques that include pre-aggregated, pre-built analytics in the cube. This is where we start to breakdown in terms of the future of OLAP cubes in Big Data use cases.

In Big Data use cases, we need to provide much more ad-hoc, data exploration and knowledge self-discovery. This makes building the analytics in the cube based on requirements and assumptions very difficult. Even in the most “Agile” BI shops, this is a challenge.

This is where in-memory technologies, MPP and columnar databases become key enablers in the BI stack for Big Data. I’m writing a few new posts for SQL Server Pro mag and MSSQLDUDE that I’ll link to here to explain this in more technical terms over the next few days. Back here in Big Data Analytics, I’ll talk about generic MPP techniques.

For now, be prepared to hear the BI and database industry talk about maximizing in-memory cubes & databases for BI & reporting purposes, replacing OLAP cubes.

This does NOT preclude the need for semantic modeling and abstraction layers. And OLAP cubes still play a very important role in specific use cases that do not require large sets of ad-hoc query requirements.

However, Big Data architects do need to think about solving the traditional BI problems in a different way.


Microsoft SQL/BI and other bits and pieces


Current & Breaking News | National & World Updates

Tech Ramblings

My Thoughts on Software

SQL Authority with Pinal Dave

SQL Server Performance Tuning Expert

Insight Extractor - Blog

Paras Doshi's Blog on Analytics, Data Science & Business Intelligence.

The SQL Herald

Databases et al...

Chris Webb's BI Blog

Microsoft Analysis Services, MDX, DAX, Power Pivot, Power Query and Power BI

Bill on BI

Info about Business Analytics and Pentaho

Big Data Analytics

Occasional observations from a vet of many database, Big Data and BI battles

Blog Home for MSSQLDUDE

The life of a data geek