Thanks to all who joined me at Penn State Abington on Saturday for the Philly Code Camp 2013.1! As promised, here are the supporting files that I used for the Big Data demos on Hadoop (Microsoft’s HDInsight). If you would like the slides, you can click over here on Slideshare for those. Best, Mark
This is the PowerPivot Excel file with sample reports that I used to create the Power View and reports using the Microsoft Hive ODBC driver to pull the data from Hadoop: icatab. BTW, ICA stands for “impressions, clicks, actions” and is based on a sample set of clicksteam analytics that I generated with aggregated data from each month of the past 2 years. The idea is that you can use this data to simulate Big Data Analytics with tools like PowerPivot from aggregated data that would be generated from MapReduce and/or Hive:
This is the sample SSIS package that I created which also used the Hive table that I craeted in Hadoop (HDInsight) and again uses the ODBC driver as a source, with a simple transformation and a SQL Server destination: http://sdrv.ms/15X1mky. Use this technique as a better way of putting aggregated data from Hive queries into SQL Server for analysis instead of running a series of Hive commands directly or using Sqoop. I found this ODBC / SSIS approach performs much better.