Tag Archives: hadoop

Big Data Analytics Presentation for SQL Saturday Orlando

28 Sep

Thanks to all for joining my session on Big Data Analytics at Seminole State College in Sanford, FL for the SQL Saturday event. I’ve uploaded my slides to SlideShare here. Thanks again!  Best, Mark


Big Data – Lots of Data

5 Mar

I’ve spent most of my time blogging at my Big Data site here focused on the business value-add aspect of Big Data: Big Data Analytics. There are 2 emerging points about Big Data that are made consistently that I want to emphasize:

  1. Big Data as a practice does not just mean lots of data
  2. Eventually, all data can be seen as “Big Data”

That being said, we shouldn’t ignore the impacts that we feel as data professionals of large data sets and large data stores. This harkens back to the days of VLDBs, EDWs, Teradata, etc. where you have RDBMs that include techniques for dealing with the challenge of large databases: modifying schemas, backup/restore, read vs. write throughput and so on. I stick with my mantra that Big Data != NoSQL. That is, NoSQL has applications to Big Data problems. But NoSQL databases has varying origins and divergent purposes.

I have always seen the NoSQL movement as brought on by 3 primary drivers:

  1. Developers aversion to DBA work and the complexities of RDBMSs
  2. Internet social & search sites desire to not pay big $$ for large database systems
  3. The need for flexibility in schema and data even in very large data sets

#3 in my list above is addressed by the NoSQL databases that I’ve used: Cassandra, Hbase & Dynamo. Big Data Analytics has additional requirements that go beyond these key/value & document stores that are very good for inserting data, but not built for complex queries, aggregations, analytics, etc.

The major database vendors (MSFT, Oracle, Teradata, HP, EMC) are addressing these needs in their platforms by including more & more in-memory & columnar capabilities to help eliminate IO bottlenecks and including MapReduce functionality and other integrations to Hadoop tools to enable the analytics to distribute across clusters like Cassandra does for data stores.

Bottom-line: Your Big Data project will require a complete understanding of the NoSQL, Hadoop, MapReduce, DW and Analytics tool landscape. There are many more than I touched on here briefly that are available to you as a data professional. Each has their own strengths & weaknesses. The successful Big Data platforms that I’ve worked on to date have included some parts of all of those, so they are not mutually exclusive and they are not one-size fits-all.

Big Data: Think in Terms of Business Problems

12 Feb

Big Data, although more specifically, Big Data Analytics, help solve business problems. These business problems include advanced customer analytics:

  • Customer segmentation for targeted marketing
  • Root-cause analysis of network problems
  • Data correlation for improved health care outcomes
  • Customer churn management
  • Advanced risk management

These are all problems that can be solved today with traditional data warehouse & business intelligence techniques. But advanced forms of these analyses with additional complex & streaming data sources provide additional business benefit that lift the already improved outcomes and marketing lift. This is the value that Big Data brings to your business.

And this is why I tend to focus on Big Data Analytics and why it is a clearly an extension of business intelligence and data warehousing, not a replacement. Analytics provides root cause, correlation and data discovery that you cannot achieve with KPI-based balanced scorecards on a dashboard.

But, you need to beginning playing with an experimenting with Big Data tools to break through the DW/BI barrier where you are currently boxed in with 10-20% organizational data asset reach and 8-hour ETL windows:

  • Hadoop for storing large & complex data files across distributed nodes
  • MapReduce to process those files on Hadoop with data locality and divide & conquer
  • NoSQL databases like Cassandra & Hbase to write data into clusters quickly, beyond RBMS boundaries
  • In-memory analytics for real-time drill-down and data discovery
  • Columnar data storage for max compression and analytical capabilities

Microsoft SQL/BI and other bits and pieces


Current & Breaking News | National & World Updates

Tech Ramblings

My Thoughts on Software

SQL Authority with Pinal Dave

SQL Server Performance Tuning Expert

Insight Extractor - Blog

Paras Doshi's Blog on Analytics, Data Science & Business Intelligence.

The SQL Herald

Databases et al...

Chris Webb's BI Blog

Microsoft Analysis Services, MDX, DAX, Power Pivot, Power Query and Power BI

Bill on BI

Info about Business Analytics and Pentaho

Big Data Analytics

Occasional observations from a vet of many database, Big Data and BI battles

Blog Home for MSSQLDUDE

The life of a data geek