Agile Big Data Analytics: The Art of the Possible, part 1

19 Feb

Big Data is a misnomer. Too often, people immediately think about the enormous, large, deluge of data and the exabytes of data being created in the universe, data volumes doubling every year, etc., etc. The “volume” problem that Big Data presents is only a portion of the problem space. And to focus on the storage of that data moves too much focus away from the business problems: marketing attribution, customer churn, improving outcomes, risk mitigation, etc.

And in the world of solution and product development, R&D teams should equally not get bogged-down by the data sizes and keep your eyes on delivering incremental solutions to market in a way that adds value in iterations. in other words, Agile delivery is still possible, even in Big Data scenarios.

Ken Collier’s seminal book Agile Analytics did an outstanding job of translating the traditional Agile Manifesto methods of software development to the traditional BI & DW project space including coverage of ETL, data modeling, reports, testing, continuous integration, TDD, etc. Once you’ve read that book, you should feel confident that you can deliver products in an Agile way with business intelligence teams.

Take those same concepts to the world of Big Data Analytics, for instance. We did this successfully last year (2012) in taking a Big Data platform to the market with on-shore & off-shore development teams with a mixed technology environment and managing unstructured & structured data sets that had GBs of data change daily that needed to be processed with analytical models and star schemas in the TBs.

The keys are not uncommon to other project types: buy-in from management, buy-in from the technical teams, strong leadership & Scrum Masters and strong & engaged Product Owners were critical to the success from an organizational perspective.

From a technical perspective, things get a little bit different because many Big Data platform tools are monolithic in nature, not well integrated yet, and are very new to technical teams. But the same concepts can apply:

  1. Ensure that developers have clean, stripped-down environments for easy & quick development. I.e. don’t use complete copies of environments, which won’t work in Big Data scenarios
  2. Practice CI of all code: ETL, MapReduce, analytical functions, scripts (PIG, Hive, etc)
  3. Bring your data scientists into the Agile Scrum team environment and include their models as part of CI and Sprint testing tasks.
  4. Make sure data scientists and POs are in the Sprint reviews.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

cbailiss

Microsoft SQL/BI and other bits and pieces

TIME

Current & Breaking News | National & World Updates

Tech Ramblings

My Thoughts on Software

SQL Authority with Pinal Dave

SQL Server Performance Tuning Expert

Insight Extractor - Blog

Paras Doshi's Blog on Analytics, Data Science & Business Intelligence.

The SQL Herald

Databases et al...

Chris Webb's BI Blog

Microsoft Analysis Services, MDX, DAX, Power Pivot, Power Query and Power BI

Bill on BI

Info about Business Analytics and Pentaho

Big Data Analytics

Occasional observations from a vet of many database, Big Data and BI battles

Blog Home for MSSQLDUDE

The life of a data geek

%d bloggers like this: