Big Data is a misnomer. Too often, people immediately think about the enormous, large, deluge of data and the exabytes of data being created in the universe, data volumes doubling every year, etc., etc. The “volume” problem that Big Data presents is only a portion of the problem space. And to focus on the storage of that data moves too much focus away from the business problems: marketing attribution, customer churn, improving outcomes, risk mitigation, etc.
And in the world of solution and product development, R&D teams should equally not get bogged-down by the data sizes and keep your eyes on delivering incremental solutions to market in a way that adds value in iterations. in other words, Agile delivery is still possible, even in Big Data scenarios.
Ken Collier’s seminal book Agile Analytics did an outstanding job of translating the traditional Agile Manifesto methods of software development to the traditional BI & DW project space including coverage of ETL, data modeling, reports, testing, continuous integration, TDD, etc. Once you’ve read that book, you should feel confident that you can deliver products in an Agile way with business intelligence teams.
Take those same concepts to the world of Big Data Analytics, for instance. We did this successfully last year (2012) in taking a Big Data platform to the market with on-shore & off-shore development teams with a mixed technology environment and managing unstructured & structured data sets that had GBs of data change daily that needed to be processed with analytical models and star schemas in the TBs.
The keys are not uncommon to other project types: buy-in from management, buy-in from the technical teams, strong leadership & Scrum Masters and strong & engaged Product Owners were critical to the success from an organizational perspective.
From a technical perspective, things get a little bit different because many Big Data platform tools are monolithic in nature, not well integrated yet, and are very new to technical teams. But the same concepts can apply:
- Ensure that developers have clean, stripped-down environments for easy & quick development. I.e. don’t use complete copies of environments, which won’t work in Big Data scenarios
- Practice CI of all code: ETL, MapReduce, analytical functions, scripts (PIG, Hive, etc)
- Bring your data scientists into the Agile Scrum team environment and include their models as part of CI and Sprint testing tasks.
- Make sure data scientists and POs are in the Sprint reviews.