Archive | Agile Analytics RSS feed for this section

Tips for Editing Pentaho Auto-Generated OLAP Models

1 Feb

If you’ve followed some of my tutorials earlier here or here where I’ve described the process of auto-generating OLAP models through the Pentaho auto-modeler, you will end up with a basic multidimensional star schema that allow you a basic level of customization such as here:


In most cases, that environment will provide enough control for you to create a model that will cover most of your analytical reporting needs. But if you want to build out a more complex model, you can manipulate the underlying Mondrian schema XML directly in a file or use the Pentaho Schema Workbench tool to build out snowflake schemas, custom calculations, Analyzer annotations, etc.


For direct XML editing of the multidimensional model, you can follow the Mondrian schema guide here.

To pull out the Mondrian model for editing from these Data Source Wizard sources, you can accomplish this by clicking the Export button on the Data Sources dialog box below:


If you use this method from the UI, you will download a ZIP file. Unzip that file and save the “schema.xml” inside the ZIP to your local file system. You can then edit that file in Schema Workbench (PSW) or in an XML editor and import your changes back into the platform from that same Manage Data Sources dialog in the Web UI, or just publish it directly to your server from PSW:


Here’s another tip that I like to do when I pull out a Mondrian schema from an auto-generated Data Source Wizard model that I think is easier than export a ZIP is to use the REST API call for extracting the XML schema directly. I downloaded curl on my Windows laptop to use as a command-line tool for calling Web Services APIs. Now I can make this REST call

curl –user Admin:password http://localhost:8080/pentaho/plugin/data-access/api/datasource/analysis/foodmart/download > foodmart.xml

To make the above call work in your environment, change the “–user” credentials to your username:password, replace the hostname with your server and then substitute “foodmart” for the name of your model that you  wish to modify. You can then edit that resulting file (foodmart.xml) in PSW or with an XML editor.

Don’t forget to import the updated file back into the platform or Publish it from Schema Workbench so that users will then be able to build their reports from the new schema.

One last trick that I do when I re-import or re-publish the edited model when I started from the generated Data Source Wizard model, is to rename the model in PSW or the XML file so that it will appear as a new model in the Pentaho tools. This way, you can avoid losing your new updates if you were to update the model in the thin modeler from Data Source Wizard again.



Agile Analytics, Continued …

18 Sep

It’s been a while since I’ve posted here about the role of Agile Scrum in BI development teams and how you can be successful implementing Agile Scrum in your BI solution development teams. For a review, check out my post from almost exactly 1 year ago here.

That’s back when I was the BI Technology Director for Razorfish where we took a Big Data Analytics platform to market for online advertising analytics. Since then, Razorfish & Atlas were sold to Facebook & Omnicom and I moved back to the software vendor side of the business, with Penatho.

In both cases, we built BI and Analytics solutions using Agile. Pentaho is a bit simpler to implement because we are building Java applications with teams of developers who are already familiar with working in Agile teams and environments.

At Razorfish, we were really employing teams of BI developers: ETL developers, data modelers, cube developers, report designers and data scientists. Not only are many of those roles not comfortable naturally in an Agile environment, many of these roles have worked for years in their career previously working solo, not even in teams at all.

You can read through my earlier post and Ken Collier’s excellent (and very small, quick read) Agile Analytics which I also referenced in that link. But today I wanted to highlight 2 important points when considering Agile Scrum for BI solutions that I have run into in these environments, which also came up many times during my time in Redmond with Microsoft:

  1. BI engineers are indeed developers. You do not have to write code in Java, .NET or C++ to be a “developer”. An engineer writing ETL and models is writing code, testing, working from requirements and troubleshooting. That means that Agile can work and it also means that “BI Developer” is a role that is very important, deserves the same level of respect and can be very difficult and tedious at times. Give some love back to your BI Developer!!
  2. Sprint Reviews CAN work in BI / Analytics solutions and are VITAL to your success. In the Java development world, this may seem like a no-brainer. But I run into a lot of resistance when building Analytics solutions using Scrum. Many folks believe that building a data model, ETL, analytics, reports and demoing all of that in a 3 / 4 week iteration is not possible. It is possible, I have worked in this environment for years and will take you out of that “go off and build-it” approach to business intelligence that has killed many BI projects over the years. Sprint Reviews are your friend!


Why Agile Analytics?

19 Oct

Why Agile Analytics?

Instead of digging into the weeds of Scrum teams and Agile methodology, let me just give you a quick, short, snappy use case that we are doing @ Razorfish on the BI Platform team that shows true business value for developing data warehouses, analytics and business intelligence solutions in an Agile, iterative manner.

We have several large data warehouses in SQL Server and Teradata Aster along with large distributed file systems using Hadoop for data scientists. When the DW/BI development teams work on a new subject area, new features or other development projects on Big Data and VLDBs, it will take time, care, caution and incremental change.

Now, in our case, we already have an established revenue stream and cost justification for servers & infrastructure for Big Data. But suppose that you are starting a new project that will use data warehouses for BI and analytics and will demonstrate business value such as strategic and marketing advantages that your business cannot achieve today because they cannot mine all that data.

But if you put together a traditional project team and project plan, you may have milestones that drag out on the magnitude of years, not weeks or months. Additionally, those milestones will include gates and those events on the project plan will likely entail project work like requirements analysis, risk assessments, developer checkpoints, analyst checkpoints, etc.

Now suppose instead, that you build the team as a self-evolving group that estimates work based on story points instead of hours and builds only enough backlog that a fully functioning feature can be demonstrated back to the business in 3-4 weeks.

After 2 months (assume 1 sprint for establishing velocity and team establishment), you will have BI reports and access to knowledge mined from Big Data that your business never had before. It will be demonstrable, probably with shims and some prototypes so that it can be demo’d. But it will include data for the demo and will function end-to-end. And after 3 more weeks, you’ll demonstrate more reports with more & more real, live data each iteration.

When you compare that against long development cycles with planned demonstration checkpoints of code that is not fully tested or integrated. Agile will demand that the code is unit, integration and regression tested each sprint.

My experience has shown that Agile approach in DW/BI works and will make you much more successful in keeping the business happy and your Big Data investment safe.

Can an Agile Development Approach Work for a Data Warehouse?

16 Sep

By now, Agile development methodologies have more than proven themselves in the software engineering industry. Agile & Scrum has provided development teams with a process that is lightweight, flexible and effective in keeping software projects on-time, on-budget and closer to user expectations.

So why can’t we apply that same approach to data warehouse and BI projects? Well, we can. In this post, I’ll kick-off my conversations with you about how you can apply Agile Scrum for your DW & BI development teams. Many lessons that I’ve learned @ both Oracle and Microsoft will be applied here as well as the practices that we apply @ Razorfish, many of which originate from Ken W. Collier’s exellect book Agile Analytics, which I highly recommend.

One of the effective approaches that Ken takes in his book is the same teaching mechanism that I use, which is to teach the basics of Agile Scrum development and adapt to DW & BI practices. I’ll dive into this throughout my posts to make that more clear. But for now, simply think about the Agile Manifesto and the principles applied therein, namely self-organizing teams, working software for each Sprint review in Scrum and user stories for voice of the customer. From there, we’ll work in the specifics of DW & BI development, which is different in many aspects to developing software in Java or .NET. And since we’re using basic Agile Scrum practices, nearly any good book or article on forming Scrum teams will work.

I also like to emphasize TDD (test-driven development) as an approach that is most effective to quality products in business intelligence and data warehouse solutions. To classic software development professionals, this may seem like a no-brainer. TDD has been also proven very effective at producing quality iterations in development cycles. However, once again, in the DW & BI world, TDD is a bit more difficult. Sticking with data warehouse development for this initial post, it’s important to remember that DW professionals will develop solutions using ETL 4GL tools, SQL code and workflow integration packages. Tool vendors like Microsoft (Visual Studio with Data-Tier Applications, SSIS 2012 and TFS include some support for Agile, versioning, releases and TDD) and open-source tools like Pentaho are moving more & more toward Agile BI.

But generally speaking, it can be challenging for DWBI developers to easily integrate these Agile practices into the development cycle.

My experience has been that it is possible to self-organize, test-first, generate user stories and develop from a backlog in 30-day iterations with working solutions at the end of each iteration in the DWBI world. There is, however, a specific persona in these projects whose job becomes more challenging, as opposed to easier, with Agile, that is not found as much in other software engineering projects. That is the role of the data modeler.

In large and highly-integrated data warehouse solutions, the data modeler is especially important, as you will likely benefit from a canonical model with Web Services integration and contract-first approaches. This can make the reality of a highly-flexible 30-day iteration approach difficult for a role whose intent is to keep things locked-down and with a scope that is quite larger. Certainly much larger than what can be accomplished in 30 days.

Therefore, the approach that we took was to plan a separate set of data modeling sprints that were devised from a backlog coming from user stories that were developed before Sprint Zero even began on the DW project. Within 3 months, we had the model defined such that services, analytics, UX and other areas of the solution could work against a solid set of interfaces. Those Sprints also gave us the opportunity to test and prove-out the model before moving forward by using prototype mock-up reports that used completely auto-generated data in the databases.

So, that’s a starting point for your journey into Agile Analytics from my perspective. I’ve worked on a number of very large DWBI projects where we chose to organize in Agile Scrum teams and it has worked very effectively. There were a number of growing pains, as in each case, many of the team members were new to the approach. I’ll keep posting here and sharing my experiences and lessons learned to help you on your way to successful data warehouse business intelligence projects!

Br, Mark

Next up … Use Cases & Agile Analytics

4 Sep

Just wanted to put out a placeholder for the blog so that you will have a good understanding of the primary use cases for which I work with Big Data.

Also, building Big Data Analytics with an Agile development team will be a key focus.

Stay tuned …


Microsoft SQL/BI and other bits and pieces


Current & Breaking News | National & World Updates

Tech Ramblings

My Thoughts on Software

SQL Authority with Pinal Dave

SQL Server Performance Tuning Expert

Insight Extractor - Blog

Paras Doshi's Blog on Analytics, Data Science & Business Intelligence.

The SQL Herald

Databases et al...

Chris Webb's BI Blog

Microsoft Analysis Services, MDX, DAX, Power Pivot, Power Query and Power BI

Bill on BI

Info about Business Analytics and Pentaho

Big Data Analytics

Occasional observations from a vet of many database, Big Data and BI battles

Blog Home for MSSQLDUDE

The life of a data geek