OLAP Analytics on Cassandra Data

In my previous 2 posts on MDX & OLAP on NoSQL data stores, I showed you how easy it is to accomplish complex analytics with slice & dice on MongoDB and Cassandra. But in those posts, we wired up the Pentaho suite to Mongo & Cassandra for MDX queries and table output. That was nice, but now let’s use the visualization capabilities in Pentaho’s Business Analytics to build pivot tables and Big Data visualizations for some rich Big Data Analytics.

Click on the embedded URL links in my above paragraph to see how to get started with building a Pentaho Mondrian model so that we can use a tool that sends MDX OLAP queries and renders those results. In this post, we are going to use Pentaho Analyzer.

My starting point will be the small Cassandra data set that I used in the previous “MDX on Cassandra” blog post:

Image

Image

Image

In the above screenshots, I’ve started with the Pentaho Instaview tool with the MongoDB template, modified the in-memory models and now I’m ready to build visualizations in Analyzer. My data set comes from a small # of rows in a CSV file from Excel that had sales figures for our fictional business, listed by salesperson and date.

I am going to first draw a matrix by dragging the Empsale & Empid values into Measures on the design surface. I only need row slicers for this demo, so I’m putting Emplast (last name of sales employee) and Salesdata in Rows. If I had been a bit more diligent about building this out as a real BI solution, I would have given more friendly business names to those entities. Do as I say, not as I do! You can make these changes in the Mondrian model (Edit Model in Instaview).

Image

You should also notice in the pic above, that there is a custom measure that I created in Analyzer: “% of Emp Sale”. You can create custom measures in Analyzer with right-click on the column header. You can use custom MDX here, field calculations, or use one of the pre-set user-defined measure options.

Looks good so far … Now to turn these into visualizations that will best convey the meaning of your data, choose the “View As” option on the top right-hand button selector of the Analyzer screen. In this case, I am showing you my Cassandra data as a scatter plot first and then as a tree heat map:

Image

Image

Advertisements

4 comments

  1. What versions of PDI and Cassandra did you use? I tried it with Pentaho5 and Cassandra 2.0.7, I get the “cannot connect to destination” while trying to open instaview. I found that the error means that cassandra is not supported.

  2. I’m curious to know what what versions did you use? I tried it with Pentaho 5 and Cassandra 2.0.7 running on CentOS 6. I get the “cannot connect to destination” error when I open Instaview. (the error as far as I can tell is means taht pdi doesn’t support cassandra)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s