I had a few queries following my posting on MDX for Cassandra using Pentaho’s Big Data Analytics, namely about if this same capability was available on other NoSQL platforms, not just Cassandra. Simply put, the answer is yes. And I’ll give you a really short simple demo of Pentaho’s MDX Olap for MongoDB.
First, let me just point out that MongoDB is a very important NoSQL / Big Data partner for Pentaho as you can read about here. So you’ll see a lot of tight integration and built-in capabilities with MongoDB in the Pentaho Suite.
As I did in the Cassandra demo earlier, I’ll use Pentaho’s Instaview to build out a quick & easy template, but selecting the MongoDB template this time, instead of Cassandra.
This will direct Pentaho to generate a bunch of automated routines in the background through the PDI / Kettle tranformation to the Mondrian OLAP engine. Your end result will be an OLAP model for sending MDX queries against your MongoDB data, using an embedded in-memory model. First, tell PDI (through Instaview’s steps) where your MongoDB data store is located and give it a JSON query to pull the data that you want to use for your OLAP BI application:
In my simple demo, I am going to use the PageSuccessions sample data set, which you can find more about here. I really like this data set because I’ve worked with clickstream analytics in the past and Web log data stored in a NoSQL data store is a key, common NoSQL and Big Data use case, so it’s a good representative data set.
What I really like about the way that Pentaho’s tools work with MongoDB is the way that it queries the data store’s metadata to fill in the database and collection information from the source. Â I wrote up a bit more about it here at this posting on KromerBigData about MongoDB with Pentaho tools.
When you are done with those steps, you will now have an auto-generated OLAP model for MDX querying. It is very simple in this case and you will want to modify the measure and dimensional properties to fit your reporting needs.
Time to start slicing & dicing … I can drag fields from the OLAP model onto the design surface to build a pivot table or use a data visualization option.
That data visualization tool is Pentaho’s Analyzer and is sending MDX queries to the in-memory data store from MongoDB’s data, which means that we can now use it to query MongoDB manually using MDX.
Go to Administration -> MDX in Analyzer and in the subsequent query box, you can use my demo MDX against PageSuccessions which will slice the counted hits per page and order it by total hits descending (DESC):
SELECT [Measures].[Count] ON 0, ORDER ([nextUrl].Members,[Measures].[Count], DESC) ON 1 FROM [Untitled 10]
The cube name “Untitled 10” is just a placeholder given to the auto-generated OLAP logical model that Instaview creates and is generated by Mondrian.
The results look like this:
And that’s it. Not too much different than my earlier Cassandra MDX OLAP demo using Pentaho. Just a slight difference in setting up the data source. The querying, reporting and model generation are all essentially the same otherwise.
Enjoy!! Best, Mark
[…] further to do a few more interesting things with MDX OLAP models that we’ve put on top of MongoDB and Cassandra data […]
[…] data stores, I showed you how easy it is to accomplish complex analytics with slice & dice on MongoDB and Cassandra. But in those posts, we wired up the Pentaho suite to Mongo & Cassandra for MDX […]
Hi i have a few questions. Please help me:
1) To Do OLAP on Mongo i am unable to find the administration button from where we select MDX.
2) Can we do MDX on Hadoop as well.
3) can we integrate this and call this job using java ?
4) And when i am transforming my data using kettle i am getting an error:
Unable to load data cache.
Confirm input settings in the configure panel.
1) The MDX selector is in the Pentaho EE product called Analyzer. Go to Settings | Administration | MDX from an Analyzer report toolbar, not the PUC toolbar.
2) Yes, using a SQL on Hadoop technology like Hive, Impala, Drill, etc.
3) Yes
4) Can you submit that question on our Kettle forum here? http://forums.pentaho.com/forumdisplay.php?135-Pentaho-Data-Integration-Kettle