I’ve spent the past several posts here at my Big Data Analytics blog introducing you to Big Data Analytics with Pentaho by leveraging OLAP models and MDX on NoSQL source like Cassandra and MongoDB. I received a lot of positive responses to that from many folks who had no idea that analytics tools like Pentaho could provide that same slice & dice and drill-detail on those sources. Problem is, I was still on the previous version of the Pentaho BI suite 4.8.2.
Well, I’ve finally upgraded to 5.0.2, which you can download here from Pentaho. So, in today’s post, I’m going to take you through a new demonstration of OLAP analytics on a Big Data source. But this time, I am going to use the new 5.0 Pentaho BI Suite and I will also use another Big Data source: memsql. Memsql is an all in-memory distributed database engine which was built to solve large Big Data Analytics problems. It was extremely easy for me to set-up and connect to Pentaho because it is based on MySQL, so I was able to use the MySQL JDBC driver to make things work in this demo.
1. I installed Pentaho 5.0.2 on my Windows 7 laptop, while I am running memsql on a single CentOS Linux VM which I download from memsql.com.
2. I created a memsql database from our Pentaho Mondrian sample data set for “Foodmart” and ran the create scripts from the MySQL Workbench. That connected to my memsql instance and generated the schema and sample data.
3. Open the Pentaho User Console from your Web Browser … I’m starting from scratch here with the steps since this my first post for Pentaho 5.0!
4. Create a new Analyzer Report and select a memsql source, which you can connect to via the MySQL JDBC driver. We’ll then use the auto-modeler built into the Pentaho Suite to build the ROLAP model on top of memsql for Analytics.
5. Create a new data source & Analyzer report model. Pentaho will connect to the tables via MySQL JDBC and will auto-generate a Mondrian ROLAP model for you.
6. You then will be prompted in the wizard to design a very simple star schema for Mondrian. Just tell the wizard which tables to use for OLAP and join the dimension tables to the fact table
7. Now you can have fun with Analyzer, choosing the new model as the source and pull the fields that were created from the Foodmart database running in-memory on memsql for drill detail, slice, dice, etc. Very nice! Also, very similar to Pentaho 4.8, but with a much more clean, clear and crisp (the 3 c’s!) user experience now in Pentaho 5.0.