How I Use Pentaho to Easily Move Data Through MongoDB

20 Sep

I can still clearly remember when object-oriented programming started to become the standard model, moving away from structural procedural code. I used to think to myself that new programmers, just starting in C++ or Java had it easier than guys like me, who started in Pascal and Basic and needed to relearn how to do things. In fact, I often found myself doing things the wrong way in C++ because I had just learned C and so I tried to fit my structural procedural brain trained on C, into OO, not unlike a square peg into a round hole.

Same thing when I learned data warehousing, star schemas and multidimensional modeling. I came from the standard 3NF OLTP world into OLAP and found that the learning curve was such that I would find it easier to start in the OLAP world instead of relearning new ways of expressing data.

I find this all very similar to where we are now in the NoSQL world. There are obvious use cases for NoSQL data stores such as caching, storing unstructured data, log events, etc. And so it makes sense to move or create apps that may have been using MySQL or any other lock-based serializable ACID RDBMS.

But you also limit yourself to the tools available for input, update and retrieval. And since I’m not here to write Java or Javascript to get or put data into a data store, I greatly appreciate tools that make interacting with a NoSQL data store like MongoDB easy and familiar to my SQL-ized brain. Many BI, OLAP and ETL tools just don’t speak to NoSQL data stores.

Here is an example of using Pentaho Data Integration (PDI, aka Kettle) to connect to my MongoDB collection, insert data and then report on that data, all within the Pentaho Business Analytics suite. If you’d like to try this out, just download the Pentaho BA Suite evaluation from http://www.pentaho.com. All of these tools are available in the Pentaho Business Analytics Suite.

First, I created my database and collection (table) in MongoDB, calling the DB Demo and the collection or table “Sales” from the MongoDB command prompt:

MongoDB shell version: 2.4.5
connecting to: test
> use Demo;
switched to db Demo
> db.createCollection(“Sales”);
{ “ok” : 1 }
>

That’s all that I had to do to set up my data store in Mongo. The rest was done in Pentaho.  But for now, it’s time to get a spreadsheet of data from my sales.csv into Mongo using the PDI tool from Pentaho. I will first create a transformation (the T in ETL):

mongo1

My source is a simple CSV file of sales that looks like this:

EmpID,LastName,Sales,SalesDate
1,Kromer,12,1/1/2012
2,Smith,12.99,1/2/2012
3,Jones,1.99,1/3/2012
4,Kumar,120.5,1/4/2012
5,Yung,50,1/5/2012
6,Basero,5,1/1/2012
1,Kromer,25.25,1/2/2012
2,Smith,12,1/3/2012
3,Jones,12.99,1/4/2012
4,Kumar,1.99,1/5/2012
5,Yung,120.5,1/1/2012
6,Basero,50,1/2/2012
1,Kromer,5,1/3/2012
2,Smith,25.25,1/4/2012
3,Jones,11,1/5/2012
4,Kumar,334.3,1/6/2012
5,Yung,1000.2,1/7/2012
6,Basero,1,1/8/2012

Straight-forward comma-separated value strings with a header that includes Employee ID, last name, sales amount and sales data fields. I connect that CSV text file input step to the MongoDB output step to insert those as data documents into Mongo. What you’ll see in the next 2 configuration screens below are the different tab options in the MongoDB output step dialog from PDI Kettle. I am pointing to my local MongoDB instance database Demo and collection Sales. I define the fields that will map to the input fields coming from the CSV text file. Mongo will use these names to build the document in the data store for me. As a data engineer, not a Java programmer, I love this because this allows me to use an ETL data intergrator using the terminology that I understand and to easily move data into my NoSQL data stores:

mongo2

 

mongo3

 

Great, I have data in Mongo. Now let’s use Pentaho Instaview to quickly generate a report by taking that same data, staging it in an in-memory database and then put an analysis model around it for OLAP on Mongo that will look like this:

mongo4

 

Pentaho creates the PDI transformations for you to report on the Mongo data and also generates an automatic multidimensional model in Mondrian:

mongo5

 

You can edit and modify that model and create any visualization in the Analyzer tool that you like. It doesn’t matter that the data was sourced from MongoDB. This enables me to provide BI and analytics on MongoDB or other NoSQL sources, without needing to relearn how to engineer a BI solution. I’m just using my business analytics tools, regardless of the backend system types.

One more thing, when using Instaview to analyze data from MongoDB: make sure you’ve selected the MongoDB source and then point to your collection. Instaview generates the PDI transformation for you and use the power of the platform, which includes the ability of the Pentaho interfaces into Mongo to discover the schema metadata. Notice in this screen in Instaview, we see a Kettle transformation that includes a button to the get the fields from the MongoDB collection:

mongo6

 

You should be able to use this in your environment as a way to built data integration and reporting with MongoDB as a source, very similar to how you do things today with a traditional RDBMS backend, enabling NoSQL as a part of our BI solution.

 

About these ads

3 Responses to “How I Use Pentaho to Easily Move Data Through MongoDB”

Trackbacks/Pingbacks

  1. How I Use Pentaho to Easily Move Data Through M... - September 20, 2013

    […] There are obvious use cases for NoSQL data stores such as caching, storing unstructured data, log events, etc. And so it makes sense to move or create apps that may have been using MySQL or any other lock-based serializable ACID RDBMS.But you also limit yourself to the tools available for input, update and retrieval. And since I’m not here to write Java or Javascript to get or put data into a data store, I greatly appreciate tools that make interacting with a NoSQL data store like MongoDB easy and familiar to my SQL-ized brain. Many BI, OLAP and ETL tools just don’t speak to NoSQL data stores. That’s all that I had to do to set up my data store in Mongo. The rest was done in Pentaho. But for now, it’s time to get a spreadsheet of data from my sales.csv into Mongo using the PDI tool from Pentaho. I will first create a transformation (the T in ETL):  […]

  2. How I Use Pentaho to Easily Move Data Through M... - September 24, 2013

    […] I can still clearly remember when object-oriented programming started to become the standard model, moving away from structural procedural code. I used to think to myself that new programmers, just…  […]

  3. MDX on MongoDB | Big Data Analytics - September 30, 2013

    […] What I really like about the way that Pentaho’s tools work with MongoDB is the way that it queries the data store’s metadata to fill in the database and collection information from the source.  I wrote up a bit more about it here at this posting on KromerBigData about MongoDB with Pentaho tool…. […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Pentaho Business Analytics Blog

This blog provides views on the value of business analytics, and what companies are gaining with reporting, analysis, data integration, dashboards, and data mining.

Journey to SQL Authority with Pinal Dave

SQL, SQL Server, MySQL, Big Data and NoSQL

Paras Doshi - Blog

Notes of a Business Analytics Consultant

The SQL Herald

Databases et al...

Chris Webb's BI Blog

Analysis Services, MDX, PowerPivot, DAX and anything BI-related

Bill on BI

Info about Business Analytics and Pentaho

Big Data Analytics

Regular musings from a BI professional in the world of Big Data

Blog Home for MSSQLDUDE

The life of a data geek

Follow

Get every new post delivered to your Inbox.

Join 975 other followers

%d bloggers like this: