Archive | Visualization RSS feed for this section

Big Data Visualizations: A Good Use of the Chord

25 Nov

There are a number of visualizations that I’ve included in my BI solutions over the years that were typically reserved for data mining applications when visualizing data clustering such as this “blog galaxy” example from datamining.typepad.com:

Image

Data mining is a practice that has been around for decades and has a goal that is very similar to what we attempt to achieve with business intelligence and Big Data Analytics: uncover meaning, knowledge and value from data. The biggest differentiator, in my mind, that adds value from applying data mining algorithms to visualizing and demonstrating business value, which can be used to make better business decisions, is the intersection of data mining / analytics and business intelligence.

In today’s world, algorithms used for mining and analytics are being applied to Big Data sets, which implies a different approach to data management and processing. But it also means that ideas such as data exploration & data discovery are beginning to permeate modern every-day BI solutions.

I just want to focus on one common visualization that you will see a lot in Big Data Analytics: Chord graphs. Below is an example from Pentaho where you can see that a chord does a good job of demonstrating connections, paths, and relationships between attributes and dimensions.

Image

That comes from bigdatagov.org. We also use Chords often for our “data scientists” in Web analytics who are looking for paths to maximize conversions.

Taking the chord idea to the next extreme comes from a project by Colin Owens at http://www.owensdesign.co.uk/hitch.html where he is exploring different pros & cons of visualizations that demonstrate relationships. Here you can see some of the chord’s shortcomings in terms of showing influencers, a key aspect to marketing analytics:

Image

But here is a great example of where the chord shines by using a data set that makes sense to most of us, not just statisticians:

Image

Click here to interact with that chord graph. This should give you a good idea of the utility of a chord graph. In this case, Chris Walker used 2012 U.S. census data to show Americans moving between states in the U.S. When you hover and select areas of the radial chord, you can easily see paths (very important in Web analytics and marketing) with size of links related to # of migrations.

Advertisements

Consider Tag Clouds for BI Analysis Solutions

19 Sep

As a business intelligence practitioner or developer, it is always a critical success factor in your solutions to make sure that you are presenting the complex world of data engineering to your end users in a way that enables quick, easy and accurate analysis. In other words, compelling data visualizations can improve business decisions by making decision-makers more involved in data-driven decision processes.

An emerging data visualization in the BI world that is gaining traction is the Tag Cloud. Here is an example of a Tag Cloud that I created in just a few minutes using the Pentaho 4.8.2 Analyzer tool … Notice that I included a text dimension attribute (Model Name from Adventure Works DW) that is appropriate for a Tag Cloud visualization and I used dimensions for sales amount and unit quantity to affect the size and color aspects of the visualization. To me, I found this to be a very compelling and easy-to-understand method to quickly show the model names that had the biggest impact on Adventure Works sales revenue:

ct1

The data set that I used is the infamous Microsoft SQL Server Adventure Works Data Warehouse that I downloaded for free from Microsoft’s sponsored Codeplex community site here. To build a Tag Cloud for your dashboard or BI solution in Pentaho, make sure that you download the SQL Server JDBC driver if you are going to use SQL Server as a data source: http://msdn.microsoft.com/en-us/sqlserver/aa937724.aspx. In this demo, I used SQL Server 2012 Developer Edition and Adventure Works DW 2012 sample data warehouse data set.

You can get an evaluation copy of the Pentaho Business Analytics Suite from Pentaho.com and you can run this all from your laptop. The visualization tool that I used here runs all in your browser and will automatically generate a multiple-dimensional model from your database sources so that you can slice, dice and analyze your data without needing to get down into the cube development process.

When you get started in Pentaho, to reproduce the Tag Cloud, you will create a “New Analysis” report (uses the Pentaho Analyzer pivot tool) which I’ve demonstrated below with the follow-up dialog which is where you will point to the SQL Server database tables for Adventure Works:

ct2

 

 

 

ct3

 

After this point, Pentaho will generate a base model for MDX queries from Analyzer, so you can start building the report right away. Or, if you are like me, you will want to go into the model and modify the measures, dimensions, properties, etc. to customize the model for data types, formatting, etc.:

 

 

ct4

Before you can choose the Tag Cloud report design, you will need to download the plug-in and add it to your installation. This is all explained for you here. That is very simple and only takes a few minutes. After that is done, you will see the Tag Cloud appear in your report chart types list. Now you can drag the measures & dimensions that you need to color and size the values in your cloud. This is very good for string attributes in your dimensions and will display in the browser, so it is easy to include this report in a Pentaho or other dashboards.

ct5

 

 

Big Data Visualizations

16 Dec

Big Data Visualization caries with it different requirements than the similar business data visualization requirements that you may find in traditional business intelligence solutions.

With Big Data Analytics, you are likely going to need to provide visualization capabilities to more than the general knowledge worker community that would typically have requirements for no further detailed data than the aggregated business-level view. In order to support ad-hoc data discovery and for functionality needed by your data scientist community, you will need to provide data visualizations that can help to provide context and meaning behind very large data sets with possibly millions of individual data points.

When I am analyzing large clicksteam or Web Analytics data sets, I like to present the data in a diagram like a SanKey (I got this from OUseful.info):

Image

This is a very helpful way to demonstrate data relationships, paths, flows and which path or input has the most impact on the output.

Common tools like SSAS in SQL Server (the diagram below is from a bidn.com tutorial) can show diagrams in the data mining tools from Visual Studio (or Excel, for smaller data sets) that demonstrate classification, relationships, paths, etc. to the analyst.

Image

That is a Microsoft tool only available in Visual Studio or Excel and very useful. But will not always scale to the Big Data requirements that your larger projects may have that include sensor data, clickstream, etc.

But in traditional BI tools, there are a variety of visualizations that work well for both dashboards as well as ad-hoc data discovery analysis, which is aligned with the Big Data / Data Scientist audience. If your data scientists are going to leverage Big Data tools to access deep granular data in Hive / Hadoop, then the number of data points that you’ll have to graph will not be possible as a traditional time series or X-Y graph.

Tableau 8, for example, now includes heat maps, which is one my personal favorite tools to take big volume data and aggregate those into an easy-to-read format in a chart:

Image

We’re all unhappy about Microsoft’s removal of heat maps from the Proclarity tool set and not surfacing it in Power View. However, my former Microsoft colleague Jen Underwood, has a post on her blog here demonstrating the use of JavaScript in an Excel Office App to emulate that same TreeMap or HeatMap functionality.

cbailiss

Microsoft SQL/BI and other bits and pieces

TIME

Current & Breaking News | National & World Updates

Tech Ramblings

My Thoughts on Software

SQL Authority with Pinal Dave

SQL Server Performance Tuning Expert

Insight Extractor - Blog

Paras Doshi's Blog on Analytics, Data Science & Business Intelligence.

The SQL Herald

Databases et al...

Chris Webb's BI Blog

Microsoft Analysis Services, MDX, DAX, Power Pivot, Power Query and Power BI

Bill on BI

Info about Business Analytics and Pentaho

Big Data Analytics

Occasional observations from a vet of many database, Big Data and BI battles

Blog Home for MSSQLDUDE

The life of a data geek