The Association for Computing Machinery produces a regular journal called SIGKDD Explorations, where SIGKDD is an acronym for Special Interest Group on Knowledge Discovery and Data Mining. I would classify the journal as academic, even though private-sector consultants or companies may be coauthoring articles.
In a recent issue, there is an article titled “Visual Analytics: How much visualization and how much analytics?”. The article makes the following claims:
- “Visual Analytics is the science of analytical reasoning supported by interactive visual interfaces.” (page 5)
- “The term Visual Analytics has been around for about five years now.” (page 5)
- “The core of our view on Visual Analytics is the new enabling and accessible analytic reasoning interactions supported by the combination of automated and visual analytics.”(page 5)
Altogether, these statements mean that Visual Analytics is a relatively new academic buzzword to define a specific field of research, namely the combination of automated analysis and visual representation. Someone might ask, how much does that description look like what people do with Excel? I would at first pass answer that Excel 2010 has exceptional graphic and visualization capabilities, but it does not inherently provide automated data analysis. However, SQL Server Data Mining adds the automated portion of this equation.
I have presented on the topic of using SQL Server Data Mining with the Excel add-in, and have produced several videos. You can peruse http://marktab.net to see some of the videos and presentations I have made on this topic. I am making this post to this blog so that in the search engines, Visual Analytics should be correctly connected with what people can do with this combination of technologies.
Let’s return to a structural and philosophical truth about the SQL Server Data Mining technology. As implemented through BIDS (Business Intelligence Development Studio), the encouragement is to create multiple data mining structures from which people can develop multiple data mining models. BIDS can be the interface for that development, though Excel can also be a way to access that technology. As is, the tools and viewers are there for anyone who wants to develop a .NET web application, and I am believing that many will be leveraging SharePoint in the coming years to bring customized data mining solutions to the web.
I will outline some steps people can use with SQL Server Data Mining and the Excel Add-In:
- Create data mining models using data either from Excel or other sources (PowerPivot for Excel 2010 can draw from a variety of sources)
- Interactively create temporary mining models, products with intentional interactive creation
- Share permanent mining models, products intended for group modeling or application (even enterprise-level production)
- Use the automated visualization tools, some of which describe the mining models, others (like Highlight Exceptions) visually change the source table data based on analyst choices
You can see examples on http://marktab.net/Resources/MicrosoftPresentations.aspx if you download my Excel and PowerPivot demo spreadsheets from May 2010.
The point of any of these interactive technologies (BIDS, Excel, SharePoint) is that people are an important part of the equation. I have stated before that data mining and statistical inference techniques do not inherently make value judgments (of what is right or wrong, better or worse, superior or inferior) but people – and more likely groups — must collectively decide what value labels to attach to statistical results. The SQL Server Data Mining technology allows multiple algorithms to compete (using, for example, lift charts) and therefore allow each algorithm to perform exploratory data analysis against different attributes (statisticians prefer variables). One mining structure can house multiple data algorithms, thus hopefully sidestepping the issue of achieving only local minima or maxima (which the authors mention as a general problem with especially single-model automated analysis, page 6).
The authors claim:
Visual Analytics combines the strengths of both worlds [of automatic analysis and visualization]: On the one hand they take advantage of intelligent algorithms and vast computational power of modern computers and on the other hand they integrate human background knowledge and intuition to find a good solution. (page 6)
I am a believer in learning organizations (Peter Senge), organizational feedback loops (the esteemed W. Edwards Deming) and moving toward better solutions by leveraging computational power for human decision making. The CRISP-DM model (in its first version) shows that people are an important part of the process. That systems understanding, in my view, underlies all the better management theories, and provides a context within which data mining should happen.
Modeling is not just about cleaning the data (much as we might clean our fresh vegetables to prepare for a meal), combining a good combination of ingredients and cooking up a single solution. Small segments resemble this process, but having a visual monitoring of the entire process helps people continue to manage the process (I like manage better than balance, since management is the name of an actual University degree).
As the authors conclude:
In this paper we defined Visual Analytics as new enabling and accessible analytic reasoning interactions supported by the combination of automated and visual analysis… we foresee its applicability to everyday processes due to both the efficiency and effectiveness of Visual Analytics applications. (page 7)
Sometimes when new buzzwords take hold, some might assume that they are not doing Visual Analytics unless they purchase a product with those words in the software title, or in the name of the company. Keywords are important in academic settings to help steer research into new combinations of research inquiry. We do not just want to have more research on automatic analysis (machine learning) or visualization (which might be a form of applied psychology), but at this nexus of these two areas lies what I believe should include applied data mining.
Keim, D. A., Mansmann, F., & Thomas, J. (2009). Visual Analytics: How much visualization and how much analytics? SIGKDD Explorations, 11(2), 5-8.
If you like this post, you may also like:
- Using SQL Server Data Mining [Translate] Data Mining with Microsoft SQL Server 2008 Review Chapter 4 This chapter covers a complete look at how to develop...
- Introduction to Data Mining in SQL Server 2008 [Translate] Data Mining with Microsoft SQL Server 2008 Book Review Chapter 1 This book is an essential reference for anyone wanting to...
- Applied Data Mining using Microsoft Excel 2007 [Translate] Data Mining with Microsoft SQL Server 2008 Book Review Chapter 2 This chapter starts with a recommendation to http://trymicrosoftoffice.com which gives...
- Data Mining Concepts and DMX [Translate] Data Mining with Microsoft SQL Server 2008 Book Review Chapter 3 DMX stands for Data Mining Extensions, though originally was called OLE DB...
- Writing Data into Analysis Services [Translate] Microsoft SQL Server 2008 Analysis Services Unleashed Book Review Chapter 16 This chapter discusses writing back data into Analysis Services. The...
