Predixion PMML Connexion(TM) for R

Predixion Software announced today an extended PMML (prediction model markup language) interface for R, a popular open-source software framework.    The technology is an Excel add-in to the core Predixion Insight product, a cloud-based data mining service based on SQL Server Analysis Services.

http://www.businesswire.com/news/home/20101109005633/en/Predixion-Software-Introduces-Interface-Access-Run-Models

The ability extends previously announced PMML support for SAS and SPSS.  Supported by key data mining vendors, PMML was an intentionally developed XML-based markup language for standardizing data mining models.  Many vendors support this format, and going forward, many organizations will increase their use of PMML when they choose to share data mining models across software solutions.

Data Mining for Business Intelligence Book Review

Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel® with XLMiner® Book Review

I recently met Dr. Galit Shmueli online, and she informed me about this new book (now second edition).  The book is priced and aimed as a graduate-level textbook for business majors.  Wiley was kind to send me a desk copy since I teach with the University of Phoenix.  Having read this book, I believe the explanations and content provide a good foundation in data mining, and though the chapters are organized like many such books, the approach is what I would expect for business intelligence applications.

I believe this book is outstanding, well-written and full of intuitive advice based on strong mathematics.  I like many of the one-liners I highlighted throughout the book on how to think about data mining.

Continue reading “Data Mining for Business Intelligence Book Review” »

Data Mining Seminar Passed Over for SQL Rally 2011

SQL Rally is a new conference scheduled for May 11-13, 2011 at the Marriott World Center in Orlando, FL. The conference will have pre-conference seminars intended to appeal to a broad range of attendees and also perceived to be worthy of additional expense (meaning above the conference fees, see http://www.sqlandy.com/wp-content/uploads/2010/09/SQLRally-PreCon-Application-Final.pdf). In brief, the pre-conference seminar topics should relate to the median core attendee who wants to spend even more time and money on additional learning.

I decided to submit a data mining seminar idea for this conference. The conference team graciously passed over my seminar idea, with good reasons. I have decided to blog about this topic because I believe I can provide insight into the Microsoft Data Mining community, and talk about where this technology is perceived to fit in the Microsoft world. I believe my seminar has a conference home, and now that I have this written outline I can seek more feedback.

My blog post outline:

  • My seminar proposal
  • Gracious Response from SQL Rally
  • My response to SQL Rally, and commentary

Continue reading “Data Mining Seminar Passed Over for SQL Rally 2011” »

Data Mining Separates News from Noise

I vocalized this title recently when explaining what data mining does.

The challenge which many people face is information overload.  Too much information.  Too much data.  Some separate what data and what information means.  The challenge is the same:  too much, and too much of too much.

I was talking with a new friend about the weather, not the actual weather, but weather predictions. I was informing him that in many weather monitoring applications, data are discarded, and only a fraction of what is determined “meaningful” is retained for predictive application.  Weather people give away data for the same reason non-profit and for-profit organizations throw away or shred files:   people only keep what they believe will be value to them in the future.

Data or information (make your choice) is expensive to keep and archive.  Perhaps the per byte storage cost is approaching zero, but what does not approach zero is the ongoing cost of keeping data organized (including the metadata).  Someone still has to make that information available.  Search engines can do a lot, but search engines do not categorize, people do.  Computer algorithms can help automate some classification, but values start and end with people.

People determine what is news, and that decision starts the scientific process for determining patterns in data.  I do not believe science creates itself or that science emerges on its own (as if Science were a personality with a decisive will).  The anthropomorphized Science might be entertaining for science fiction, on perhaps another Star Trek adventure.  Pragmatically, Science does not determine results, people and people groups and communities do.

In the weather, people have different goals, even with the same data.  In weather prediction, some people hope it will rain, and others hope it will not.  The value systems are different, and the same data mining models can help both groups answer the same question.  Data mining will not adjudicate among people groups, but simply provides insight into the vast amount of data and helps human interpreters focus on certain points of information.

I am among those who believe that humans typically process information through patterns.  We might deride discrimination as being inherently wrong, but what most people are actually against is discrmination closed to new information.  I believe science is an centrally important portal for new information, but not the only one.  Some ideas are beyond science, including logic and self-knowledge.  Data mining can be a powerful tool to surface new patterns from empirically-based investigation.  I promote data mining to be used within the scientific method, and logic helps apply values to the results from science.

Data mining separates news from noise.

Predixion Software Beta

Earlier this year I became aware of Predixion Software, and the excellent plans being made by their team, headed by CEO Simon Arkell and CTO Jamie MacLennan.  I plan to be talking more about this company in the near future, but today decided to put in a post about activity this company has made public today (I have more to say that fits in a tweet).

First:  The video posted to http://www.dailymotion.com/predixion
I like the video production quality, not just the lighting and the shot angles, but also that they created a storyline.  CTO Jamie MacLennan does most of the talking, but you also see messages from CEO Simon Arkell about the company in general.  The video includes some screenshots and information about Predixion Insight.  Some of the other shots show peole now working at Predixion Software. One image which sticks in my mind is Jamie playing guitar in a group.

Second: Bodgan Crivat’s blog post today: http://www.bogdancrivat.net/dm/archives/62
In a coordinated effort, Bogdan also posted today about Predixion Software.  He tells the story about how he had worked on the cloud possibilities while he was at Microsoft.  He also shares that Predixion Insight unlocks the potential for the cloud.

Third:  Jamie MacLennan’s post today:  http://jamiemaclennan.blogspot.com/2010/08/predixion-on-brink.html
Jamie talks about starting with Predixion Software this past January, and along with a small development team, aims to produce a disruptive predictive analytics strategy.  I agree with him that the concept is exciting since Predixion Insight sounds like it will be based both on the familiar SQL Server Data Mining technology, and yet offer analysts and organizations more functionality.  He also mentions a VIP beta (who is so lucky to be involved in that?), and a formal beta starting on August 16 (first come, first served).

For Bogdan and Jamie to blog on the same day after only periodic posts in the past few months means two things:   1) they are intensely engaged on something interesting, and 2) they have something important to say now.

I want to emphasize in this post that SQL Server Data Mining has always been a development platform from the outset.  Predixion Software’s product Predxion Insight aims to build on this platform and deliver a competitive offering for anyone doing predictive analytics.   The August 16 beta participants have the opportunity to see what their product is all about, and I repeat again what they are looking for:

  • SQL Server Data Mining Add-In Users
  • PowerPivot Users
  • Excel Users

Predixion Software Website:   http://www.predixionsoftware.com
Predxion Software on Twitter:  http://twitter.com/predixionsw

Visual Analytics and SQL Server Data Mining

The Association for Computing Machinery produces a regular journal called SIGKDD Explorations, where SIGKDD is an acronym for Special Interest Group on Knowledge Discovery and Data Mining. I would classify the journal as academic, even though private-sector consultants or companies may be coauthoring articles.

In a recent issue, there is an article titled “Visual Analytics: How much visualization and how much analytics?”. The article makes the following claims:

  • Visual Analytics is the science of analytical reasoning supported by interactive visual interfaces.” (page 5)
  • “The term Visual Analytics has been around for about five years now.” (page 5)
  • “The core of our view on Visual Analytics is the new enabling and accessible analytic reasoning interactions supported by the combination of automated and visual analytics.”(page 5)

Altogether, these statements mean that Visual Analytics is a relatively new academic buzzword to define a specific field of research, namely the combination of automated analysis and visual representation. Someone might ask, how much does that description look like what people do with Excel? I would at first pass answer that Excel 2010 has exceptional graphic and visualization capabilities, but it does not inherently provide automated data analysis. However, SQL Server Data Mining adds the automated portion of this equation.

Continue reading “Visual Analytics and SQL Server Data Mining” »

Writing Data into Analysis Services

Microsoft SQL Server 2008 Analysis Services Unleashed Book Review Chapter 16

This chapter discusses writing back data into Analysis Services. The chapter talks about a detailed and technically challenging sequence of actions which have to happen to:

  1. allow temporary writebacks from any session
  2. allow writeback to be turned on per SSAS partition, and therefore accept changes into a writeback partition (separate from the original data)
  3. sequence a writeback through the current session and other session temporary writeback partitions

What’s additionally exciting on this topic is that writeback is now available from Excel 2010, and I will provide screenshots which were not available when this book was written. Visually, this new feature seems to unlock the final goal for writeback, namely providing a desktop user the ability to change specific values. However, the hard lifting happened by the time this book was written, and represents the core commitment technology required to synchronize not just one user’s writebacks with the cube, but all possible users altogether.

Continue reading “Writing Data into Analysis Services” »