Data Mining Answers for SQL Server Professionals

I received detailed feedback based on my December 6, 2010 presentation Data Mining for SQL Server Professionals. As I posted on this blog, the whole presentation has been archived for video. I intend to continue this type of presentation for 2011 for SQL Server Professional audiences (so feel free to send in what feedback you have).  

My goal in the presentation was providing an overview of what is available in this complex service technology (since SQL Server Data Mining is not an application but instead a service). Data mining should result in more questions than answers, but hopefully the questions people have after seeing a presentation would be of higher quality (as measured by what is currently available through past collective industry experience or academic research). 

In this blog post, I have taken the recently received feedback (from an experienced presenter and SQL Server leader) and turned the ideas into a Q&A (question and answer) for what a SQL Server professional might want to know or ask.  In total, I took the excellent feedback and constructed eight questions and answers, which also incorporate previous blog posts. In this post, the letter Q represents question, and the letter A represents answer. 

Q: What is the logical/physical structure behind SQL Server Data Mining (SSDM)? For logical I would want to see the logical relations, relationships, functions, stored procedures and how they related together to facilitate DM. For physical I would want to see the actual tables and columns, views and keywords that give access to the physical objects etc. 

A: I will provide some links for full documenation, and the scope of the entire architecture is beyond an hour-long presentation.  Some of that structure is not documented or even accessible (meaning readable).  Even SQL Server professionals can assume that Microsoft may change or improve internal mechanics among versions. In the past, I have had discussions with Microsoft about this topic, and they officially have two standards: what they might publish in Books Online (the official word) and what they leak through as undocumented features on the forums and perhaps in books (like the one authored by Jamie MacLennan and Bogdan Crivat for the 2008 version). Microsoft’s SSDM architecture supports functionality rather than provides comprehensive processing mechanics (which they may internally improve). Even at this level, I believe that Microsoft has been comparatively transparent relative to competing technologies, likely because the patents surrounding the algorithms provide a barrier, and because Microsoft wants developers to have good .NET access through ADOMD .NET and AMO. This issue is a good question, one which no one has asked at an event. 

Link:  Planning and Architecture (Analysis Services – Data Mining)  

Continue reading “Data Mining Answers for SQL Server Professionals” »

Tampa Bay SQL BI User Group December 2010 — Post-Event Wrapup

I was the speaker this week for the Tampa Bay SQL BI User Group (a PASS organization).  I presented virtually through LiveMeeting, sharing my desktop and with webcam too.  I created a new presentation based on my experience this year at SQL Saturday events, where I have been emphasizing the conceptual foundations of data mining using Excel and PowerPoint and BIDS as the interfaces.

This new presentation contrasts with past ones since I emphasized the physical architecture of SQL Server Data Mining.   I post the link to the slides and demo at the end of this blog post.

Data Mining for SQL Server Professionals

This presentation presents SQL Server Data Mining (SSDM) for SQL Server Professionals. Starting with SQL Server  Management Studio (SSMS), the presentation covers the interfaces important for professional development, including Business Intelligence Development Studio (BIDS), highlighting Integration Services (SSIS), and PowerShell.  The interactive demos are based on Microsoft’s Contoso Retail sample data.

Session Level: Intermediate

Continue reading “Tampa Bay SQL BI User Group December 2010 — Post-Event Wrapup” »

SQL Server Data Mining and Apollo Columnstore Indexes

Note: This post was revised November 12, 2010 to clarify the brand names Apollo and VertiPaq (thanks Denny Lee of SQLCAT) — and I extended comments on Amir Netz’s C++ versus C# analogy which I believe clarifies the discussion between what I have termed managed and unmanaged aggregations.  

This week’s PASS Summit conference included several demonstrations and announcements of the next version of SQL Server, version 11, codenamed Denali. In this blog post I have the following goals:

  • Outline Apollo columnstore indexes as a competitive Microsoft technology
  • Respond to Microsoft claims about the comparative performance advantages of columnstore indexes specifically for aggregations
  • Respond to Chris Webb’s multiple blog posts (posted from Seattle, WA) about the future of SQL Server Analysis Services

These topics seem like a lot to take on in one blog post, but in context, Microsoft found a way to introduce columnstore indexes in an 8 page whitepaper. As regular blog readers know, I put on my scientific hat first when trying to distinguish science from science fiction…

Continue reading “SQL Server Data Mining and Apollo Columnstore Indexes” »

PMML 2.1, XML Notepad 2007 and Contoso Retail 2.1

PMML (Predictive Model Markup Language) promises to provide a way to share data mining models in XML. The standard is published by the Data Mining Group, and currently the most recent PMML version is 4.0 released in June 2009. SQL Server Analysis Services (as of SQL Server 2008 R2) only supports through PMML version 2.1, which was released in March 2003. My opinion is that Microsoft needs to keep current with PMML to make this data mining technology a viable option. 

I decided that it was time to investigate this PMML topic, and this blog post shares my observations. As I stated, when the underlying technology is SQL Server 2008 R2 Analysis Services, even PMML 2.1 support is limited, and SQL Server Data Mining does not provide PMML model creation for most of its algorithms. The following table has clickable links to the MSDN Documentation. 

Continue reading “PMML 2.1, XML Notepad 2007 and Contoso Retail 2.1” »

Predixion PMML Connexion(TM) for R

Predixion Software announced today an extended PMML (prediction model markup language) interface for R, a popular open-source software framework.    The technology is an Excel add-in to the core Predixion Insight product, a cloud-based data mining service based on SQL Server Analysis Services.

http://www.businesswire.com/news/home/20101109005633/en/Predixion-Software-Introduces-Interface-Access-Run-Models

The ability extends previously announced PMML support for SAS and SPSS.  Supported by key data mining vendors, PMML was an intentionally developed XML-based markup language for standardizing data mining models.  Many vendors support this format, and going forward, many organizations will increase their use of PMML when they choose to share data mining models across software solutions.

Solid Quality Journal November 2010

Timed for the USA PASS Summit in Seattle, the November 2010 edition of Solid Quality Journal is now available. You can read it online at:

http://www.solidq.com/sqj/Pages/Home.aspx

I have an article this month titled Applying the Scientific Method to Unsupervised Data Mining Models. As I continue to talk in presentations and with clients about the scientific method, I am increasingly convinced that I need to continue writing on this topic:

  • What are the limitations of science?
  • Why does logic transcend science?
  • Why does science need science fiction?

People will still and always be important to decision-making processes. People and communities transmit values, and while computers can be programmed to reflect those values (even automated to make some decisions), the values discussion belongs to communities. I have a similar article coming in December, and a planned completion to this series in January.

Other articles appear in this issue, by many of the speakers at the PASS Summit:

  • Dejan Sarka — Database Development: 5 Words of Wisom
  • Davide Mauri — DLPC: Automating Dimension Loading
  • Eladio Rincon — Know where your Query Spends its Time
  • Herbert Albert & Gianluca Hotz — Restore with PowerShell Part 4
  • Gilberto Zampatti — SharePoint Installation Best Practices: Getting Started
  • Greg Low — Unique Constraints vs. Unique Indexes
  • Douglas McDowell — User Groups: The Core of the SQL Server Community

I’m glad Douglas has finally put an article into the Journal. I have been helping Javier Torrenteras for Business Intelligence submissions, and overall had been hoping Douglas would contribute material. Not all the Solid Quality Board has submitted material, but Mai Low is continuing to encourage…

Data Mining Seminar Passed Over for SQL Rally 2011

SQL Rally is a new conference scheduled for May 11-13, 2011 at the Marriott World Center in Orlando, FL. The conference will have pre-conference seminars intended to appeal to a broad range of attendees and also perceived to be worthy of additional expense (meaning above the conference fees, see http://www.sqlandy.com/wp-content/uploads/2010/09/SQLRally-PreCon-Application-Final.pdf). In brief, the pre-conference seminar topics should relate to the median core attendee who wants to spend even more time and money on additional learning.

I decided to submit a data mining seminar idea for this conference. The conference team graciously passed over my seminar idea, with good reasons. I have decided to blog about this topic because I believe I can provide insight into the Microsoft Data Mining community, and talk about where this technology is perceived to fit in the Microsoft world. I believe my seminar has a conference home, and now that I have this written outline I can seek more feedback.

My blog post outline:

  • My seminar proposal
  • Gracious Response from SQL Rally
  • My response to SQL Rally, and commentary

Continue reading “Data Mining Seminar Passed Over for SQL Rally 2011” »