SQL Saturday Nashville August 21

I will be presenting on Data Mining at an 11 AM Session in Nashville on August 21. This link will take you to the event website. A number of the presenters are experienced Microsoft experts, and I believe you will benefit from the range and depth of topics at this event. Like this blog, this event is FREE (which is my recommended price for events which build community).

Here are some details (quoted from the event website):

Are you ready for some FREE SQL Server training?

SQLSaturday #51 in Nashville, Tennessee, is the training event for SQL Server professionals and those wanting to learn about SQL Server. Admittance to this event is free! That’s right; free! All costs are covered by donations and sponsorships.

Please register soon since seating is limited, and let your friends and colleages know about the event.

When and Where?

SQLSaturday #51 will be held on August 21st, 2010, at Nashville State Community College (120 White Bridge Road, Nashville, Tennessee, 37209). Event check-in will be at 7:30 with the sessions beginning at 8:00. Sessions will wrap up between 5:00 and 5:30.

Coffee and doughnuts for breakfast will be provided. There’s an optional catered lunch available, too. Plus there will be numerous opportunities to win swag such as shirts, software, posters, and the like!

Schedule

We have four tracks – Database Administration, Database Development, Business Intelligence, and Professional Development. Visit the Schedule page.

Who is presenting?

Many of the local, regional, and even national SQL Server experts (including Microsoft MVP’s) will be here sharing their knowledge and experiences.

Will there be an after party?

Follow Twitter #sqlsat51 for updates on after actvities. People are already talking about a specific restaurant, and after that, #sqlkaraoke. MarkTab is a known #sqlkaraoke participant.

SolidQ Journal

Solid Quality Journal launched on July 19, 2010, with a cover story by Brian Moran (an interview with Mark Souza). I am an author in this first 56-page issue (with an article titled “Why Use Data Mining?”), and was privileged to be invited to the core journal development team last year (named to the Advisory Board inside the front cover). In the past I have written some academic articles, and contributed to a number of peer-reviewed publications. I have also been a reviewer for the esteemed Academy of Management Journal. You can check my author page, and download my article.

Managing Editor Kathy Bloomstrom heads the team that directed the first-rate production quality (outstanding layout). Kathy has been amassing ideas among the mentors at Solid Quality Mentors, and plans suprises in future issues. Our intention is to keep this journal free, even if we have required registration for some future content or features. This first issue does NOT require registration, and here are some of the topics from the July 2010 issue:

  • Eating, Drinking, Sleeping SQL Server — Brian Moran interviews Mark Souza, the Partner Director for the Microsoft Business Platform Division and Customer Experience Team. This long article provides qualitative insight into SQL Server
  • The Blueprint for Proper SSAS Dimensions — Craig Utley describes best practices for dimension creation
  • Advertising for Consulting — I don’t know who the model is but page 29 has an ad — contact Kathy Bloomstrom if you are wanting to advertise in future issues
  • Requirements are Evil — Stephen Cohen, Senior Enterprise Architect with Microsoft’s Enterprise Services, concedes to the fact that we (“software developers”) are all evil and provides practical advice on how to succed in software development
  • Should SQL Server Automatically Index Foreign Key Constraints? — Greg Low discusses this technical question
  • Why Use Data Mining? — My article provides the case for data mining, especially for enterprise systems using PowerPivot and SharePoint
  • N-Tier: No Separation Anxiety Here — Ken Spencer shows how to tame the development process for n-tier applications
  • Hammers, Nails and PowerShell — Herbert Albert and Gianluca Hotz discuss PowerShell for managment tasks
  • What 3 Events Brought Me Here — Andrew Kelly shares his technical background
  • The back cover is an advertisement for our friends at SQLPASS.ORG — some of the mentors at Solid Quality Mentors provide leadership for PASS and are often speakers at PASS events (which now includes SQL Saturday — I will be speaking at #SQLSAT51 Nashville on August 21).

Predixion Software Beta

Earlier this year I became aware of Predixion Software, and the excellent plans being made by their team, headed by CEO Simon Arkell and CTO Jamie MacLennan.  I plan to be talking more about this company in the near future, but today decided to put in a post about activity this company has made public today (I have more to say that fits in a tweet).

First:  The video posted to http://www.dailymotion.com/predixion
I like the video production quality, not just the lighting and the shot angles, but also that they created a storyline.  CTO Jamie MacLennan does most of the talking, but you also see messages from CEO Simon Arkell about the company in general.  The video includes some screenshots and information about Predixion Insight.  Some of the other shots show peole now working at Predixion Software. One image which sticks in my mind is Jamie playing guitar in a group.

Second: Bodgan Crivat’s blog post today: http://www.bogdancrivat.net/dm/archives/62
In a coordinated effort, Bogdan also posted today about Predixion Software.  He tells the story about how he had worked on the cloud possibilities while he was at Microsoft.  He also shares that Predixion Insight unlocks the potential for the cloud.

Third:  Jamie MacLennan’s post today:  http://jamiemaclennan.blogspot.com/2010/08/predixion-on-brink.html
Jamie talks about starting with Predixion Software this past January, and along with a small development team, aims to produce a disruptive predictive analytics strategy.  I agree with him that the concept is exciting since Predixion Insight sounds like it will be based both on the familiar SQL Server Data Mining technology, and yet offer analysts and organizations more functionality.  He also mentions a VIP beta (who is so lucky to be involved in that?), and a formal beta starting on August 16 (first come, first served).

For Bogdan and Jamie to blog on the same day after only periodic posts in the past few months means two things:   1) they are intensely engaged on something interesting, and 2) they have something important to say now.

I want to emphasize in this post that SQL Server Data Mining has always been a development platform from the outset.  Predixion Software’s product Predxion Insight aims to build on this platform and deliver a competitive offering for anyone doing predictive analytics.   The August 16 beta participants have the opportunity to see what their product is all about, and I repeat again what they are looking for:

  • SQL Server Data Mining Add-In Users
  • PowerPivot Users
  • Excel Users

Predixion Software Website:   http://www.predixionsoftware.com
Predxion Software on Twitter:  http://twitter.com/predixionsw

SQL Server Data Mining Capacities 2008 R2

I was wondering what the maximum capacities were for data mining, and could not find the answer in SQL Server Books Online. So, I asked the Microsoft Analysis Services product team for the answer. If you read this blog, sometimes you get insider information.

They provided me with the data mining capacities in the following table (the first four rows). This information is NOT yet in SQL Server Books Online, but Microsoft promised that it will be. I want to stress that these capacities are theoretical limits, and practically there are other limitations (such as human management skill or the NTFS file system) which prevent people from achieving these theoretical limits. Continue reading “SQL Server Data Mining Capacities 2008 R2” »

Visual Analytics and SQL Server Data Mining

The Association for Computing Machinery produces a regular journal called SIGKDD Explorations, where SIGKDD is an acronym for Special Interest Group on Knowledge Discovery and Data Mining. I would classify the journal as academic, even though private-sector consultants or companies may be coauthoring articles.

In a recent issue, there is an article titled “Visual Analytics: How much visualization and how much analytics?”. The article makes the following claims:

  • Visual Analytics is the science of analytical reasoning supported by interactive visual interfaces.” (page 5)
  • “The term Visual Analytics has been around for about five years now.” (page 5)
  • “The core of our view on Visual Analytics is the new enabling and accessible analytic reasoning interactions supported by the combination of automated and visual analytics.”(page 5)

Altogether, these statements mean that Visual Analytics is a relatively new academic buzzword to define a specific field of research, namely the combination of automated analysis and visual representation. Someone might ask, how much does that description look like what people do with Excel? I would at first pass answer that Excel 2010 has exceptional graphic and visualization capabilities, but it does not inherently provide automated data analysis. However, SQL Server Data Mining adds the automated portion of this equation.

Continue reading “Visual Analytics and SQL Server Data Mining” »

Windows Azure Marketplace DataMarket

People are using the phrase semantic web to be either part of Web 3.0 or the substantial component of it. Perhaps database vendors like Microsoft (with SQL Server) or other vendors (IBM’s DB2, Oracle, Teradata) might believe that because they can transmit data and allow management over TCP/IP, they already have a strong foray into Web 3.0. I believe the debate on terminology will continue as both academics and commercial vendors attempt to define these phrases.

On this blog, I have been presenting SQL Server Data Mining technology, which contrasts from competing technologies because it is a service and not an application. The service is strongly knitted to production-savvy Analysis Services and SQL Server database engine, and therefore can provide high-volume analysis. In this post, I will talk about Microsoft’s cloud service to provide data. I did not spend any money on data for the single subscription I established, but the portal is intended for both free (some flavor of public domain or open source) or paid commercial datasets. A cloud is a logical way to serve large amounts of data since servers can be physically colocated around the world and provide essentially the same face despite geographic disparity.

The main website is at http://datamarket.azure.com:

I was intrigued by their promotion of the Excel add-in client so I installed it into Excel 2010. This client is still tagged as beta, reflecting the relative newness of this service.

Microsoft preceded this latest Excel client with the SQL Server Data Mining client and the PowerPivot for Excel client. Many people work in Excel, and the rationale for continuing to use this face for the semantic web is compelling.

The second graphic in the Silverlight presentation box promotes data from data.gov, one of many resources I have listed on my list of data providers. As we proceed, you will see what my screen looks as I subscribe to a dataset from data.gov.

After choosing the dataset, the screen produces a “receipt”. What I had to do was:

  • create a Windows Live ID account (I used one already established)
  • choose the dataset of interest
  • agree to the legal terms and conditions
  • pay if necessary (this dataset is free, and to my brief survey, all of the datasets right now are either free or “coming soon”, which means to me that the payment system is not ready yet)
  • see the final receipt

The subscribed data is attached to my Windows Live ID, and I can type a unique account key into the Excel add-in to access the data there. The account key is unique to my ID, and in the future we could expect using this type of key to access subscriptions through other interfaces like SharePoint.

The My Data interface shows my current subscriptions. In this case, the subscription implies that they are preparing to license some data. This specific data on crime statistics is already available at data.gov, though is provided through Windows Azure as a convenience. The type field makes me believe that they could also handle outright payments, and the status field allows for subscriptions to be inactive (needing renewal).

From Excel, you can see the DataMarket icon which appears in the Data tab. Also showing on this screenshot is my Excel Data Mining add-in menu selection. I will next click the “Import Data…” link.

Every time I see the phrase “Query Builder” my mind thinks about Microsoft Access. As a test, I used the “Limit number of results” button to load the nominal 50 observations. Because this data comes over the web, it’s important to see whether and how quickly the data are being transmitted. The end product for me was an Excel table with 50 observations.

The team has provided videos on YouTube at http://www.youtube.com/user/azuredatamarket. Though they mention using PowerPivot from Excel, I did not see that icon on my installation (and in context, this client is tagged as beta). I would expect the production client to have connectivity with PowerPivot.

Philosophically, this type of data access could allow you to perform Bayesian analysis on any dataset. The DataMarket would provide the previous values (the prior distribution) from which you could analyze current data and determine a posterior distribution. The PowerPivot video has the suggestion of having data already, and looking through the DataMarket to see if there is similar historical data available.

For SQL Server Data Mining, you could train the model from something from the DataMarket, and calculate probabilities from these data mining models for new data. Alternatively, you might choose to append your own data to already existing data in the DataMarket and use that entire data for data mining model training. A third approach is to train simultaneous data mining models from DataMarket data and your own data, and compare the results. Any of these approaches could be altered by applying a filter (choosing a sample) from any dataset and training a data mining model.

This service will mature, and perhaps Microsoft may release variants of this type of service. We can all expect competition to be fierce as data providers jockey for market share.