People are using the phrase semantic web to be either part of Web 3.0 or the substantial component of it. Perhaps database vendors like Microsoft (with SQL Server) or other vendors (IBM’s DB2, Oracle, Teradata) might believe that because they can transmit data and allow management over TCP/IP, they already have a strong foray into Web 3.0. I believe the debate on terminology will continue as both academics and commercial vendors attempt to define these phrases.
On this blog, I have been presenting SQL Server Data Mining technology, which contrasts from competing technologies because it is a service and not an application. The service is strongly knitted to production-savvy Analysis Services and SQL Server database engine, and therefore can provide high-volume analysis. In this post, I will talk about Microsoft’s cloud service to provide data. I did not spend any money on data for the single subscription I established, but the portal is intended for both free (some flavor of public domain or open source) or paid commercial datasets. A cloud is a logical way to serve large amounts of data since servers can be physically colocated around the world and provide essentially the same face despite geographic disparity.
The main website is at http://datamarket.azure.com:

I was intrigued by their promotion of the Excel add-in client so I installed it into Excel 2010. This client is still tagged as beta, reflecting the relative newness of this service.

Microsoft preceded this latest Excel client with the SQL Server Data Mining client and the PowerPivot for Excel client. Many people work in Excel, and the rationale for continuing to use this face for the semantic web is compelling.

The second graphic in the Silverlight presentation box promotes data from data.gov, one of many resources I have listed on my list of data providers. As we proceed, you will see what my screen looks as I subscribe to a dataset from data.gov.

After choosing the dataset, the screen produces a “receipt”. What I had to do was:
- create a Windows Live ID account (I used one already established)
- choose the dataset of interest
- agree to the legal terms and conditions
- pay if necessary (this dataset is free, and to my brief survey, all of the datasets right now are either free or “coming soon”, which means to me that the payment system is not ready yet)
- see the final receipt
The subscribed data is attached to my Windows Live ID, and I can type a unique account key into the Excel add-in to access the data there. The account key is unique to my ID, and in the future we could expect using this type of key to access subscriptions through other interfaces like SharePoint.

The My Data interface shows my current subscriptions. In this case, the subscription implies that they are preparing to license some data. This specific data on crime statistics is already available at data.gov, though is provided through Windows Azure as a convenience. The type field makes me believe that they could also handle outright payments, and the status field allows for subscriptions to be inactive (needing renewal).

From Excel, you can see the DataMarket icon which appears in the Data tab. Also showing on this screenshot is my Excel Data Mining add-in menu selection. I will next click the “Import Data…” link.

Every time I see the phrase “Query Builder” my mind thinks about Microsoft Access. As a test, I used the “Limit number of results” button to load the nominal 50 observations. Because this data comes over the web, it’s important to see whether and how quickly the data are being transmitted. The end product for me was an Excel table with 50 observations.
The team has provided videos on YouTube at http://www.youtube.com/user/azuredatamarket. Though they mention using PowerPivot from Excel, I did not see that icon on my installation (and in context, this client is tagged as beta). I would expect the production client to have connectivity with PowerPivot.
Philosophically, this type of data access could allow you to perform Bayesian analysis on any dataset. The DataMarket would provide the previous values (the prior distribution) from which you could analyze current data and determine a posterior distribution. The PowerPivot video has the suggestion of having data already, and looking through the DataMarket to see if there is similar historical data available.
For SQL Server Data Mining, you could train the model from something from the DataMarket, and calculate probabilities from these data mining models for new data. Alternatively, you might choose to append your own data to already existing data in the DataMarket and use that entire data for data mining model training. A third approach is to train simultaneous data mining models from DataMarket data and your own data, and compare the results. Any of these approaches could be altered by applying a filter (choosing a sample) from any dataset and training a data mining model.
This service will mature, and perhaps Microsoft may release variants of this type of service. We can all expect competition to be fierce as data providers jockey for market share.