I’m happy to post this interview with Dr. Galit Shmueli Ph.D., lead author of Data Mining for Business Intelligence, which has just been released earlier this year (2010) in a new edition:
I previously reviewed this textbook, which you can read from this link. The price of the book is more than typical Microsoft technical books, but its cost is comparable to many data mining books. Relatively fewer number of professionals perform data mining, and I believe this book is worth its value at full retail cost.
Now, on to the interview.
Galit Shmueli is a Professor of Statistics at the Robert H Smith School of Business, University of Maryland, College Park MD. She is a researcher, author, teacher, advisor, and technology freak (among other things). Her current research focuses on statistical methods and data mining in information systems research (and particularly, eCommerce). Professor Shmueli has taught data analytics for over a decade to engineers and to business students and has won rewards for teaching excellence. She co-authors several books and has published her work in professional journals in statistics, marketing, information systems, and more. Professor Shmueli blogs about data analytics in business at blog.bzst.com. Check her website (galitshmueli.com) for papers, recorded talks, and more.
How and when did you start authoring books on data mining?
I started teaching a data mining course in our MBA program in 2004. I inherited a statistics-oriented course that used 3 different software packages. This clearly was not going to work from my point of view, and I went in search of a more holistic, data-mining oriented solution for software and a textbook. I couldn’t find any adequate textbook: they were all either too technical (aimed at computer science students) or too “fluffy” (meaning, all talk and very little hands on). I met Nitin Patel (MIT) and Peter Bruce (statistics.com) at the “Teaching Statistics in Business Schools” conference at Georgetown University. Nitin and Peter were promoting XLMiner, an Excel data mining add-on. I tried it out, and it was terrific. That solved my software problem, but not the textbook problem. When I complained to Nitin, he suggested that we should write a textbook. Very quickly Nitin, Peter and I teamed up and took on the job. Each of us brought a different point of view to the table, and we loved learning from each other. The result was a seamless, understandable textbook (at least that is the feedback that we received from readers). I will just note that at the time I was an Assistant Professor and was warned not to write a textbook, which would distract me from getting tenure. Needless to say, I ignored this advice and followed my heart.
What would you like people to know about the new edition of Data Mining for Business Intelligence?
This new edition is not “yet another edition”. It is truly new-and-improved. I have seen so many textbooks come out with who-knows-how-many-new-editions, just for the sake of “refreshing the market”. I am against that approach and the reason that we created the 2nd edition was to address a few issues, based on feedback from our users (instructors and students) and from our own use of the book in data mining courses:
- More on data visualization – the new dedicated chapter on data visualization discusses state-of-the-art interactive visualization beyond the typical statistics textbook. We also reached an agreement with TIBCO to provide a free 1-year license to adopting classes for their excellent industry-standard interactive visualization tool Spotfire Professional
- A new part (3 chapters) on forecasting time series — statistics textbooks usually focus on a method called ARIMA, which is quite complicated and rarely used in practice unless there is a statistician on board. Instead, we’ve focused on popular, data-driven forecasting methods such as exponential smoothing and regression-based forecasting. We discuss how to evaluate predictive accuracy, how to visualize data, and other related issues in the time series context
- Summaries at the beginning of each chapter — to help bring out the big picture and for high-level browsing.
- Design — we put special effort into the design of the new edition, to make it more appealing and user-friendly. Unfortunately, we could not convince the publisher to use color.
- Case and Exercises — A new case and chapter exercises for the new chapters.
What types of roles have you played in the academic world?
My longest role has been as a student. I still consider myself a student and try to learn more and more from all possible resources. I started instructing statistics when I was an undergraduate at the University of Haifa. It was quite scary back then, but that’s how I received a double dosage of many courses (once by taking them as a student, and another by teaching them as a Teaching Assistant). I continued teaching throughout my graduate studies at the Israel Institute of Technology (Technion), mostly to engineers, and towards the end of my PhD I became a full-fledged instructor. I remember being one of the first at our school to create course homepages (using raw HTML). After graduating from the Technion, I joined Carnegie Mellon University’s Statistics department as a Visiting Assistant Professor. That’s when I started advising grad students and becoming involved in cross-disciplinary and cross-institution collaborations. I also learned about grants and the world of funding. In 2002, I joined University of Maryland’s Smith School of Business as a tenure-track Assistant Professor. I had this crazy idea of conducting statistical research on online auctions (eBay was not so famous back then). Together with my colleague Wolfgang Jank, we took on this direction and spearheaded a statistically-oriented online auction research. Luck then stroked and I met Ravi Bapna, a completely out-of-the-box Information Systems researcher. The three of us organized the first “Statistical Challenges in Ecommerce Research” (SCECR) symposium (now an annual event – see statschallenges.com). Wolfgang and I edited a book on the topic, with articles by the top researchers. We also published a bunch of papers on various approaches that we developed, and finally we put together the book “Modeling Online Auctions” (modelingonlineauctions.com). During this entire period I advised several students, some of whom were sufficiently brave to dive into this new field. The last academic role that I’ll mention is presenter: I have been presenting my work at many conferences and seminars, to different audiences (statisticians and non-statisticians), around the world. This has always been a terrific way to receive feedback, initiate new collaborative work, and discover new ideas.
How do you believe your training and education helped you for the professional work you do today?
My undergraduate studies were in psychology and statistics. That’s when I got a glimpse of how the social sciences operate. My Masters and PhD were both in an engineering school, where I learned the engineering approach to the world and to research.
I was extremely lucky to have an advisor (Prof. Ayala Cohen) who supported my free spirit and gave me the freedom to explore and research an area that was not her main area of interest. This is when I learned to trust my self-learning capabilities and my writing skills. My advisor used to compliment my writing (and she was not very generous in complimenting the writing of other students!)
Although I grew up bilingual, my entire education was in Hebrew. Yet, my English knowledge gave me the advantage of being able to read lots of books in psychology and statistics at the university libraries. Most students shied away from such books, which made it much harder to expand one’s knowledge beyond the class materials. I wrote my Masters Thesis in Hebrew, but then switched to English for my PhD Thesis. Since then, nearly all of my writing is in English.
Finally, I’d like to mention again that a pillar of my education has been the access to good libraries. I literally spent hours reading books that I just picked off the shelves (I still have that bad habit). While today one surfs the Web for information, deep knowledge in a variety of areas really still resides in libraries.
What positive responses have you heard from people using the book Data Mining for Business Intelligence?
We try to keep in touch with our adopting instructors to get their feedback and to answer their questions. Instructors have complimented us on the readability of the book (many instructors are not data mining experts, yet were able to pick up sufficient knowledge from the book to teach a new course). Students and instructors also like the manageable size of the book. The hands-on nature of the book has also been complimented – you can literally take any of the datasets used in the book and try to replicate our analysis or just go off with your own.
When we asked instructors how they use the book, we discovered that there are two types of use: some teach using the method chapters, while others teach via the cases in the last chapter. This depends on the audience (e.g., technically-savvy MBAs vs. executive education), on the background and style of the instructor, and on the program requirements. Hence, our book appears to be sufficiently versatile in nature to allow for different teaching styles and for various audiences.
What do you enjoy about the author team you have been working with on Data Mining for Business Intelligence?
Authoring with Nitin and Peter has been a continuous pleasure. We all share the same work ethics and enthusiasm, while specializing in different aspects of data mining. We work fast, give a lot of feedback to each other, and delegate work well. Aside from the book, we have also created instructor materials such as slides, chapter solutions, case solutions, etc. These have required us to continuously collaborate, even when we are geographically apart.
What encouragement do you have for students considering studying data mining?
Being a geek, let me first say that data mining is a lot of fun! It is much less mysterious than classic statistics, and can be applied to almost any data-rich environment.
The potential of data mining to transform life is far from being exhausted: While there are many data mining success stories in the sense of material prosperity, the application of data mining for improving social, cultural, environmental, and other noble causes is only in its infancy. I encourage students to think out of the box and apply data mining to problems that have positive impact.
And of course, there is the standard practical answer: because it is hot! (which increases your chances of landing a job).
What encouragement do you have for young women in technology?
Women who have already crossed the (psychological?) barrier and hopped on the science and technology wagon already know the answer: this is where it is all happening these days! Exciting, fast, and with broad impact on all aspects of life. Understanding technology and not being afraid to try out new technologies is the secret to enjoying the benefits and avoiding the evils that come with technology. Data mining is not only technology, as it requires a good understanding of the domain where it is employed. Unlike closed-form math, there are lots of issues in data mining applications where you must get creative. A strong tech background (and fearlessness) will take you a long way.
Beyond the technical, are there any personal passions or interests you want to share?
I have quite a bunch of old-fashioned low-tech passions: spending time with friends and family (not on Facebook), playing piano (not typing), reading books (not web surfing), and learning new skills such as new languages (not through RosettaStone.com). A recent project that I’ve co-initiated gives a glimpse into how I combine tech and non-tech skills: we developed the first typing tutor for Dzongkha, the official language of the Kingdom of Bhutan (www.rigsum-it.com/calt/dztype).
What would you like to share about your future plans with data mining?
My most important academic work has evolved from my venturing into data mining. I call it “To Explain or To Predict?” (galitshmueli.com/explain-predict). It deals with the unfortunate under-use of predictive modeling in the social sciences, in economics, and other fields. Research in those fields is focused on causal explanation, and assumes that causal models inherently possess predictive power. This misconception is exacting a high toll in the sense of halting science (when was the last ground-breaking psychological theory?). My current efforts are focused on clarifying the difference between causal explanation and prediction, and the introduction of predictive thinking into scientific research.
Thank You Dr. Shmueli for sharing your passion.
MarkTab Commentary:
- I believe this book provides a practical statistics framework for working with Microsoft’s SQL Server Data Mining. As Dr. Shmueli mentions in the interview, the book includes software from a few vendors, but more generally the techniques can apply to many other data mining technologies. You can read my previously posted review by clicking this link.
- I agree that data mining is in its infacy, as she aptly stated “far from being exhausted”. While many of the mathematics are comparatively older, certainly predating both the web and the Internet, the application of such techniques continues to grow as technology itself advances. SQL Server 2008 R2 is now, for example, commonly used on terabyte and higher sized databases. With current technology, consumers (and therefore entrepreneurial small business innovaters) can make terabyte-sized databases at home.
- I enjoy the stories about collaboration to make a project happen. Collaboration is increasingly an important data mining professional skill, not just for producing books and supporting websites, but also for accomplishing projects. I believe the classic model of one person analyzing a flattened table will continue, but that the fascinating and interesting data mining problems — with high impact value — will require teamwork, and especially distributed asyncrhonous workflows from distributed teams.
- I am glad the authoring team decided to include many graphics throughout the textbook, and included a visualization tool. The term “Visual Analytics” is the current term for this new field of information visualization. Data mining can provide some amount of data reduction from large datasets, and customers are now looking for even newer yet-to-be-developed visualization tools and technologies. I believe this area will attract some Silverlight developers.
- I share Dr. Shmueli’s interest in introducing predictive analytics to researchers. For my current doctoral learners at the University of Phoenix, I have encouraged them to consider data mining technologies since even some commercial vendors will make their software free or at a heavily-reduced cost for students in accredited programs.
- I believe that text mining will increasingly if not predominantly become the main focus for data mining altogether. The reason is that much information (just think of the Internet) is textually based and not stored (for example) in a relational database. Also, we can expect people wanting to data mine verbal speech, anything from conversations to conventions to singing. XLMiner (in current version) has many data mining features but will not perform text mining. I can recommend another Excel-based add-in, Predixion Insight, which does perform text analytics (for English). As consumers, we can expect new text mining tools and technologies to be available, and current vendors like Predixion will continue to improve their technologies as their users ask for more features.
If you like this post, you may also like:
- Data Mining for Business Intelligence Book Review [Translate] Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel® with XLMiner® Book Review I recently met...
- Multidimensional Models and Business Intelligence Development Studio [Translate] Microsoft SQL Server 2008 Analysis Services Unleashed Book Review Chapter 9 For people who like step-by-step, how-to instruction, this chapter...
- Data Mining Separates News from Noise [Translate] I vocalized this title recently when explaining what data mining does. The challenge which many people face is information overload....
- Visual Analytics and SQL Server Data Mining [Translate] The Association for Computing Machinery produces a regular journal called SIGKDD Explorations, where SIGKDD is an acronym for Special Interest...
- Data Mining Seminar Passed Over for SQL Rally 2011 [Translate] SQL Rally is a new conference scheduled for May 11-13, 2011 at the Marriott World Center in Orlando, FL. The...

Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel® with XLMiner®