THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Dejan Sarka

Data Mining Resources

There are many different types of analyses, each one with its own pros and cons.

Relational reports have a predefined structure, and end users cannot change it. They are simple to use for end users. Reports can use real-time data and snapshots of data to show the state of a report at specific points in time. One of the drawbacks is that report authoring is limited to IT pros and advanced users. Any kind of dynamic restructuring is very limited. If real-time data is used for a report, the report has a negative impact on the performance of the source system. Processing of the reports might be slow because the data comes from relational database management systems, which are not optimized for reporting only.

If you create a semantic model of your data, your end users can create ad-hoc report structures. However, the development is more complex because a developer is needed to create these semantic models.

For OLAP, you typically use specialized database management systems. You get lightning speed of analyses. End users can use rich and thin clients to interactively change the structure of the report. Typically, they do it graphically. However, the development of an OLAP system is many times quite complex. It involves the preparation and maintenance of an enterprise data warehouse and OLAP cubes. In order to exploit the possibility of real-time restructuring of reports, the users must be both active and educated. The data is usually stale, as it is loaded into data warehouses and OLAP cubes with a scheduled process.

With data mining, a structure is not selected in advance; it searches for the structure. As a result, data mining can give you the most valuable results because you can discover patterns you did not expect. A data mining model structure is limited only by the attributes that you use to train the model. One of the drawbacks is that a lot of knowledge is needed for a successful data mining project. End users have to understand the results. Subject matter experts and IT professionals need to understand business problem thoroughly. The development might be sometimes even more complex than the development of OLAP cubes.

Each type of analysis has its own place in an enterprise system. SQL Server has tools for all kinds of analyses. However, data mining is the most advanced way of analyzing the data; this is the “I” in BI. In order to get the most out of it, you need to learn quite a lot. In this blog post, I am gathering together resources for learning, including forthcoming events.

  • Books
    • Multiple authors: SQL Server MVP Deep Dives – I wrote an introductory data mining chapter there.
    • Erik Veerman, Teo Lachev and Dejan Sarka: MCTS Self-Paced Training Kit (Exam 70-448): Microsoft SQL Server 2008 - Business Intelligence Development and Maintenance – you can find a good overview of a complete BI solution, including data mining, in this book.
    • Jamie MacLennan, ZhaoHui Tang, and Bogdan Crivat: Data Mining with Microsoft SQL Server 2008 – can’t miss this book if you want to mine your data with SQL Server tools.
    • Michael Berry, Gordon Linoff: Mastering Data Mining: The Art and Science of Customer Relationship Management – data mining from both, business and technical perspective.
    • Dorian Pyle: Data Preparation for Data Mining – an in-depth book about data preparation.
    • Thomas and Ronald Wonnacott: Introductory Statistics – if you thought that you could get away without statistics, then you are not serious about data mining.
    • Jiawei Han and Micheline Kamber: Data Mining Concepts and Techniques – in-depth explanation of the most popular data mining algorithms.
    • Michael Berry and Gordon Linoff: Data Mining Techniques – another book that explains data mining algorithms, more fro a business perspective.
    • Paolo Guidici: Applied Data Mining – very mathematical book, only if you enjoy statistics and mathematics in general.
  • Forthcoming presentations
    • I am presenting two data mining related sessions during the PASS Summit in Charlotte, NC:
      • Wednesday, October 16th, 2013 - Fraud Detection: Notes from the Field – I am showing how to use data mining for a specific business problem. The presentation is based on real-life projects.
      • Friday, October 18th: Excel 2013 Advanced Analytics – I am focusing on Excel Data Mining Add-ins, and how to use them together with Power Pivot and other add-ins. This is the most you can get out of Excel.
    • Sinergija 2013, Belgrade, Serbia
      • Tuesday, October 22nd: Excel 2013 Analytics to the Max – another presentation focusing on the most advanced analytics you can get in Excel.
    • SQL Rally Amsterdam, Netherlands
      • Thursday, November 7th: Advanced Analytics in Excel 2013 – and again I am presenting about data mining in Excel. Why three different titles for the same presentation? I don’t know, I guess I forgot the name I proposed every time right after I sent the proposal.
  • Courses
    • Data Mining with SQL Server 2012 – I wrote a 3-day course for SolidQ. If you are interested in this course, which I could also deliver in a shorter seminar way, you can contact your closes SolidQ subsidiary, or, of course, me directly on addresses dsarka@solidq.com or dsarka@siol.net. This course could also complement the existing courseware portfolio of training providers, which are welcome to contact me as well.

OK, now you know: no more excuses, start learning data mining, get the most out of your dataSmile

Published Thursday, October 10, 2013 10:50 AM by Dejan Sarka

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Dejan Sarka said:

I wanted to add another resource, however, I am getting old, and forgot about it:-) You can also find a lot of interesting videos on data mining, nearly 8 hours of material, recorder by Rafal Lukawiecki, at https://projectbotticelli.com/datamining. Another very valuable resource indeed.

October 10, 2013 10:09 AM
 

Rafal Lukawiecki said:

Thank you, Dejan! If any readers of SQLBlog are interested in my online Data Mining with SSAS course, or the DAX training by Marco Russo and Alberto Ferrari, at https://projectbotticelli.com/dax, I'd be happy to give you all a 10% discount this month. Use code SQLBLOG2013OCT at checkout (before end of Oct 2013).

October 10, 2013 10:22 AM

Leave a Comment

(required) 
(required) 
Submit

About Dejan Sarka

Dejan Sarka, MCT and SQL Server MVP, is an independent consultant, trainer, and developer focusing on database & business intelligence applications. His specialties are advanced topics like data modeling, data mining, and data quality. On these toughest topics, he works and researches together with SolidQ and The Data Quality Institute. He is the founder of the Slovenian SQL Server and .NET Users Group. Dejan Sarka is the main author or coauthor of eleven books about databases and SQL Server, with more to come. Dejan Sarka also developed and is developing many courses and seminars for SolidQ, Microsoft and Pluralsight. He is a regular speaker at many conferences worldwide for more than 15 years, including conferences like Microsoft TechEd, PASS Summit and others.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement