


The schedule for the last SQL Saturday this year in Europe is alive. Hurry up with registrations, two months before the even we are already 70% full! Also don’t forget our great preconference seminars: See you in beautiful Ljubljana!


It’s been awhile since I wrote the last blog on the data mining / machine learning algorithms. I described the Neural Network algorithm. In addition, it is a good time to write another post in order to remind the readers of the two upcoming seminars about the algorithms I have in Oslo, Friday, September 2nd, 2016, and in Cambridge, Thursday, September 8th. Hope to see you in one of the seminars. Finally, to conclude this marketing part: if you are interested in the R language, I am preparing another seminar “EmbRace R”, which will cover R from basics to advanced analytics. Stay tuned. Now for the algorithm. If you remember the post, a Neural network has an input, an output, and one or more hidden layers. The Neural Network algorithm uses the hyperbolic tangent activation function in the hidden layer and the sigmoid function in output layer. However, the Sigmoid function is called the Logistic function as well. Therefore, describing the Logistic Regression algorithm is simple after I described the Neural Network. If a neural network has only input neurons that are directly connected to the output neurons, it is a Logistic Regression. Or, to repeat the same thing in a different way: Logistic Regression is Neural Network with zero hidden layers. This was quick:) To add more meat to the post, I am adding the formulas and the graphs for the hyperbolic tangent and sigmoid functions.


I am closing my plan for the second semester of this year. Before listing the events I plan to attend, just a quick comment. I had conversation about some specific events and why don’t I visit them many times, especially about the events in vicinity. My answer is pretty simple. I try to plan my events for six months in advance. My schedule for the year 2016 is full. I simply can’t visit the events that are announced only couple of months in advance. I prefer longterm planning. Anyway, here is my list, pretty long again.  SQL Grill, Lingen, Germany, August 19th: one presentation  Statistics with TSQL
 SQLSaturday #532  Oslo 2016, September 2nd3rd:
 SQLSaturday #520  Cambridge 2016, September 8th10th:
 SQLSaturday #555  Munich 2016, October 8th: not confirmed yet.
 SQLSaturday #538  Sofia 2016, October 15th: not confirmed yet.
 PASS Summit 2016, October 25th28th, Seattle, WA:
 SQLSaturday #569  Prague 2016, December 3rd: not confirmed yet.
 SQLSaturday #567  Slovenia 2016, December 9th10th: since I am one of the organizers, this one is confirmed:)
And this should be enough for this year:)


So we are back again The leading event dedicated to Microsoft SQL Server in Slovenia will take place on Saturday, 10^{th} December 2016, at the Faculty of Computer and Information Science of the University of Ljubljana, Večna pot 113, Ljubljana (http://www.fri.unilj.si/en/about/how_to_reach_us/). As always, this is an Englishonly event. We don’t expect the speakers and the attendees to understand Slovenian However, this way, our SQL Saturday has become quite well known especially in the neighboring countries. Therefore, expect not only international speakers, expect international attendees as well. There will be 30 top sessions, two original and interesting preconference seminars, a small party after the conference, an organized dinner for the speakers and sponsors… But first of all, expect a lot of good vibrations, mingling with friends, smiling faces, great atmosphere. You might also consider visiting Ljubljana and Slovenia for couple of additional days. Ljubljana is a very beautiful and lively city, especially in December. In cooperation with Kompas Xnet d.o.o. we are once again organizing two preconference seminars by three distinguished Microsoft SQL Server experts: The seminars will take place the day before the main event, on Friday, 9^{th} December 2016, at Kompas Xnet d.o.o., Stegne 7, Ljubljana. The attendance fee for each seminar is 149.00 € per person; until 31^{st} October 2016 you can register for each seminar for 119.00 € per person. Hope we meet at the event!


This is a tip that should help installing SQL Server 2016 (tested on CTP33, RC2 and RC3) Master Data Services. The documentation is pretty old and incomplete (I already sent the feedback). The page “Web Application Requirements (Master Data Services)” (https://msdn.microsoft.com/enus/library/ee633744.aspx) should be seriously updated. First of all, there should be documented also how to use operating systems Windows Server 2012 R2 and Windows 10. I managed to install it on Windows Server 2012 R2. However, there is a bullet missing in the Role and Role Services part. In the Performance section, only Static Content Compression is mentioned. However, Dynamic Content Compression is needed as well. I managed to get it up and running


I got some questions about virtual machine / notebook setup for my Business Intelligence in SQL Server 2016 DevWeek postconference workshop. I am writing this blog because I want to spread this information as quickly as possible. There will be no labs during the seminar, no time for this. However, I will make all of the code available. Therefore, if the attendees would like to test the code, they need to prepare their own setup. I will use the following SW: Windows Server 2012 R2 SQL Server 2016 components  Database Engine
 SSIS
 SSRS
 SSAS
 DQS
 MDS
 R Services
Tools  SQL Server Management Studio (this is not included in SQL Server setup anymore)
 SQL Server Data Tools
 R Tools for Visual Studio
 R Studio
Excel 2016 Professional Plus with addins  MDS addin
 Power Pivot
 Power Query
 Power Map
 Power View
 Azure ML addin
Excel 2013 Professional Plus with addins  Data Mining addin (this addin does not work for Excel 2016 yet, this one is announced for Excel 2016 only later this year, after SQL Server 2016 release)
Power BI Apps and Services  Power BI Desktop
 Power BI Service (they need to create a free account at PowerBI.com)
 Azure ML (they need to create a free account at AzureML.com)
Mobile Publisher AdventureWorks demo databases version 2016, 2014 or 2012 I know the list is long:) However, nobody needs to test everything. Just pick the parts you need and you want to learn about. See you soon!


Traditionally, I write down a list of presentations I am giving on different events every semester. This semester, I am already a bit late. I am still missing some info. So here is the list of the events I am planning to attend. I will add events and correct the list as needed later. Here is the updated info. Of course, more updates will come when I get the relevant information.  Bulgarian UG meeting, Sofia, January 14th: presentation Introducing R and Azure ML
 Slovenian UG meeting, Ljubljana, February 18th: presentation Introducing R and Using R in SQL Server 2016, Power BI, and Azure ML
 SQL Server Konferenz 2016, Darmstadt, February 23rd – 25th:
 preconference seminar Data Mining Algorithms in SSAS, Excel, R, and Azure ML
 presentation SQL Server & Power BI Geographic and Temporal Predictions
 PASS SQL Saturday #495, Pordenone, February 27th:
 presentation SQL Server 20122016 Columnar Storage
 presentation Enterprise Information Management with SQL Server 2016
 DevWeek 2016, London, April 22nd – 26th:
 postconference seminar Business Intelligence in SQL Server 2016
 presentation Using R in SQL Server 2016 Database Engine and Reporting Services
 presentation SQL Server Isolation Levels and Locking
 SQL Nexus, Copenhagen, May 2nd – 4th: presentation Identity Mapping and DeDuplicating
 SQL Bits 2016, Liverpool, May 4th – 7th: presentation Using R in SQL Server 2016 Database Engine and SSRS
 SQL Day, Wroclaw, May 16th – 18th:
 preconference seminar Data Mining Algorithms in SSAS, Excel, R, and Azure ML
 presentation: Statistical Analysis with TSQL
 presentation: Anomaly Detection
 PASS SQL Saturday #508, Kyiv, May 21st: information to follow.
 PASS SQL Saturday #510, Paris, June 25th: information to follow.
 PASS SQL Saturday #520, Cambridge, September 10th: information to follow. And yes, this is already quarter 3, but I am late with this ist anyway


A neural network is a powerful data modeling tool that is able to capture and represent complex input/output relationships. The motivation for the development of neural network technology stemmed from the desire to develop an artificial system that could perform "intelligent" tasks similar to those performed by the human brain. Neural networks resemble the human brain in the following two ways:  A neural network acquires knowledge through learning
 A neural network's knowledge is stored within interneuron connection strengths known as synaptic weights
The Neural Network algorithm is an artificial intelligence technique that explores more possible data relationships than other algorithms. Because it is such a thorough technique, the processing of it is usually slower than the processing of other classification algorithms. A neural network consists of basic units modeled after biological neurons. Each unit has many inputs that it combines into a single output value. These inputs are connected together, so the outputs of some units are used as inputs into other units. The network can have one or more middle layers called hidden layers. The simplest are feedforward networks (pictured), where there is only a oneway flow through the network from the inputs to the outputs. There are no cycles in the feedforward networks. As mentioned, units combine inputs into a single output value. This combination is called the unit’s activation function. Consider this example: The human ear can function near a working jet engine. Yet, if it were only 10 times more sensitive, you would be able to hear a single molecule hitting the membrane in your ears! What does that mean? When you go from 0.01 to 0.02, the difference should be comparable with going from 100 to 200. In biology, there are many types of nonlinear behavior. Thus, an activation function has two parts. The first part is the combination function that merges all of the inputs into a single value (weighted sum, for example). The second part is the transfer function, which transfers the value of the combination function to the output value of the unit. The linear transfer function would do just the linear regression. The transfer functions are Sshaped, like the sigmoid function: Sigmoid(x) = 1 / (1 + e(x)). A single hidden layer is optimal, so the Neural Network algorithm always uses a maximum of one (or zero for Logistic Regression). The Neural Network algorithm uses the hyperbolic tangent activation function in the hidden layer and the sigmoid function in output layer. You can see a Neural Network with a single hidden layer in the following picture. Training a neural network is the process of setting the best weights on the inputs of each of the units. This backpropagation process does the following:  Gets a training example and calculates outputs
 Calculates the error – the difference between the calculated and the expected (known) result
 Adjusts the weights to minimize the error
Like the Decision Trees algorithm, you can use the Neural Network algorithm for classification and prediction. The interpretation of the Neural Network algorithm results is somewhat more complex than the interpretation of the Decision Trees algorithm results. Consequently, the Decision Trees algorithm is more popular.


Our third SQL Saturday in Ljubljana is over. Two weeks seems to be enough time to sleep over and see things a bit from a distance. Without any further delays, I can declare that it is clear that the event was a pure success Let me start with the numbers, comparing total number of people, including speakers, sponsors, attendees, and organizers, with previous two SQL Saturdays in Ljubljana:  2013: 135 people from 12 countries
 2014: 215 people from 16 countries
 2015: 253 people from 20 countries
You can clearly see the growth. Even the keynote was full, like the following picture shows. We again experienced very small drop rate; more than 90% or registered attendees showed up. That’s very nice, showing respect to the speakers and sponsors. So thank you, attendees, for being fair and respectful again! We had more sponsors than previous years. This was extremely important, because this time we did not get the venue for free, and therefore we needed more money than for the previous two events. Thank you, sponsors, for enabling the event! Probably the most important part of these SQL Saturday events are the speakers. We got 125 sessions submitted by 51 speakers from 20 countries! We were really surprised. We take this a sign of our good work in the past. 30 great sessions with two state of the art precon seminars is more than we expected, yet still not enough to accommodate all speakers that sent the submissions. Thank you all speakers, those who were selected and those who were not! I hope we see you again in Slovenia next year. You can see some of the most beautiful speakers and volunteers in the following picture (decide by yourself if there is somebody spoiling the picture). Next positive surprise were the volunteers. With these number of speakers and attendees, we would not be able to handle the event without them. We realized that we have a great community, consisting of some really helpful people, that we can always count on. Thank you all! I think I can say for all three organizers, Mladen Prajdić, Matija Lah, and me, that we were more tired than any year before. However, hosting a satisfied crowd is the best payback you can imagine And the satisfaction level was high even among the youngest visitors, as you can see from the following picture. Of course, we experienced also some negative things. However, just a day before the New Year evening, I am not going to deal with them now. Let me finish this post in a positive way


Decision Trees is a directed technique. Your target variable is the one that holds information about a particular decision, divided into a few discrete and broad categories (yes / no; liked / partially liked / disliked, etc.). You are trying to explain this decision using other gleaned information saved in other variables (demographic data, purchasing habits, etc.). With limited statistical significance, you are going to predict the target variable for a new case using its known values of the input variables based on results of your trained model. Recursive partitioning is used to build the tree. The data is split into partitions using a certain value of one of the explaining variables. The partitions are then split again and again. Initially the data is in one big box. The algorithm tries all possible breaks of both input (explaining) variables for the initial split. The goal is to get purer partitions considering the classes of the target variable. You know intuitively that purity is related to the percentage of the cases in each class of the target variable. There are many better, but more complicated measures of the purity, for example entropy or information gain. The tree continues to grow using the two new partitions as separate starting points and splitting them more. You have to stop the process somewhere. Otherwise, you could get a completely fitted tree that has only one case in each class. The class would be, of course, absolutely pure. This would not make any sense. The results could not be used for any meaningful prediction. This phenomenon is called “overfitting”. There are two basic approaches to solve this problem: prepruning (bonsai) and postpruning techniques. The prepruning (bonsai) methods prevent growth of the tree in advance by applying tests at each node to determine whether a further split is going to be useful; the tests can be simple (number of cases) or complicated (complexity penalty). The postpruning methods allow the tree to grow and then prune off the useless branches. The postpruning methods tend to give more accurate results, but they require more computation than prepruning methods. Imagine the following example. You have the answers to a simple question: Did you like the famous Woodstock movie? You also have some demographic data: age (20 to 60) and education (ranged in 7 classes from the lowest to the highest). In all, 55% of the interviewees liked the movie and 45% did not like it. Can you discover factors that have an influence on whether they liked the movie? Starting point: 55% of the interviewees liked the movie and 45% did not like it. After checking all possible splits, you find the best initial split made at the age of 35. With further splitting, you finish with a fullgrown tree. Note that not all branches lead to purer classes. Some of them are not useful at all and should be pruned. Decision trees are used for classification and prediction. Typical usage scenarios include:  Predicting which customers will leave
 Targeting the audience for mailings and promotional campaigns
 Explain reasons for a decision
 Answering questions such as “What movies do young female customers buy?”
Decision Trees is the most popular data mining algorithm. This is because the results are very understandable and simple to interpret, and the quality of the predictions is usually very high.


It is alive! It was really hard to make the choices. Nevertheless, the schedule is public now. To everybody that submitted proposals – thank you very much again! We are sorry we cannot accommodate everybody. Please note that even if you were not selected, we would be happy to see you in Ljubljana.



I am continuing with my data mining and machine learning algorithms series. Naive Bayes is a nice algorithm for classification and prediction. It calculates probabilities for each possible state of the input attribute, given each state of the predictable attribute, which can later be used to predict an outcome of the predicted attribute based on the known input attributes. The algorithm supports only discrete or discretized attributes, and it considers all input attributes to be independent. Starting with a single dependent and a single independent variable, the algorithm is not too complex to understand (I am using an example from my old book about statistics  Thomas H. Wonnacott, Ronald J. Wonnacot: Introductory Statistics, Wiley 1990). Let’s say I am buying a used car. In an auto magazine I find that 30% of secondhand cars are faulty. I take with me a mechanic who can make a shrewd guess on a basis of a quick drive. Of course, he isn’t always right. Of the faulty cars he examined in the past he correctly pronounced 90 % faulty and wrongly pronounced 10% ok. When judging good cars, he correctly pronounced 80% of them as good, and wrongly 20% as faulted. In the graph, we can see that 27% (90% of 30%) of all cars are actually faulty and then correctly identified as such. 14% (20% of 70%) are judged faulty, although they are good. Altogether, 41% (27% + 14%) of cars are judged faulty. Of these cars, 67% (27% / 41%) are actually faulty. To sum up: Once the car has been pronounced faulty by the mechanic, the chance that it is actually faulty rises from the original 30% up to 67%. The following figure shows this process. The calculations in the previous slide can be summarized in another tree, the reverse tree. You can start branching with opinion of the mechanic (59% ok and 41% faulty). Moving to the right, the second branching shows the actual conditions of the cars, and this is the valuable information for you. For example, the top branch says: Once the car is judged faulty, the chance that it actually turns faulty is 67%. The third branch from the top displays clearly: Once the car is judged good, the chance that it is actually faulty is just 5%. You can see the reverse tree in the following figure. As mentioned, Naïve Bayes treats all of the input attributes as independent of each other with respect to the target variable. While this could be a wrong assumption, it allows multiplying the probabilities to determine the likelihood of each state of the target variable based on states of input variables. For example, let’s say that you need to analyze whether there is any association between NULLs in different columns of your Products table. You realize that if Color is missing, 80% of Weight values are missing as well; and if Class is missing, 60% of Weight values are missing as well. You can multiply these probabilities. If Weight is missing, you can calculate the product: 0.8 (Color missing for Weight missing) * 0.6 (Class missing for Weight missing) = 0.48 You can also check what happens to the not missing state of the target variable, the Weight: 0.2 (Color missing for Weight not missing) * 0.4 (Class missing for Weight not missing) = 0.08 You can easily see that the likelihood that Weight is missing is much higher than the likelihood it is not missing when Color and Class are unknown. You can convert the likelihoods to probabilities by normalizing their sum to 1: P (Weight missing if Color and Class are missing) = 0.48 / (0.48 + 0.08) = 0.857 P (Weight not missing if Color and Class are missing) = 0.08 / (0.48 + 0.08) = 0.143 Now when you know that the Color value is NULL and the Class value is null, then you have nearly 86% chances that you get NULL also in the Weight attribute. This might lead you to some conclusions where to start improving your data quality. In general, you use the Naive Bayes algorithm for classification. You want to extract models describing important data classes and then assign new cases to predefined classes. Some typical usage scenarios include:  Categorizing bank loan applications (safe or risky) based on previous experience
 Determining which home telephone lines are used for Internet access
 Assigning customers to predefined segments.
 Quickly obtaining a basic comprehension of the data by checking the correlation between input variables.


I am finishing my list of conferences and seminars I am attending in the second half of the year 2015. Here is my list.  Kulendayz 2015 – September 4th5th. Although I will have huge problems to get there on time, I would never like to miss it. I have one talk there.
 SQL Saturday #413 Denmark – September 17th19th. You can join me already on Friday for the Data Mining Algorithms seminar.
 SQL Saturday #434 Holland – September 25th26th. If you miss the Denmark Data Mining Algorithms seminar, I am repeating it in Utrecht.
 SQL Server Days 2015 Belgium – September 28th29th. This will be my first talk at SQL Server Days.
 SQL Saturday #454 Turin – October 10th. I was not confirmed as a speaker yet, but I still plan to go there, to combine the SQL part with the Expo in Milan part.
 PASS Summit 2015 Seattle – October 27th30th. I still continue to be present at every single summit:) This year I have one presentation.
 SharePoint Days 2015 Slovenia – November 17th18th. No, I don’t like SPS. I will just have one small BI presentation there.
 SQL Saturday #475 Belgrade – November 28th. First SQL Saturday in Serbia. I simply must be there.
 SQL Saturday #460 Slovenia – December 11th12th. I am coorganizing this event. Everybody is welcome, this will be fully Englishspeaking event. Don’t miss beautiful, relaxed and friendly Ljubljana!
That’s it. For now:)


This is a bit different post in the series about the data mining and machine learning algorithms. This time I am honored and humbled to announce that my fourth Pluralsight course is alive. This is the Data Mining Algorithms in SSAS, Excel, and R course. besides explaining the algorithms, I also show demos in different products. This gives you even better understanding than just reading the blog posts. Of course, I will continue with describing the algorithms here as well.




