THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SELECT Hints, Tips, Tricks FROM Hugo Kornelis WHERE RDBMS = 'SQL Server'

  • Execution plans, laundry, and a giveaway

    In just a week from now, SQL Saturday #337 will kick off in Portland, Oregon. And I will be there – the third time already for me to be speaking in Portland.

    For a European, Portland is not the most logical location. But the organization over there is a pretty smart bunch of people. They figure that being close to Seattle gives them a great opportunity – so whenever they get the chance, they will reserve a Saturday just before the PASS Summit in Seattle for their own event. And then they make sure to organize this event so extremely well that people, speakers as well as attendees, will want to travel a few days early and make Portland their first stop.

    SQL Mastery Sessions

    But this year, the crew of SQL Saturday Oregon tried something new. They have added a day of pre-cons (or, as they call it, SQL Mastery Sessions) to their program. On Friday, October 31, three speakers will deliver a full-day session, allowing them to dive much deeper in their favorite subjects.

    For the BI fans, Mark Tabladillo will present on “Self-Service BI with Power BI”. I do not know Power BI (or any BI at all), but I do know Mark and I am sure that the session will rock. Based on the abstract, I think it will be a “from zero to hero” type of day. If you want to be the one that brings self-service BI into your organization, go and sign up for this session!

    If you are more DBA oriented, then you should definitely consider going to another rock star in the SQL world: Argenis Fernandez. His session “SQL Server Internals and Data Recovery” promises to give you a real deep-dive insight in how SQL Server stores data, how fate and bad karma can cause that data to become corrupted, and (in my opinion the most important skill a DBA should have, and one you hope never to have to apply to your production database) how to salvage your company’s valuable data if the unthinkable happens. Argenis will correct me if I’m wrong (see the comments below), but based on the abstract I suspect that this session is not for the faint of heart, nor for the beginning DBA. But if you are at least at medior level and want to learn how to recover from almost every imaginable disaster, then this is the session where you should be!

    Both sessions sound great to me, but I am spared the challenge of choosing – because I will be presenting the third SQL Mastery Session, aimed at a target audience of both database developers and DBAs at intermediate level. In “Understanding Execution Plans”, I will explain how to read an execution plan, starting at the very beginning (like: what is an execution plan and where do I find it), and then exposing intimate details of what SQL Server is doing when your query is running. You will learn how a merge join operator processes a right anti semi join, how an exchange operator shoves data between threads, how aggregation relates to my laundry, and hundreds of other things – but most of all, you will learn knowledge that you can apply to slow queries at work, to get a better understanding of what makes them slow, and how you could fix it.

    First the bad news

    Earlier this week I was told that my SQL Mastery Session is already sold out. That is great for me – nothing strokes the ego as much as seeing that there are many people who do not run away screaming at the idea of spending a day locked up with me in a classroom, listening to me talking geek, but are actually prepared to spend real dollars for the privilege. But for you, the reader, this may be bad news – especially if the paragraphs above have whetted your appetite and you now also want to be there. But don’t despair, you can sign up for the waiting list and hope a slot fills up. But there is another option as well.

    Then the good news

    The good news is that the kind people of the SQL Saturday Oregon organization have given me permission to give away one seat to my session. If you win, you not only get a $129 value admission ticket, you even get bypass the waiting list to find yourself in the last available seat, reserved especially for you! Sounds good? Yeah, thought so. So, how do you apply?

    Easy. You just have to help the SQL Saturday Oregon team and me to get all three sessions sold out. Spread the word, do the marketing! Here are the simple steps:

    1. Write on Twitter, Facebook, your blog, a web forum, or any other public place why you think that everyone should sign up for the SQL Mastery sessions in Oregon. Be original, be creative, and most of all: be convincing!

    2. Send me an email, with a link to your post. Send it from an address where you will be able to receive replies – otherwise I will not be able to contact you.

    3. Wait, pray, and hope.

    On Monday, October 27, I will write all email addresses on paper slips, throw them in a bowl, and draw the winner. To reward effort, I do reserve the right to put in two, three or even four slips for entries that are especially creative, convincing, and/or original. This is determined on a purely subjective basis by an independent jury consisting of only me. But every entry will result in at least one slip in the bowl, so every participant has a chance to be the lucky winner!

    I will pass the email address of the winner to the SQL Saturday Oregon team, who will them contact him or her to work out the details.

    Everyone’s a winner

    Only one person can win the main prize. But everyone who enters in the competition will receive a consolation prize. I will not disclose what that prize is at this time – just watch your email next week, to see the surprise.

  • Database Mail … and then the SMTP Server changed

    The database I inherited when I started my current job sends out lots of mails. It does so using Database Mail, and it has lots of mail accounts and mail profiles defined. I know that several of those profiles are used, I suspect that some are not, and I have no idea about the rest – one day I will find the time to clean up, but so far there have always been more pressing matters to attend to.

    But today the mail administrator told me that due to a change in architecture, SQL Server had to start using a different SMTP server for sending mails. Quite an easy task if you have just a single profile – just a few clicks in the Database Mail Configuration Wizard, and done. But repeating those same mouse-clicks for every profile in the list was not my idea of a morning well spent, so I decided that I’d have to script this. (This should be easy – we have just a single SMTP server, so I could hit every single mail account and did not have to bother with exceptions).

    Usually, scripts for such a task are very easy – just type a well-chosen search string in your favorite search engine, check the first two or three hits, and you’ll have a script. Usually even more than one. Carefully inspect the script (just because it’s on the internet does not mean it’s safe!), copy the script, paste into SSMS, make adjustments for your own situation, do one more inspection just to be sure – and then hit the F5 button and say “time to grab a coff … oh wait, it’s already done”.

    In this case I had no luck. Maybe I used the wrong search phrase, or maybe there is a way to accomplish this that is so easy that nobody ever bother blogging about it and I am the only one who managed to overlook the option. Or maybe nobody has ever tried to automate the task of changing SMTP servers.

    Bottom line, I did not find a pre-made script for this task, so I put in the effort to write one, and then decided to share it with you.

    Note that the script below was tested on SQL Server 2008R2 only. Also note that it will update all mail accounts to use the new SMTP server. If you have a more complex setup with multiple servers and only some need to change, you will have to add the correct filtering criteria to the WHERE clause.

    DECLARE @NewServer sysname = 'NotTelling.mail', -- New SMTP server
            @OldServer sysname = 'MySecret.mail',   -- Old SMTP server
            @account_id int;

    DECLARE Cursor_MailAccounts CURSOR LOCAL FORWARD_ONLY STATIC READ_ONLY
    FOR SELECT account_id
        FROM   msdb.dbo.sysmail_server
        WHERE  servername = @OldServer;             -- Add extra logic here

    OPEN Cursor_MailAccounts;

    FETCH NEXT
    FROM  Cursor_MailAccounts
    INTO  @account_id;

    WHILE @@FETCH_STATUS = 0
    BEGIN;
        EXECUTE msdb.dbo.sysmail_update_account_sp
                    @account_id = @account_id,
                    @mailserver_name = @NewServer;
       
        FETCH NEXT
        FROM  Cursor_MailAccounts
        INTO  @account_id;
    END;

    CLOSE Cursor_MailAccounts;
    DEALLOCATE Cursor_MailAccounts;
    (And remember, just because you found it on the internet doesn’t mean it’s safe!)

    With that done, my next task was to fix the mails being sent from SSIS packages. They use an SMTP connection that is defined in the package, not SQL Server’s own Database Mail, so I had to open and edit them by hand. Luckily, I have only four packages active, and only three of them have a Send Mail task, so I did not see any need to automate this.

  • PASS Summit Abstract Feedback

    Last week, Adam posted the feedback he received from PASS on the abstracts he had submitted for the PASS Summit 2014. I was very happy he did – not only because the post itself was a good read, but also because I had not seen the PASS announcement on making the feedback available (I did see it in a PASS community mail a few days later, though).

    I can imagine that not everyone will want to see this feedback, but I do. I love harsh criticism, as long as it’s fair – that’s how I can learn and improve. (The written comments are always my favorite part of the session feedback from conferences where I have spoken). So I have followed Adam’s lead and requested a copy of the feedback on my abstracts. And I will continue to follow Adam, by posting the feedback here – that way, all of the community can learn from my mistakes, instead of just me.

    Understanding Execution Plans [full day, level 400] (not accepted)

    For troubleshooting long running queries, looking at the execution plan is often a good starting point. Once you know how the query is executed, you know why it's slow and what you can do to speed it up.
    But what if the execution plan is just beyond your understanding? What if it uses operators you have seen before, but do not really understand? What if you look at the execution plan, but just don't see the problem?
    In this full-day workshop, you will learn everything you need to be able to read and understand any execution plan. We'll start with an overview of execution plans as a whole, and then dive in and look at all the components, and how they fit together. This will increase your understanding on why the optimizer picks a plan, and what you can do to make it pick a better plan.

    Whether you have read your share of execution plans or whether you wouldn't know where to find them, this workshop will teach you everything you need to know about execution plans. Attend this workshop if you want to hone your tuning skills!
    Throughout the day, we will have exercises to help you get an even better understanding of the theory. In order to get the most out of the day, attendees are encouraged to bring their own laptop, with SQL Server (any version), Management Studio, and the AdventureWorks sample database installed.

    • This sounds a great topic and a good advanced level. I notice it was presented at 300 level in Copenhagen and this time it's aimed at 400 level. Hopefully it will not start too advanced for the audience. - great abstract. maybe more demo time??
      One of the learning outcomes is "Understand how to read execution plans." - hopefully at level 400 people who know how to do this already
    • Good abstract, very good idea for a session.
    • The topic is very important for any developer and looks like the presenter will be very well trained on this session as it is being presented in three other major conferences.
    • well written abstratc with good level of details. interetsing topic - but i dont see ay references to updates in sql server 2014. goals are clear. demo % could be higher for this topic.
    • Yay for audience participation.
      In the future, when noting this session has been presented at SQL Saturdays, please note if it was presented as a full-day pre-conference or as a normal session as normal SQL Saturday sessions do not match the length of a full session.

    Setting the level in an abstract is a constant struggle for me. Is a session only 500 level if only MVPs have a chance of understanding it, and if people walk out with brain damage? If a presenter is able to take the most complex subject matter and explain it in a way that can be understood by everyone, is this a 200 or 300 level session because the intended audience can be relatively new, or is still a 500 level session because the subject matter is deep and advanced? Because of this struggle, I often adjust the level when submitting the same session. In Copenhagen, I got feedback that it was more advanced than the audience expected, so I adjusted. The goal of the session is to ensure that even people who have little experience with execution plans can learn to read them (hence the learning outcome), and then continue to dig deeper where most sessions stop (hence the advanced level). This can be seen as sitting uncomfortably between two stools, or as killing two birds with one stone. I think that the session achieves the latter – but I apparently fail to explain that well enough in the abstract.
    Even for a full day session, time is a constraint. I had to choose between more demos and more room for new content, or more coverage of the basic content. I chose the latter (even the SQL 2012 content is just the last 30 minutes or so), and I stand by that choice, but I realize that everyone has their own preferences.
    When I put in where I presented the session before, I assumed that is would be obvious that this were pre-cons for SQL Saturdays – otherwise it would not be the same session. But it was apparently not clear at all. I should have been more specific.

    Lessons learned:
    Make sure the abstract reflects that people can come in with little previous knowledge, but will still leave with advanced understanding.
    Be careful with assumptions.

     

    A New Method for Effective Database Design [full day, level 200] (not accepted)

    Your databases may be normalized and may perform well … but is the data actually correct?
    Hidden pitfalls abound even in seemingly-mature database designs. Complexity, overabstraction, and miscommunication, can create situations where data simply can’t meet expectations.
    In this full-day seminar, you will learn how to use a requirements-based methodology to translate you users’ real business problems into a rock-solid database design. You will learn exactly what questions to ask, how to ask them, and how to transform the answers into a fully normalized database that accurately captures all the business requirements.

    If you have ever experienced problems due to bad database design, attend this seminar to make sure that never happens again.

    • Good outcomes - sounds appealing
    • One typo "to translate you" rather than "your".  I really like this session idea, though I can't really foresee what the "new" method might be.  Definitely sounds like the only method I know of. Expect it would be quite good.
    • This is a very important topic.
    • Seems the presenter can pull this off quite nicely.
    • Great topic
    • interesting topic. decent abstract. clear goals. low % of demos for a full day session
    • Abstract is for a full-day session and the submitter notes they have presented the session at SQL Saturdays and SQLBits.  One is not able to compare these due to the massive difference in the session length.

    Hmmm, this is a challenge. The abstract apparently fails to make it clear that the method I present here is indeed very different from traditional database design methods. I remember a previous version of the abstract that was rather boring and technical and got rejected all the time. I tried to spiffy it up and make it sound more appealing, but when I now reread it, it does sound a lot like the marketing speak I see and hear from advocates of other methods. I’ll have to redo this one for the next time, find a middle ground between accurate but bring, and spiffy marketing speak.
    I agree with the demo remark, and I would love to do more demos, and maybe some classroom exercises, but the subject is simply too big for that. Maybe if I can present both Monday and Tuesday? ;-)
    And I again caused confusion by including references to previous deliveries without explicitly mentioning that it was a precon at those events as well. Too bad: with the mostly positive other comments, I feel that this is the only reason that this session was not selected.

    Lessons learned:
    Abstract should contain some “near-marketing” speak to make it sound appealing, but I have gone overboard. I failed to really explain what is new and special about the presented method.
    And, again, make very clear that the other deliveries were precons as well.

     

    T-SQL User-Defined Functions, or: Bad Performance Made Easy [general session, level 300] (accepted)

    User-defined functions in SQL Server are very much like custom methods and properties in .Net languages. At first sight, they seem to be the perfect tool to introduce code encapsulation and reuse in T-SQL. So why is this feature mostly avoided by all T-SQL gurus?
    The reason is performance. In this session, you will learn how user-defined functions feed the optimizer with misleading and insufficient information, how the optimizer fails to use even what little information it has, and how this can lead to shocking query performance.
    However, you will also see that there is a way to avoid the problems. With just a little extra effort, you can reap the benefits of code encapsulation and reuse, and still get good performance.

    • Should be a good session - I would attend.
    • Very good abstract, and a decent idea for a session. I like that it includes positive benefits along with the bad effects as functions aren't all bad in all cases. I think you have a typo in the notes, as  you say it got a 4.73 on a 15 scale :)
    • This session looks very good. We always hear how UDFs are bad for performance but never what we can do about it. Having a session that shows that can be very good.
    • Well written with clear and concise goals.
    • good topic. great abstract. clear goals. exciting to see session with 100% demos

    Thanks for the kind words. And yeah, that 4.73 was on a 1-5 scale, missed the dash.
    I hope the last reviewer will not be disappointed when the actual demo percentage is about 95% (the PASS submission form has a dropdown for the demo percentage with 0%, 25%, 50%, 75%, or 100%, so I had to round).

    Lessons learned:
    Looks like this abstract, apart from the typo, can remain unchanged.

     

    Now Where Did THAT Estimate Come From? [general session, level 500] (not accepted)

    The SQL Server Query Optimizer makes its plan choices based on estimated rowcounts. If those estimates are wrong, the optimizer will very likely produce a poor plan. And there's nothing you can do about it. Or is there?
    In this session, you will learn exactly where these estimates come from. You will gain intimate knowledge of how statistics are built and maintained and how they are used to estimate row counts. But you will also learn how filters and joins influence those estimates.
    Though the focus of this session is on understanding the cause of bad estimates, you will also learn ways to fix the problems and get better estimates - and hence, better performing queries.

    • 500 level session and the pre-requisits seem to make it clear that prior knowledge is essential. I hope the presenter can do justice to this - that would be my main concern.
    • I wish the title make it more clear that we were talking about the optimizer, but for a 500 level session, perhaps that isn't needed. "Though the focus of this session is on understanding the cause of bad estimates" I guess I see that, but really the session is about where estimates come from based on the rest of the abstract, and where a bad one comes from is just an offshoot of that.
    • The abstract looks solid. The goals are aligned with the abstract and also look solid.
    • Abstract gives a clear picture of what to expect.  Goals could be better defined.  Is this truly a 500 level?
    • Type-o in Session Prerequisite, "sold" should be "solid".
    • Doubtful if this 500
    • interesting title. good topic - revevant for senior dbas or developers interesting in an indepth understanding of execution plans. abstract has good level of depth goals are terse but convey the necessary details. high % of demo is good.

    Again the level problem, and I must admit that I was, and still am, in doubt on whether to present this as level 400 or 500.
    The “can do justice” remark makes me wonder if this is from a reviewer who got only the abstract, or the abstract and my name. In other words, are these doubts based on the abstract, or based on a perception of my understanding of and ability to explain the subject? I would love to get into contact with this reviewer (mail me, please!). No hard feelings, I am just curious to understand where this opinion comes from and how I can improve.
    The remark on “talking about the optimizer” is interesting. That is NOT what I talk about in this session, but I see what part of the abstract results in this misconception. I need to rewrite that. And yes, you are completely right that the focus is on where estimates come from, not just bad estimates, so I will definitely steal those words for my v2 abstract.

    Lesson learned:
    Focus the abstract on the actual content, not on related content.

     

    Normalization Beyond Third Normal Form [general session, level 400] (not accepted)

    Do you believe the myths that “Third Normal Form is good enough”, or that “Higher Normal Forms are hard to understand”?
    Do you believe the people who claim that these statements are myths?
    Or do you prefer to form your own opinion?

    If you take database design seriously, you cannot afford to miss this session. You will get a clear and easy to understand overview of all the higher Normal Forms: what they are, how to check if they are met, and what consequences their violations can have. This will arm you with the knowledge to reject the myths about higher Normal Forms. But, more important: it will make you a better designer!

    • This is a good academic session - but not sure about it's real world application - a totlal theoretical discussion (based on the fact it has 0% demos) may not hold peoples interest for too long.
    • Database design is my favorite topic, and one I think db devs need to know more about. My only problem with this abstract is that it feels way overreaching to cover all of these normal forms at a 100 level where 1, 2 and 3 aren't even going to be understood all that well.
    • Looks very interesting. I suppose this is something we all should learn about. Looks the abstract covers a lot of material and while is 100% theory it is theory that must be learned anyway.
    • Love the topic. Abstractand Goals tend to "preach" one side of the story.Real-world examples would be helpful.
    • Not sure this is 100 level,
    • routine topic. abstract and topic dont show anything new or exciting for sql 2012 or 2014. The sessions seems all theory and has no demos. This may not be of much appeal and interest to audience
    • Zero demonstration time.
    • While the abstract is presented as a 100-level; the abstract description and goals are 200-level.

    I always prepare abstracts in a Word document, and then copy the content on the submission form. And in this case I must have made a mistake in the level selection. I intended to submit this as a level 400 session, not as level 100. Oops.
    What is a demo? Is it only a demo if the presenter sits behind the laptop, opens Management Studio, and executes T-SQL code? Or can you consider applying normalization rules to a concrete example of a database design, even if it’s on pre-made Powerpoint slides, as a demo too? The reviewers seem to say “yes” to the last option; I thought “no” when I submitted.
    The “abstract and topic dont show anything new or exciting for sql 2012 or 2014” comment is the only remark in all the feedback that I can’t use in any way to improve myself. The subject of database design is not related to a version of SQL Server, not even to SQL Server itself. I could present this session on a conference on any relational database and it would be equally useful.

    Lessons learned:
    When the session submission form is completed, “check, check and check again” is not sufficient; I probably should wait a day and then check three times more (or just learn not to make such sloppy mistakes).
    All real-world examples (or made up but realistic examples) are demos. Even if I am still in Powerpoint.

     

    Inside the Nonclustered Columnstore Index [general session, level 400] (not accepted)

    You have heard the marketing speak. You may have seen the demos. And maybe you have even played with the feature: Columnstore indexes, the amazing "go faster" switch in SQL Server 2012!
    But do you know how they work? Come to this session to find out. We will lift the hood to give you a fascinating in-depth view of how Columnstore indexes are built and read, how batch-mode execution operates, and how these features work together to achieve an amazing performance boost.

    • Topic should be well received
    • Thanks for the abstract, it's good to have someone speaking on columnstore Index, also it would be great if we can include some real examples and demo.since there is no any demo.
      Thanks

    I must have made another mistake – not visible in the feedback, but in the mail I got with the results this was listed as a half-day session. I entered this as a normal (75 minute) session in my Word document.
    No demo (or rather 10%, but I had to round to either 0 or 25) is a result of my choice to cover the internals in depth. Most internals can’t be shown in demos. I love seeing lots of demos in the sessions I attend, so I fully understand the reviewer. But I am also convinced that adding more demos would not improve the quality of this specific session. If that reduces the chance to be accepted, then so be it. I rather present the best possible session at only a few events than a watered down version at more places.
    That being said, I probably should not have submitted this session, and I will not submit it anymore to other conferences. The nonclustered columnstore was SQL Server 2012, and is now sort of obsolete. SQL Server 2014 was not released yet when I submitted my abstracts, but I knew it was coming and I knew that it would bring the improved, clustered version of the columnstore index – this session is old news. I love this session and I regret having to decommission it. But this session is now about as relevant as a session on the internals of English Query.

    Lessons learned:
    Once more: check, check, double check – and then check again.
    No matter how much I love a session, when it’s time has come I must no longer submit it.


    Powerful T-SQL Improvements that Reduce Query Complexity [general session, level 300] (not accepted)

    We’ve all dealt with nightmare queries: huge, twisted monsters that somehow work, despite being ugly and unmanageable. The time has come to tame these beasts, and the solution is available now, in SQL Server 2012.
    New T-SQL functions offer out-of-the-box solutions for many problems that previously required complex workarounds. Paging, Running totals, Moving aggregates, YTD, and much more comes at the power of your fingertips in SQL Server 2012. The only thing you need to do is learn the syntax. And that is exactly what this session is all about: a thorough description and explanation of the syntax, and loads of demos to demonstrate how you can use all these new features.
    Attend this session to boldly take SQL Server where it has never gone before!

    • I like the abstract it informs attendees of some of the things that they can expect to learn.
    • The title of the session is kind of vague, since whereas the windowing functions are useful TSQL improvements, they are not the only ones that are available throughout time. Would have gotten a better review if the title was more specific to windowing functions and 2012.
    • Looks good. If selected I would encourage the presenter to include/replace some aspects of it with SQL 2014 features that also help to improve how queries are written.
    • interesting topic and catchy session name. abstract is very well written and points out sql server 2012 features - relevant and current. good level of details in the goals and good balance of demo %
    • The session name is somewhat misleading for what appears to be a "Windowing Functions" presentation.

    Thanks for calling the title catchy. I like it too. But the comments of three other reviewers make me painfully aware that this title fails to deliver on the most important function of a title: it does not give the conference visitor an “at a glance” idea of the broad subject matter. Yes, all abstracts are available. But how many conference attendees read them all? I don’t, that’s for sure! I use the titles to filter down to just two or three promising sessions, then read the abstracts (if I have time – otherwise I pick based on title alone!) There were indeed more improvements in SQL 2012, and then there are even more in SQL 2014. Based on the title, I myself would expect to see them covered.

    Lesson learned:
    I already knew this but failed to apply the lesson: session titles should be catchy, but should also set a correct expectation for the audience.

     

    And that concludes another long post. Thanks to all the volunteers who spent countless hours on the unthankful task of reading through hundreds of abstracts, commenting on them, selecting between sessions that are almost impossible to compare with nothing more to go on than a few hundred words in the abstract. Your hard work is much appreciated; the PASS Summit would not be possible without people like you. So thanks!

    And I like to conclude this post with the same words that Adam has at the end of his: Have any additional feedback for me? Post it below!

  • SQLPass NomCom election: Why I voted twice

    Did you already cast your votes for the SQLPass NomCom election? If not, you really should! Your vote can make a difference, so don’t let it go to waste.

    The NomCom is the group of people that prepares the elections for the SQLPass Board of Directors. With the current election procedures, their opinion carries a lot of weight. They can reject applications, and the order in which they present candidates can be considered a voting advice. So use care when casting your votes – you are giving a lot of influence to the people you choose.

    Because there are three seats open for elections, every PASS member with a completed profile on the PASS site gets a maximum of three votes. I used only two, and I have a good reason for that. I believe that there is something fundamentally broken in the NomCom selection and election process.

    Since a few years, PASS is actively trying to morph from an “international” organization, ran only from the USA, to a truly international organization. As a result of changes to the bylaws and to the procedure for Board of Directors elections, the current Board of Directors now for the first time in history (as far as I know) has two European representatives – but sadly no one from any other non-North-American area. A lot of work still has to be done, and a lot of time still has to pass, before we can expect a truly international Board of Directors.

    The process for the NomCom election appears to be a step back in this regard. Three seats are to be elected. Two are truly open, the third is reserved … for a US/Canada representative. I do not understand this. Does PASS really need a guaranteed US/Canada seat on the NomCom to get a truly international representation? Wouldn’t it have made much more sense to reserve at least one seat for “non-US/Canada”?

    It is also relevant to consider that the actual NomCom consists of five persons. Three are elected by the members, one is the Immediate Past President of PASS, Bill Graziano, and I could not find anything about the fifth seat but my assumption is that this seat is reserved for founding partner and primary sponsor Microsoft. So that means that two seats are already filled with representatives from the US, and the reserved US/Canada seat in the elections means that North America is guaranteed a 3 out of 5 majority in the NomCom. And depending on the votes, US/Canada may also get the last two seats.

    With only two candidates for the NomCom who are not from the US or Canada, you might consider all of this a moot point. From a practical point of view, the reserved seat for US/Canada will never make a difference (though a reserved seat for non-US/Canada might). But purely out of principle, I want to voice my opinion that this process is broken. (And then we can also speculate on whether there would have been more international candidates if there would have been no reserved seats – if I know that the rules of the election favor my opponents and reduce my chances, I would think twice before even applying).

    I am convinced that all NomCom candidates will try to put the best candidates forward, and that none of them will favor candidates from their own region. But if PASS wants to become a truly international organization, it needs to have international representatives in all bodies, including the NomCom. Ideally, that is achieved by having completely open elections for all bodies – no seats should be reserved for regions. But I understand that the current membership and the past record of election participation from members from different regions would cause such a setup to end up with all North American representatives, so I do see the need for reserving seats for under-represented areas – as a temporary measure.

    What I do not understand is the reason for reserving a seat for a region that is already way over-represented. And that’s why I decided to go on a strike against candidates from the US or from Canada, voting only for candidates from the rest of the world. Nothing personal, not for any reasons related to the actual candidates – just making a point, and maximizing the chance of both open seats going to non-North-American candidates. There were only two such candidates – and that is why I used only two of my three votes.

    There are now less than two days left before voting closes. If you have not already, I urge you to go out and vote now. Use your votes. Use them wisely.

  • SQL TuneIn Zagreb 2014 – Session material

    I spent the last few days in Zagreb, Croatie, at the third edition of the SQL TuneIn conference, and I had a very good time here. Nice company, good sessions, and awesome audiences.

    I presented my “Understanding Execution Plans” precon to a small but interested audience on Monday. Participants have received a download link for the slide deck.

    On Tuesday I had a larger crowd for my session on cardinality estimation. The slide deck and demo code used for that presentation will be available through the conference website, but for those who cannot wait, I have also attached them to this blog post.

    The organization of the event have already announced their plans to host a fourth edition. And if I have any say in the matter, I will visit again.

  • TechDays 2014 – Session material

    Last week in the Hague, I had the honor to present two very different sessions at the Dutch TechDays conference. A deep-dive session on internals of the SQL Server 2012 nonclustered columnstore index, and a very developer-oriented session on the bare basics of performance tuning. To my delight, both times the room was filled with very interested people, asking great questions and, I guess, enjoying my presentations.

    All sessions were recorded, and I have been told that in due time, all will be available on Channel 9. But what if you can’t stand the wait? What if you just want to quickly browse through my slide deck without having to endure my crappy jokes? Or what if you want to play around with my demo code but are of the rightful opinion that manually copying my code while constantly pausing and unpausing the video would be an utter waste of time?

    The answer to each of those problems is: you simply download the slides and demo code from the attachment to this blog post. Simple, huh?

  • Fake statistics, and how to get rid of them

    There are two ways to test how your queries behave on huge amounts of data. The simple option is to actually use them on huge amounts of data – but where do you get that if you have no access to the production database, and how do you store it if you happen not to have a multi-terabyte storage array sitting in your basement? So here’s the second best option: you cheat.

    Luckily, SQL Server has a feature that allows you to do just that. I must warn you that this feature is undocumented and unsupported. You should never under any circumstances use this feature on a production server, unless explicitly directed to by qualified Microsoft staff. Using it, sparingly, on a test box is okay. But as an undocumented feature, there is no guarantee that it will always work, or that it will continue to work in future versions.

    With that disclaimer out of the way, let’s take a look at the “cheat” option. As you probably know, the Query Optimizer relies on statistics for its decisions. If those statistics are wrong, the decisions will probably be wrong. Here is where we find the leverage to cheat: if we can force SQL Server to use statistics that are representative of the production database rather than the test database, we will get the execution plan it would generate on production. The actual performance will probably still be lots faster, but at least we can look at the plan and use our understanding of execution plans to check that we get scans where we want scans, seeks where we want seeks, and all the right types of joins.

    Fake statistics

    The undocumented feature that we can use here actually has a very simple and easy-to-remember syntax. You just run an UPDATE STATISTICS command, adding the options “WITH ROWCOUNT=xxxx, PAGECOUNT=xxxx” to force SQL Server to store the numbers you mention as the rowcount and pagecount. Clear the plan cache, then test your queries and check how they behave with the simulated number of rows and pages. I could give a much longer description and examples of the usage, but others have already done so – see for instance this post from the horse’s mouth (aka the Query Optimization Team), this post from Benjamin Nevarez, or if you are able to read German this excellent post from Uwe Ricken.

    … and how to get rid of them

    But what all those blog posts fail to mention is what happens later. Of course, if you just create a small test table, slap in a few sample rows, fake big statistics, check the execution plan and then drop the table, that is not an issue. But what if you have a test database that contains a lot of carefully crafted test data, and that is used for other tests as well? You may not want to run all the tests with those artificially bumped rowcounts!

    In my case, I ran into this in the AdventureWorks sample database, my loyal companion for presentations and demos. I will present a pre-conference seminar on Execution Plans in Exeter (March 21), in Copenhagen (March 28), and in Zagreb (June 2). For my preparations, I wanted to force a hash spill warning, so I artificially lowered the rowcount for the SalesOrderDetail table to just 200. That worked just fine – this nifty feature can be used to mimic every rowcount, not just the big numbers.

    Because I use this sample database for other purposes as well, I wanted to immediately reset the statistics to their normal value. I figured rebuilding all the statistics on the table with the FULLSCAN option would do the trick. Right? Wrong! The UPDATE STATISTICS did run, it took a while to scan all rows in the table (as requested) – but then it still retained the rowcount and pagecount values that I had forced earlier! And there is no visible indication at all – executing the UDPATE STATISTICS … WITH FULLSCAN statement simply runs for a while, then reports success.

    It was actually mere coincidence that I later did some other tests on the same table, and just happened to notice that the estimated rowcount for this table was still 200. It then took me at least an hour of searching the internet and trying many different options (including using zero or a blank value for the rowcount and pagecount parameters, combining different other options of the UPDATE STATISTICS command, clearing or not clearing the plan cache, and a few other things I don’t even remember. None of them worked. I could of course count the actual number of rows and pages and use that, and that would have worked because my AdventureWorks database never changes – but for a test database where the data can actually change over time, this options would not work.

    In the end, I finally found one a method that works. But it is really a sledgehammer approach, and I prefer not heaving to do this on large tables on a regular basis: ALTER TABLE Sales.SalesOrderDetail REBUILD. After running that statement, I found that the statistics on the SalesOrderDetail table had finally reverted to their normal behavior. Until the next time I need to fake a different number.

    Conclusion

    On a development server, using the WITH ROWCOUNT and WITH PAGECOUNT options of the UPDATE STATISTICS is a fine method to simulate large numbers of rows, or to simulate the effect of wrong statistics. But unfortunately, the numbers given stick for far longer that I like; this is not documented anywhere, and not easy to undo. Rebuilding the table and indexes appears to be the only solution.

    If you have found other ways to return from fake row- and pagecounts to the normal statistics behavior, post a comment and share your knowledge!

  • Database Design training – for free?

    When I started this blog, my plan was to focus on two main subject areas. One of them is SQL Server, T-SQL, and performance. The other is database design.

    Looking back over my post history, I honestly cannot say I delivered on the second area. Not because I have nothing to say about database design, but because I found it doesn’t lend itself for a blog. In the SQL Server / T-SQL area, there are subjects that can be isolated and described in a single post, or in a short series. In database design, I do not see such objects. Everything is related, and the only good way to cover this is end to end – starting at requirements gathering, through modeling, normalization, more normalization, and ending at the final database design.

    That cannot be covered on a blog post. So I found a different place: Pluralsight. Pluralsight is a company that provides online training on IT-related subjects. Excellent training, delivered by the best specialists available, for a fraction of the price of classroom training. And with the additional benefit of being able to pause and rewind as often as you want.

    Relational Database Design

    I was delighted when Pluralsight offered me the opportunity to record a course on relational database design for them. I must admit that creating the course took more time than I had ever anticipated – but it is now finished and available in the Pluralsight course library. It is a very comprehensive course – to see the entire course, end to end, you will need to set aside almost a full work day: 7 hours en 33 minutes. But luckily, you can consume the course in small portions, watching a few clips when you have time and stopping to return later when you’ve had enough.

    The first module is the introduction, in which I give a historic overview of how storage hardware and methods have evolved to where they are now. If you are pressed for time, you might consider skipping this module. Knowing the past can help understand the present better, but this module is not required to understand the rest of the course.

    In module 2, I explain the IDEF1X notation for ER diagrams, that I use throughout the course. Module 3 presents you with techniques you can use to gather the information you need from the domain experts, and module 4 shows how to use that information to create a first draft of the data model. Module 5 then explains how to convert between the ER diagram and the relational database representation of the data model – I personally prefer to postpone this conversion until after the normalization, but not everyone likes to work that way; hence my decision to cover the conversion first.

    Normalization is a subject that is far more complex than many people realize – yet much easier to do than many people fear. I use two (long!) modules to cover basic normalization, up to third normal form; and then a third long module to go over all the so-called “higher” normal forms – often considered to be unnecessary, too hard to do, and too complex to understand. I hope you’ll disagree with those qualifications as much as I do after watching this module.

    IDEF1X may be popular, it is far from the only ER diagramming method. In the last module, I cover several other methods, showing you how much they are alike and where you’ll find the important differences. After seeing it, you should be able to work in any ER diagramming method you may encounter.

    I personally think that this is a very good course on relational database design. But I might be just a bit biased. So I suggest that you go ahead, check it out, and be your own judge!

    Cheap …

    My relational database design course is not only (in my opinion) very good – it is also extremely affordable. Have you ever looked at the cost of a classroom course on database design, then decided that you simply cannot afford it? The good news is – Pluralsight is cheaper! Prices for individual subscriptions start at just $29 per month, or $299 per year for a basic subscription – which already gives unlimited access to the full course library. Or you can pay $49 per month / $499 per year for a “plus” subscription, that adds access to exercise files, assessments, certificates, and offline viewing.

    And for that low price, you do not get access to just my course on relational database design – the full course library of Pluralsight already contains over a thousand titles, with new titles being added almost every day. Readers of this bog should be excited to see over fifty courses in the SQL Server subject area, by renown experts such as Paul Randal, Jonathan Kehayias, Leonard Lobel, Glenn Berry, and many others. And if you want to look beyond the borders of SQL Server, Pluralsight has courses on almost every other subject (as long as it is related to dev and IT).

    … or free?

    But the title of my blog promises “free”, not “cheap”. Thirty bucks per month may be a steal, but it is not free. Was I lying? Of course not! You can check my relational database design course (and more) for free. How? Simple!

    Method 1: available to all, but a bit limited. Go to the Pluralsight website, and sign up for a free trial. This costs you absolutely nothing, and it gives you access to all the Pluralsight courses for a whole week. There is a catch, though – your viewing time is limited to 200 minutes. So you can use the trial to watch one of the shorter courses completely, to get a very extensive taster of a longer course, or to browse around and watch snippets of many courses – that is up to you!

    Using this method is an easy way to watch almost half of the relational database design course for free. But what about the rest? Enter method 2!

    Method 2: sufficient to watch the entire course, but not available to everyone. Pluralsight has allowed me to give out a few one-week training passes. Like the free trial, these give access to all courses in the course library, for a full week. Unlike the free trial, they are not time limited. If you have the stamina, you can get a whole 168 hours of training out of them. (I do not advise this. I am not responsible for the effects on mental and physical health if anyone thinks they should even try this. Please don’t). But with a bit of planning, a week should be enough to consume the entire relational database design course, and still have time left to check out some of the other courses!

    So how do you get such a free pass? The easiest method is to post a comment to this blog, explaining why I should pick you. On February 10, I will choose the five best, most convincing, or maybe just the funniest comments, and reward them with a training pass!

    So go the the Pluralsight site, sign up for that free trial, check out my course, then return here and post a comment. Convince me!

  • Parameterization and filtered indexes (part 2)

    In my previous post, I demonstrated how the presence of a filtered index can get in the way of successful parameterization, possibly resulting in too much query compilations and procedure cache bloat. I suggest reading that first, because I will go straight where I left off.

     

    Use the Force, Luke

     

    If you read the previous post very carefully, you will have noticed that I wrote that if a plan is unstable, it is “unsafe for simple parameterization” – and that suggests that SQL Server will not have such inhibitions when using forced parameterization. Let’s have a look. Assuming you still have the indexes I created in the previous blog post, just execute the query below change the parameterization setting for AdventureWorks2012 to forced, clear out the plan cache (and remember not to do this on a production system!), execute the query that would not successfully parameterize when the database was set to simple parameterization, and inspect the plan cache. You will see that this time, the query was indeed parameterized. If you also add the code to watch the “Unsafe Auto-Params/sec” counter, you will see no changes to its value.

     

    USE AdventureWorks2012;

    go

    ALTER DATABASE AdventureWorks2012

    SET PARAMETERIZATION FORCED;

    go

    DBCC FREEPROCCACHE;

    go

    SELECT SUM(OrderQty)

    FROM   Sales.SalesOrderDetail

    WHERE  ProductID = 706;

    go

    WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')

    SELECT      cp.objtype AS ObjectType,

                cp.usecounts AS UseCount,

                st.[text] AS QueryText,

                cp.plan_handle AS PlanHandle,

                stmt.value('(@ParameterizedPlanHandle)',

                           'varchar(64)') AS ParameterizedPlanHandle,

                qp.query_plan AS [Execution Plan]

    FROM        sys.dm_exec_cached_plans AS cp

    CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp

    CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) AS st

    CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS batch(stmt)

    WHERE       qp.dbid = DB_ID()

    AND         st.[text] NOT LIKE '%ExcludeMe%';

    go

     

    At first sight, this may appear to be a very simple and elegant solution for the problem. Just set all databases to forced parameterization and then sit back and relax, waiting for the inevitable flood of happy emails from end users who wish to thank you for the tremendous performance boost you just gave them.

     

    But wait. When has solving a problem in SQL Server ever been this simple? There must be a catch, right? But where?

     

    A new use case

     

    In order to see the problem, I will use a different database – so let’s first clean up the mess we made in AdventureWorks2012 and restore it to its original state, so that other bloggers can use it for their demos without my stuff getting in the way:

     

    ALTER DATABASE AdventureWorks2012

    SET PARAMETERIZATION SIMPLE;

     

    DROP INDEX Sales.SalesOrderDetail.ix_ProductID_Over_1000;

    DROP INDEX Sales.SalesOrderDetail.ix_ProductID_Incl_OrderQty;

     

    A very common use case of filtered indexes is in combination with “soft deletes” – data that is no longer valid is not physically removed from the table, but kept (often to maintain a history of events). A special column, usually called “is_deleted”, is used to track that this information is historic only. Since the majority of queries is only interested in current data, you will find the predicate “is_deleted = 0” in almost every query – so that makes this an ideal candidate for a filtered index. And since we are now under the impression that filtered indexes really require forced parameterization, we will change that setting right away. Here is the script to create this scenario in a sample database (it may take some time to run this!):

     

    USE tempdb;

    go

    IF EXISTS (SELECT * FROM sys.databases WHERE name = 'FilterDemo')

    BEGIN;

        ALTER DATABASE FilterDemo SET SINGLE_USER WITH ROLLBACK IMMEDIATE;

        DROP DATABASE FilterDemo;

    END;

    CREATE DATABASE FilterDemo;

    ALTER DATABASE FilterDemo SET PARAMETERIZATION FORCED;

    go

    USE FilterDemo;

    go

    CREATE TABLE dbo.AllThings

       (ThingID int NOT NULL,

        ThingName varchar(20) NOT NULL,

        is_deleted bit NOT NULL DEFAULT (0),

        LotsOfStuff char(2000) DEFAULT ('Placeholder'),

        CONSTRAINT PK_AllThings

            PRIMARY KEY (ThingID)

       );

    DECLARE @i int = 0;

    WHILE @i < 5000

    BEGIN;

        SET @i += 1;

        INSERT dbo.AllThings

               (ThingID,

                ThingName,

                is_deleted)

        VALUES (@i,

                CHAR(65 + RAND() * 26) + CHAR(65 + RAND() * 26) + CHAR(65 + RAND() * 26), -- Random name

                CASE WHEN RAND() < 0.99 THEN 1 ELSE 0 END)  -- Most products are deleted

    END;

    CREATE INDEX ix_Name_NotDeleted

    ON dbo.AllThings (ThingName)

    INCLUDE (is_deleted)    -- Always include the columns you filter on!

    WHERE is_deleted = 0;

    go

     

    Once the script is done, let’s get a quick count of the number of things that have not been deleted. I used a formula that results in 99% of the things being deleted, but there is a random factor involved. The actual number of non-deleted things should be close to 50. It was 58 on my system, which I found by running this query:

     

    SET STATISTICS IO ON;

    go

    SELECT COUNT(*)

    FROM   dbo.AllThings

    WHERE  is_deleted = 0;

     

    But if you switch to the messages tab, you will see a very disturbing number. I expect SQL Server to execute this query by simply scanning the filtered index, as this index exactly contains all the rows it needs to count. But the output from STATISTICS IO shows that a total of 1256 logical reads have been made. Over twelve hundred reads to count just 58 rows? How is that possible? Let’s take a look at the execution plan to find out:

    image

    As you see, the filtered index is not used at all; the optimizer chose to scan the clustered index instead, wading through all 5,000 “things” in my table to find just those 58 that were not deleted.

     

    The reason for this is simple. I have enabled forced parameterization. So I told SQL Server that, no matter the consequences, it should always replace constant values with parameters. So the plan that the optimizer was forced to compile was not for the query I typed, but for this query instead:

     

    DECLARE @1 int = 0;

    SELECT COUNT(*)

    FROM   dbo.AllThings

    WHERE  is_deleted = @1;

     

    And the optimizer has to produce a plan that will always return the correct results, for any possible value of the parameter. Of course, when executing this query with parameter value 1, scanning the clustered index is the only possible way to return correct results, so the plan choice that was made was indeed the only possible choice. By enabling forced parameterization, we have effectively crippled the optimizer in using any filtered index at all (except through views).

     

    More force?

     

    In a case such as this, where we know that we will always use is_deleted = 0 in the query predicate, it can become very tempting to find ways to convince the optimizer to choose the query plan we want without giving up on the forced parameterization plan. Let’s first see what we can achieve by parameterizing this query ourselves and applying the OPTIMIZE FOR hint:

     

    DECLARE @1 int = 0;

    SELECT COUNT(*)

    FROM   dbo.AllThings

    WHERE  is_deleted = @1

    OPTION (OPTIMIZE FOR (@1 = 0));

     

    Okay, I admit, it was a rather desperate attempt and I didn’t really expect much of it. The OPTIMIZE FOR hint tells the optimizer that I want a plan that gives the best performance for that particular value, but the results still have to be correct for other values. So the optimizer will still consider what would happen if I supply other values, and will reject the filtered index because of that.

     

    So, back to the original query, and now use more force. I know that this query will always perform better when using the filtered index – so if the optimizer fails to see that, I will just force it. Applying an index hint does just that. It may be called a hint, but it is a directive; if you hint a query, it WILL be used no matter what. So this should help, right?

     

    SELECT COUNT(*)

    FROM   dbo.AllThings WITH (INDEX = ix_Name_NotDeleted)

    WHERE  is_deleted = 0;

     

    No, it does not help. The result is an error message. We have now given the optimizer a totally impossible task. We told it to parameterize the query, no matter what, so it did. It now has to produce an execution plan for the parameterized query, and that query will be reused with different values. You and I know that the value will really never be different, but SQL Server does not, and it still has to guarantee correct results. But then we also told SQL Server that it really has to use an index that does not include some of the rows that, for some values of the parameter, may have to be returned. Like I said – an impossible task, and SQL Server responds just like you and I would do, by throwing up its hands in despair and giving up.

     

    The simple solution

     

    Luckily, there is a solution to this problem. It’s simple – simple parameterization, to be precise. Just set the parameterization option back to its default setting of simple, and you will get a much better behavior.

     

    ALTER DATABASE FilterDemo

    SET PARAMETERIZATION SIMPLE;

    SET STATISTICS IO ON;

    go

    SELECT COUNT(*)

    FROM   dbo.AllThings

    WHERE  is_deleted = 0;

     

    Now the query takes just two logical reads. And the execution plan looks as expected: a scan of the filtered index, that’s all.

     

    Conclusion

     

    When you look only at the plan cache, forced parameterization may look like manna from heaven. But when you look further, you will see that setting the parameterization option to forced is probably not a good idea at all. It may appear to solve some issues, but you get bigger issues in return – filtered indexes that might boost performance tremendously are ignored, and if they are hinted it can even cause errors. Do you know all the code that is running on your system? Are you sure that none of your application developers has ever added an index hint? Do you want to find out the hard way?

     

    Simple parameterization in combination with filtered indexes may not always play well with the plan cache. But I would think very long and hard, and do a close inspection of all objects in the database and all queries used before even considering switching to forced parameterization.

  • Parameterization and filtered indexes (part 1)

    Parameterization is a process where SQL Server slightly modifies a query in the parse phase: it replaces constant values by parameters. So if you submit

    -- Query 1

    SELECT COUNT(*)

    FROM   Sales.SalesOrderDetail

    WHERE  ProductID = 706;

    the parser will replace this by (and fool the optimizer into thinking you submitted):

    DECLARE @1 smallint = 706;

    SELECT COUNT(*)

    FROM   Sales.SalesOrderDetail

    WHERE  ProductID = @1;

    You can verify this very easily. If you go to the actual execution plan and hover over the left-most icon (the SELECT), a popup window will open that shows (a.o.) the query that was fed into the optimizer – but beware, this can be misleading, as will be demonstrated later in this post. I will show a more reliable method shortly. Don’t forget to disable the “Include Actual Execution Plan” option now, as it results in some extra work being done that will influence the output of the queries below.

     

    The benefit of this is that a second execution of the same query with a different value will reuse the same plan. This saves compilation time, and less procedure cache space is consumed for these queries. However, you do run the risk of getting bad plans because of parameter sniffing with a bad value. Because that risk increases with query complexity, SQL Server’s default behavior is to only parameterize very simple queries (“simple parameterization”), but you can opt to use “forced parameterization” instead, by setting the database option PARAMETERIZATION FORCED.

     

    Parameterization in action

     

    Before adding filtered indexes to the mix, let’s start with some experiments. First, verify that you are on your development server and not accidentally logged in to a production box. Then run the query below. The first four lines disable the “Optimize for ad hoc workloads” options, a server-wide option that should probably be enabled on every production box, but is not as relevant for development – and would complicate this blog post. (The same principles apply, it is just a bit harder to demonstrate). The fifth line instantly clears out the execution cache, a very bad thing to do in production, but great for experiments such as this.

    -- Query 2

    EXEC sp_configure 'show advanced options', 1;

    RECONFIGURE;

    EXEC sp_configure 'optimize for ad hoc workloads', 0;

    RECONFIGURE;

    DBCC FREEPROCCACHE;

     

    Let’s take a look at the contents of the plan cache – it should be empty, so ideally this query will return an empty result set.

    -- Query 3

    WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')

    SELECT      cp.objtype AS ObjectType,

                cp.usecounts AS UseCount,

                st.[text] AS QueryText,

                cp.plan_handle AS PlanHandle,

                stmt.value('(@ParameterizedPlanHandle)',

                           'varchar(64)') AS ParameterizedPlanHandle,

                qp.query_plan AS [Execution Plan]

    FROM        sys.dm_exec_cached_plans AS cp

    CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp

    CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) AS st

    CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple')

                                                 AS batch(stmt)

    WHERE       qp.dbid = DB_ID()

    AND         st.[text] NOT LIKE '%ExcludeMe%';

    The filter on “ExcludeMe” is a trick to ensure that the entry for the query itself does not show up. Other unrelated queries can also pop up, even when you are on a development system that nobody else is using – internal management tasks and installed tools (such as intellisense or third party database management tools) may fire queries that will all show up. Most of these should be excluded because of the filter on DB_ID (but do note that as a consequence, you MUST run this query in the same database where you are running your experiments – which for this blog post will be AdventureWorks). If the remaining unrelated queries bother you, find a part of the query text that identifies those queries and that never occurs in your own queries and add an extra filter on st.[text] to exclude those queries as well.

     

    Now rerun query 1 above, then inspect the plan cache again (using query 3), and you should see a result similar to this:

    image

    The “Adhoc” plan is not a real execution plan. It is a placeholder that links the original text of the query before parameterization to the actual plan, called a “Prepared” plan. Check the values in the PlanHandle and ParameterizedPlanHandle columns to see this. Or click the content of the Execution Plan column to see a graphical representation of each plan. If you submit Query 1 a few more times, the UseCount of the Adhoc plan goes up, because in case of an exact match between the text of a submitted query and the text of a query in the plan cache, the same plan will be reused. The UseCount of the Prepared plan does not go up, even though this is the plan that is actually executed – a weird way of counting, but documented (though not entirely correct).

     

    To see parameterization in action, let’s count orders for a different product:

    -- Query 4

    SELECT COUNT(*)

    FROM   Sales.SalesOrderDetail

    WHERE  ProductID = 710;

    Go back to query 3 and run it to see what this has done to the plan cache. Here are the results I got:

    image

    The raw text of the query is different, so a new “Adhoc” entry has been made. But after parameterizing the query text, SQL Server found that it already had a plan, so it could skip the compilation process and start executing right away. This saves lots of CPU cycles for the (relatively expensive) compilation process. It also reduces plan cache bloat, because the fake plan for the unparameterized query takes up a lot less space than a full execution plan (add the column cp.size_in_bytes to the SELECT list in query 3 if you want to see how much it saves). As you can see, the UseCount of the parameterized plan does go up this time, which conflicts with the remark in Books Online that it is “not incremented when parameterized queries find a plan in the cache”. I guess it actually represents the number of distinct unparameterized plans that have been linked to this parameterized plan.

     

    Assuming you use the default of simple parameterization, a more complicated query should not use this mechanism. We can verify this using the same method. After clearing the plan cache (using the last line of query 2 above), I added some needless complexity to the previous query:

    -- Query 5

    SELECT     COUNT(*)

    FROM       Sales.SalesOrderDetail AS sd

    INNER JOIN Sales.SalesOrderHeader AS sh

          ON   sh.SalesOrderID = sd.SalesOrderID

    WHERE      sd.ProductID = 710;

    The join to SalesOrderHeader does nothing, and will even be eliminated by the query optimizer; the actual execution plan is still the same as before. But if you now run query 3 to inspect the plan cache contents, you will get just a single row in the result:

    image

    In this case, there is just an “Adhoc” plan that contains the original query text, including the hard-coded product ID value. It does not reference a different, parameterized plan; and when you click the Execution Plan column, you will indeed see a real execution plan. Change the product ID in query 5 back to 706 and execute it, and a new row will be added to the plan cache – again containing a full plan. So in this case, the optimizer had to do all the work to produce a plan. Imagine hundreds of queries like this on a busy production system, and you can see how all those compilation could impact overall performance, and bloat the plan cache. Using forced parameterization changes this, but at the risk of potentially introducing parameter sniffing issues – so pick your poison.

     

    As you can see, parameterization works as designed, and it can be quite useful. Depending on your workload, you might want to consider enabling forced parameterization. But that is not the subject of this blog post!

     

    Filtered indexes

     

    The feature “filtered indexes” was first introduced in SQL Server 2008. This feature can be very useful, despite its known limitations (hint: if you filter on any columns not included in the index, always INCLUDE them – that solves 95% of the problems). But it has an effect on parameterization and plan reuse that many people are not aware of. To illustrate this, let’s just create a filtered index that should not affect my demo queries at all – an index that includes only products with a much higher product number.

    -- Query 6

    CREATE INDEX ix_ProductID_Over_1000

    ON Sales.SalesOrderDetail (ProductID)

    WHERE ProductID > 1000;

    Run this query to create the index, then clear the plan cache, activate the option to include the actual execution plan and then run query 1 again. Looking at the plan, you might think that nothing changed – just as before, the value 706 has been replaced by the placeholder @1 in the “Statement” property of the “SELECT” iterator. But as I said, this is not always a reliable source of information – let’s get a second opinion from the plan cache by using query 3 again. Here is the result:

    image

    The two STATISTICS XML queries are the result of including the actual execution plan (if you don’t believe me, clear the procedure cache, then rerun query 1 and you will see only that single line). Just a single Adhoc plan, no ParameterizedPlanHandle, no Prepared plan – the picture is clear, the plan was not parameterized. The information in the execution plan is misleading

     

    So how can a filtered index that would never have been used for this query have this effect on parameterization? The explanation lies of course in the existence of a filtered index that has the ProductID column in its predicate – the same column that is also used in the predicate of the query. Because of the different values, the filtered index may look irrelevant to this query. But if the parameterization were successful, the same plan would ALSO be used for this query:

    -- Query 7

    SELECT COUNT(*)

    FROM   Sales.SalesOrderDetail

    WHERE  ProductID = 1706;

    After parameterization, this query looks the same as query 1. If you look at the actual execution plan of this query, you will see that this query not only CAN, but also WILL use the filtered index we created earlier. If SQL Server had allowed parameterization to take place, this option would not have been available. To understand the misleading information in the actual execution plan, we have to understand the actual order of events, and looking at the right performance counter can help here (thanks to Alejandro Mesa, who helped me understand ). Let’s clear out that procedure cache again, and now run the query below – it is the same as query 7, but with two copies of the same query before and after it.

    -- Query 8

    SELECT cntr_value

    FROM   sys.dm_os_performance_counters

    WHERE  [object_name] LIKE '%:SQL Statistics%'

    AND    counter_name = 'Unsafe Auto-Params/sec';

    go

    SELECT COUNT(*)

    FROM   Sales.SalesOrderDetail

    WHERE  ProductID = 706;

    go

    SELECT cntr_value

    FROM   sys.dm_os_performance_counters

    WHERE  [object_name] LIKE '%:SQL Statistics%'

    AND    counter_name = 'Unsafe Auto-Params/sec';

    go

    If you follow along, you will see that the value of this performance counter goes up by one. This is an artifact of a chain of events that starts with one component (I assume the parser – I guess someone like Paul White would probably know, or immediately attach a debugger to find out) deciding to parameterize the query – this is the version of the plan you see in the graphical execution plan. The next component (probably the optimizer) notices that the plan is not stable – meaning that different values of the constant can result in different plans being optimal. That makes it unsafe for simple parameterization, so the parameterized version is rejected, and the original, non-parameterized version is compiled, stored in the plan cache, and executed instead.

     

    Unfortunately, this mechanism of rejecting unsafe parameterizations may get a bit too enthusiastic. Let’s first create one more index – this time not indexed, but with an included column:

    -- Query 9

    CREATE INDEX ix_ProductID_Incl_OrderQty

    ON Sales.SalesOrderDetail (ProductID)

    INCLUDE (OrderQty);

    If I now want to find the total ordered quantity instead of just the number of rows for a product, I would modify my original query 1 as follows:

    -- Query 10

    SELECT SUM(OrderQty)

    FROM   Sales.SalesOrderDetail

    WHERE  ProductID = 706;

    This query will never use the filtered index. Change the value 706 to something above 1000 and check the execution plan if you don’t believe me. Then use a hint to force the filtered index and compare the new execution plan with the old one to see the reason. The new index includes the OrderQty column, so it covers this query. The filtered index does not include this column, so a lookup into the clustered index has to be added to the plan, making it much more expensive (about twice the cost, according to the optimizer’s estimates). For this particular query, the filtered index is totally irrelevant – so now the plan is stable again, and there is no need to reject the parameterization attempt, right? Well, you can use any of the methods outlined above (clearing the cache, running the query, then looking at the cache contents; or watching the “Unsafe Auto-Params/sec” when submitting the query) and you will see that the parameterization is still rejected. The decision to reject or accept a parameterized query is not made after a full optimization, but after just a quick glance at the start of the optimization process. By the time the optimizer has explored enough options to realize that there is a single plan that is optimal for all values, even those included in the filtered index, the decision has already been made and it cannot be undone.

     

    Conclusion


    If you have issues with a bloated plan cache or too much CPU time going to compiling queries, check for filtered indexes. Their habit of getting in the way of successfully parameterizing queries may be a root cause of your problems.

     

    But that’s not the whole story! In the next part, I will show you how to convince SQL Server to change the default behavior of not parameterizing queries if a filtered index is present – but I will also demonstrate how this can result in even worse performance, or even runtime errors!
  • Understanding Execution Plans

    To me, it feels as if 2014 is a long time away. But it isn’t. Sinterklaas has already dropped off his presents and is probably already back in his castle in Spain. Christmas is almost upon us. And before we know it, we’ll be washing oliebollen down with Champagne.

    That also means that I need to get cracking on preparing my precon seminar on execution plans. What precon seminar you say? Oh right – I didn’t tell you yet. The good folks organizing SQL Saturday #269 in Exeter (UK) (on March 22) have decided to extend the event with a full training day on Friday March 21. For that purpose, they invited eight of the best SQL Server speakers in the world, and me, to spend a whole day pouring our knowledge into other people’s heads.

    The full program for this excellent day can be found here – and believe me, if I was not speaking that day, I would be trying to clone myself and visit at least four of them!

    My session on that day, as you can see, focuses on execution plans. So what can you expect if you sign up? Here is a short overview.

    I will start with the bare basics. What is an execution plan, what is its role in a SQL Server database, and most of all – how can you get to see it, how do you start reading it (Left to right? Right to left?), and how do you get to those extremely interesting details that Microsoft has carefully hidden from view? Why is an estimated plan not estimated, why is an actual plan actually mostly an estimated plan? What is an iterator, what does it do, and how does it interact with other operators?

    After that, we’ll dive into the details. What is a seek, what is a scan? How do you notice when a seek secretly scans, and why should you care? Why do people say scans are bad, and why are they wrong? (Or, maybe even more important, WHEN are they wrong, and when not?)

    Next up are the joins. Inner, outer, loop, merge, hash – but also semi, and anti semi. Yes, those are all join types, and we will explore them all. How do they work, what are the benefits and what are the costs of all of them? How can you influence the type of join chosen, and why would you want to?

    All that, and much more, will be included in my seminar. And that’s just what I have planned for before lunch. At the end of the day, you can expect to have a good working knowledge of almost every iterator that you can encounter in an execution plan. I will also explain what has changed in execution plans in SQL Server 2012 and SQL Server 2014. And to ensure that nobody falls asleep from my constant talking, I will give you all some exercises in between, challenging you to immediately apply what you just learned. These exercises may not be the kind of exercises you expect - but trust me, you will find them to be fun and refreshing, and they will also learn you how to apply your knowledge to all kinds of problems.

    Does this sounds interesting or useful at all to you? I hope so – why else are you on a SQL Server blog site? So get cracking – click this link and sign up for my precon right now. (Or for one of the other precons – they are all great!) If you do so soon enough, you can still apply for the special “Early Bird” rate of only £150 – which is valid until December 15. But don’t despair if you see this post later – from December 16 until February 28, you still only pay £185 – still a steal for a whole day of training! (Even the £200 last minute rate that applies from March 1st is great value – but honestly, why wait?)

    I am looking forward to visiting Exeter this March. I hope to see all of you there. Either in my precon – or if you choose to attend one of the other precons, then maybe in one of the many sessions that will take place the next day.

  • SQLRally and SQLRally - Session material

    I had a great week last week. First at SQLRally Nordic, in Stockholm, where I presented a session on how improvements to the OVER clause can help you simplify queries in SQL Server 2012 enormously. And then I continued straight on into SQLRally Amsterdam, where I delivered a session on the performance implications of using user-defined functions in T-SQL.

    I understand that both events will make my slides and demo code downloadable from their website, but this may take a while. So those who do not want to wait can download the material from this blog post.

    Both SQLRally events have recorded all their sessions. It will obviously take a while to edit and publish all those recordings – but those who missed my session and want to check it out with my explanations know that if they wait a while, they can watch the recording online on the SQLRally websites.

    I once more would like to thank all volunteers who organized these events, all the sponsors who helped fund them … and most of all, all attendees who made my time in Stockholm and Amsterdam amazing. You were a great crowd, both during my session and in the many chats I had during the breaks.

  • Decks and demos – Session material for Silicon Valley Code Camp

    This weekend, I will be presenting two sessions at Silicon Valley Code Camp, in Los Altos Hills, CA. On Saturday, I will have an early start – the first time slot of the day, at 9:45 AM, I will present on how T-SQL user-defined functions can easily wreck your performance – and how you can prevent that.

    On Sunday afternoon (1:15 PM), I will then present a session on the OVER clause, focusing on how both the SQLL Server 2005 version and the enhanced SQL Server 2012 syntax of this feature can help you solve common problems without having to resort to ill-performing and unmaintainable monster queries.

    Both sessions are quite demo-heavy, so I hope a lot of attendees will download the demo code and play with it for themselves. I have therefor attached the demo code for these sessions to this post. Oh, and the slide deck is included as well.

  • Book review: SQL Server Transaction Log Management

    It was an offer I could not resist. I was promised a free copy of one of the newest books from Red Gate Books, SQL Server Transaction Log Management (by Tony Davis and Gail Shaw), with the caveat that I should write a review after reading it. Mind you, not a commercial, “make sure we sell more copies” kind of review, but a review of my actual thoughts. Yes, I got explicit permission to be my usual brutally honest self.

    A total win/win for me! First, I get a free book – and free is always good, right? And second, I knew that I would get my pleasure out of it – either the pleasure of having a great read if it’s good, or the somewhat more perverse pleasure of bashing the book after struggling through it if it’s not!

    For all the tl;dr types out there: this review will be mostly positive. I found no reason to give the book a bashing. And for those with an attention span beyond three paragraphs, here’s why.

    Things I like about the book

    The authors explicitly state that they intended the book to be lightweight and easily accessible. They clearly intended the book to be a good primer for the starting DBA who wants to know a bit more about how to manage the transaction log, but is not interested in parsing the output of sys.fn_dblog.

    And they deliver as promised. The book is really an easy read. It starts with a good, basic explanation of the transaction log for DBAs who are completely new to this subject; more experienced readers can probably decide to leaf through this part without reading it all. After that, more information is stacked upon that foundation. The subjects are well organized in their own chapters. One nice touch is that the authors were not afraid to have extremely short chapters if there is not much to tell about a subject – it keeps the book well organized and information easy to find.

    Aiming to keep a book simple can be dangerous when the subject isn’t. My initial concern was that the discussion of LSNs and how Virtual Log Files are (or are not) reused luckily turned out to be unfounded – Tony and Gail apparently realized that many log management scenarios really depend on an understanding of this level, so they made the effort to try to explain this subject in an easy way, without simplifying the actual mechanisms. And they succeed very well at that!

    The book contains lots of code samples to illustrate the concepts. Those who like to read books while sitting at a computer can easily follow along, either by copying the queries, or by using the downloadable code samples. And for those who usually read in places where they have no computer available (I personally love reading in bed!), all relevant output of the code samples is included as well, so you don’t miss out by not running the code while reading.

    Finally, for those who, after reading the book, would like to have an even deeper understanding about some of the subjects, the book contains loads of references to websites where subjects are explained in more depth.

    Things I do not like about the book

    It’s not all perfect, though. I did run across a few things that bothered me. Not enough to change my overall opinion of the book, but still sufficient that I feel I need to include a few words about them in my review.

    First – the illustrations. The (otherwise really excellent) explanation of log internals, and how LSN and VLFs impact reuse of the log file, uses illustrations to make it easier to understand this complex subject matter. Unfortunately, these illustrations are very hard to read in the printed version of the book. Probably due to the illustrations being made in color, but the book being printed in black and white / grayscale. Pages 30 and 31 are prime examples of how not to use shading – the black text is impossible to read in the dark shaded areas. Another illustration, in chapter 6, does not suffer from this “dark on dark” problem, but it is weird to read references to “green” and “yellow” bars in a book that uses no color. Both the sloppy references to non-existent colors and (especially!) the undecipherable text in the illustrations of chapter 2 should really have been found and corrected during the pre-production process!

    The desire to keep the book simple has some drawbacks as well. I am not sure if all readers would agree on this, but I think some more background would have been useful for a few specific subjects. Two questions that I am left with after this book are: “so I now know that the log is used when recovering a database – but how exactly does this undo and redo process work?”, and “okay, you made it clear that too many VLFs can slow down performance – but why, what is going on under the hood, where does this delay come from?” I do happen to know the answer to the first question myself, thanks to reading some of Kalen Delaney’s excellent books – but I feel that including these backgrounds would have helped to give the reader a better understanding of how the transaction log is used.

    I also found some bits to be a bit repetitive. Maybe this was a conscious decision, to really drive home a point? Or to facilitate the readers who only read specific bits of the book instead of reading it cover to cover as I did? Or maybe it was just an oversight, or an attempt to fill the book after running out of new stuff to discuss? Anyway, on me it had the effect of making me go “Yeah, no need to repeat that yet another time, I get it now” a few times while reading.

    I already said that I liked all the links to further reading. But URL shortening services like bit.ly or tinyurl exist for more reasons than just the twitter 140-character limit. I cannot click on a link in a hardcopy book, and I do not like having to manually copy a link that spans two lines in print. A missed opportunity! Also, while most links were completely optional to follow only if I wanted broader or deeper insights, there were (I think) one or two occasions where I felt that following the link was required for better understanding of the book itself – which given my habit of reading in bed was a problem. In those cases, the information should really have been included in the book.

    Finally, the last chapter was a slight disappointment to me. A lot of text was spent to explain how to use and setup the various monitoring tools, which I think is a bit too far off-topic for the book. And then, after showing the reader how to set up monitoring, the failed to answer the question that I am sure haunts many incidental DBAs: “In these monitoring results, what should I look for?” Experts have very good reasons for not wanting to set specific values and thresholds in their advice, but they forget that beginners have equally good reasons for needing those values. Some more specific guidelines, with the caveat that they are not universal and not guaranteed to be still valid in future versions of SQL Server, would have made this chapter much more useful.

    The verdict

    If you are already aware of how to allocate the transaction log, how recovery models, backups, restores, and log growth interact, and how to monitor the log; if you are looking for an in-depth discussion of transaction log internals – this is NOT the book for you. That is not a statement about the quality of the book; you are simply not the intended audience.

    But for the beginning or occasional DBAs, as well as the more experienced DBAs who feel a need to deepen their understanding of the transaction log beyond the knowledge that “the bloody thing needs to be backed up or it’ll eat up my entire drive, ‘coz those Redmond guys say so”, this is an excellent read. And even if you have passed that level a long time ago, if you feel comfortable scheduling log backups, switching between recovery models, and doing a point-in-time restore – you’ll probably still learn some new stuff from reading this book.

    All in all, I consider this book a recommended read. You can order it through amazon.com or through amazon.co.uk – or if you like free as much as I do, you can even download a free PDF copy (and optionally get a free evaluation copy of Red Gate’s SQL Backup Pro thrown in!)

  • Jetzt geht’s los - speaking in Germany!

    It feels just like yesterday that I went to Munich for the very first German edition of SQL Saturday – and it was a great event. An agenda that was packed with three tracks of great sessions, and lots of fun with the organization, attendees, and other speakers.

    That was such a great time that I didn’t have to hesitate long before deciding that I wanted to repeat this event this year. Especially when I heard that it will be held in Rheinland, on July 13 – that is a distance I can travel by car! The only potential problem was the timing (school holiday for my kids), but as soon as I had that sorted out, I submitted my abstracts.

    I am delighted to say that I have been selected to speak again. Do take coffee if you plan to go to my session – it’s the first of the day, and I’ll dive right in with a deep, 500-level session. In this session, I will give my audience a peek under the hood of columnstore indexes and batch mode processing in SQL Server 2012 – two fantastic new features in SQL Server 2012, that can give you enormous performance benefits – but very little is documented about how exactly they work. I did a lot of digging to find the bits that are documented, pieced everything together, and used common sense and logic thinking to fill in the blanks. The result is a session that discloses a lot of hard-to-find and undocumented internals of columnstore indexes and batch mode processing. If you’re a geek, and you think you can handle a 500-level session at 9:15 in the morning – come visit my session!

    But wait (to quote late night teleshopping TV), there is more! The volunteers who work incredibly hard to organize this event have decided that they will also host a precon this year. So if you can manage to be in Rheinland by Friday July 12, you can get in a full day of learning for the low fee of only € 179,= – a fraction of what you pay for similar training days at most other conferences! And you can even pick from a selection of three great choices.

    If you’re into BI, you can learn about Data Analytics and BigData from Ruben Pertusa Lopez and Paco Gonzalez. If you are more a DBA person, consider subscribing to the training day on Extended Events, presented by Andreas Wolter and my good friend Mladen Prajdic. But if you ever are in a situation where knowing a bit about database design would help (and let’s be honest – who in this profession is never in such a situation?), you should really consider subscribing to my seminar on effective database design.

    If you choose to attend my precon, you will learn a database design method that doesn’t focus on functional dependencies and normal forms. Instead, I will learn you how to discuss design questions with your interview partners in a language they understand – even if you don’t! Sounds impossible? Trust me, it isn’t. I’ll show you the steps to follow to make sure you ask the right questions and draw the correct conclusions from the answer. In the end, you will have a database design that is completely correct for the business, and perfectly normalized to boot.

    Interested? Great! You can sign up here. I’m looking forward to meeting you in Rheinland!

More Posts Next page »

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement