THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Adam Machanic

Adam Machanic, Boston-based SQL Server developer, shares his experiences with programming, monitoring, and performance tuning SQL Server. And the occasional battle with the query optimizer.

  • The Guru (T-SQL Tuesday #089)

    I became a consultant a bit earlier in my career than was probably wise. The going was rough at first, with periods of feast and periods of famine. Although I had several clients, I didn’t understand how to sell services or schedule my workload, and so I’d wind up either doing huge amounts of work for a few weeks, or absolutely nothing some other weeks. The topic of this month’s T-SQL Tuesday is “Database WTFs” and it was during one of my periods of relative inactivity that I met Jonathan at a local user group meeting, which resulted in one of the more memorable WTF situations of my career.

    Jonathan was a few years older than I was. Most of the user group attendees wore jeans and sneakers. Jonathan had on a nice looking suit and shiny shoes. He had a smooth demeanor and introduced himself as “a consultant and SQL guru.” Literally, with that exact phrasing. We struck up a conversation and he told me about all of the amazing work he was doing with various customers. I was extremely impressed. Jonathan represented the consultant I thought I wanted to be, a bit further down the road.

    I told Jonathan about my struggles with finding steady work, and he immediately offered to introduce me to a company he’d spoken with, CorpX, Inc. He explained that they were a growing company and needed lots of performance tuning help. He was just too busy on other things to give them a hand, but maybe I could. I eagerly agreed, and he did an e-mail introduction the next day.

    I don’t remember the initial conversation with CorpX, nor whether there was any kind of interview, but we signed a three month agreement and I told my customers I wouldn’t be available for a while. Before I knew it I was nervously walking in to a strange office for my first day on the job. Once in the company’s space, the receptionist told me she wasn’t expecting me, but that I could sit down and wait while she sorted things out. She offered me a magazine, but only after 20 minutes of waiting did I actually pick it up and start flipping through the pages. The receptionist told me that she thought I was supposed to meet with the CTO, but he was busy, so I should keep waiting. An hour later I’d read the magazine cover-to-cover, so I picked up a second one.

    At some point, a few magazines in, a guy stopped by to chat with the receptionist. He was a some sort of IT person, and she asked him if he knew when the CTO would be available. “Oh, he went for coffee a while ago.” However, the guy had heard that I was supposed to be there that day, and was able to lead me to my space: A cubicle in a dark corner of the basement. Just like Milton’s. He left me there to stew, but it didn’t last long. I had no login credentials, nothing to do, and was tired of wasting my day. So I tracked him down again and asked him to introduce me to the DBA.

    The DBA was a red-faced guy named Dave, and he seemed to be in a state of panic. He spoke quickly and told me he didn’t have time to deal with me; he was in the middle of something way too important. His large office was perfectly clean and ordered, and I clearly recall the three huge monitors set up on his desk (well, huge for the time). A complex ER diagram covered all available screen real estate. I told Dave that I could possibly help out with his SQL Server problem, and he replied that there was nothing for me to do. He deeply involved in designing the next version of the database. He couldn’t possibly think about anything else. He asked me to leave his office.

    vintage-baseball-batsIt was around noon, so I left for a while to get lunch. When I returned to the office, this time I was marched directly to the CTO. “Where have you been? I’ve been waiting for you.” The CTO was a middle-aged guy with a beer gut and, apparently, an extreme baseball fetish. His office was filled with baseballs, helmets, jerseys, and bats. He absentmindedly played with a bat as he talked to me, lifting it and bringing it down on his desk when he wanted to emphasize a point. “We hired you as a pinch hitter. We retired our SQL Server team a few weeks ago, because I finally came to the realization that SQL Server can’t meet our performance demands. It’s like the database farm team, you know? We’re moving to Oracle in six months. That’s a real database! Until then, though, we need to play hard and score one for our customers, and that’s where you come in. You’ll turn our screwballs into fast balls. Well, as fast as that crappy thing can go, am I right?”

    I asked him if he was sure that SQL Server was at fault. Even though I was still fairly young, I had already worked on some pretty big projects. And I had a number of very experienced friends, including some well known MVPs who talked about their work on gigantic systems. “Yes, we’re sure. We had Jonathan in here, and he is a SQL guru. He told us that SQL Server simply does not scale.”

    Part of me wanted to get up and run, but I was intrigued. I decided to treat it as a challenge. I would prove him wrong and save the company millions on its Oracle transition. Maybe I’d score a steady long-term client in the process. I asked for access to a test system and to the source control repository, and the CTO gave me a blank look. “You’re here to performance tune our production database.” There was no need for a test environment, he told me, and he wasn’t aware of any database code in source control. He handed me a printout of a network login and password and told me that I had plenty of access to “work some magic.” I asked him which part of the application to focus on, and he told me to work on any piece I felt like; they were all equally slow. He concluded our discourse by saying that he was a very busy person and as a consultant I was expected to figure things out for myself and not bother him or his staff with details. I was to come back and talk to him when I had made some progress, and not before then.

    Back in my basement hideout, I spent the next couple of hours exploring the network and figuring out which server to connect to. The CTO was right; I did have enough access. I was sysadmin on the production SQL Server and had full admin access to the app server. I logged in to the app and with the help of a Profiler trace managed to figure out one of the main slow stored procedure calls that occurred any time someone saved a change via the user interface.

    Pasting the procedure call into SSMS, I turned on Actual Execution Plan, hit F5, and got ready to see indications of a few missing indexes. I was ready to walk back upstairs, gloat to the CTO, and ask for a better workspace so I could continue to help. What I didn’t expect was what actually came back: Not one execution plan, or two, or three, but hundreds and hundreds. The scroll bar become progressively smaller as time clicked by and the elapsed counter did not stop running. All I’d done in the application was change the name of a single field. What was going on?

    recursionI opened the stored procedure and it was a simple update against a single table. But after poking around in Object Explorer I discovered that the table had four different insert and update triggers. Each of the triggers created a cursor over the inserted table, and for each row therein called one or two stored procedures. Each of these stored procedures was seemingly more complex than the last, with increasing amounts of dynamic SQL and nested cursors. And each did updates or inserts to at least one, and usually several tables. As I chased down those tables I saw the same pattern again and again: More triggers on more tables, with more cursors and more stored procedure calls, with more dynamic SQL and more cursors and more stored procedure calls. I began recursively searching for the bottom of it all, but gave up once I hit 9 or 10 levels. The tables were named with some sort of scheme involving various abbreviations, but with no documentation or support, I had no clue what was going on.

    Getting concerned about my ability to unwind this mess, I looked around the database and discovered that the same pattern was used on every single table. I ran some counts and found around 400 tables, 1200 triggers, and several thousand stored procedures and user-defined functions. Not to mention a massive number of views, all extraordinarily nested.

    The best part? Every single module in the database had a dated copyright notice, containing my friend Jonathan’s name. He had apparently built the entire thing himself over the prior four years, no doubt billing some truly obscene number of hours. And then he had gone ahead and signed his work. To this day I still wonder about the legal implications of that copyright notice. What does it mean to copyright a work for hire that contains logic specific to someone else’s business? And why did CorpX allow this in its code? Over all of that time did no one else ever look at the database?

    I wandered upstairs to ask the CTO what he expected me to do with such a massive mess, but the office was empty. Sitting in the basement, I’d lost track of time, and had spent four or five hours just researching that one initial bit of code path. It was late in the evening, I’d fixed nothing, and I knew that I wouldn’t be able to work any magic. I realized that the guru had already tricked us all, and I decided to cut my losses. I packed up my things, logged out, and left the building. I emailed the CTO that night and politely told him that the engagement wouldn’t work out. I never billed for my day, and he never replied to my note. I looked up Dave, the DBA, on LinkedIn a few months later. He was now “database architect” for a different company. Years later I met someone else who worked for CorpX and found out that they were still, miraculously, in business. Perhaps in the end they did find a magician; I was told that they were still running SQL Server.

    copyright_jonathanI ran into Jonathan one other time, and he asked me why I never thanked him for getting me into such a sweet gig. I merely shook my head.

    (Yes, this is a true story. Well, mostly. I changed some names and details to obscure things a bit. But this is pretty much how things happened. I learned a lot that day!)

  • Solving the Net Changes Problem with Temporal Tables (T-SQL Tuesday #087)

    SQL Server 2008 was perhaps not the meatiest of SQL Server releases, but it did have one especially promising feature: Data change detection. Or maybe two features, depending on how you look at things. I’m talking, of course, about the almost confusingly similarly named Change Data Capture and Change Tracking.

    Change Data Capture was the big gun that promised to solve everyone’s change tracking woes (well at least everyone running Enterprise Edition). In practice it was a heavy, overwrought hack of a feature. Built on the foundations of replication—a feature designed to move data, not record and store changes to it—the entire system was quirky and problematic. Users had to become familiar with odd APIs, low-level transaction identifiers, and the occasional call to sp_repldone for that wonderful moment when the whole thing crashed and burned. But enough about this feature. If you want to know more about it and its future in SQL Server, look it up in the Azure SQL Database documentation.

    Change Tracking, on the other hand, was designed to solve only one problem. It did so using a totally integrated approach, with changes recorded as part of the write transaction, first-class T-SQL syntax, and a very simple and well thought out API. The Change Tracking problem? Net changes. Simply put: First give me all of the rows. Then let me periodically ask what’s changed since last time I asked. When I ask, tell me how to synchronize my copy, cache, app, or whatever is on the other end, to get it into the same state as the host data set.

    Fast-forward to SQL Server 2016, and now we have Temporal Tables. This feature is, in many ways, the best of both worlds when it comes to change detection. Temporal Tables is as well integrated into the query engine—and the transaction—as Change Tracking, but captures all changes, much like Change Data Capture. It has a clean and simple interface, and while it’s definitely a v1 technology it seems like a much more solid foundation than either of the SQL Server 2008 features ever did.

    This month’s T-SQL Tuesday, hosted by Matt Gordon, asks for solutions to old problems using “shiny new toys.” I wondered—could I solve the Change Tracking net changes problem using Temporal Tables? Unlike its predecessors, net changes is a major use case that Temporal Tables simply doesn’t do right out of the box. Turning on both Temporal Tables and Change Tracking side-by-side is an option, but that seems like overkill. One change detection technology should be enough…or is it?

    Thinking through the problem, it quickly became apparent to me that once you have all changes—as we have with Temporal Tables—getting to a net scenario is not especially difficult. Given a primary key and a time interval over which we’re concerned about changes,  there are of only four possible outcomes:

    • A new row for the key was inserted
    • The row was updated
    • The row was deleted
    • Nothing happened

    For now we can ignore the final outcome and focus on the first three, but of course any proper solution must ensure that the negative case is taken into account so as to not send false changes.

    For now, consider the following set of propositions, given that we’re asking at time Y for all changes since a prior time X.

    • INSERT: The key did not exist at time X but does exist at time Y.
    • DELETE: The key existed at time X but does not exist at time Y.
    • UPDATE: The key existed at both time X and at time Y, and at least one change occurred between time X and time Y.

    Given these assumptions, we can begin work on a temporal queries that return the necessary rows. Solving for these conditions will require all rows that were active as of time X, rows that were (or are) active at time Y, and for the final case, all rows that were active in-between times X and Y. Since this is a range-based scenario, our best Temporal predication option will be either FROM or BETWEEN. The difference between these two is subtle: FROM uses an open interval (non-inclusive at both endpoints), whereas BETWEEN uses a half-open interval, inclusive on the end date. Given the choice in the scenario, BETWEEN makes more sense, as we can take advantage of the inclusive endpoint to avoid dropping a badly-timed row. But more on that in a moment.

    To begin solving for net changes, we must first get all changes over the time interval in question:

    SELECT
        t.pk,
        t.[other columns],
        t.valid_start_time,
        t.valid_end_time
    FROM [table] FOR SYSTEM_TIME BETWEEN @start_time AND @end_time AS t

    This query will return all rows with a valid_end_time greater than the passed-in @start_time and a valid_start_time less than or equal to the passed-in @end_time. With that in mind, we can start putting together some expressions for each of the operations in question.

    First step in tackling the insert: Did the key only come into existence after @start_time? The following expression finds out by testing the minimum valid_start_time per key; if it’s after the passed-in @start_time, we know it’s a new key.

    CASE
        WHEN 
            @start_time < 
                MIN(t.valid_start_time) OVER (PARTITION BY t.pk)
                    THEN 1
        ELSE 0
    END AS is_new_key

    A key could have been both inserted and deleted in the interval between @start_time and @end_time—or simply deleted altogether—so a second expression is necessary to determine whether the row still exists at @end_time. The expression below accomplishes this by checking the @end_time against the maximum valid_end_time per key. If the @end_time is greater than or equal to the maximum valid_end_time then the key must have been deleted. (This is where we’re taking advantage of the inclusive endpoint used by the BETWEEN predicate.)

    CASE
        WHEN
            @end_time >=
                MAX(t.valid_end_time) OVER (PARTITION BY t.pk)
                    THEN 1
        ELSE 0
    END AS is_deleted_key

    The final case is that of update operations. A ranged Temporal predicate will return at least one row per key that was active during the range. If the query returns more than one row for a given key, some change must have occurred. A simple COUNT will suffice in this case.

    CASE
        WHEN
            1 <
                COUNT(*) OVER (PARTITION BY t.pk) THEN 1
        ELSE 0
    END AS is_changed_key

    Putting this all together, we can come up with a general template query to answer net changes style questions. To make this work I put the above expressions into a derived table, to which I added a ROW_NUMBER. The numbering is used so that we get only a single version for each row – the most recent one. (Consider cases where a row was updated multiple times during the interval.) The query below uses the derived table [x], and applies a WHERE clause that filters the set based on the expressions, in order to output only rows we’re interested in: new, deleted, or changed rows, except in cases where a row was both inserted and deleted in the input range. The CASE expression in the outer SELECT list is used to determine what type of operation will have to be done on the synchronizing side in order to bring the two sets into parity.

    SELECT
        x.*,
        CASE
            WHEN x.is_new_key = 1 THEN 'INSERT'
            WHEN x.is_deleted_key = 1 THEN 'DELETE'
            ELSE 'UPDATE'
        END AS change_operation
    FROM
    (
        SELECT
            t.pk,
            t.[other columns],
            t.valid_start_time,
            t.valid_end_time,
            CASE
                WHEN 
                    @start_time < 
                        MIN(t.valid_start_time) OVER (PARTITION BY t.pk)
                            THEN 1
                ELSE 0
            END AS is_new_key,
            CASE
                WHEN
                    @end_time >=
                        MAX(t.valid_end_time) OVER (PARTITION BY t.pk)
                            THEN 1
                ELSE 0
            END AS is_deleted_key,
            CASE
                WHEN
                    1 <
                        COUNT(*) OVER (PARTITION BY t.pk) THEN 1
                ELSE 0
            END AS is_changed_key,
            ROW_NUMBER() OVER (PARTITION BY t.pk ORDER BY t.valid_end_time DESC) AS rn
        FROM [table] FOR SYSTEM_TIME BETWEEN @start_time AND @end_time AS t
    ) AS x
    WHERE
        x.rn = 1
        AND
        (
            x.is_new_key = 1
            OR x.is_changed_key = 1
            OR x.is_deleted_key = 1
        )
        AND NOT
        (
            x.is_new_key = 1
            AND x.is_deleted_key = 1
        )

    And in theory, that’s that. This query does in fact solve for net changes. But the devil’s in the details, and there is a subtle bug in the above query: What happens if the passed in @end_time happens to fall at exactly the same time that an update occurred on one of my rows? The answer is that the MAX(t.valid_end_time) for that key will be equal to the passed in @end_time, and so the logic above will consider the update to be a delete. Clearly not desirable behavior!

    The solution? In an update operation, Temporal Tables logs the new row with the same valid start time as the old row’s valid end time. Therefore, to figure out whether an update occurred, we must look forward exactly one row from our actual intended end the. To accomplish this, the following modified version of the query creates a new search end time by bumping the input end date up by 100ns (the finest resolution supported by DATETIME2).

    DECLARE @search_end_time DATETIME2 =
        DATEADD(NANOSECOND, 100, @end_time)

    SELECT
        x.*,
        CASE
            WHEN x.is_new_key = 1 THEN 'INSERT'
            WHEN x.is_deleted_key = 1 THEN 'DELETE'
            ELSE 'UPDATE'
        END AS change_operation
    FROM
    (
        SELECT
            y.*,
            CASE
                WHEN
                    1 <
                        COUNT(*) OVER (PARTITION BY y.pk) THEN 1
                ELSE 0
            END AS is_changed_key,
            ROW_NUMBER() OVER (PARTITION BY y.pk ORDER BY y.valid_end_time DESC) AS rn
        FROM
        (
            SELECT
                t.pk,
                t.[other columns]
                t.valid_start_time,
                t.valid_end_time,
                CASE
                    WHEN 
                        @start_time < 
                            MIN(t.valid_start_time) OVER (PARTITION BY t.pk)
                                THEN 1
                    ELSE 0
                END AS is_new_key,
                CASE
                    WHEN
                        @end_time >=
                            MAX(t.valid_end_time) OVER (PARTITION BY t.pk)
                                THEN 1
                    ELSE 0
                END AS is_deleted_key
            FROM [table] FOR SYSTEM_TIME BETWEEN @start_time AND @search_end_time AS t
        ) AS y
        WHERE
            y.valid_start_time <= @end_time
    ) AS x
    WHERE
        x.rn = 1
        AND
        (
            x.is_new_key = 1
            OR x.is_changed_key = 1
            OR x.is_deleted_key = 1
        )
        AND NOT
        (
            x.is_new_key = 1
            AND x.is_deleted_key = 1
        )

    This query uses @search_end_time in the Temporal predicate, but then does all of the comparison work using the original @end_time. For the is_new_key check, this doesn’t matter, as we’re only comparing start times. For the is_deleted_key check it makes all the difference in the world, as an update done at exactly @end_time will populate a new row which will increase the value of MAX(t.valid_end_time). Because the extended search predicate can return an extra row, the is_changed_key and rn expressions had to be moved to an outer table expression filtered by the original @end_time. If an update occurs exactly at @end_time, we don’t want it to trigger a delete operation now, but we also don’t want to actually see it until next time we synchronize.

    That bug solved, we now must consider a second, slightly more important unsolved problem: What do you use for the @start_date and @end_date?

    Naïvely speaking we should be able to ask for a base set of rows by using some long-past date as the @start_date—say, 1900-01-01—and the current SYSUTCDATETIME() as the @end_date. Then we should be able to pass back that same @end_date next time as the start date, and so on and so forth. But that approach will open you to a major issue; Temporal Tables simply wasn’t designed for this.

    Backing up a bit: Change Tracking, which was in fact designed for answering net changes questions, works by placing a surrogate internal transaction identifier on each row touched during a transaction, and then exposing an external transaction identifier that corresponds to the commit time of the transaction. It does this by using an external “transactional commit table” in conjunction with the main changes table. But Temporal Tables uses an entirely different system, wherein there is no external table, and each row is populated with its one-and-only, both internal and external identifier, as it is being written. And every row touched by a given transaction gets the same exact time: The start time of the transaction.

    So what’s does this mean when we’re talking about net changes? Pretend for a moment that you start a transaction, jump on a plane, go hang out on the beach in Jamaica for a week (highly recommended), and then return to your office (lightly sunburned), update a table of your choosing, and commit the transaction. The time stamped on the row will correspond to the start of the transaction—a week ago, before you ever boarded the plane. Meanwhile, your synchronization process has been running regularly, let’s say once an hour, and it thinks it’s lagged by only an hour. It’s never going to ask for rows that changed a week ago. This is referred to in change detection parlance as a lost update.

    Does this mean we’re completely out of luck? Of course not—but it does mean that a complete solution will have to consider the transactional state of the system. And you’ll have to monitor to make sure that no one starts a transaction and leaves it hanging around for a week. (You should probably be doing that anyway.)

    The key to my full solution is the sys.dm_tran_active_transactions DMV. This view contains one row per current transaction—either explicit or implicit—and a handy column called transaction_type which categorizes the transactions as read, read/write, system, or distributed. It also includes a transaction start time, which can be used to figure out which times we might not want to touch just yet, if some transaction is still outstanding from then.

    The following expression uses the DMV and is designed to output a safe @end_date, that will avoid lost updates. It starts by asking for the minimum start time for any active read/write transactions (transaction_type 1). If there are none, it uses the current SYSUTCDATETIME instead. Just to be on the extra safe side—in case there was a delay in processing due to sitting on a long runnable queue, for example—the expression pulls five seconds off of the end time. Note that this code also needs to be able to covert the local time as reported by the DMV into UTC as required by our Temporal predicate. There is currently no reasonable way to get the local time zone in T-SQL (there is at least one hack involving a registry read, which I do not recommend), so you’ll have to replace the hardcoded Eastern Standard Time with whatever time zone is appropriate for your server.

    CONVERT
    (
        DATETIME2(0),
        DATEADD
        (
            SECOND,
            -5,
            ISNULL
            (
                (
                    SELECT
                        CONVERT(DATETIMEOFFSET, MIN(transaction_begin_time))
                            AT TIME ZONE 'Eastern Standard Time' AT TIME ZONE 'UTC'
                    FROM sys.dm_tran_active_transactions
                    WHERE
                        transaction_type = 1
                ),
                SYSUTCDATETIME()
            )
        )
    )

    The final operation I’ve added to the end date expression is to truncate the fractional (sub-second) precision off of the DATETIME2. The reason for this is that it’s important that subsequent change requests use the exact same start time as the prior end time. If you’re storing the value from your last synchronization in a system that has slightly different date rules than SQL Server, you may lose some precision and wind up with a rounding error. So better, in my opinion, to delay changes by up to one additional second, by ensuring that any possible date/time data type will suffice.

    Once all of this is in place on your server, pointing at your table, usage is pretty simple:

    • Set a start date of 1900-01-01.
    • Set an end date using the expression above.
    • Ask for “changes.” That’s your input set. (Similar to Charge Tracking’s VERSION.)
    • Record the end date, and use it as the start date next time. The same date functions as both end date one time and start time the next because of the BETWEEN predicate’s half-open interval; if a delete operation occurred at exactly the end date, it will be included in that synchronization due to the end date being inclusive, but it will not be included in the next synchronization because the start time comparison is non-inclusive.
    • Re-populate the end date each time using the expression. It may be worthwhile to log the start and end dates so that you can detect conditions where the new end date is not much greater than the start date. This would indicate either that you might want to increase your polling interval or figure out why user transactions are running for a long time.
    • Repeat as needed. Remember that every cache is effectively stale the moment you’re done synchronizing it, so you must strike a balance between synchronized enough and not putting excessive stress on the host system trying to keep things up to date. And given the safety measures I’ve taken in the above expression, if you use this system and ask for changes more often than once every six seconds, you’ll tend to not get very far.

    That’s it—net changes from Temporal Tables. Not the simplest exercise in the world, and maybe not quite as perfect as we’d like, but completely workable none the less. And of course, it’s always fun to put shiny new toys completely through their paces.

    Enjoy, and happy synchronizations!

  • SQL Server 2016 "Hidden Gems" - Now on a YouTube Near You

    Sad that you missed last week's SQL Server 2016 "Hidden Gems" GroupBy session?

    Unlikely, perhaps...but no worries either way! It's now available for free, on YouTube:

    https://www.youtube.com/watch?v=P4608MNM-QU

    Enjoy!

     

  • More Query Tuning Training: Chicago!

    I'm very happy to announce a third 2017 date for "Tuning Your Biggest Queries!"

    On March 10 I'll deliver the seminar in one of my favorite cities, Chicago.

    This delivery is a pre-con for SQL Saturday Chicago, which will run on March 11. But here's the thing: The SQL Saturday is already at capacity and wait listed. The organizers have decided to allow pre-con attendees to skip the queue. So as a bonus for attending Friday's pre-con, you now get guaranteed admission to Saturday's event. Not a bad prize!

     

    And by the way, there are still seats available for Boston (January 27) and Cleveland (February 3).

     

    See you soon! More dates forthcoming. I'm especially looking for some hosts in Europe and the US (or Canadian) west coast. If either of these describes you, drop me a line!

  • SQL Server 2016 “Hidden Gems” Resources

    Today I was honored to present my SQL Server 2016 “Hidden Gems” talk for the GroupBy online conference.

     

    The slide deck and demos are attached to this post.

     

    Following are some links to help you in further exploration of the various features I discussed:

     

    STRING_SPLIT

    https://msdn.microsoft.com/en-us/library/mt684588.aspx

    Aaron Bertrand’s performance tests: https://sqlperformance.com/2016/03/sql-server-2016/string-split

     

    Session Context

    https://msdn.microsoft.com/en-us/library/mt605113.aspx

     

    Time Zone Support

    sys.time_zone_info: https://msdn.microsoft.com/en-us/library/mt612790.aspx

    AT TIME ZONE: https://msdn.microsoft.com/en-us/library/mt612795.aspx

     

    HASHBYTES

    https://msdn.microsoft.com/en-us/library/ms174415.aspx

     

    Session Wait Stats

    https://msdn.microsoft.com/en-us/library/mt282433.aspx

     

    Live Query Statistics

    https://msdn.microsoft.com/en-us/library/dn831878.aspx

    Ability to attach to other sessions' Live Queries!  https://blogs.msdn.microsoft.com/sql_server_team/query-progress-anytime-anywhere/

     

    Actual Query Plan Data Enhancements

    https://blogs.msdn.microsoft.com/sql_server_team/added-per-operator-level-performance-stats-for-query-processing/

     

    INPUTBUFFER DMV

    https://msdn.microsoft.com/en-us/library/mt652096.aspx

     

    Database Scoped Configuration

    https://blogs.msdn.microsoft.com/sqlserverstorageengine/2016/05/02/database-scoped-configuration/

     

    Increased Number of Index Keys

    https://msdn.microsoft.com/en-us/library/ms143432.aspx#Engine

     

    Increased Index Key Size

    https://blogs.msdn.microsoft.com/sqlserverstorageengine/2016/04/26/increased-nonclustered-index-key-size-with-sql-server-2016/

     

    COMPRESS

    https://msdn.microsoft.com/en-us/library/mt622775.aspx

     

    Columnstore

    Too much stuff to list! Go google it.

  • The SQL Genie (T-SQL Tuesday #086)

    At some point a long time ago, when I was first getting serious about SQL Server, I became aware of an open secret.

    You see, there was this special Microsoft alias called “sqlwish.” Those in the know could send an e-mail and pass along suggestions for improving the product. And they wouldn’t just vanish into thin air; some actual member of the product team on the other end would not only read the suggestions, but also add them to the internal Official List of Stuff to Get Done.

    How exciting! A chance to make my mark on the product I worked with day in and day out. I immediately put together a list of my wishes and sent an e-mail. And over time, I sent a few more. I never heard back from anyone, so I have no clue if my missives were actually read. And while I can’t remember all of the things I asked for, I’m pretty sure none of them were ever implemented. (I can clearly remember only one request from that very first e-mail. I’ve asked for it again numerous times over the years and still want it today. No luck.)

    Fast-forward a few years and Microsoft introduced the Product Feedback Lab, a web-based solution for user feedback collection. The sqlwish alias was retired and now we could submit feedback in an open, transparent way. We were told by Microsoft that not only would someone read these items, but now we would actually get status updates and replies! An amazing idea. And it went well for a while. Users loved the Lab, populated it with lots of items, and Microsoft soon promoted it to a permanent home in a place called Connect.

    Alas, the powers that be at Microsoft seem to have severely under-estimated the amount of work and attention this endeavor would require. Members of the product group got busy or moved on to other things and the replies stopped coming as often. Users lost enthusiasm. And time simply passed, as it does. After a while Connect fell into a state of abandonment. The SQL Server items have occasionally seen periods of promising forward movement, as well as at least one horrible stage during which the product group indiscriminately closed virtually every open item. But generally speaking the site is mostly considered to be a joke by the SQL Server professionals I’m friendly with. It’s insanely slow, has an unbelievably awful UI, and most of the submitted items are eventually “resolved”—if they ever are—with the infamous “Won’t Fix” status. (Said resolutions are almost never accompanied by a comment from Microsoft or anyone else.)

    All of which doesn’t mean that Connect is completely useless. As desperate as we often are to see change in certain areas, Connect has become an amazing catalog of user stories. It’s one of the only mechanisms for most people to voice concerns about the direction the product is taking, and it turns out that DBAs have quite loud voices. Some tickets actually do go through the system and end up in that exalted state of having been “Fixed.” One such case that I will never forget occurred early on, when Hugo Kornelis demanded the release of SQL Server 2005 SP3. Over 800 people joined him and the service pack was released. A true win for and by the SQL Server community.

    But today’s post isn’t about victory. Today’s post is about a dream.

    This month’s T-SQL Tuesday challenge, issued by Brent Ozar, is simple, vague, and frustrating: find an interesting item on Connect and write about it. The problem for me is that I can’t find just one interesting item; there are simply too many from which to choose. But sometimes when I’m both especially bored and feeling a bit depressed about SQL Server I’ll peruse Connect by using one of its only good features, the ability to sort the items by “most voted.” (Note that even this feature doesn’t even work properly; the sort order is only kinda-sorta accurate.)

    What if, I wonder, Microsoft took a step back and actually listened to the voice of the masses? What if Microsoft actually committed to implementing the top voted items on the list?

    That’s exactly what would happen if Microsoft put me in charge. (Imagine that!) Looking over the most voted—but still open—items, and sorting the wheat from the chaff just a bit (I only want legit developer features), I could come up with a pretty amazing release.  Here’s the session I would present at PASS just before Microsoft fired me for not filling the bits with enough marketing fluff:

    SQL Server v.Next.Next++: Developer Enhancements That Will Completely Change Your Life

    Hyperbole? Maybe just a little bit. But check out some of these amazing features…

    First and foremost, I would eliminate the biggest debugging time waster in the history of the product. (Bad error messages!) That sounds nice. Nine years and 1300 votes, for those keeping score.

    I would so, so love to see this virtual table of errors that would completely revolutionize relational ETL. I’ve even written about it before. (Three and a half years ago. Sigh.) 600 votes over four years.

    Regular expressions. I have to admit that I absolutely hate them and even when I work in C# I often find some other way to solve my problems. But the built in LIKE/PATINDEX patterns just don’t cut it. And yeah, you can implement some SQLCLR solution for this, but it should just be built in by default. Only 475 votes in the past 10 years, but I guarantee this would be an amazingly popular feature among the hardcore developer set.

    Scalar functions are awful. It bears repeating. Scalar functions are awful. Just for good measure, maybe once more. Scalar functions are awful. But they’re so incredibly tempting! Encapsulation and re-use simply make development better, more scalable, more manageable. If only scalar functions weren’t so very awful. Fix them? Yes. Please. Now! 560 people over the past 10 years have agreed with this sentiment. I bet that number would be much higher if the item was not somewhat strangely worded.

    I use a lot of derived tables in my T-SQL. I bet you do, too. I spend so much time writing so very many derived tables. And they bloat my statements to ginormous proportions. But I have no choice! I need to be able to use expressions as filter qualifications. Oh wait, there’s an amazing solution for this problem, suggested by Itzik Ben-Gan. 340 people over the past seven years have ticked the upvote button. Maybe Microsoft would have implemented it were it suggested by someone who actually knows T-SQL pretty well. The world may never know.

    Table-valued parameters are one of those things that I was so excited about when they were released. And I told myself that typical Microsoft lie. “It’s a v1 thing. It will be improved over time.” Well, that hasn’t happened. And this is a feature so very ripe for improvement. Removing that READONLY restriction would mean the difference between today’s “meh” and an absolute developer productivity masterpiece. Nine years. 320 votes.

    How many times have you been asked by end users to change the order of columns in a table? “But tables are sets, and sets have no explicit order.” Guess what? Users don’t care. We waste so much time on this and SQL Server could so easily support changing the order. And this item asking for a feature to do it has been in the system so long (almost 12 years) that when you click on it you don’t even see the item, you get an error. (When Microsoft migrated the items from Product Feedback Lab to Connect, something got messed up and a large number of items went into this limbo state. Perhaps someone got a column out of order? It’s been 10 years since that transition and I’m not expecting a fix anytime soon. But you can still see the number of votes on the main list: 270. I’m sure making these items almost completely unavailable did them no favors in terms of votes.)

    NULLs. What’s the first thing you think of? 3VL? You’re a geek. Annoyance over subtle bugs? You’re quite correct. And while there are some pretty brilliant workarounds to the various NULL issues, the fact is that the ANSI standard includes a clause specifically dedicated to eliminating a huge subset of the problems. Why, after nine years and almost 300 votes, do we not have this?

    Magic numbers! That sounds cool. Except it’s not. It’s a horrible drain on readability and a pox on the manageability of your code. But we need them, because every database out there has “lookup tables” with specific values that we need to predicate on throughout the code base. SQL Server could solve this problem once and for all by allowing users to promote these values to named enumerators. How cool would that be?!? 220 votes, nine years.

    And finally, one more from Mr. Ben-Gan, a clever idea for improving TOP with the OVER clause. I love the OVER clause. I love TOP. Their offspring would be beautiful indeed… 180 voters over the past nine years have shared in this sentiment.

    Quite a list, isn’t it? I think it would in fact change the entire developer experience for the better. But these items are old, and these numbers aren’t too impressive. 180 votes, really? There are probably a hundred thousand SQL Server DBAs and developers out there. The community, it seems, has no faith in Connect. People have neither time nor inclination to navigate its painful interface just to click the vote button on an item that will sit untouched for a decade. It’s a sad state of affairs.

    Eternal optimist that I am, I still hit Connect from time to time. I still vote, and I still submit my own items. I remain ever hopeful. Microsoft, if you need a new product head and fall guy, just for one release, I’m waiting by the phone...

    And just for the record, the number one item in my very first sqlwish e-mail, and something I’d still love to see today? Implementation of DOMAINs, an ANSI-standard feature that would bring the type system out of the Dark Ages and into the 21st Century. Here it is on Connect. “Won’t Fix.” The genie clearly slept through that one.

  • T-SQL Tuesday Rules of Engagement

    Just over seven years ago (wow, does time fly!) I launched T-SQL Tuesday. The core idea is simple: Each month, one blogger acts as “host.” He or she is responsible for choosing an interesting topic in or related to the SQL Server realm. Then a bunch of other bloggers all write articles on that topic. Finally, the hosting blogger puts together a round up of all of the participating posts so that anyone who’s interested in reading can find all of the content in one convenient place.

    We’ve now gone through 85 months of this experiment (#86 was announced just today, by Brent Ozar). And I’m happy to report that at least from my perspective, it has mostly been a huge success. We’re nearing a big milestone, just a bit more than a year away, of hitting triple digits. And I don’t see this slowing down anytime in the foreseeable future. There is still plenty of sharing and learning to be done.

    In speaking with Brent, he mentioned that he wasn’t sure exactly what the correct rules are or where to find them, and he asked me to put things together in one clear place. It kind of makes sense that there is no central place: Although I control the hosts and help dictate the general rules, T-SQL Tuesday is an entirely community-based thing and has grown and spread quite organically. Each month, each host is free to bend the rules just a bit to fit his or her unique blog style. But there are some central guidelines and best practices, and I’ll document and describe each of them below.

    Before I present the rules, I’d like to take a moment to describe the purpose of all of this: T-SQL Tuesday is designed to strengthen the SQL Server blog community in two different ways. First and most obvious, it generates blog posts, and depending on the topic these tend to be educational in nature. More education means a stronger community, and that’s what we’re all about. Second, and a bit more subtly, T-SQL Tuesday was conceived to strengthen search engine reputation by creating lots of links between blogs. Search engines like pages that get linked to a lot. So we generate lots of links to lots of great content, and then that great content floats up to the top as people search for stuff. That’s a win-win in my book and something to keep in mind if you either host or participate in a T-SQL Tuesday.

    (Aside: Does driving up your blog traffic sound good to you? Write to me about hosting! See below for the rules. I am literally always looking for hosts, especially new blogger hosts.)

    Now that that’s out of the way, without further ado, I present…

    The Official T-SQL Tuesday Rules, as of January 2017

    1. Each month, a host blogger will be assigned to create a T-SQL Tuesday event on his or her blog.
    2. Creating an event entails several steps:
      1. First the host blogger must be assigned to a month.
        1. Getting a month assigned means writing to me (amachanic [at] gmail [dot] com) and asking to host.
        2. Prior to writing to me you must have participated in at least two prior T-SQL Tuesday events as a blogger.
        3. Prior to writing to me your blog must show decent on-going activity. This means, for established blogs, at least 6-8 technical posts a year. For newer blogs, I expect to see at least one technical post a month sustained for at least six months.
        4. Assuming that everything checks out, I will give you the next available month. Usually things are booked out four to six months in advance.
        5. I try not to repeat hosts more often than once every 24 months.
      2. The host blogger must first choose a topic.
        1. Despite the name of the event, this topic does not need to be about T-SQL, and in fact ideally should not be. Any topic in or related to the SQL Server realm can be covered.
        2. It is strongly suggested that topics be technical in nature, and also strongly suggested that topics be broad enough to cover a variety of different interests and job functions. A T-SQL specific topic would therefore not be recommended, as it would eliminate people who want to write about MDX, DAX, R, PowerShell, C#, or some other related toolset. Remember that not everyone is you, and that’s what makes this fun; we want to hear from all sorts of different perspectives.
        3. A list of all prior topics is available here: https://voiceofthedba.com/t-sql-tuesday-topic-list/. This list is maintained by Steve Jones, to whom I owe a huge debt of gratitude.
      3. The host blogger must create an invitation blog post. The invitation post is the means by which the host blogger will advertise that month’s event.
        1. The invitation post must include T-SQL Tuesday and the event number in its title.
        2. The invitation post must include the T-SQL Tuesday logo.
        3. The invitation post should usually go live either on the Tuesday one week prior to the second Tuesday of the event month, or on the Monday before that Tuesday. Note that some hosts have posted a bit earlier than that, in order to give participants more time. That’s allowed, but it usually backfires in the form of low participation as participants forget to write their posts. Posting later than a week out can backfire as well, due to people not having enough time.
          1. If your post does not go live by close of business on the US East Coast on the Tuesday one week prior to the second Tuesday of the event month, and you haven’t been in contact with me, you will lose your month and you may be banned from ever hosting again. I luckily have not had to exercise this clause very often! Thank you, hosts, for fulfilling your duty.
        4. The invitation post should describe the topic and outline what kinds of blog posts participants should write in order to participate in the event.
        5. The invitation post must mention that participant posts should go live on the second Tuesday of the event month. The recommendation is to limit this to 0:00 UTC to 23:59 UTC on that day, but hosts are free to change the rule to local time zones or however they see fit to enforce it.
        6. The invitation post should describe how participants should make the host aware of participating posts – via trackbacks, comments on the invitation blog post, etc.
        7. I appreciate it if invitation posts tell people to contact me so that they can host. But that’s not required.
      4. The host blogger should advertise his or her invitation through social media channels. Advertising can be done on the official T-SQL Tuesday hashtag, #tsql2sday.
      5. The host is not required to participate in the month’s event by creating his or her own on-topic post, but it is recommended.
      6. After the second Tuesday of the month, ideally by the end of the week, the host is expected to post a “round up.” The round up is a blog post that reiterates the month’s topic and links to all of the participant blog posts. Ideally this round up post should include some overall commentary on the content that was produced, and maybe some individual comments on each blog post.
    3. Participating in a T-SQL Tuesday simply means writing a blog post
      1. Any blogger can (and should!) participate. It does not matter how long your blog has been active or where you blog or even what language you post in. WE WANT YOU.
      2. The blog post should be on topic for the month as described in the invitation blog.
      3. The blog post should go live at the appropriate time as specified in the invitation post.
      4. Participant blog posts need not mention the text “T-SQL Tuesday” in either the subject or the body.
      5. Participant blog posts must include the T-SQL Tuesday logo. The logo must link back to the invitation post. This is the primary hard and fast rule that drives T-SQL Tuesday and the host reserves the right to not link back to you if you don’t follow it. If you don’t link, you’re not acting in the spirit of the event and you are in fact being very rude.
      6. That’s it! Have fun, write great content, and submit it.
    4. All other questions or concerns
      1. Figure it out for yourself and do what you think is best
      2. Failing number one, write to me and I’ll tell you what I think is best

    Amazing how 85 months turned my initial single paragraph description into this big set of rules. But it’s really just a simple exercise in spreading our collective knowledge. So go forth, write, learn, and enjoy. See you next Tuesday!

  • Query Tuning Training in Early 2017: Boston and Cleveland!

    A couple of days ago in Washington DC I presented the final 2016 delivery of my Tuning Your Biggest Queries seminar. The class went really well (at least from my perspective!) and it was a great end to a busy 2016 training schedule.

    The seminar has been an absolute joy to teach and I think it is some of the best work I've ever put together. Advanced query tuning is my favorite topic, by far, and there are so many interesting routes through the material that it never seems to get old or boring. At every class attendees have been asking unique and challenging questions that really make things feel dynamic. Plus, thanks to Microsoft's increased release cadence and focus on performance, there are a lot of new and cool things on the horizon that I plan to add to the content in the coming months. Exciting times to be a query tuner!

    So all that said, now that this year is closing down, I'm happy to report that I'm already working on more dates for next year! Here are the first two:

     

    Boston area, January 27, 2017 - Information Here

    This will be a standalone delivery, which I'm working on in conjunction with (and in support of) the local New England SQL Server users group.

     

    Cleveland, OH, February 3, 2017 - Information Here

    Just a week after the Boston class I'll do it again, this time as a pre-con for SQL Saturday 595. And hopefully I can squeeze in a visit to the Rock and Roll Hall of Fame!

     

    Even more dates forthcoming - stay tuned, and I hope to see you in 2017! 

  • Learn Advanced Query Tuning in Washington DC

    Just about every database professional on Earth has the same goal: Make things faster.

    There are numerous ways to achieve that goal, but for many people it comes down to a series of hunches, guesses, and trying whatever seemed to work last time.

    What if there was a more efficient way to get the job done? What if you could understand what's going on under the covers, know your options, and make the right moves for your actual situation?

    That's what my full-day Tuning Your Biggest Queries seminar is all about. In this unique class you'll learn about how the query processor works, how the query optimizer thinks, and how you can leverage an understanding of these internals to make your queries faster. What really sets this content apart is that there is almost no focus on indexing -- only a very brief discussion at the end of the day, and only after you've already learned how to improve query performance by orders of magnitude through rewrites.

    The idea is not to make you an index champion. The idea is to set you on the path to advanced query tuning mastery.

    I am pleased to announce the final 2016 public delivery of the seminar -- Friday, December 2, in Washington DC. And you can get early bird pricing through the end of this week!

    This delivery will act as a pre-conference seminar for SQL Saturday DC, but you are free to attend either or both.

    An updated course outline follows. Looking forward to seeing you there!


    Tuning Your Biggest Queries - December 2 - Washington DC 

    Module 1: Query Plans

    • How Query Plans Really Work
    • Data Acquisition Iterators
    • Join Iterators
    • Aggregation and Analysis Iterators
    • Parallel Processing 

    Module 2: Plan Shaping

    • T-SQL Rewrites That Waste Your Time
    • T-SQL Rewrites That Work
    • Query Hints
    • Leveraging Row Goals For Plan Control
    • Understanding Ideal Plan Shape
    • Performance Debugging Complex Plans 

    Module 3: Index Optimization 

    • The Levels of Index Understanding 
    • How B*Tree Indexes Really Work
    • How to Design Optimal B*Tree Indexes
    • How to Choose Among Indexing Options
    • Columnstores vs. B*Trees 

     

  • sp_whoisactive: The Big Fixes!

    Four and a half years have flown by since I released sp_whoisactive version 11.11.

    It's been a pretty solid and stable release, but a few bug reports and requests have trickled in. I've been thinking about sp_whoisactive v.Next -- a version that will take advantage of some newer SQL Server DMVs and maybe programmability features, but in the meantime I decided to clear out the backlog on the current version.

    And so, with that, I present sp_whoisactive version 11.16. Hosted on the brand new home for sp_whoisactive. (Yes, it's a bit minimalist for the moment.)

    Enjoy! 

  • Temporal Tables: Connect Item Round Up

    I've been thinking a lot about SQL Server 2016 temporal tables of late. I think it's possibly the most compelling feature in the release, with broad applications across a number of different use cases. However, just like any v.1 feature, it's not without its faults.

    I created a couple of new Connect items and decided to see what other things people had submitted. I combed the list and came up with a bunch of interesting items, all of which I think have great merit. Following is a summary of what I found. I hope you'll consider voting these items up and hopefully we can push Microsoft to improve the feature in forthcoming releases.

     

    Better, More Automatic Logging, Especially to Support Audit Scenarios

    A big theme I saw across many items, and something I've also heard from attendees of my sessions when I've presented on temporal tables, is the question of audit support. History is never just about the the data itself. Metadata around who made the change, how the change was made, and sometimes even why the change was made, can be vital for a number of scenarios. Temporal tables do not improve this experience today, but I think they very easily could.

    Permit additional hidden columns in temporal tables - by DBAIntrepid - https://connect.microsoft.com/SQLServer/Feedback/Details/1707429

    Storing audit columns and don't want your users to see them all the time? Wouldn't it be nice if they could be hidden by default? I certainly think so...


    Provide a mechanism for columns to automatically update themselves - by Adam Machanic - https://connect.microsoft.com/sql/Feedback/Details/3105516 

    UpdatedBy, UpdatedFrom, UpdatedSessionContext, and many other versions on the same theme. Every single database I see has these columns. Why can't SQL Server help us by automatically populating them on our behalf?

     

    Temporal Tables: Improve History Retention of Dropped Columns - by Adam Machanic - https://connect.microsoft.com/sql/Feedback/Details/3105517 

    One of the really annoying things about temporal tables is that the idea of what is and is not history can change. If I drop a column from my base table, the history of the column disappears. That, it seems to me, is unnecessary. Why not leave it (optionally?) and make it nullable?

     

    Temporal Tables (system versioned) Enhancment - by Guy Twena - https://connect.microsoft.com/SQLServer/Feedback/Details/1691517

    Not the best named item; this one is requests a column update mask, similar to that provided by CDC. I agree that this would be a very useful feature.

     

    Easier Querying 

    The first time you use SQL Server temporal tables, the query experience seems amazing. It's so much better and more integrated than any other form of "history" table we've ever had in SQL Server. But the experience very quickly breaks down as soon as you try to do something a bit more complex. The issue? The temporal predicates support only literals and variables. This means that, for example, you can't encapsulate a significantly complex temporal query in a view or inline UDF. I found lots of items around these theme but I decided to include only the following one in this post, as I thought it had the best phrasing and customer impact story.

    Temporal for FOR SYSTEM_TIME AS OF cannot use datetime functions. - by Eric A. Peterson - https://connect.microsoft.com/SQLServer/Feedback/Details/2582201 

     

    Better Administrative Support 

    Sometimes you need to change history. Maybe you need to trim some rows to keep table sizes in check. Maybe you need to backfill some history when merging data sets. Or maybe you need to fix an error. In any case, you can do this with temporal tables by briefly turning off the feature, making your changes, and then turning it back on. This isn't something end users should be doing, but is absolutely a valid administrative concern. Unfortunately, it's tougher than it should be to encapsulate in a stored procedure.

    Cannot catalog stored procedures for updating temporal columns in advance - by Greg Low - Australia - https://connect.microsoft.com/SQLServer/Feedback/Details/2500716

     

    That's it for this round up. Any other interesting items I missed? Please leave a note in the comments section if so! 

  • Upcoming Performance Seminars: Charlotte and Vegas!

    I am happy to report two more scheduled public deliveries of my new-for-2016 seminar, Tuning Your Biggest Queries! This seminar is all about how to approach difficult query tuning tasks by understanding exactly what the query optimizer is (and is not) doing with your data, how it internally works and thinks, and how you can take complete control. Thanks to great audience feedback I've made some tweaks and I think the content is getting better and better every time. 

    The upcoming dates are:

    Charlotte, NC - September 16 - Pre-conference seminar for SQL Saturday Charlotte. Note that this is almost sold out.

    Las Vegas, NV - October 10 - Pre-conference seminar for IT/Dev Connections. An early bird rate is still in effect for this one.

     

    Hope to see you there!

  • Performance Tuning Seminars in Philadelphia and Indianapolis

    Pretty much every environment I've ever seen follows the same patterns with regard to query size distribution: lots of typical run-of-the-mill queries (whatever is typical in that environment), followed by a diminishing number of larger queries (say, an order of magnitude bigger than typical), and a handful of gigantic queries that are several orders of magnitude beyond that.

    When tuning a slow SQL Server, what's the main thing I look at improving? I focus, first and foremost, on that final set, containing the biggest, most complex, most long-running things I can find. I believe this makes a lot of sense on several levels: Not only are the biggest queries the "lowest hanging fruit" -- i.e. things I tend to be able to markedly improve quickly, but they're also usually the ones dragging down the rest of the system. Bring down the monsters and all of the little guys can breathe easier.

    Unfortunately -- or fortunately if you enjoy these kinds of things as much as I do -- tuning the biggest and most complex queries can be quite a challenge. I often need to try to figure out what the query optimizer is "thinking," figure out which query hints I can or cannot leverage, and come up with interesting ways to re-write things to improve speed while maintaining the same logic. Dealing with big queries tends to be equal parts art (creativity) and science (raw application of logic). And the really interesting thing I've noted after tuning dozens of systems is that it doesn't matter whether my "big" query is bigger than your "big" query: It's the relative size of these beasts, as compared to whatever is standard, that makes them equally challenging and rewarding to tackle.

    Recently I was looking for a new topic on which to write a seminar, and I decided that this topic would be perfect. Query tuning is one of my favorite pastimes in the SQL Server world, virtually every environment can use some tuning, and virtually everyone has to deal with these kinds of big queries.

    So all that said, I'm proud to announce the first two deliveries of Tuning Your Biggest Queries, a full-day seminar I've put together for SQL Saturdays and conferences.

    The two dates are:


    Click through for abstract, pricing, and registration details. 
     
    If you can't make those dates, don't worry; more are forthcoming. Stay tuned. 

    Questions? Comments? Leave me a message below if you need clarification regarding the content, deliveries, or future dates.
     
    Hope to see you there!
  • sp_whoisactive for Azure SQL Database - Attempt #2

    Over a year ago now I announced a rough port of sp_whoisactive to Azure SQL Database. This worked thanks to Microsoft finally aligning the Azure DMVs to the box product DMVs...until Microsoft decided to rip out one of the key DMVs, sys.dm_os_sys_info and its ms_ticks column. This column is the key to unlocking various metrics in some of the other SQLOS DMVs.

    I tried to get my contacts at Microsoft to listen to me about restoring it, but my efforts were met with nothing but silence and lots of wasted time.

    So I set about looking for a workaround. This week I think I've finally managed to get something working that approximates the number I need from that view, ms_ticks. 

    Attached is sp_whoisactive v11.112 -- Azure Special Edition v2. Please give it a shot, and I am especially interested in feedback if you use the @get_task_info = 2 option when running sp_whoisactive. That is the main use case that's impacted by the lack of ms_ticks information and my attempt at a workaround.

    Thanks for your patience as I worked this out! Enjoy. 

  • SQLQueryStress: The Source Code

    Back in 2006, I created a very simple tool called SQLQueryStress. The goal of this tool was to enable the user to test a single query under some form of load -- up to 200 simultaneous threads.

    I wrote this tool primarily for a book I was working on at the time, and used it for most of the examples therein. And after that, truth be told, I decided that the tool was not very useful. I use it for demos in almost every presentation I give -- it's great for showing simple contention scenarios -- but I can't say that I've used it even a single time in an actual work project.

    Interestingly, although I personally haven't found much use for it, many of you out there seem to have. The tool has become incredibly popular over time, and I've received numerous e-mails over the years requesting features, as well as many from people requesting and even offering to pay for the source code.

    I've never said yes to any of these source code requests. For a long time I said no because I thought I might add more features and turn SQLQueryStress into something I could sell. Later, as my C# skills improved, I realized that the project was not very well done, and I really didn't want to release garbage code with my name on it. Still later, I thought I'd lost the source code forever in a hard drive crash. Meanwhile, SQLQueryStress has limped along unchanged, even as the wish list has continued to grow.

    Things changed last week, when I allowed one of my domains to expire; the one hosting SQLQueryStress. The domain was hosted by a very shady company that I'd long since decided to stop doing business with, and I found myself faced with the task of transferring things to a new provider. That is of course not a big deal, but I decided that the world would be better off without the current version of SQLQueryStress being quite so available. Almost 10 years after its creation -- most of the work done over a long weekend -- the world needs a better tool. I think it's time for someone to step up and deliver. And that someone is not going to be me.

    After the domain expired I went searching through my archives and found the source code. I took a quick look and confirmed what I thought all along: It's pretty bad. But today I'm releasing it for free, and the old adage stands: You get what you pay for!

    So here's the official word: The attached source code is hereby released to the world, copyright and royalty free. You may use it, if you like, for whatever you want. Enjoy! If you use it for a public project, I would appreciate a mention in the acknowledgements section, but even that is not required. This source code is yours, warts and all. I was tempted to do some cleanup work, but at this point it's just not something I'm ever going to touch again. I upgraded the project from Visual Studio 2005 to Visual Studio 2013, confirmed that it builds and seems to work, and that's that.

    I would very much like for someone to release a quality and modern stress tool. I would personally trash this source code and start with something brand new, but if you do use my code or even just my ideas as the base, here are the current top to-do items:

    • Fix bug where SQLQueryStress tends to crash after the Stop button is hit. Some variable, I guess, is not being properly reset.
    • Implement a better query editor. The control I used is bare bones to the max and does no one any favors.
    • Apparently there are some bugs around the statistics creation options not getting properly turned off when you turn them off.
    • Enable more than 200 simultaneous threads. This was just an arbitrary number and can be changed by simply modifying the appropriate control. But perhaps there is a smarter way to come up with a maximum?
    • Implement support for saving of statistics and collected information, and reporting on that data, especially when the same test is run more than once.
    • Implement support for multiple test queries.

    If you have any questions about the source code... you're out of luck. I haven't looked at it in at least 8 years. So fire up your debuggers!

    And finally, kindly let me know of any public projects you create, so that I can become an end user.

More Posts Next page »

This Blog

Syndication

Privacy Statement