THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

Geek City: Accessing Distribution Statistics

Distribution statistics are one of the most important sources of information that the Query Optimizer uses to determine a good query plan. In this post, I’m not going to tell you everything about distribution statistics. I’m just going to show you a few tricks for getting access to the statistics.

If you want a deeper understanding of what the statistics keep track of, and you don’t have any of my SQL Server Internals books handy, check out this whitepaper: Statistics Used by the Query Optimizer in Microsoft SQL Server 2008 

Microsoft does provide us a tool called DBCC SHOW_STATISTICS for examining the distribution statistics.

Microsoft has gradually been making more of the more of the old DBCC commands available as DMVs, even some undocumented ones. For example, one of my favorites, DBCC IND, has now been replaced in SQL Server 2012  by sys.dm_db_database_page_allocations.

I have been wishing for several versions that Microsoft would make the DBCC SHOW_STATISTICS information available as a DMV. But it hasn’t happened yet, and I’m tired of waiting, so I decided to do something about it.

My solution is not quite as easy to use as a DMV might be, but it allows you to get the information that DBCC SHOW_STATISTICS provides into a set of three tables that can then be saved into a more permanent location of your choice, and/or queried as desired.

DBCC SHOW_STATISTICS returns three sets of information, with different columns in the output, so three different tables are needed. The DBCC SHOW_STATISTICS command can be called with an argument that specifies that you just want one of the three sets returned. The options are

WITH STAT_HEADER – returns basic info such as last update date, and number of rows in the table/index. Also reports number of steps returned for HISTOGRAM section.

WITH DENSITY_VECTOR – returns density info for each left-based subset of columns in the index. For example, an index on (lastname, firstname, city) would have a density value for (lastname), for (lastname, firstname), and for (lastname, firstname, city). Each density value is a single number representing the average number of occurrences and depends on the number of distinct values. For example, if there are only 2 possible values in the column, the density would be 0.5. Multiplying density by the number of rows in the STAT_HEADER section would give the average expected rowcount if a query was executed looking for an equality on the specified column(s).

WITH HISTOGRAM – returns a set of ordered values from the first column of the index, creating a histogram. This histogram provides the optimizer with selectivity information for specific values or ranges of values in the first column of the index.

To collect this info, I will use one of my favorite tricks, which is to create a table in the master database with a name starting with sp_. (I’ve written about this trick several times, including in this earlier blog post.) Once I have the table(s) created, I can access them from any database. So here are the three tables:

USE Master;
GO

IF  (SELECT object_id('sp_stat_header')) IS NOT NULL
  DROP TABLE sp_statsheader;
GO
CREATE TABLE sp_stat_header
(   Name sysname,
    Updated datetime,
    Rows bigint,
    Rows_sampled bigint,
    Steps smallint,
    Density numeric (10,9),
    Average_key_length smallint,
    String_index char(3),
    Filter_expression nvarchar(1000),
    Unfiltered_rows bigint);
GO


IF  (SELECT object_id('sp_density_vector')) IS NOT NULL
  DROP TABLE sp_density_vector;
GO
CREATE TABLE sp_density_vector
(  all_density numeric(10,8),
   average_length smallint,
   columns nvarchar(2126) );
GO

IF  (SELECT object_id('sp_histogram')) IS NOT NULL
  DROP TABLE sp_histogram;
GO
CREATE TABLE sp_histogram
(   RANGE_HI_KEY sql_variant,
    RANGE_ROWS bigint,
    EQ_ROWS bigint,
    DISTINCT_RANGE_ROWS bigint,
    AVG_RANGE_ROWS bigint);
GO

The second trick is to use INSERT … EXEC to execute a DBCC statement and populate the tables. I will build the DBCC  command dynamically, after capturing the schema, table and index names in variables. You of course could take this code and turn it into a stored procedure, for which the schema, table and index names are passed as parameters. I’ll use as an example a table in the AdventureWorks2008 sample database, just so you can try running the code, and I can verify that it actually works!

I will use the table Sales.SalesOrderDetail and the index IX_SalesOrderDetail_ProductID. So the object name (@oname) is SalesOrderDetail, the schema name (@sname) is Sales, and the index name (@iname) is IX_SalesOrderDetail_ProductID.

SET NOCOUNT ON;
USE AdventureWorks2008;
GO
DECLARE @oname sysname,  @iname sysname, @sname sysname

SELECT @oname = 'SalesOrderDetail',  @sname = 'Sales', @iname = 'IX_SalesOrderDetail_ProductID';
 
-- Update the object name to include the schema name, because that is the format the DBCC command expects
SELECT @oname = @sname +'.' + @oname;

TRUNCATE TABLE sp_stat_header;
INSERT INTO sp_stat_header
    EXEC ('DBCC SHOW_STATISTICS(['+ @oname + '],' + @iname +') WITH STAT_HEADER');

TRUNCATE TABLE sp_density_vector;
INSERT INTO sp_density_vector
    EXEC ('DBCC SHOW_STATISTICS(['+ @oname + '],' + @iname +') WITH DENSITY_VECTOR');

TRUNCATE TABLE sp_histogram;
INSERT INTO sp_histogram
    EXEC ('DBCC SHOW_STATISTICS(['+ @oname + '],' + @iname +') WITH HISTOGRAM');

So now you can look at the values collected and filter or query in any way, or use SELECT INTO to save them into another table, so the sp_ tables can be used the next time you want to capture distribution statistics information.

SELECT * FROM sp_stat_header;

SELECT * FROM sp_density_vector;

SELECT * FROM sp_histogram;

  

Let me know if you find this useful, and especially if you embellish it to create a procedure or an automated process of your own!

Thanks!

~Kalen

Published Friday, January 18, 2013 1:08 PM by Kalen Delaney
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

sfibich said:

I’m guessing you already know but SQL 2008 R2 introduces a new DMV sys.dm_db_stats_properties that has almost all of the same properties as the DBCC show_statistics WITH STAT_HEADER returns.  The DMV is missing the density which I believe is no longer used by the optimizer, string index, average key length, and filtered expression.  The filtered expression column can be picked up from sys.stats filter definition column.  It’s not a perfect fit but if you can do with the average key length and the string index flag it’s a nice alternative.  You can also pull in the average record length for the 0 level index of a statistic if it has one from the sys.dm_db_index_physical_stats DMV using the DETAILED setting.  It’s not exactly the same thing but it may give some insight to the statistic.  This combination has been getting me through an high level look at statistics for a database when I’m having performance problems.

I like your solution as it solves the issue for getting some of the more detailed statistics information.

something like...

select

ddsp.*

,ss.filter_definition

,ss.no_recompute

from

sys.dm_db_stats_properties (object_id('FactInternetSales'), 2) as ddsp

inner join

sys.stats ss

on

ddsp.object_id =ss.object_id

and

ddsp.stats_id =ss.stats_id

January 22, 2013 10:14 AM
 

Kalen Delaney said:

Thanks sfibich... I actually just found out about the new DMV when this issue of programmatically accessing stats info was being discussed on a private forum. Someone asked why we couldn't use sys.dm_db_stats_properties, and that was the first time I saw it! But as you say, it doesn't have everything.

Thanks again

Kalen

January 22, 2013 1:08 PM
 

Kalen Delaney said:

  I just started using a new DMV (one that’s actually an ‘F’ not a ‘V’, as in Function) that gives

April 9, 2013 9:39 PM
 

Kalen Delaney said:

I just started using a new DMV (one that’s actually an ‘F’ not a ‘V’, as in Function) that gives us more

April 9, 2013 9:40 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement