THE SQL Server Blog Spot on the Web

Welcome to - The SQL Server blog spot on the web Sign in | |
in Search

Erin Welker

SQL Server and Business Intelligence

  • SQL Saturday #35 - Dallas

    I had intended to post this weeks ago.  I guess I'm still postponing my New Year's resolution to stop procrastinating.

    We're organizing our first SQL Saturday event in Dallas on May 22nd, hoping to make it an annual event.  It will be held at the Region 10 Education Center in Richardson, and we currently have 4 tracks slated:  Development, Administration, Business Intelligence and Beginner.  I'm particularly excited about the Beginner track since we so rarely find those at the big conferences.  This could be a valuable set of sessions, especially for those that are moving from one area of focus to another.  If you live in or around the Dallas area and have been hoping to find one of these events near you, here's your chance!  The event, like all SQL Saturdays, is free.  We are collecting a fee of $10 for lunch onsite.  Click here to register.  Registration is limited, so don'

    Our call for speakers is open and will be open until mid-February.  We've already received some great abstracts from well-known speakers, and I hoping we'll see even more.  Even if you've never presented before and want to get your feet wet, this is a great opportunity.  Everyone has something interesting to share with their peers and this is a good opportunity to do so.

  • Automating UBO

    I was preparing for my presentation at PASS and wanted to show some code for automating usage-based optimization.  This is the process that allows for creating aggregation designs based on past query activity.  I was surprised to find that there were no code examples posted on the web on how to do this, at least none that I could find.  BIDSHelper and AggManager have similar functionality but, upon examining the code, they both use a more manual approach of examining each Dataset column in the query log and creating an aggregation when it encounters a "1" in the bit string.  I believe they did so due to the inferiority of the UBO algorithm in SQL Server 2005.  That being said, using the out-of-the-box UBO algorithm leveraged here is only recommended for use with SQL Server 2008.

    I felt sure that this functionality had to be available in AMO (Analysis Management Objects) because it existed in its predecessor, DSO.  It turns out that there is supporting documentation in Books Online, but it took some help from some of the folks at Microsoft to find it.  Note to self, when looking for detailed documentation of functionality in AMO, look under XMLA.  This kind of makes sense - I just would have expected it to be replicated in the AMO documentation.  Maybe it just takes a developer to know how to find it.  XMLA can also be used as a means of automating UBO but I find the AMO method to be a little more elegant.

    The methods used for leveraging query logs when designing aggregations are the same as those used in generic aggregation design.  The only difference is that a collection of queries is passed in addition to the other parameters.  There are two things to note, however: 

    First, the Dataset column that is read from the query log must be preceded by a weighting factor, usually the number of times the query appeared in the log. 

    Second, the queries string collection must only be passed the first time the method is executed.  If you pass this parameter in on subsequent calls you will get an error.  The DesignAggregations method is executed in a loop that is terminated when the size and/or optimization goals are met.  I'm not sure why it couldn't just ignore the queries parameter in subsequent calls, but it doesn't, so you need to code accordingly.

    Both of these nuances are documented in Books Online.

    Here's an AMO code snippet for adding an aggregation design based on query history.  Note that development is not among my primary skill sets, so you'll probably want to clean it up - but it works.  For the sake of the session demo, I hard-coded a measure group and created a brand new aggregation design.  Best practices indicate that you should add these new aggregations to the existing design so that you won't throw away aggregations that were leveraged by previously well-performing queries. In a real-world scenario, you'd also want to filter the query log so that you only design aggregations for queries that had a long Duration:

    Public Sub Main()
       ' Declarations
       ' Get Server and Database name from DTS connection object 
       Dim oSSASConnection As ConnectionManager = Dts.Connections("Analysis Services")
       Dim oSQLConnection As ConnectionManager = Dts.Connections("SQL Server")
       Dim sSSASServer As String = CStr(oSSASConnection.Properties("ServerName").GetValue(oSSASConnection))
       Dim oSSASServer As New Microsoft.AnalysisServices.Server
       ' SQL Connection
       Dim strSQLQuery As String
       Dim strConnection As String
       Dim sqlCn As SqlConnection
       Dim oServerProperty As ServerProperty
       Dim sqlDataAdapter1 As SqlDataAdapter
       Dim dsQueryLog As DataSet
       Dim dvQueryLog As DataView
       Dim dRow As DataRow
       ' Aggregation variables
       Dim Queries As New StringCollection
       Dim strAggPrefix As String
       Dim aggName As String
       Dim aggDesign As AggregationDesign
       Dim optimization As Double = 0
       Dim storage As Double = 0
       Dim aggCount As Long = 0
       Dim finished As Boolean = False
       Dim firsttime As Boolean = True
       ' Measure group variables
       Dim DatabaseName As String
       Dim CubeName As String
       Dim oMeasureGroup As MeasureGroup
       Dim MeasureGroupID As String
       Dim oPartition As Partition
       ' Initialize
       DatabaseName = "Adventure Works DW 2008"
       CubeName = "Adventure Works"
       MeasureGroupID = "Fact Sales Summary"
       strAggPrefix = "PASS2009_"
          ' Initialize connections
          oServerProperty = oSSASServer.ServerProperties("Log\QueryLog\QueryLogConnectionString")
          strConnection = oServerProperty.Value.Substring(oServerProperty.Value.IndexOf(";") + 1)
          sqlCn = New SqlConnection(strConnection)
          oMeasureGroup = oSSASServer.Databases(DatabaseName).Cubes(CubeName).MeasureGroups(MeasureGroupID)
          ' This would be a good place to update the EstimatedRows in the measure group and partitions
          ' Set oMeasureGroup.EstimatedRows = to the count of rows in the source fact table
          ' Get select queries from the query log
          strSQLQuery = "SELECT dataset, COUNT(*) FROM OLAPQueryLog WHERE MSOLAP_Database = '" & DatabaseName & "' " & _
              " AND MSOLAP_ObjectPath = 'ERIN-PC\SQL2008." & DatabaseName & "." & CubeName & "." & oMeasureGroup.ID & _
              "' GROUP BY dataset"
          sqlDataAdapter1 = New SqlDataAdapter(strSQLQuery, sqlCn)
          dsQueryLog = New DataSet
          sqlDataAdapter1.Fill(dsQueryLog, strSQLQuery)
          dvQueryLog = dsQueryLog.Tables(strSQLQuery).DefaultView
          ' Populate the Queries string collection with the distinct queries from the query log
          For Each dRow In dvQueryLog.Table.Rows
             Queries.Add(dRow(1).ToString & "," & dRow(0).ToString)
          ' Add a new design to the Fact Sales Summary measure group and design aggregations based on the passed list of queries
          aggName = strAggPrefix & "_" & oMeasureGroup.Name
          aggDesign = oMeasureGroup.AggregationDesigns.Add
          aggDesign.Name = aggName
          Do While ((Not finished) And (optimization < 100))
             If firsttime Then
                aggDesign.DesignAggregations(optimization, storage, aggCount, finished, Queries)
                firsttime = False
                aggDesign.DesignAggregations(optimization, storage, aggCount, finished)
             End If
          ' Assign the new aggregation design to all partitions in the measure group
          For Each oPartition In oMeasureGroup.Partitions
             oPartition.AggregationDesignID = aggDesign.ID
          ' Process the indexes to build the new aggregations
          ' oMeasureGroup.Process(Microsoft.AnalysisServices.ProcessType.ProcessIndexes)
          Dts.TaskResult = ScriptResults.Success
       Catch ex As Exception
          Dts.Events.FireError(0, "Design aggregations failed - ", ex.Message, "", 0)
          Dts.TaskResult = ScriptResults.Failure
       End Try
       If oSSASServer.Connected Then
       End If
    End Sub
    .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

  • Using SSAS 2008 tools with SSAS 2005 cubes

    It's been awhile since I've blogged.  Every time it seems I'll get some spare moments, something comes up to direct my attention elsewhere.  My current engagement keeps me busy and embedded in SQL Server 2005.  Though I've presented on some of the cool features of SQL Server 2008, I feel I have very little new insight to publish on that topic that hasn't already been covered on this site.  It occurs to me that I'm most readers are in the same position that I am.  Unless driven to SQL Server 2008 by particular features, I suspect many remain on 2005 (and, dare I say it, 2000).

    One of the many aspects of SQL Server 2008 that appeals to me is the cube designers in Business Intelligence Development Studio (BIDS).  For example, attribute relationships are a commonly misunderstood aspect of cube dimensions in SQL Server 2005/2008.  Later SQL Server 2005 service packs added some warnings when attribute relationships were not designed into dimension hierarchies, but these are easy to ignore and ARs are not well understood by many cube designers.  Why are attribute relationships so important?  That's something I may save for another blog, but I suspect it has already been blogged about elsewhere on this site.  My point is that BIDS in 2008 provides far more feedback on various cube/dimension design aspects, attribute relationship definitions being one.  I'm including a screen snapshot of the attribute relationship designer in 2008 BIDS to demonstrate my point.

    Attribute Relationship designer 2 

    As you can see, it's far more intuitive. 

    And for those courageous enough to take on the manual design of aggregations, the aggregation designer in 2008 is killer.  I'm not recommending this.  In fact, I use it more to view the aggregations that have been designed by the aggregation design wizard or usage-based optimizer.  There is a separate application that could be downloaded prior to 2008, but I found the interface to be difficult to work with.  Here's what the aggregation designer looks like in BIDS 2008:


    SQL Server 2008 Aggregation Designer


    What I learned from a Tech Ed presentation by Donald Farmer is that these tools can be used to manage a cube implemented in SQL Server 2005.  How cool is that?!  So, if your organization plans to stay on SQL Server 2005 on the server-side for a little while longer, it doesn't mean you can't take advantage of some of the great new client tools in SQL Server 2008 to make life a little easier in the meantime.

  • Analysis Services Performance Whitepaper - a little light reading

    The SQL Server 2008 version of the Analysis Services Performance Guide was published last month.  I've yet to read it - it's 80 pages (which is actually less than its 2005 predecessor).  I'm sure there's some overlap with that one but, since it has different authors, the information is presented differently.  And, of course, it has been updated for SQL Server 2008.  

  • Trials in reporting on cubes

    I'll admit that I haven't spent a lot of time in Reporting Services.  And it's been quite awhile since I've written a report against an Analysis Services cube - since SQL Server 2000, in fact.  I was kind of looking forward to using the MDX designer in Reporting Services 2005 but soon found it's limitations (which are well-documented).  I needed to implement drilldown on the Account hierarchy, which required pulling the preceding parent levels in a parent-child dimension.  This quickly moved me to the OLE DB data source type and out of the nifty query designer that is only available when using the Analysis Services type :-(  As such, I was back to implementing parameters by concatenating them in the query string - it was looking like SQL Server 2000 all over again.

    But that's not the jist of this blog entry.

    I got a nice looking report that drilled down a parent-child Account hierarchy on the rows and displayed various time periods on the columns.  Yes, it looked good, but it took, on average, 1 minute 37 seconds to refresh!  Every time the user would select a new job to display they'd have to wait over a minute and 1/2 for the report to come up.  Unacceptable!  What's worse is that the MDX was only taking 7 seconds to execute.  I can tune the MDX/cube but I had no idea what Reporting Services was doing that resulted in such a long render time.

    My first resort was to google on the problem.  I found very little information on the topic.  One useful blog by Chris Webb suggested that I remove the cell formatting options.  Alright, that was nice - it removed 15 seconds and my report now refreshed in 1 minute 23 seconds (average).  Still unacceptable.

    In thinking about the problem it seemed strange that RS required such a flat query that took an aggregated source, required you to unaggregate it, so that RS could re-aggregate it itself.  Yet all examples I could find put all measures (and only measures) on the columns axis, and all remaining dimension members on the remaining axes.  This returns a resultset that is long and narrow.  The report is short (relatively) and wide.  Why can't I make the query look just like the report?

    So that's what I tried.  I put the column header dimension members on the column axis.  Bingo - the report rendered in 15 seconds - now that's a significant improvement!

    Note that this solution will not fit the need of all reports.  In placing dimension members on the column axis I lose the ability to drill down or up that dimension - the column headers become static.  In my case, that's precisely what I wanted since that happened to be the design of the report.  It also is a little more tedious to manually enter the values associated with 10 static columns compared to the 1 column in the original report.  I felt this was a small price to pay for over 5x performance improvement.



    MDX Before (condensed)

    SELECT NON EMPTY {[Measures].[Value]} ON COLUMNS, 

      {([Time].[Fiscal Month].[Current Month], {[Scenario].[All Members].[Actual], [Scenario].[All Members].[Forecast]}),
        ([Time].[Fiscal Month].[ITD], {[Scenario].[All Members].[Actual], [Scenario].[All Members].[Forecast]}),
        ([Time].[Fiscal Month].[EAC], {[Scenario].[All Members].[EAC], [Scenario].[All Members].[Budget]})}
    ON ROWS,

    NON EMPTY  ([Version].[All Members].[Version].[0], [Job].[Job_Hierarchy].[Level 03].[1234],
       { [MeasureType].[All Members].[Units] * DESCENDANTS([Account].[Account Hierarchy].[Total Costs],, LEAVES),
         [MeasureType].[All Members].[Dollars] * {DESCENDANTS([Account].[Account Hierarchy].[Total Costs],, LEAVES), [Account].[Account Hierarchy].[Total Revenue]}}) ON PAGES

    FROM [Job Planning]

    MDX After

      {([Time].[Fiscal Month].[Current Month], {[Scenario].[All Members].[Actual], [Scenario].[All Members].[Forecast]}),
        ([Time].[Fiscal Month].[ITD], {[Scenario].[All Members].[Actual], [Scenario].[All Members].[Forecast]}),
        ([Time].[Fiscal Month].[EAC], {[Scenario].[All Members].[EAC], [Scenario].[All Members].[Budget]})}

      * {[Measures].[Value]} ON COLUMNS, 

    NON EMPTY  ([Version].[All Members].[Version].[0], [Job].[Job_Hierarchy].[Level 03].[1234],
       { [MeasureType].[All Members].[Units] * DESCENDANTS([Account].[Account Hierarchy].[Total Costs],, LEAVES),
         [MeasureType].[All Members].[Dollars] * {DESCENDANTS([Account].[Account Hierarchy].[Total Costs],, LEAVES), [Account].[Account Hierarchy].[Total Revenue]}}) ON ROWS

    FROM [Job Planning]

  • ETL World Record!

    I just heard about this from Len Wyatt, who's a Principle Program Manager on the SQL Server Performance team.  Microsoft and Unisys announced a record for loading into a relational database, using SSIS (SQL Server 2008) to load over 1 TB of data in under 30 minutes!  The previous known record-holder was Informatica, who reported loading 1 TB in 45 minutes.  Note that that this is not a TPC benchmark, since none exists for loading tools (yet).

    Len blogged about the details on the SQL Server Performance team blog, where he mentions that a whitepaper will follow with more information on how they did it.  I, for one, can't wait to read all about it!

  • Partitioning enhancements in SQL Server 2008

    Horizontal partitioning can be a great benefit to data warehouses, as it greatly facilitates the management of very large databases. Of course, querying on partitions can also have performance benefits, especially when only a small percentage of the database is queried and partition elimination can occur. SQL Server 2005, however, can have some unexpected behavior when querying few partitions.

    In SQL Server 2005, one thread is allocated per partition when multiple partitions are queried. These threads then work in parallel to retrieve the rows to satisfy the query. If only one partition is queried, however, SQL Server 2005 allocates as many threads as are available to the one partition. As you can quickly see, there is a vulnerability when it comes to queries that query very few, but more than 1 partition.

    The common scenario is when only 2 partitions are queried. For example, let’s say you have a data warehouse database that is partitioned on a sales date by month. You write a query to compare last month’s sales to this month’s sales. Now, let’s say this runs on a 32-processor server. How many threads get allocated to this query? The answer is 2, one per queried partition.

    This last scenario can become even more complicated when a window of time causes the query to fluctuate between 1 and multiple partitions. Let’s say we compare today’s sales to the same day of the week last week. If both weeks are in the same month, we get full thread allocation. If the weeks span two months (= 2 partitions), however, we only get 2 threads allocated and the query can appear to slow down considerably, for no apparent reason (at least, probably not to the issuer of the query).


    Note: All of these scenarios assume that the queries are able to eliminate partitions due to a direct search on the partitioning key. Be sure partition elimination is occurring in order to gain the best partitioned table query performance.

    SQL Server 2008 behavior changes the way in which threads are allocated. It allocates all available threads to each partition relevant to the query in a round robin fashion. This results in fast and consistent query execution, no matter how many partitions are queried.


    Note that this makes it all the more important to allocate partitions to filegroups that are spread across many disk spindles, allowing the query on a given partition to be as efficient as possible.

    But wait, there’s another improvement to partition query behavior in SQL Server 2008. SQL Server 2005 only allows lock escalation to the table level. Let’s say your query eliminates 75% of a table’s partitions, but scans all of the remaining partitions (1/4 of the table). If SQL Server decides to escalate the lock, all other queries will be locked out even if they are querying completely different partitions. SQL Server 2008 provides a table option to override this default behavior. Note that the default is still to escalate to table locks (at least for now) so this option will need to be changed to take advantage of partition-level lock escalation. Find out more about this, like I did, on über blogger Paul Randall’s post.

  • My Top 10 list for SQL Server 2008

    Long gone are the days when you can reasonably expect to know all aspects of SQL Server at a deep level.  When a new version starts to emerge, I try to put some scope around the features that I will dive more deeply into.  I'm publishing my list here so that like-minded SQL Server enthusiasts may become aware of a feature or two that was lost in that one-page Powerpoint slide you see in most of the presentations on SQL Server 2008.

    Note:  I put together this list during the summer as an roadmap for personal testing with the CTP releases.  In December 2007, a whitepaper, "An Introduction to New Data Warehouse Scalability Features in SQL Server 2008", was released that includes a short writeup on each of these features, as well as some additional improvements to SSAS, SSIS and SSRS.  If you are interested in the types of features I mention below, I highly recommend you check out this whitepaper. 

    My focus is on Business Intelligence and performance (particularly in terms of relational data warehouse performance).  With that said and in no particular order, here is my list of top 10 features  in SQL Server 2008, and why.  I plan to post more detail about several of these in the future:

    1. MERGE command - this is a new TSQL command that will allow you to combine an Insert with an Update command, sometimes referred to as an UPSERT.  This is particularly useful in loading a data warehouse.  You will no longer have to test for a row's existence in order to take one of two paths (INSERT or UPDATE, or even DELETE).
    2. Star Join - this could have a tremendous impact on queries in a relational data warehouse.  Data warehouse queries are characteristically performance hogs. Since a large percentage of data is usually selected, the query optimizer often cannot take advantage of indexes like it can with more selective OLTP queries.
    3. Change Data Capture (CDC) - this feature can be used in SQL Server 2008 data sources to automatically track changes in data that require a row to be re-sent to a data warehouse.  This makes ETL from a SQL Server 2008 data source far more efficient and straight-forward.
    4. INSERT INTO - no structural changes have been made to the statement, but minimal logging can be implement, much like BULK INSERT or SELECT..INTO, under the right conditions.
    5. Lookups in SSIS - though SSIS lookups were incredibly enhanced from DTS (where they were virtually unusable), the performance of this task has been tuned to improve performance and minimize resource utilization.  In lieu of blogging about this later, I'll refer to a blog post by Jamie Thomson who explains this feature in detail and far better than I could.
    6. Data compression - this is potentially huge!  First of all, compression is an option, so if if the CPU hit is an issue you can choose not to implement compression.  Data compression means more data on fewer pages, which has a domino effect on performance (more pages in memory, better memory utilization, and improved page life expectancy).
    7. Partitioning enhancements - there's an issue in SQL Server 2005 partitioning parallelism that affects queries on a few number of partitions but greater than 1.  If only one partition is queried, intra-partition parallelism is implemented effectively.  If greater than one, a single thread is used to process each partition, which results in under-parallelism with queries on just a few partitions.  This has been addressed in SQL Server 2008 (future post).
    8. Resource Governor - if you haven't seen the demo on this, you should.  You can implement resource governing rules that affect inflight queries.  One scenario I see for this is in environments where ETL occurs simultaneously with data warehouse queries, such as in a real-time environment.
    9. Backup compression - this will dramatically affect backup times for large data warehouses. This capability is currently available through third party vendors but some DBAs are currently unable to leverage these solutions due to company standards or budget constraints.
    10. Partition-aligned indexed views (IVs) - this allows for the use of IVs on fact tables.  Indexed views on partitioned tables is virtually unusable in SQL Server 2005 because of the requirement to drop and recreate dependent IVs whenever a fact table partition SWITCH is made.  I'll explain why I think this is such a big deal in a future post.
  • Installing SQL Server 2005 on Vista

    I sent this to our local user group when I installed Vista Ultimate on my new laptop a few months ago.  Yesterday, I did a quick search to see if this information was readily available and found that it still was not (at least not in one place), so I decided to post it here.  I hope it helps!

    ·       You must install SQL Server 2005 SP2.  I was able to successfully install RTM, then followed by SP2, but an attendee at our last user group meeting had no such luck.  One of our local Microsofties mentioned there is an SQL install that combines the core install of SQL with SP2.

    ·       OWC 11 component install fails.  This has a domino effect on the installation of most of the other SQL components.  You know, the important ones, like database services, SSIS, SSAS and SSRS.  The newsgroup solution was to uninstall OWC 11 through Add/Remove Programs.  This solution worked for me, as well.

    ·       IIS is required for Reporting Services.  Nothing new here, but the configuration of IIS was not as simple as I recall it on XP.  Be sure the following options are selected (Start --> Control Panel --> Programs and Features --> Turn Windows features on or off):

    o   Web management tools

    §  IIS 6 Management Compatibility

    ·         IIS 6 WMI Compatibility

    ·         IIS Metabase and IIS 6 configuration compatibility

    o   World Wide Web Services

    §  Application Development Features

    ·         ASP.NET

    ·         ISAPI Extensions

    ·         ISAPI Filters

    §  Common Http Features

    ·         Default Document

    ·         Directory Browsing

    ·         HTTP Redirection

    ·         Static Content

    §  Security

    ·         Windows Authentication

    ·       You must install Visual Studio SP1, then another patch for Vista.  I initially tried installing the patch for Vista first but it wanted SP1.  I was unable to run BIDS until I installed these.  Here are the download files (available for download at

    o   VS80sp1-KB926601-X86-ENU (VS SP1)

    o   VS80sp1-KB932232-X86-ENU (patch for Vista)



    There was mention of having to turn off UAC at one of our user group meetings.  I did not have this issue.  During the install, I was asked if I wanted to be added to the sysadmin role during the SQL SP2 setup, because Builtin\Administrators is not automatically added to sysadmin like it used to be.



  • Hi SQLBlog readers!

    I'm excited to be blogging with several of my SQL Server comrades.  My name is Erin Welker and SQL Server is my passion - I've been working with the product since version 1.11!  In the last few years I have been sharing my lessons learned through presenting and writing.  I've been honored to present at SQL PASS for the last 3 years and at Tech Ed 2007.  I was part of the Project REAL team and authored a whitepaper on partitioning as a result.  I also wrote 2 whitepapers on large relational data warehouses on SQL Server, one on managing and one on performance.  I'm currently working on my second article for SQL Server Magazine, which should appear in the March 2008 issue. 

    My focus is on Business Intelligence and performance, and that's what I intend to blog about most.  On my current engagement, I'm working with PerformancePoint Planning, so you may see that slip in here and there.

    I look forward to sharing information on this site and hope to learn from other bloggers and readers, as well.

    I hope you enjoy!


Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement