THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

Surrogate key issues with Analysis Services

Usually I don't post something just to raise a question - but every rule has its exceptions.

As you know, using surrogate keys is a best practice for a lot of reasons. Everything works fine with SSAS when you use surrogate keys with a plain star schema. If you have an incremental process of the dimensions in the Data Warehouse, you are also granted that surrogate keys don't change their internal meaning during time. But, are you sure you want to use surrogate keys in end user reporting tool?

The problem arises when you have a daily reprocess of the whole OLAP database and of the whole relational data mart (or data warehouse, if you prefer). A complete reprocess is not so strange if you have less than 10 GB of data. I've seen the same technique applied to data mart with tens of millions of rows in the fact table and 20/30 GB of data (the process can be 3/4 hours each night).

Each dimension that makes the granularity attribute visible to end users, also exposes the surrogate key. Even if you change the Name property of the attribute, its Key still maintains the surrogate key. When the end user creates a pivot table with Excel 2007 or a report with ProClarity, he never see the surrogate key, but he save a document that contains that key and that will use this key to query the cube next time.

The user might open a query saved some days/months ago and... He doesn't see the same members, just because the same surrogate keys got a different meaning. If surrogate key are generated each night, this happens daily. Even with an incremental data mart you might have the need to reprocess (on the relational side) one or more dimensions in the future.

Now, the issue is both practical and philosophical. From the practical point of view, the user rants he lost his reports. From the philosophicalside, when you save a surrogate key in a report, you are giving a semantic value to something that probably shouldn't have one.

With SSAS and pivot tables this is a practical issue that may be approached in several ways. I'm studying what is the better balance between usability, maintainability and performance. But I'd like to get some comments about this issue that probably is very common. I did some search, but I haven't found a good discussion focused on the issues related to Analysis Services client tools.

Give me your feedback!

Published Tuesday, July 03, 2007 6:46 PM by Marco Russo (SQLBI)

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Sqlgoof said:

Well i normally choose to eliminate unwanted attributes and this includes the SK. So for the productname atttribute for exmaple i'll use the SK as key and productname as name, then delete the SK as attribute. So in Excel you'll still have the key but only see the productname, i don't see how one will get away from that....

July 12, 2007 4:03 AM
 

Marco Russo (SQLBI) said:

The idea is to put:

SK as Attribute Key (invisible)

ProductName as Attribute (with an application-key as Attribute Key and ProductName as Attribute Description)

Marco

July 14, 2007 8:29 AM
 

SQLBI - Marco Russo said:

BI Developer Studio does a lot of automatic changes when you change something. Each dimension is stored

July 16, 2007 8:58 AM
 

AlbertoFerrari said:

One more problem raises when you have a parent/child hierarchy. In this case hiding the SK and leaving only the natural key as visible is not a viable solution because you need to use the primary key of the dimension to let the dimension relate to itself with the parent/child relationship.

So, for P/C hierarchies I had to use the natural key to make the relation from both the fact table and the hierarchy in itself.

Up to now, I haven't found a better solution with parent/child hierarchies, if anyone has a better idea she's welcome.

September 10, 2007 1:56 PM
 

Tom said:

I just ran into the same issue. A customer opens a previously created report and expects to see the same filtered dimension members on the rows. Instead other dimension members are displayed. The dimension filtered is not slowly changing but still some incidental changes were done. SK's are created in SSIS using a script so the SK' are added based on the order the records go through the package. This orders must have been changed. I do not really see a technical solution since the SK is required.

July 23, 2009 8:34 AM
 

Marco Russo (SQLBI) said:

Tom,

look at my previous comment - we use SK in a hidden attribute and put the application key as the most granular visible attribute.

Marco

July 23, 2009 8:57 AM
 

Tom said:

Marco,

Hiding the SK and using another attribute for end users is a good idea which I always use. The problem however is that it's a parent -child dimension. As AlbertoFerrari mentions in a previous post parent -child dimensions require a single key. In my case this was the (automatically generated) SK. I now have to combe two natural keys into a new SK. Probably less efficient in joins (both varchars...) but I can not think of another way.

By the way i have considered using CHECKSUM om the new SK but this pootentially generates non unique values.

July 28, 2009 11:00 AM
 

Lena said:

Hi Marco,

I have a requirement to make Excel Pivot tables always remember the members saved. So I need to make sure that surrogate keys never get visible to users of the cube at any level, and never get saved inside Excel reports. How do I do that?

You are giving a solution here: "we use SK in a hidden attribute and put the application key as the most granular visible attribute."

Questions:

1)What attribute do you set with usage=key (the SK or another one)?

2)do we have to have SK as an attributes at all, even hidden? Can we just not use SK as Dimension attribute completely (that would make sure users of cube will never see it)

I am using SSAS 2008

June 10, 2010 4:05 PM
 

Lena said:

Sorry, I forgot to say Hi and Thanks in advance.

I can't seem to find an information about this issue anywhere.

Your post seems to be the only place addressing this issue.

June 10, 2010 4:08 PM
 

Marco Russo said:

Lena,

1) you have to use the SK with attribute usage = key

2) you have to keep SK because you need it to join the fact table in a fast way (just hidden it is enough to make sure end users will never see it)

Marco

June 10, 2010 4:31 PM
 

Lena said:

Thanks a lot, Marco!

June 11, 2010 12:03 PM
 

Stanislav said:

Marco, i don't get your answer

you suggesting exposing natural key. that does not help.

suppose that SCD2 change occur and you have now two records with two unique keys.

when user referencing natural key excel is going to randomly pick-up

of the two records with matching natural key. that a mistake. exp this problem shows up when user wants to see flat list of leaves over time using excel functions. excel pivot tables do work correctly and can be refreshed if SCD2 change occurs. the issue is that excel cube functions are not able to address members by natural key because that that member that changed naturally forms a set which should be plugged into query. that set probably should be session scoped by excel  cube functions.

March 4, 2011 4:58 PM
 

Marco Russo (SQLBI) said:

In case of an SCD2 the natural key should be a unique key composed by the application key plus another field (version number, date, ...), depending on the modeling of your dimension.

Am I missing something?

Marco

March 4, 2011 5:03 PM
 

Stanislav said:

Excel cube functions, that is

SCD2 change can occur in flat or parent-child dimension

March 4, 2011 5:11 PM
 

Marco Russo said:

Stanislav, sorry but I don't get your point.

Marco

March 4, 2011 5:22 PM
 

Stanislav said:

Marco, your answer is correct but sythnetic key is already playing this role of unique identifier. imagine the following situation

you have a parent-child hierarchy with the following three fields(key,parentkey,businesscode) and it changes over time as SCD2 structure.

now a user wants a flat list of both leaf-level and branch codes fields over time. let's say a node called liabilities moved in the midle of the year to some other parent. now due to scd2 logic liabilities will be present in the table twice. first structure before the change and then structure after the change. when we do flat list of branches and leafes we don't want liaibility to appear twice, instead liability should become a set comprising the node before change and after the change. this set is non-overlapping in time and sums up beutifully. this the functionality excel and msas are missing.

March 7, 2011 1:50 PM
 

Marco Russo (SQLBI) said:

Stanislav,

I get your point now.

You are assuming that, building an SCD2, the surrogate key is build just once and it is the only point where the change is detected and stored. This might be true in certain implementations, but it is not true if you reprocess (=rebuild) the dimension table from scratch, rebuilding all the changes occurred in a period. At that point, you rebuild surrogate keys and, depending on the way the ETL is implemented, you are not granted to rebuild the same surrogate key for the same element.

In certain models, like the SQLBI Methodology, you have the data warehouse that is not based on surrogate keys and the SCD2 in Kimball methodology can be rebuilt in any moment, but without the grant of having the same surrogate keys for the same entities in two different reprocess operations.

Marco

March 7, 2011 2:15 PM
 

Tiago BI said:

Hi,

In order to overcome the original problem described by Marco and to be consistent to the AS2000 way, we changed the "MemberUniqueNameStyle" property of the dimensions from "Native" to NamePath.

Regards,

Tiago.

April 18, 2011 6:00 AM
 

Marco Russo (SQLBI) said:

Thiag, thanks for your comment - please can you be specific about when you made this change? In Denali?

Is this going to change the default in existing projects?

April 18, 2011 5:41 PM
 

Sohrab said:

Marco,

I found your post while investigating the issue of dealing with SKs, reprocessing whole dimensions, having ETL assign the SK values non-deterministically, and not having client OLAP applications store key values so that they don't fail if/when dimensions are rebuilt and SK values change.

MemberUniqueNameStyle is available in at least MSAS 2005, though it took me a while to find it myself:

In BIDS, with the cube open, on the cube structure tab, left click on a dimension in the bottom-left window where the cube dimensions are listed, then view the properties of the selected dimension.  The properties window will now list, among other things, the MemberUniqueNameStyle property.

There is a screenshot of this accessible via a Google Images search for MemberUniqueNameStyle.

Best regards,

Sohrab

November 29, 2011 9:41 PM

Leave a Comment

(required) 
(required) 
Submit

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement