THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Alberto Ferrari

  • New Whitepaper from SQLBI: Vertipaq vs ColumnStore

    At the end of June 2012, I was in Amsterdam to present some sessions at Teched Europe 2012 and, while preparing the material for the demos (yes, the best demos are the ones I prepare at the last minute), I decided to make a comparison between the two implementations of xVelocity of SQL 2012, one is the VertiPaq engine in SSAS Tabular and the other one is the ColumnStore index in SQL Server. After some trials, I decided that ColumnStore was a clear loser, because I was not able to see a real improvement in performance for my test queries, involving complex calculations and many-to-many relationships.

    But, hey, I am not a SQL expert, I work mostly on the BI side, might it be the case that I don’t know how to write a query that takes advantage of ColumnStore?

    Being in a place full of SQL gurus from Microsoft and MVPs, I decided to ask for help. Hugo Kornelis was there and I had a chance to talk with him about ColumnStore performance. It turned out that he perfectly knew how to improve the performance of my queries and I was amazed to see that, after his intervention on SQL code, the two engines did run at a comparable speed. Moreover, sometimes ColumnStore was faster than VertiPaq. I suddenly decided to investigate more on the topic. I did not have time to prepare the material for the demos, but, at that point, a whitepaper was born.

    At the end, I discovered (many thanks Hugo!) that the first implementation of the ColumnStore indexes in SQL2012 is not perfect and hopefully will be improved in the future. There are several limitations that reduce the effectiveness of ColumnStore indexes, as stated in the following post: http://msdn.microsoft.com/en-us/library/gg492088.aspx.

    If you want to join me in this amazing trip on performance analysis of two incredible query engines, take a look at the whitepaper on “Vertipaq vs ColumnStore” on sqlbi.com and provide me all the comments you have, I always enjoy reading your thoughts, I learn a lot by comparing with you.

  • DAX in Action

    I do not normally blog about my activity at conferences, courses and workshop, not because I don’t like these “marketing” topics, but only because I forget to do it. That said, sometimes I do some really exciting activities and, well, I need to share them with all of my readers, just for the pleasure of doing it.

    You probably already know www.projectbotticelli.com, it is probably the best place where you can find informative videos about our beloved Business Intelligence topics. Starting from August 2012, SQLBI is working with Project Botticelli to provide videos about DAX and the new SQL Server Analysis Services Tabular model.

    The first video, which is a 20 minutes teaser on how to compute new and returning customers in DAX, is out right now. You can find it here: DAX in Action. Preparing the video has been a wonderful experience, I learned a lot and really enjoyed staying behind a camera. We are already working on other great videos on the DAX language, stay tuned!

  • Clever Hierarchy Handling in DAX

    Hierarchy handling in DAX is not very easy, due to the fact that hierarchies, unlike it was in MDX, are not first-class citizens in the DAX world. While hierarchies can be easily defined in the data model, there are no DAX functions that let you access, for example, the parent of the CurrentMember. Well, to tell the truth, there is no concept of CurrentMember in DAX either. That said, there are several scenarios where it might be useful to handle hierarchies in DAX and, in this post, I am going to show some techniques to make them work in a clever way, starting from the basics until we reach a good level of hierarchy understanding in DAX.

    Read more at http://www.sqlbi.com/articles/clever-hierarchy-handling-in-dax/.

  • Speaking at Lisbon NetPonto UG on 02/13/2012

    As you might (or not) know, I travel a lot for work and, when possible, I always try to speak at local user groups. It is fun and I always learn a lot by speaking with other professionals like me.

    Being in Lisbon next week, I’ll deliver a speech at Microsoft Portugal with the NetPonto  user group for a couple of hours of DAX, Vertipaq and many-to-many relationships. Here’s the link of the event, a free registration is required: http://netponto-lisboa-fevereiro-2012-powerpivot.eventbrite.com/

    If you happen to be there, I’ll be really happy to meet you.

  • OT: A Good Reason to Attend #SQLBits

    My 4yo son keeps asking me why I should leave home so often to attend SQL conferences around the world and today, after a couple of days of snow, he discovered a good reason, along with the best place for the SQLBits scarf. Sorriso

    DSCF2594

  • Circular Dependencies in DAX

    What is a circular dependency? If A depends on B and B depends on A, then you have a circular dependency. As programmers, we know that they should be avoided to obtain a working system. All this looks fine but, when authoring a simple DAX formula containing CALCULATE, SSDT (formerly known as BIDS, sometimes called Visual Studio… I love Microsoft attitude to find new names) warned me about circular dependency errors and, believe me, there seemed to be no loops inside my formulas.

    After some fight with the tool, it turned out that anytime you use CALCULATE in a formula for a calculated column you are in danger for a circular dependency and, to solve the issue, the easiest way is to add a key column to a table. Understanding why the problem exists is not trivial and it requires you to read the article I have published on the SQLBI website here: http://www.sqlbi.com/articles/understanding-circular-dependencies/.

    Clearly the scenario is the same for PowerPivot 2012 and for Tabular in SSAS 2012.

    Happy reading! And, as always, comments are very welcome.

    BTW: Happy new year to everybody! In case you forgot it, THIS is the year of SQL2012. Sorriso

  • SUM or SUMX? With Simple Intermediate Calculations SUMX Is The Way To Go

    Data modeling in Tabular is different from Multidimensional. I spend much time teaching people that they need to change their mind when approaching the Vertipaq engine, because it is a different tool with different capabilities. Nevertheless, I still have a lot to learn on this topic, as I am going to show in the next paragraphs.

    One customers of mine has a 4 Billion rows table containing some numbers he wants to work with. Among those, two columns contain a value and a weight and, when aggregating the value, it should be weighted by the weight. This is a very common pattern: in the banking environment, for example, you might have the rate and the amount and, to get the mean rate, you need to weight the detail rates by the amount. In my specific scenario, the value represents the age of a customer, ranging from 0 to 100, while the weight is an integer number ranging from 1 to 60M. Take note of the ranges, it will come handy later.

    I am used, in such a scenario, to store in the fact table the result of Age * Weight in a column called AgeMultipliedByWeight so that the aggregate can be computed as SUM (AgeMultipliedByWeight)/SUM(Weight). Following this pattern, I developed the solution and performances were pretty good. Nothing exceptional, but it worked fine. The reason to store the intermediate calculation is to use SUM instead of SUMX, getting better performances, as I normally teach during my classes.

    Then, during PASS 2011, I had the option to discuss this model with the dev team and it has been very funny to get their feedback, which sounded like: “why did you do something that stupid? You don’t need it, just compute the values runtime using SUMX, it is much faster and you consume a lot less memory”. Learning why this pattern is wrong in Tabular is definitely worth spending some time, because in Multidimensional this is the right pattern. So, let us think in DAX and try to figure out what’s wrong with storing intermediate calculations.

    First of all, we already know that the memory usage of Vertipaq depends on the number of distinct values of a column. It turns out that, while the age has only 100 distinct values, AgeMultipliedByWeight has 600 millions of different values. The difference is huge and, in fact, it turned out that the AgeMultipliedByWeight column was using a lot of memory (9Gb in my scenario with 4 billion rows). First lesson learned, increasing the number of distinct values of a column is easy as performing some computation over it and a lot of precious memory is wasted if you don’t pay attention to that.

    The second and most important consideration is that the DAX Formula Engine is capable to push simple calculations to the Vertipaq Engine, which runs in a multithreaded environment (while the Formula Engine is still single threaded). Thus, if a SUMX contains a simple calculation (i.e. simple math with no complex formulas), this formula will be pushed down to Vertipaq and will be executed in a parallel way directly at the leaf level. There’s a lot of work yet to be done in order to understand what “simple calculations” means, but I will speak about that in future posts, as soon as I discover more information about what can and cannot be pushed down to Vertipaq.

    Back home, I had the option to try this on my server and, as expected, performance is now much better. By changing the formula from SUM to SUMX, avoiding the intermediate calculated column, I saved 9Gb of RAM and the test query, which was running in 13 seconds, is now executed in 3 seconds, i.e. 4 times faster, scaling on all the available cores.

    Thus, the first lesson here is always the same: thinking in DAX is not easy, we need to get rid of many of the concepts we have learned with UDM/MDX and change the way we approach problems. The results, when following this practice, are awesome. There is another lesson, probably the most important one: speaking with people at conferences lets you learn a lot, much more than you will ever be able to learn alone in the office or at home.

    For the ones of you coming at SQL PASS Nordic… I am eager to speak with you too, see you there on 8-9 November.

  • PASS Summit 2011: good time for a chat on DAX?

    This year I and Marco Russo will host a Birds of a Feather lunch about “DAX, Vertipaq and BISM Tabular” during PASS Summit in Seattle.

    If you want to spend some time speaking about DAX, the new Vertipaq engine of just have a good chat in front of some food to give a face to a name, just come there and sit down. The BOF lunch will be on Friday, there is a bunch of tables to choose from, as you can see in the complete list. Social networking is one of the main reasons to go to a conference, thus, don’t be shy, come to meet us in person: we’ll be really happy to share some time with you and provide a demonstration of true “Spaghetti English”.

  • Creating a copy of a BISM Tabular project

    Using Visual Studio to author SSAS Multidimensional projects, I was used to copy the project, deploy it with another name and then proceed with updates, until I was satisfied with the final result. Then, working with the two projects side-by-side I could check performance and numbers.

    I tried the same technique with Visual Studio 2010 with CTP3 Denali with a Tabular model. I created a copy of an existing project but, as soon as I opened it, I got the error “This model is already open in another location by the same user, please close the other project before working on this one”:

    image

    The reason for the error is that, copying the project folder, I copied the user settings file too, which contains the workspace database definition. Because the workspace is already in use by the previous instance of the project (which is still open, due to the way I work) and a workspace cannot be used by more than one active project. For more information about the user settings file and the behavior when opening a BIM project in Denali, refer to this very comprehensive post of Cathy Dumas.

    The solution to this problem is straightforward (once you learn it, thanks Cathy for explaining this!): just locate and delete the user settings file (which, by the way, is a hidden file, thus you will need to show hidden files before searching for it). As soon as Visual Studio opens a project with no user settings file, it will create a new workspace, detaching the copied project from the original one, and I can work with the two projects side-by-side.

    Tabular projects work in a different way, when compared with Multidimensional ones, we just need to learn new tricks, as always. Sorriso

  • KEEPFILTERS: a new DAX feature to correctly compute over arbitrary shaped sets

    Having read this question on the mdsn blogs, I investigated on the KEEPFILTERS function and, after having learned it, it is now time to write about it. Moreover, before start to write about it, I need to thank the dev team of SSAS and Marco Russo who helped me understanding this complex topic.

    This blog post is not an easy one, so let me start with some conclusions, in order to let you understand why you need to read the post up to the end and digest its content.

    KEEPFILTERS:

    • Is a new feature in the Denali version of DAX
    • Is very useful, I would say necessary
    • It solves a problem that is very common and very difficult to address
    • It is complex. No, is is very complex. I think Rob will need to update his spicy scale of functions to make some place for KEEPFILTERS
    • If you don’t use and understand it, you will incur in major problems with your formulas and debugging the wrong results will turn into a nightmare

    These are the final considerations. Now, if you want to discover why these facts holds, roll up your sleeves and come with me in a travel in the land of filter contexts.

    We all know that filter contexts are the foundation of any DAX calculation. The CALCULATE function is used to create, alter, update filter contexts and to create any complex formula in DAX. We also know that iterators like FILTER, SUMX and AVERAGEX create a row context that gets translated into a row context when a measure is used as the formula to iterate.

    For example, to compute the average of yearly sales, we can define a couple of measures:

    TotalSales := SUM (FactInternetSales[SalesAmount])
    AvgYear := AVERAGEX (VALUES (DimTime[CalendarYear]), [TotalSales])

    The measure [TotalSales] inside AVERAGEX is called in a filter context that filters only one year. The values are then averaged to return the correct value. All this is well known.

    For educational purposes, we are not going to use AVERAGEX but a modified version of the formula which is useless, but makes the concepts clearer. We simply substitute AVERAGEX with SUMX:

    TotalSales := SUM (FactInternetSales[SalesAmount])
    SumYear := SUMX (VALUES (DimTime[CalendarYear]), [TotalSales])

    Thus, SumYear is equivalent to SUM (FactInternetSales[SalesAmount]). Or… it should be… as we are going to see in a few minutes, something weird will happen when we evaluate this formula over a user defined hierarchy.

    In the following figure, I have put the calendar hierarchy on the rows, TotalSales and SumYear on the columns and the result is straightforward: the two results are identical.

    image

    Now, I can filter the time and decide that I want to see only July and August for 2001 and September and October for 2002. This can be easily accomplished filtering the hierarchy but, this time, the result is much more interesting:

    image

    If you look at the highlighted cells you should have the strong feeling that something is going wrong. From where do these strange numbers come? At the month level, everything is fine. At the year level and at the grand total the values shown make no sense at all. In order to understand what is going wrong here, we need, as always, to dive into the different filter contexts that exists during the steps of the computation.

    First of all, we have filtered some months for 2001 and some other months for 2002, creating a complex filter that looks like this:

       (DimTime[CalendarYear] = 2001 && (DimTime[MonthName] = "July" || DimTime[MonthName] = "August"))
    || (DimTime[CalendarYear] = 2002 && (DimTime[MonthName] = "September" || DimTime[MonthName] = "October"))

    This filter contains two columns: MonthName and CalendarYear and the resulting filter is a mix of both columns, resulting in a relationship between the two columns. Please, read this sentence twice and keep this in mind: this filter contains two columns.

    Now, what happens when the SumYear gets evaluated? This is the formula:

    SumYear := SUMX (VALUES (DimTime[CalendarYear]), [TotalSales])

    SUMX iterates over the values of CalendarYear and, for each value, it computes TotalSales, after having transformed the row context over CalendarYear in a filter context. Thus, if we unroll the iteration, the formula is equivalent to:

    CALCULATE ([TotalSales], DimTime[CalendarYear] = 2001) +
    CALCULATE ([TotalSales], DimTime[CalendarYear] = 2002)

    We have two CALCULATE that set a filter on the year. Now it is useful to remember that when a column gets filtered inside CALCULATE, the new filter overrides any existing filter on the same column. Do we have any filter on CalendarYear? Yes, we do, because the previous filter context imposed by the hierarchy was filtering CalendarYear and MonthName. Thus, the engine will remove the filter on CalendarYear from that formula and then apply the new filter.

    What happens to our original filter if we remove all the references to CalendarYear? It becomes:

       ((DimTime[MonthName] = "July" || DimTime[MonthName] = "August"))
    || ((DimTime[MonthName] = "September" || DimTime[MonthName] = "October"))

    You see that removing CalendarYear from the filter it now become a completely different filter, that says: “any month from July to October is fine”. We are then going to add the new filter on CalendarYear and the resulting filter, under which [TotalSales] gets computed is:

       DimTime[CalendarYear] = 2001
    && (DimTime[MonthName] = "July" || DimTime[MonthName] = "August" || DimTime[MonthName] = "September" || DimTime[MonthName] = "October")

    In other words, the cell for the year 2001 takes into account the months from July to October and, clearly, the same happens fro 2002. The final result is completely wrong because the original filter is lost. This is the reason for which numbers are wrong.

    Luckily, the SSAS dev team addressed this problem in advance and gave us the magic function KEEPFILTERS. What KEEPFILTERS does is to modify the semantics of CALCULATE so that the new filter will not replace any existing filter but will be merged in AND with any previous filters.

    If we rewrite our SumYear definition in this way:

    SumYearWithKeepFilters := SUMX (KEEPFILTERS (VALUES (DimTime[CalendarYear])), [TotalSales])

    We are asking DAX to take the values of DimTime[CalendarYear] but keep any existing filter on the same column in place, without removing them. With this definition, the filter context under which [TotalSales] gets evaluated is:

       DimTime[CalendarYear] = 2001 &&
    DimTime[CalendarYear] = 2001 && (DimTime[MonthName] = "July" || DimTime[MonthName] = "August")) || (DimTime[CalendarYear] = 2002 && (DimTime[MonthName] = "September" || DimTime[MonthName] = "October"))

    And, this time, the formula will return the correct result because the original filter context is preserved and, with it, the relationship between months and years. You can see that in the following figure:

    image

    Now that we have understood the issue with SUM, it is easy to see that the same problem happens with AVERAGE, but it is harder to detect because numbers are not so easy to check.

    Before to go to a conclusion, I would like to spend some more words on the topic and I will use DAX EVALUATE function to show how the same scenario can easily happen (and be verified) in DAX. The same model, deployed on SSAS Server in Vertipaq mode, can be queried with DAX and, to simulate the complex condition, we use CALCULATETABLE and put a filter on a CROSSJOIN. Take a look at this DAX query:

    DEFINE
        MEASURE FactInternetSales[SumYearNoKeep] = SUMX (VALUES (DimTime[CalendarYear]), [TotalSales])
        MEASURE FactInternetSales[SumYearKeep] = SUMX (KEEPFILTERS (VALUES (DimTime[CalendarYear])), [TotalSales])
    EVALUATE
        CALCULATETABLE (
            SUMMARIZE (
                CROSSJOIN (
                    VALUES (DimTime[CalendarYear]),
                    VALUES (DimTime[MonthName])
                ),
                ROLLUP (
                    DimTime[CalendarYear],
                    DimTime[MonthName]
                ),
                "SumYeaNoKeep", [SumYearNoKeep],
                "SumYeaKeep", [SumYearKeep]
            ),
            FILTER (
                CROSSJOIN (
                    VALUES (DimTime[CalendarYear]),
                    VALUES (DimTime[MonthName])
                ),
                (DimTime[CalendarYear] = 2001 && (DimTime[MonthName] = "July" || DimTime[MonthName] = "August"))
             || (DimTime[CalendarYear] = 2002 && (DimTime[MonthName] = "September" || DimTime[MonthName] = "October"))
            )
        )

    This query returns both [SumYearNoKeep] and [SumYearKeep] and, strangely, they return the same wrong value. This is because the damage of destroying the original filter context has already been done by the CROSSJOIN inside SUMMARIZE, which did not take into account the previous filters.

    If we add KEEPFILTERS to the CROSSJOIN inside SUMMARIZE, the formula will be different:

    DEFINE
        MEASURE FactInternetSales[SumYearNoKeep] = SUMX (VALUES (DimTime[CalendarYear]), [TotalSales])
        MEASURE FactInternetSales[SumYearKeep] = SUMX (KEEPFILTERS (VALUES (DimTime[CalendarYear])), [TotalSales])
    EVALUATE
        CALCULATETABLE (
            SUMMARIZE (
                KEEPFILTERS (
                    CROSSJOIN (
                        VALUES (DimTime[CalendarYear]),
                        VALUES (DimTime[MonthName])
                    )
                ),
                ROLLUP (
                    DimTime[CalendarYear],
                    DimTime[MonthName]
                ),
                "SumYeaNoKeep", [SumYearNoKeep],
                "SumYeaKeep", [SumYearKeep]
            ),
            FILTER (
                CROSSJOIN (
                    VALUES (DimTime[CalendarYear]),
                    VALUES (DimTime[MonthName])
                ),
                (DimTime[CalendarYear] = 2001 && (DimTime[MonthName] = "July" || DimTime[MonthName] = "August"))
             || (DimTime[CalendarYear] = 2002 && (DimTime[MonthName] = "September" || DimTime[MonthName] = "October"))
            )
        )

    And, this time, the result will be the correct one. Take your time to understand well these two formulas, they are not easy ones but, hopefully, they will let you understand how KEEPFILTERS works and why it is needed.

    Moreover, for the brave reader that has still wants to go deeper into the topic, it is worth to study this variation of the same query, where I have removed the FILTER on the CROSSJOIN and replaced it with a classical FILTER on the full table.

    DEFINE
        MEASURE FactInternetSales[SumYearNoKeep] = SUMX (VALUES (DimTime[CalendarYear]), [TotalSales])
        MEASURE FactInternetSales[SumYearKeep] = SUMX (KEEPFILTERS (VALUES (DimTime[CalendarYear])), [TotalSales])
    EVALUATE
        CALCULATETABLE (
            SUMMARIZE (
                CROSSJOIN (
                    VALUES (DimTime[CalendarYear]),
                    VALUES (DimTime[MonthName])
                ),
                ROLLUP (
                    DimTime[CalendarYear],
                    DimTime[MonthName]
                ),
                "SumYeaNoKeep", [SumYearNoKeep],
                "SumYeaKeep", [SumYearKeep]
            ),
            FILTER (
                DimTime,
                (DimTime[CalendarYear] = 2001 && (DimTime[MonthName] = "July" || DimTime[MonthName] = "August"))
             || (DimTime[CalendarYear] = 2002 && (DimTime[MonthName] = "September" || DimTime[MonthName] = "October"))
            )
        )

    This time the filter is not on the CROSSJOIN of the columns but on the full table, resulting in a table filter context. This formula will return the correct value for both [SumYearNoKeep] and [SumYearKeep] because the table filter context will impose its filter on the individual rows of the table and the new filter on CalendarYear will not touch the filter on the table. As always, in DAX there is a strong difference between a table filter context and a column one. In Denali we can now create a new kind of filter context that contains many columns from the same table and, for that kind of filter context, KEEPFILTER might be necessary to use to avoid inconsistent results in your formulas.

    Moreover, even if it seems that KEEPFILTERS usage can be avoided by means of using table filters, keep in mind that we have provided this example using only one table. If you want to filter a CROSSJOIN that uses more than one table, then it will not be easy to create table filter contexts on that structure. Thus, KEEPFILTERS usage is much more convenient because it solves any scenario you will encounter.

  • Deep dive day on #DAX at #SQLBits

    September 29th I and Marco Russo will be delivering a full day of deep dive into the DAX programming language, during the training days of SqlBits 9. Later on, during the conference, I will deliver a session about many to many relationships in DAX, exploiting the power of the Tabular data model with DAX to solve complex problems with many to many relationships. Mine is just one among several great sessions delivered by many of the most talented SQL professional from all around the world.

    There are still a few seats available for the training day on DAX. Thus, if you are interested, hurry up and register yourself, I’m eager to meet you in Liverpool.

    You can find full information here: http://www.sqlbits.com/information/Event9/DAX_Deep_Dive/TrainingDetails.aspx.

    See you there

  • #PowerPivot Webinar, august 17th at 11 AM CET

    Tomorrow, at 11 AM CET, I will deliver a webinar on how to produce interesting reports with PowerPivot, a webcast which is a first taste of my incoming session for the European Sharepoint Conference.

    You can see all the details here: http://www.sharepointeurope.com/upcoming-webinars/self-service-bi-at-work,-an-example-of-a-powerful-user-created-report-in-powerpivot.aspx

    See you tomorrow. Sorriso

  • PowerPivot Workshop in Portugal #PowerPivot, #ppws

    A new date is available for the PowerPivot workshop I and Marco are delivering all over Europe.

    On September 19-20, 2011 we will be in Lisbon (Portugal) to present the workshop to Excel users and BI professionals.

    Moreover, as Denali CTP3 of PowerPivot is now disclosed, we will present, for the first time, an enhanced version of the workshop with a new set of slides and demos aimed to understand what’s new in Denali for all the PowerPivot users.

    In the meantime, we are organizing a community event on the evening of the 19th, where we are going to speak about the roadmap from UDM to BISM and how to shape your data and skills for the incoming version of Analysis Services. We will publish all the details of the event as soon as we will know them.

    You can find more information and the registration page at www.powerpivotworkshop.com.

    I know from personal experience that Lisbon is a wonderful city and the food is simply great. Moreover, I have been assured that the weather will be nice too in that period. Thus, I look forward to see many of you at the workshop and at the community event, even from outside of Portugal!

  • OT: New transaction in the World Database

    Wow! Today I hit F5 on this interesting transaction:

    BEGIN TRANSACTION
        INSERT INTO World.Babies (Name, Surname, DateOfBirth, Gender)
        VALUES ('Arianna', 'Ferrari', '20110720', 'Female');

        UPDATE World.Parents SET STATUS = 'Happy'
        WHERE NAME = 'Alberto' AND Surname = 'Ferrari';
    COMMIT

    Thanks Caterina, I’m now the proud father of two babies, thanks to you!

    And… yes, I am going to lose some sleep, this time without thinkink at Denali. Sorriso

  • Tabular Parent/Child Dimensions: SumOfLeaves and the CALCULATE wall

    In my last post about Parent/Child hierarchies, there is a question, in the comments, that I found interesting. Nevertheless, the formula is a complex one and cannot be written in a simple comment. Thus, I am making a follow-up to that post. I am not repeating all the stuff of the previous post so, please, read that before reading this one, to have the necessary background.

    The question looks a simple one: “If you needed a measure (SumOfLeafAmount) that only shows childrens values, what would it be like?

    I was about to answer that “I have a formula that is too large to fit in the margin” but… Fermat did it some years ago and caused a real mess. I don’t want to be responsible for such a complication. Thus, I am providing the answer in this new blog post. Sorriso And… ehy, no, I don’t consider myself anyhow similar to that genius, I’m just joking!

    There two interesting points in this simple question:

    1. There are two interpretation of the formula: one is “sum only the leaves”, the other ones is “sum only the children”. Both are valid and interesting formulas, but they will lead to completely different implementations, and I am going to show both.
    2. During the development of the “sum of the children” we are going to hit the CALCULATE wall with our head and… yes, it hurts. Sorriso

    Let us start with the set of data we are going to work on. The first figure is the hierarchy:

    image

    And, in the second one, I have put in the same figure a PivotTable containing the P/C structure with the SumOfAmount measure and the content of the Invoices table, which is the source of the Amount column.

    image

    Now, what are the desired results?

    1. SumOfChildren, for Bill, is 1600, because his amount is coming from children only, i.e. Brad, Chris, Vincent, Julie. Bill himself is not producing any sale. SumOfChildren for Brad, on the other hand, is 900, because he sold 400 by himself with two invoices.
    2. SumOfLeaves, for Bill, is 1200 because the value produced by Brad is not on a leaf. Brad has children, thus he is not a leaf and his value should not be aggregated. Among its children, only Brad has a SumOfLeaves greater than zero, because he has children, while all others (Chris, Vincent and Julie) are leaves.

    In the case of Annabel, both formulas will return the same value because all of Annabel’s children are leaves.

    Let us start with SumOfLeaves, which is pretty easy. Any row in the hierarchy can be a leaf or not. It is a leaf if it has no children, otherwise it is the parent of somebody, thus it is not a leaf. In other words, it is a leaf it there aren’t any rows in the hierarchy where the ParentNodeId has the value of the NodeId.

    We can define a new calculated column, called IsLeaf, with this formula:

    =COUNTROWS (
        FILTER ('Hierarchy', 'Hierarchy'[ParentNodeID] = EARLIER ([NodeId]))
    ) = 0

    Or, if you are a real fan of Denali and of the new Parent/Child functions, this one works too (and, these days, it makes you look cooler):

    =COUNTROWS (
        FILTER (
            'Hierarchy', 
            PATHCONTAINS ('Hierarchy'[HierarchyPath], EARLIER ([NodeId]))
        )
    ) = 1

    Now, the hierarchy has a new column:

    image

    And the formula for the measure SumOfLeaves becomes straightforward, just a simple usage of CALCULATE:

    SumOfLeaves=IF (
        [BrowseDepth] > [MinNodeDepth], 
        BLANK(),
        CALCULATE (
            SUM (Invoices[Amount]), 
            'Hierarchy'[IsLeaf] = TRUE
        )
    )

    The result, compared with SumOfAmount, is very clear:

    image

    The reason for which SumOfLeaves is easy is because each row in the hierarchy is either a leaf or not, its behavior is easy to define. In the case of SumOfChildren, this is no longer true. Each row can be aggregated to the total or not, depending on the level we are browsing.

    Take, for example, the value of Brad. When we are summarizing Brad’s amount, his value should not be computed, because he is not a child of himself. Thus, SumOfLeaves(Brad)=900. But, when we summarize the value of Bill, Brad is a children of Bill and the complete amount of 1300 should be computed for Bill’s amount. Same row, different behaviors… things are becoming interesting.

    If we want to compute the sum of all the children of a node in a P/C hierarchy, it is enough to create a filter context that isolates the current root of the tree. Once we have isolated that row, we can create a FILTER expression that computes all the nodes but that one. The problem is that a row can be the root at some point and a regular node at other points, the difference being the value of BrowseDepth.

    In order to understand the incoming formula, we need to take some time to study our data set. I have created a new measure, to see the value of NodeDepth inside the PivotTable, which basically shows the value only when a single row of the hierarchy is selected:

    [Node Depth] = IF (
        COUNTROWS ('Hierarchy') = 1,
        VALUES ('Hierarchy'[NodeDepth]),
        BLANK ()
    )

    With this new measure, and after having modified slightly the definition of the Level1..3 calculated columns to always show the value of a node (it was previously blank), I got this very interesting figure:

    image

    Up to now, we have used the difference between BrowseDepth and NodeDepth to hide rows. Now we will use the same values to compute the SumOfChildren. By carefully looking at the PivotTable, it is easy to verify that the SumOfChildren of Brad will need to avoid computing the highlighted row, where BrowseDepth > NodeDepth. Clearly, when computing the value for Brad, we will be interested in the rows in the red box, all others are of no interest.

    When the same row (the highlighted one) will be computed for Bill’s total, the value of BrowseDepth will be 1, while NodeDepth will still be 2. In that case, the value of Brad will be used to compute the total for Bill.

    Thus, by using this simple test, SumOfChildren will compute the sum of the values of all the children of a node, avoiding the node itself, which is the only one where the condition is false. With all this in mind, the formula is straightforward:

    [SumOfChildrenWrong]=
    IF (
        [BrowseDepth] > [MinNodeDepth], 
        BLANK(),
        CALCULATE (
           SUM (Invoices[Amount]),
           FILTER (
                'Hierarchy',
                'Hierarchy'[NodeDepth] > [BrowseDepth]
           )
        )
    )

    We use FILTER to remove the nodes whose NodeDepth is not higher than BrowseDepth. Add this measure to the PivotTale and this is the result:

    image

    Et voilà, BLANK everywhere, not a single number shown. Something is wrong but… what? I strongly suggest you to check the formula by yourself at least twice before to continue reading because this is the CALCULATE wall and, hitting it, is always a pain.

    Now, the solution. I already gave you the first hint: the problem has something to do with CALCULATE. Ok, how many CALCULATE do you see in the formula?

    If you answered 1, which is the naïve answer, you are able to read a formula, but you hit the CALCULATE wall. If you answered zero, two or three, it means you need better glasses. But if you answered the only right answer, which is FOUR, then you have already touched the DAX karma: you already know what I am going to write.

    Where are those four CALCULATE? We need to remember that whenever a MEASURE is used inside a formula, it is automatically surrounded by a hidden CALCULATE. And, in this formula, we have three measure calls:

    • [BrowseDepth] in the first parameter of IF
    • [MinNodeDepth] in the first parameter of IF
    • [BrowseDepth] in the inner test of the FILTER

    These three measures, plus the evident CALCULATE, make the total of four CALCULATE inside the formula.

    Now, what happens for the [BrowseDepth] measure computed inside FILTER? We have a CALCULATE inside a loop, that means that the row context introduced by FILTER (yes, FILTER is an iterator, never forget it!) is converted into a filter context before evaluating the measure. Thus, the [BrowseDepth] is computed in a filter context where the only visible row of the Hierarchy table is the currently iterated row. Check it by yourself, the value of that [BrowseDepth] is always three, because only a single row of the hierarchy is visible. The original filter context is lost when that measure is computed.

    The value of [BrowseDepth] inside the FILTER is NOT the value of [BrowseDepth] for the original filter context, it is the [BrowseDepth] in a new context introduced by the automatic CACLULATE added by the measure call.

    If we want to compute our measure, we need to avoid that nasty CALCULATE created by the engine. The only way to do that is to remove the measure call and expand it to its measure definition (which you can find in my previous post):

    [SumOfChildren]=
    IF (
        [BrowseDepth] > [MinNodeDepth], 
        BLANK(),
        CALCULATE (
           SUM (Invoices[Amount]),
           FILTER (
               'Hierarchy',
               'Hierarchy'[NodeDepth] > 
                   IF (ISFILTERED ('Hierarchy'[Level3]), 3,
                   IF (ISFILTERED ('Hierarchy'[Level2]), 2,
                   IF (ISFILTERED ('Hierarchy'[Level1]), 1
               )))
           )
        )
    )

    The formula is identical to the previous one but, this time, we are not calling a measure, we are computing a formula. Thus, no automatic CALCULATE is added to the expression. Guess what? It now works as expected

    image

    Now, you see that all the leaf level nodes do not have any value and this is expected, since their value is provide by themselves only, not by their non-existent children.

    I personally don’t believe that this formula is really useful but it might mean something to a customer who give me some money to compute it. Thus, it would be useful at least for me. Sorriso The interesting point of this post is that DAX is simple, but not easy. This is the best definition of DAX I have ever heard. Until you fully understand CALCULATE and all its implications… DAX will be a black art. Take your time to study it, and DAX will become your best friend.

More Posts Next page »
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement