THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

Calculate New, Returning, Lost, and Recovered Customers in #dax

Calculating the number of new and returning customers is a recurring question. I would say this is a “classical” Business Intelligence problem, very common in marketing department. I worked on these problems with many customers, with small and large datasets, and I wrote a DAX Pattern “New and Returning Customers” showing how to calculate:

  • New Customers: customers who never made any purchase
  • Returning Customers: customers who bought something in the past
  • Lost Customers: customers who bought something but did not buy in the last N days days
  • Recovered Customers: previously “lost customers” who made a new purchase

This is not a brand new topic, you can find many other blog posts on this topic (Chris Webb, Javier Guillén, Gerhard Brueckl, David Hager, Rob Collie), so my goal was to show very generic formulas that were generally the best solution in term of performance. This make the formula less readable, such as the following:

[Returning Customers] :=
COUNTROWS (
    CALCULATETABLE (
        VALUES ( <customer_key_column> ),
        VALUES ( <customer_key_column> ),
        FILTER (
            ALL ( <date_column> ),
            <date_column> < MIN ( <date_column> )
        )
    )
)

As you see, using CALCULATETABLE ( VALUES ( table[column] ), VALUES ( table[column] ), … ) seems a useless thing. Why counting the rows returned by VALUES and passing it also as a filter argument? This is a not so intuitive behavior of CALCULATE. The first argument is an expression that will be evaluated in a modified filter context. The third argument is a FILTER on the date column, which extends the range of dates considered, including all the past sales transactions. At this point, the first VALUES would return any customers who made a purchase in the past, but the second argument will only considered those that made a purchase in the current selection of time. The final result is an AND condition between two sets of customers (the intersection of the two sets), which is faster than trying to calculate the number of past transactions of each customer who made a purchase in the current selection of time, filtering only those that results in zero transactions.

In general, I prefer using more readable DAX formulas, also in DAX patterns, optimizing them only when necessary. But in this case the performance might be important (visible to the user) also with a few thousands of customers. As usual, any feedback on the New and Returning Customers pattern will be very welcome!

Published Thursday, June 26, 2014 10:13 AM by Marco Russo (SQLBI)

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Ronald said:

Nice one. I wrote some t-sql the do the same, maybe also interesting to take a look at:

http://www.sqlblog.nl/2014/08/calculate-lost-customers-using-t-sql/

Can do the same trick for new/recurring customers.

grts. Ronald

August 20, 2014 5:32 AM
 

Marco Russo (SQLBI) said:

Hi Ronald, thanks for the link!

August 20, 2014 7:14 AM
 

Oxenskiold said:

That's a rather elegant way to show how tables can filter tables and the intersection is the result. In general dissecting the DAX patterns from your web site is a formidable way to figure out how DAX works in detail.

I think it's important to realize though that the above  measure only works if the date table you are using  is marked as a Date table in the data model and you use the column chosen as the unique identifier for the data table as the <date_column> in the ALL(<date_column>) in the FILTER(). The reason is that when you use this particular marked column from the marked data table as a filter parameter in CALCULATE it overwrites all columns from the date table that existed in the filter context when entering the CALCULATE (it seems). In this case [year] and [Monthname]. This is contrary to the usual rule where a single column used as a filter parameter only overwrites the same column not all columns from the same table.

So if you were to remove the marking of the date table, the first VALUES() would be evaluated with the [year] and [monthname] still active in the filter context. (They would not be overwritten) and then there would be no intersection between the dates coming from the FILTER and the dates from the first VALUES(). (the dates coming from the FILTER() would be all the dates before the dates from the first VALUES()). So the result would be empty, that is no rows are returned.

Of course if you wanted the measure to work whether the date table is marked or not you could just use the full date table as a parameter in the FILTER or as a direct parameter in the CALCULATE. Like this

FILTER (

   ALL ( <date_table> ),

   <date_column> < MIN ( <date_column> )

)

OR

[Returning Customers] :=

COUNTROWS (

   CALCULATETABLE (

       VALUES ( <customer_key_column> ),

       VALUES ( <customer_key_column> ),

       ALL ( <date_table> ),

       FILTER (

           ALL ( <date_column> ),

           <date_column> < MIN ( <date_column> )

       )

   )

)

That way the [year] and [month] columns from the date table are overwritten no matter what.

September 8, 2014 10:21 AM
 

Marco Russo (SQLBI) said:

Thanks for this comment. I know this difference, but having a Date table in a data model that is not marked as a Date Table is not a good idea - it works if you only use these formulas, but you might want to add other measures later, and/or someone else might change the data model. There are no side effects in adopting the ALL ( <date_table> ) filter argument to make the formula working also in case "Mark as Date Table" is not set.

I just don't completely understand your comment "So if you were to remove the marking of the date table, the first VALUES() would be evaluated with the [year] and [monthname] still active in the filter context."

This is the case if you are using another column to join fact and date table, but if the date column is used for the relationship, the formula works even if the Date table is not marked as Date Table.

September 8, 2014 10:35 AM
 

Oxenskiold said:

Oh I see your point thanks for clearing that up.

September 9, 2014 3:36 AM
 

Dan said:

Hello Marco,

please help me solve this:

  I want to find the NEW customers LAST YEAR, I mean I want to find the number of new customers that bought in the same period last year. I have a DimCalendar, SalesFact and DimCustormer. Thank you!

February 4, 2015 4:50 PM
 

Marco Russo (SQLBI) said:

Hi Dan, I think I already answered to your question in the DAX Patterns comments / the point is that I think the pattern already contains what you need, right?

February 8, 2015 4:38 PM
 

Eva said:

Hi Marco,

I was hoping you'd be able to help me with a variation on this. In order to track customer base, I want to be able to track lost/new/returning customers by quarter to see the evolution. The problem is that when I insert quarters as a column variable, the formulas are obviously recalculated only for that customer base. For lost customers, I was able to fix this by using =calculate(DISTINCTCOUNT(IMS_Data[Salon Number]),PARALLELPERIOD(QuarterRef[Date],-12,Month),IMS_Data[IMS N12M]<1, IMS_Data[InMS]>0)

(i.e. going 'back in time' to take the customer base a year ago and seeing how many of those have NOT repurchased, and counting those as lost). Otherwise, you get 0 lost customers because obviously they're not counted in this quarter's customer base.

Unfortunately, my problem is now with our client base- i.e., those customers who have bought at some point in the last 12M. If I use the simple formula, with the quarters as columns, it only counts customers who bought this quarter and at some point in the past (which obviously underestimates the number). If I go back in time and do it forwards as with lost customers, then it only takes those who bought 4Q ago.

I need to be able to tell SQL to take the past 4Q as a base- not just one static quarter. Any ideas on how to do this?

Thanks

April 22, 2015 5:46 AM

Leave a Comment

(required) 
(required) 
Submit

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement