THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

Improve NTILE performance

Today I implemented a classification through a NTILE equivalent function. The T-SQL NTILE implementation has very bad perfomance when used against large datasets of rows, so I and Davide Mauri realized a better implementation that Davide posted some weeks ago on his blog. No reaction from Microsoft guys and no comments on Davide blog, so I retry here to catch more attention on this topic.

Instead to use NTILE function you can use this faster query:

SELECT 
   CustomerKey,
   CEILING(ROW_NUMBER() OVER (ORDER BY YearlyIncome ASC) / ((SELECT CAST(COUNT(*) + 1 AS FLOAT) FROM DimCustomer) / @n) ) AS NTileFast
FROM 
   
DimCustomer

Note that I prefer to use the RANK function instead than the ROW_NUMBER: even if I could get non-uniform distribution of data among clusters, I get all elements with the same value in the same cluster and in many scenario you can prefer this technique to avoid cases with random cluster assignment for items with value equal to a cluster limit. This is different from NTILE result, but I prefer it for my cluster assignment.

SELECT 
   CustomerKey,
   CEILING(RANK() OVER (ORDER BY YearlyIncome ASC) / ((SELECT CAST(COUNT(*) + 1 AS FLOAT) FROM DimCustomer) / @n) ) AS NTileFast
FROM 
   
DimCustomer

Please read this post of Davide Mauri for a complete explanation and let us know what do you think about it

 

Published Wednesday, April 19, 2006 7:27 PM by Marco Russo (SQLBI)

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

David said:

Thanks this will come in very handy.  I was mainly looking to avoid rows with the same value showing up in different NTiles, but the added performance boost is a nice plus too.

November 3, 2011 12:21 PM

Leave a Comment

(required) 
(required) 
Submit

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement