THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

SQLBI - Marco Russo

SQLBI is a blog dedicated to building Business Intelligence solutions with SQL Server.
You can follow me on Twitter: @marcorus

Cost of Process Defrag in Analysis Services Tabular #ssas #tabular

I recently received a question about the memory required to run a Process Defrag on a Tabular model in Analysis Services. The Process Defrag is useful when you run incremental processing of a table frequently, or when some of the values in the dictionary are no longer used in the table, for example if you process the same partition multiple times in a table and/or remove partitions from a table. Cathy Dumas wrote an interesting blog post about the savings you can obtain by running process defrag.

 

I made some investigation and I’ve been assured that data is not completely uncompressed in this process, even if some coding/encoding happens. In a rough estimate, you need a free space equivalent to the size of the table you are going to defrag (which is already in memory) plus buffer for transient data structure, but data are managed in a compressed form, without requiring larger memory buffers for uncompressed raw data. So, if all partitions of your table and the column dictionaries require 100MB in RAM, then you need another 100MB of free RAM in order to execute a Process Defrag.

Here is a more detailed description I received from Akshai Mirchandani:

In addition to the master copies, it requires enough memory for a new dictionary, and the final compressed data + some small temporary buffer space for the transient data structures (no big buffers like normal processing).
 
It is essentially going to read each column value, insert it into a new dictionary, get a new DataID back from the dictionary insert, and append that DataID to the current segment. It doesn’t need to do VertiPaq (compression) again, and it doesn’t keep the uncompressed data in buffers like the data processing algorithm does.

The important point here is that the analysis of the segment data to come up with the best compression strategy no longer needs to be performed – and that’s typically the most expensive step of the compression (the VERTIPAQ_STATE in DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS shows whether this was done).

This is a good news if you are concerned with memory required to perform this operation.

Published Wednesday, March 13, 2013 3:17 PM by Marco Russo (SQLBI)

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

No Comments

Leave a Comment

(required) 
(required) 
Submit

About Marco Russo (SQLBI)

Marco Russo is a consultant, writer and trainer specialized in Business Intelligence with Microsoft technologies. He runs the SQLBI.COM website, which is dedicated to distribute resources useful for BI developers, like Integration Services components, Analysis Services models, tools, technical information and so on. Marco is certified as MCT, MCDBA, MCSD.NET, MCSA, MCSE+I.

This Blog

Syndication

Archives

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement