cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted

Understanding the Vertipaq Engine

Hello

 

I am doing a deep dive into the Vertipaq engine and I have a few questions regarding a concrete use case.

 

I have a .pbix file with two Fact Tables:

- Raw (300k rows and 17 columns) with the following structure:

LocationABCDOP
London644 544417 834961 806367 504 83 035893 560
Vancouver717 302153 883616 675997 484 779 110672 422
Copenhagen643 32917 978215 068935 608 270 478827 192
London279 920224 229774 350764 692 770 781929 427
Vancouvert158 089104 762209 37822 990 938 29593 513
Vancouver436 20670 139223 948482 096 136 600930 017
Copenhagen290 725986 045260 513714 680 491 096293 309
       

 

- Unpiv which is the unpivoted version of the 'Raw' table above (4.8 Million rows and 3 columns). So it has the following structure:

LocationTypeSales
LondonA644 544
LondonB417 834
LondonC961 806
LondonD367 504
LondonO83 035
LondonP893 560
VancouverA717 302
VancouverB153 883
VancouverC616 675
  
CopenhagenA643 329

 

Obviously, I used the super Vertipaq analyzer from @marcorusso & @AlbertoFerrari to understand data compression.

Not surprinsingly, because Vertipaq uses a columnar storage, my 'Unpiv' Table is ~45% smaller than 'Raw' table:

Result from the Vertipaq analyserResult from the Vertipaq analyser

We clearly see (i) the impact of the cardinality of the columns on their total size (ii) the relevance of unpivoting the table. Fair.

 

Here are my questions :

1. As we can see above, the size difference between 'Raw' and 'Unpiv' is mainly due to the "Column Hierarchies". What is it about ? I found no real explanation on the web or on Marco Russo's book. I read somewhere that it has to do with MDX structure...?!

 

2.  How is the size of Column Hierarchy computed? How can it be estimated or perhaps turned off? I understand the logic of value encoding , dictionnary encoding and RLE encoding. Is it something similar ?

 

3. What is the unit of the measures displayed by the Vertipaq analyser in the pivot table above ? When I save my .pbix file on my disk, its total size is: 74MB. I have no relationships between 'Raw' & 'Unpiv' and 0 graph on my report. so the data model itself is ~ 73MB (extracted by zipping the .pbix file).

--> As a result, I would expect the Total Table size from the Vertipaq analyzer of my two tables above (49 391 934) to be similar to the 73 MB of the data model. Where does this difference come from ?

 

Thanks in advance for your answers !

Tristan

@Datatouille

 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Understanding the Vertipaq Engine

  1. Columns Hierarchies are internal structures storing a sorted textual version of the content of each column. It is required for MDX, and it is now used also internally for optimizing certain queries (in other words - we cannot get rid of it, even for columns we never use un MDX)
  2. See above - even numbers are converted into strings
  3. I don't understand the question - but VertiPaq Analyzer reports the memory consumption of the data model, whereas the size of the files could be different (serialization and additional structures).

 

I hope it helps!

 

Marco Russo - SQLBI

View solution in original post

2 REPLIES 2
Highlighted

Re: Understanding the Vertipaq Engine

  1. Columns Hierarchies are internal structures storing a sorted textual version of the content of each column. It is required for MDX, and it is now used also internally for optimizing certain queries (in other words - we cannot get rid of it, even for columns we never use un MDX)
  2. See above - even numbers are converted into strings
  3. I don't understand the question - but VertiPaq Analyzer reports the memory consumption of the data model, whereas the size of the files could be different (serialization and additional structures).

 

I hope it helps!

 

Marco Russo - SQLBI

View solution in original post

Highlighted

Re: Understanding the Vertipaq Engine

For the folks on this thread, there is a new 'Available in MDX' feature in Azure Analysis Services, SSAS 2017 (and soon Power BI Premium) which answers my questions n°1 and n°2 above: https://blog.crossjoin.co.uk/2018/07/02/isavailableinmdx-ssas-tabular/

Helpful resources

Announcements
Community Blog

Community Blog

Visit our Community Blog for articles, guides, and information created by fellow community members.

Using the Community

Using the Community

Need help with the Power BI Community? Our 'Using the Community' support articles are a great place to start.

Community Summit North America

Community Summit North America

Innovate, Collaborate, Grow. The top training and networking event across the globe for Microsoft Business Applications

Power Platform 2020 release wave 2 plan

Power Platform 2020 release wave 2 plan

Features releasing from October 2020 through March 2021

Top Solution Authors
Top Kudoed Authors