Solved: Re: Optimize Service memory usage

DouweMeer · ‎03-02-2020

So I have a report which has an incremental refresh on a Salesforce source. This Desktop report is published to Service which, fully loaded is approximaly 9GB on Service. This report is (currently) refreshed every 4 hours. I'm working on some other report to put on there as well in the excess of 7GB with a planned refresh of 4 hours and as the source an SQL database.

Now the discussion I'm having with our Power BI Governance team is that we at corporate level are hitting the memory capacitiy of our node of 50 GB. We're having a discussion on slimming down reports but we struggle to define the memory usage in the first place. My argument is that when I would change my report from Import to Direct Query, it would in fact use much more memory as the table will be stored in Cache (Primary Storage aka Memory), while with Import it would be stored in storage (aka secondary storage). My argument would say that the current limitation is on the Memory and not the (secondary) Storage, thus having the report as Import is a better way to publish the report than using Direct Query.

Everyone is making loads of assumptions, including myself on Memory usage and how storage would work on Service. Also the monitor dashboard created by Microsoft used by our Admins is limited in that the Memory usage can't be filtered to the report/ workspace.

One says Direct Query would be the Memory saver while the other (myself) says the import is the Memory saver. What is true?

JirkaZ · ‎03-03-2020

@DouweMeer Exactly. And btw. the book "The definitive guide to DAX" explains the engine behind Power BI really well.

View solution in original post

JirkaZ · ‎03-02-2020

@DouweMeer So when you are using Import mode for your tables, whenever the report is used, the dataset is loaded into the capacity's memory. When the refresh happens, roughly the dataset size x2 is reserved in the memory.

On the other hand when using DirectQuery, only the materialized results are cached in the memory.

DouweMeer · ‎03-02-2020

@JirkaZ

We did some tests by uploading a second report non 'used' version of the same. During report usages and refresh we didn't saw a behavior of what you're suggesting. Is there any documentation on the basis what you're referring to?

Note for the Direct Query would be that you would put stress on the server capacity by making requests each time it is used. We don't know how both would size up when compared to each other.

JirkaZ · ‎03-02-2020

@DouweMeer I would start with these two docs.

Import: https://docs.microsoft.com/en-us/power-bi/guidance/import-modeling-data-reduction

DirectQuery: https://docs.microsoft.com/en-us/power-bi/guidance/directquery-model-guidance

DouweMeer · ‎03-02-2020

@JirkaZ

Thank you for your links. However when reading through them, I don't see something mentioning that when you import data: "So when you are using Import mode for your tables, whenever the report is used, the dataset is loaded into the capacity's memory".

It only mentions that when you publish a report, the compression ratio's vary when available in Service. That during the publish/ refresh state the VertiPaq storage engine is deals with an increased memory usage, but nothing is mentioned when an end user requests a dashboard of the data model.

JirkaZ · ‎03-02-2020

@DouweMeer It's by definition. Power BI is SSAS Tabular in memory. So whenever you want to retrieve data, it's from memory. So when you want to view a report, the model is loaded from storage into memory and only then you can view it.

DouweMeer · ‎03-03-2020

@JirkaZ

That sounds worrying. So if I have the statement:

calculate ( countrows( selectcolumns ( A , "test" , A.A ) ) , userelationship ( A.B , B.A ) )

You're suggesting the whole table created after an inner join of A and B is stored in RAM during the time the user is accessing the dashboard?

JirkaZ · ‎03-03-2020

@DouweMeer Exactly. And btw. the book "The definitive guide to DAX" explains the engine behind Power BI really well.

DouweMeer · ‎03-04-2020

@JirkaZ

I thought the process would be that A and B are pulled into memory to generate the selectcolumns statement. After this is done, the selectcolumns would be stored in cache for further calculation instead of whole of A and B.

What I noticed in the past is when you do not use selectcolumns as an intermediair, the calculation would take enourmous amount of times if the table you're calculating on has a relationship with another large table. But maybe that's because I didn't used calculate around selectedcolumns...

Sounds like I have to dive deep in the VertiPaq storage engine :).

Optimize Service memory usage

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024