cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
davidbrown100
Helper II
Helper II

powerbi gateway - sizing

we currently have a powerbi on premise gateway hosted on aws ec2 t2.large which has 2 vcpu and 8gb ram.

however when refreshing one report we are having memory issues.

one table has 57Million records in it.

does anyone have any experience of sizing the gateway on ec2, before i decide to go to t2.xlarge 4 vcpu and 16gb ram.

2 ACCEPTED SOLUTIONS

@davidbrown100,

 

I think the issue is with system specs. 2 vCPUs and 8GB of RAM is very, very low when dealing with large datasets. Instead of table row size, how much total data are you dealing with in terms of GBs?

 

We have clients with 50GB+ datasets, and we're using Azure Databricks with 15 vCPUs and 56GB of RAM.

View solution in original post

davidbrown100
Helper II
Helper II

i had a support ticket in with microsoft about this, but there solutions didnt work.

in the end i increased the memory on the server to 16gb and it works fine now - peaks at around 8.3gb

we are now using t2.xlarge.

 

Interesting observations about the gateway whilst debugging the problem.

i would say the process is split into 3

a) get the data via the odbc connector

b) mashup the data using the mashup container

c) send the data to powerbi service

 

In our case step b) is the memory intensive part and doesnt handle running out of memory very elegantly

 

 

Also I would recommend putting the gateway as near to the data source as possible, to avoid network issues as the volume of data received (step a) is far greater than the volume of data sent (step c)

View solution in original post

8 REPLIES 8
davidbrown100
Helper II
Helper II

i had a support ticket in with microsoft about this, but there solutions didnt work.

in the end i increased the memory on the server to 16gb and it works fine now - peaks at around 8.3gb

we are now using t2.xlarge.

 

Interesting observations about the gateway whilst debugging the problem.

i would say the process is split into 3

a) get the data via the odbc connector

b) mashup the data using the mashup container

c) send the data to powerbi service

 

In our case step b) is the memory intensive part and doesnt handle running out of memory very elegantly

 

 

Also I would recommend putting the gateway as near to the data source as possible, to avoid network issues as the volume of data received (step a) is far greater than the volume of data sent (step c)

davidbrown100
Helper II
Helper II

HI everyone, had time to sleep on this, and have reached a different solution.

Our data is in redshift, and we started out using the redshift connector (no PBI gateway required).  However there are limitations with how you can use this connector, i.e. it only connects to a whole table.

Yes you can apply basic filters afterwards and it seems to take these into account on the data refresh.

Anyway we need more control than this, i.e. the ability to join tables in our data request.

So we moved to ODBC and reluctantly set up a powerbi gateway.

Great this all works, except for the performance on large datasets.

So I have now tried using a hybrid approach.  Those tables where we are taking all or almost all the data, we can take using the redshift connector, and those tables where we need more control - use the odbc connector.  This takes a lot of strain off the gateway.

We will probably still resize the gateway to 16gb, given others comments above, and I have a support ticket in to consider if there are some bugs.  I do believe there are because for instance it is passing the same queries to redshift twice.

davidbrown100
Helper II
Helper II

i am now more inclined to say that this is a bug in the on premise gateway.
i have noticed that there are duplicate queries being requested to the redshift database

i have updated to the latest version of the gateway 14.16.6745.2 but no avail

i have tried setting MashupDefaultPoolContainerMaxCount = 1 but no change

The error i am getting is
Exception of type 'System.OutOfMemoryException' was thrown.
when trying to import a largish table.

@davidbrown100,

 

I think the issue is with system specs. 2 vCPUs and 8GB of RAM is very, very low when dealing with large datasets. Instead of table row size, how much total data are you dealing with in terms of GBs?

 

We have clients with 50GB+ datasets, and we're using Azure Databricks with 15 vCPUs and 56GB of RAM.

This one particular report is only 175Mb.  I havent heard of azure databricks before, is it possible to install a powerbi on premise gateway on there?

I wasn't referring to the PBIX file - because of compression, there's not a direct ratio between the amount of data and the size of the file. I was referring to the amount of data you're pulling in from your datasource, in GB.

 

Databricks is not a Power BI service, it's a data processing platform. I was just using it as an example of the specs we deal with.

apologies.

The main table that we are processing is 9gb in redshift, and most of its columns are encoded, so i guess depending on how these things work, it could be higher by the time it gets to powerbi gateway.

To make the import more manageable i load it in 10 separate imports and then append these inside powerbi - not sure if this is a good idea or not.

 

@davidbrown100 I agree with @RMDNA any heavy lifting at all just crushes the gateway with import methods being used. We pretty much always recommend at least 16GB as a starting point, but I can easily see scaling higher as said.


Looking for more Power BI tips, tricks & tools? Check out PowerBI.tips the site I co-own with Mike Carlo. Also, if you are near SE WI? Join our PUG Milwaukee Brew City PUG

Helpful resources

Announcements
Microsoft Build 768x460.png

Microsoft Build is May 24-26. Have you registered yet?

Come together to explore latest innovations in code and application development—and gain insights from experts from around the world.

May 23 2022 epsiode 5 without aka link.jpg

The Power BI Community Show

Welcome to the Power BI Community Show! Jeroen ter Heerdt talks about the importance of Data Modeling.

Power BI Dev Camp Session 22 with aka link 768x460.jpg

Check it out!

Mark your calendars and join us on Thursday, May 26 at 11a PDT for a great session with Ted Pattison!

charticulator_carousel_with_text (1).png

Charticulator Design Challenge

Put your data visualization and design skills to the test! This exciting challenge is happening now through May 31st!

Top Solution Authors
Top Kudoed Authors