Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

powerbi gateway - sizing

we currently have a powerbi on premise gateway hosted on aws ec2 t2.large which has 2 vcpu and 8gb ram.

however when refreshing one report we are having memory issues.

one table has 57Million records in it.

does anyone have any experience of sizing the gateway on ec2, before i decide to go to t2.xlarge 4 vcpu and 16gb ram.

2 ACCEPTED SOLUTIONS

@Anonymous,

 

I think the issue is with system specs. 2 vCPUs and 8GB of RAM is very, very low when dealing with large datasets. Instead of table row size, how much total data are you dealing with in terms of GBs?

 

We have clients with 50GB+ datasets, and we're using Azure Databricks with 15 vCPUs and 56GB of RAM.

View solution in original post

Anonymous
Not applicable

i had a support ticket in with microsoft about this, but there solutions didnt work.

in the end i increased the memory on the server to 16gb and it works fine now - peaks at around 8.3gb

we are now using t2.xlarge.

 

Interesting observations about the gateway whilst debugging the problem.

i would say the process is split into 3

a) get the data via the odbc connector

b) mashup the data using the mashup container

c) send the data to powerbi service

 

In our case step b) is the memory intensive part and doesnt handle running out of memory very elegantly

 

 

Also I would recommend putting the gateway as near to the data source as possible, to avoid network issues as the volume of data received (step a) is far greater than the volume of data sent (step c)

View solution in original post

8 REPLIES 8
Anonymous
Not applicable

i had a support ticket in with microsoft about this, but there solutions didnt work.

in the end i increased the memory on the server to 16gb and it works fine now - peaks at around 8.3gb

we are now using t2.xlarge.

 

Interesting observations about the gateway whilst debugging the problem.

i would say the process is split into 3

a) get the data via the odbc connector

b) mashup the data using the mashup container

c) send the data to powerbi service

 

In our case step b) is the memory intensive part and doesnt handle running out of memory very elegantly

 

 

Also I would recommend putting the gateway as near to the data source as possible, to avoid network issues as the volume of data received (step a) is far greater than the volume of data sent (step c)

Anonymous
Not applicable

HI everyone, had time to sleep on this, and have reached a different solution.

Our data is in redshift, and we started out using the redshift connector (no PBI gateway required).  However there are limitations with how you can use this connector, i.e. it only connects to a whole table.

Yes you can apply basic filters afterwards and it seems to take these into account on the data refresh.

Anyway we need more control than this, i.e. the ability to join tables in our data request.

So we moved to ODBC and reluctantly set up a powerbi gateway.

Great this all works, except for the performance on large datasets.

So I have now tried using a hybrid approach.  Those tables where we are taking all or almost all the data, we can take using the redshift connector, and those tables where we need more control - use the odbc connector.  This takes a lot of strain off the gateway.

We will probably still resize the gateway to 16gb, given others comments above, and I have a support ticket in to consider if there are some bugs.  I do believe there are because for instance it is passing the same queries to redshift twice.

Anonymous
Not applicable

i am now more inclined to say that this is a bug in the on premise gateway.
i have noticed that there are duplicate queries being requested to the redshift database

i have updated to the latest version of the gateway 14.16.6745.2 but no avail

i have tried setting MashupDefaultPoolContainerMaxCount = 1 but no change

The error i am getting is
Exception of type 'System.OutOfMemoryException' was thrown.
when trying to import a largish table.

@Anonymous,

 

I think the issue is with system specs. 2 vCPUs and 8GB of RAM is very, very low when dealing with large datasets. Instead of table row size, how much total data are you dealing with in terms of GBs?

 

We have clients with 50GB+ datasets, and we're using Azure Databricks with 15 vCPUs and 56GB of RAM.

Anonymous
Not applicable

This one particular report is only 175Mb.  I havent heard of azure databricks before, is it possible to install a powerbi on premise gateway on there?

I wasn't referring to the PBIX file - because of compression, there's not a direct ratio between the amount of data and the size of the file. I was referring to the amount of data you're pulling in from your datasource, in GB.

 

Databricks is not a Power BI service, it's a data processing platform. I was just using it as an example of the specs we deal with.

Anonymous
Not applicable

apologies.

The main table that we are processing is 9gb in redshift, and most of its columns are encoded, so i guess depending on how these things work, it could be higher by the time it gets to powerbi gateway.

To make the import more manageable i load it in 10 separate imports and then append these inside powerbi - not sure if this is a good idea or not.

 

@Anonymous I agree with @RMDNA any heavy lifting at all just crushes the gateway with import methods being used. We pretty much always recommend at least 16GB as a starting point, but I can easily see scaling higher as said.


Looking for more Power BI tips, tricks & tools? Check out PowerBI.tips the site I co-own with Mike Carlo. Also, if you are near SE WI? Join our PUG Milwaukee Brew City PUG

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors