Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
BBIUser
Helper IV
Helper IV

Is there any maximum size limit and data storage limit when using Power BI Report Server?

This is the size provided for PBI Pro (Service).

Maximum data storage quota in PBI Pro (Service) which is 10GB per user.
Maximum size of an imported dataset (*=future size increases over time) which is 1GB per file.

 

Is there any size limit when a Power BI Report Server and its optimized desktop is used?(Please do not consider Power BI Premium licensing. This question applies to only Report Server included with Office 365 Pro licenseing)

Maximum size of an imported dataset  ??????

Maximum data storage quota in Report Portal ????

13 REPLIES 13
panda2004
Helper I
Helper I

Thanks for the explanation!

 

My case is as follows:

  • I have ElasticSearch cluster with weekly indices. 
  • Each index has about 200GB of data.
  • Power BI obviously can't query ElasticSearch directly.
    But, I managed to query data from Power BI Desktop in different ways:
    1. DAX (Custom Qury), with Web.Contents (ElasticSearch REST API).
    2. CData ElasitcSearch ODBC.
      We also have CData Power BI Connector, which is a custom conntector. Unfortentlly, DirectQuery with Custom Connectors still supported only with Power BI Desktop (nieghter in Power BI Server / Power BI Server, as far as I can see). Link here.
    3. ASP.NET Service (Web).
  • I'm trying to query 4 indices, which means basically 800GB. Of course that my initial query will try to filter the documents needed (not bringing all the documents) and the fields needed (not bringing all the fields in every matching document), so not all 800GB will be imported. It is very slow to query many documents, as each request only get 10,000 documents in single request (paging/scrolling is applied to get all matching documents). However, caching it though Power BI data-set which refreshes every day is fine - if it works (haven't tried it yet). I'm affraid this data set would be too big, so maybe i'll store a seperate data set for each weekly index and merge them later.
  • How complex is to import the data from ElasticSearch to some SQL Server / Analysis Service? Remember it's JSON documents and some of them are very complex. Is it possible that SQL Server / Analysis Service make a DirectQuery for me (using the ElasticSearch ODBC)?
panda2004
Helper I
Helper I

Thanks for the explanation!

 

My case is as follows:

  • I have ElasticSearch cluster with weekly indices. 
  • Each index has about 200GB of data.
  • Power BI obviously can't query ElasticSearch directly.
    But, I managed to query data from Power BI Desktop in different ways:
    1. DAX (Custom Qury), with Web.Contents.
    2. ASP.NET Service (Web).
    3. CData ElasitcSearch ODBC.
      We also have CData Power BI Connector, which is a custom conntector. Unfortentlly, DirectQuery with Custom Connectors still supported only with Power BI Desktop (nieghter in Power BI Server / Power BI Server, as far as I can see). Link here.
  • I'm trying to query 4 indices, which means basically 800GB. Of course that my initial query will try to filter the documents needed (not bringing all the documents) and the fields needed (not bringing all the fields in every matching document), so not all 800GB will be imported. It is very slow to query many documents, as each request only get 10,000 documents in single request (paging/scrolling is applied to get all matching documents). However, caching it though Power BI data-set which refreshes every day is fine - if it works (haven't tried it yet). I'm affraid this data set would be too big, so maybe i'll store a seperate data set for each weekly index and merge them later.
  • How complex is to import the data to some SQL Server / Service Analaysis? Remember it's JSON documents and some of them are very complex.
{
	"ProductId":"88432",
	"Comments":[
		{
			"Username":"stpnet",
			"Comment":"Cool Product!",
			"Stars":5
		},
		{
			"Username":"rblatchford",
			"Comment":"Excellent!",
			"Stars":4
		}
	],
	"Tags":[
		{
			"Key":"Category",
			"Value":"Electricity"
		},
		{
			"Key":"IsNew",
			"Value":true
		},
		{
			"Key":"ShippingType",
			"Value":"Worldwide"
		}	
	]
}

 

panda2004
Helper I
Helper I

Thanks for the explanation!

 

My case is as follows:

  • I have ElasticSearch cluster with weekly indices. 
  • Each index has about 200GB of data.
  • Power BI obviously can't query ElasticSearch directly.
    But, I managed to query data from Power BI Desktop in different ways:
    1. DAX (Custom Qury), with Web.Contents.
    2. ASP.NET Service (Web).
    3. CData ElasitcSearch ODBC.
      We also have CData Power BI Connector, which is a custom conntector. Unfortentlly, DirectQuery with Custom Connectors still supported only with Power BI Desktop (nieghter in Power BI Server / Power BI Server, as far as I can see). Link here.
  • I'm trying to query 4 indices, which means basically 800GB. Of course that my initial query will try to filter the documents needed (not bringing all the documents) and the fields needed (not bringing all the fields in every matching document), so not all 800GB will be imported. It is very slow to query many documents, as each request only get 10,000 documents in single request (paging/scrolling is applied to get all matching documents). However, caching it though Power BI data-set which refreshes every day is fine - if it works (haven't tried it yet). I'm affraid this data set would be too big, so maybe i'll store a seperate data set for each weekly index and merge them later.
  • How complex is to import the data to some SQL Server / Service Analaysis? Remember it's JSON documents and some of them are very complex.
{
	"ProductId":"88432",
	"Comments":[
		{
			"Username":"stpnet",
			"Comment":"Cool Product!",
			"Stars":5
		},
		{
			"Username":"rblatchford",
			"Comment":"Excellent!",
			"Stars":4
		}
	],
	"Tags":[
		{
			"Key":"Category",
			"Value":"Electricity"
		},
		{
			"Key":"IsNew",
			"Value":true
		},
		{
			"Key":"ShippingType",
			"Value":"Worldwide"
		}	
	]
}

 

Anonymous
Not applicable

So this is quite a complex question to answer. (most of this is my opinion rather than being anything I'd bet the ranch on) There is a limit to the size of files you can upload to the on Prem Server. Its controlled via a config setting and seems to be set to 1000Mb. I assume it responds if you change it, we haven't had to change this so I can't comment. The last sset of release notes suggested this had been raised to 2Gb. (Remember this is the vertipaq compressed size so the uncompressed data can be several times larger than this) Be aware that PowerBI-SSRS is not just a web app. In the background it spins up captive SSAS TAB instances for PBIX files that have imported data. These captive instances can consume quite large quantities of memory if you get a number of them running concurrently (500-1000Mb per captive instance is not uncommon on our stuff, one instance per active report) so you may need to give these boxes some more memory so they can breath if you have moderate to high concurrency. A slightly longer explanation of what I understand happens is included here https://community.powerbi.com/t5/Report-Server/Needed-instances-for-PowerBI-report-server/m-p/445068... The web services that are part of PBI-SSRS that provide the data to your rendered client apps are also quite chatty. Every time a user slices/hilights/drills on a visual/page a set of queries is sent to the backend data source, either a captive SSAS instance or the actual backend if the PBIX is using Direct Query or Live connection. So be prepared for a fair amount of network chatter between the PBI-SSRS and your data sources if you are using Direct Query or Live connection. This is one reason we keep them on the same server/servers. I would suggest that if your imported data PBIX is 5Gb it would probably be way more efficient to tanslate this into a SSAS tabular instance and host it there. Then connect your PBIX to that SSAS tabular instance using LiveQuery. If you compare the two from a memory consumption point of view (all numbers are taken from looking at our internal servers here) Using SSAS TAB backend with live query connection SSAS TAb memory usage 700mb SSAS Tab data usage 5gb PowerBI SSRS (underlying services) 1.8Gb Each user viewing Report about 50Mb (this is a guess, it could well be half this or double this, It's hard to see without a lot of work looking at memory allocations inside the processes) of SSRS services/visualisation Assume 5 concurrent users giving you 250Mb That's a grand total of about 8Gb + room for your OS, scale this to 25 concurrent users and its only 1-2Gb more The report will load fast (its just the visuals and then you are querying the SSAS Tab model so how responsive it is, will be down to your model/data design/DAX) Now lets think about hosting the 5GB PBIX (assuming you can and the server doesn't have some fixed limit) SSAS Tab memory usage 0 SSAS Tab data usage 0 PowerBI SSRS (underlying services) 1.8Gb Each user viewing the PBIX about 50Mb (this is a guess, see notes above) of SSRS services/visualisation Each user vieing the PBIX (300mb of captive SSAS Instance code) Each user viewing the PBIX (5Gb of data loaded into the captive SSAS Instance) Assume 5 concurrent users and you get 27.5Gb Giving you a grand total of about 30Gb + room for your OS, scale this to 25 concurrent users and you need another 110Gb of RAM! Now lets think about comeoen accessing one of those reports. Every report has to spin up a captive SSAS Instance and then load it with 5Gb of data before it can even start to render any data. That's several tens of seconds at best. So its going to feel pretty unresponsive initially. But much the same after that as users slice and dice the reports etc. So I would probably suggest that for anything bigger than 1Gb that's going to have fairly widespread usage (potentially several concurrent users) its more scaleable and practical to import the PBIX into SSAS Tabular and host it there with a LiveQuery connection to the SSAS model giving you a much smaller PBIX file, ours measure just tens of kb as its just visuals and a cached data model. These reports load faster for end users, all the server has to do is ship the visuals to them and spin up some web services. The data is already in memory in the SSAS Tab instance. After that the traffic is from the web browser as the user queries/slices/dices to the web services and in turn to the data source. Apologies if that doesn't make any sense or isn't what you were asking. S

Anonymous
Not applicable

Hi @Anonymous,

 

Wow some good detail there. This is an interesting factor to plan for on a deployment and I've not seen a lot of detailed discussion about it, so what you have added is helpful.

 

We have gone down the SSAS Tab route, just seems to make sense for an enterprise scale deployment, we can use the same data models in various PBIX solutions and as you say SSAS Tab just seems to scale better, which is what you would expect I suppose. We also use the RLS security that it offers, and the Role AD based access controls.

 

One thing we would like to do is use multiple SSAS Tab models in the same PBIX, but that is not possible at the moment I beleive, maybe with the the new composite data models that will become a reality?

 

Interesting what you say about the impact of viz to browser interactions and the part the PBI-RS plays in this. We have experienced significant levels of performance degridation when running the browser (IE, Edge and FF) via Cirtix as opposed to outside Citrix. Looking at the browser traffic does indeed show a very chatty picture, and as we use maps quite a lot it also seems to show a lot of external chat to Bing resources.

Anonymous
Not applicable

Hey @Anonymous

 

Yes we use SSAS Tab for those very reasons, scale, RLS and a managed data model approach.

 

The data query chat thing is a killer. We have a distrib SSAS solutions 4 servers across 2 data centres behind an L5 alias. Then a pile of PBI Servers up front again under and L5 alias. Its way, way less responsive than the much smaller dev environment whihc has PBI-SSRS and SSAS TAB on the same box. the L5 is getting absolutely battered when we spin up any significant PBI load.

 

Maps will just add to the problem, we're lucky as we don't do very little meaningful geographic stuff.

 

We are hoping for multiple SSAS Tab models too! Though we have scenarios that involve imported Excel linking to live SQL or  SSAS Tab linking to Live SQL that are going to blow away a ton of nasty hacks/workarounds we have.

 

Be warned my understanding is based on observation and some informed guesswork rather than anything concrete so don't take it as gospel.

Thanks for the explanation!

 

My case is as follows:

  • I have ElasticSearch cluster with weekly indices. 
  • Each index has about 200GB of data.
  • Power BI obviously can't query ElasticSearch directly.
    But, I managed to query data from Power BI Desktop in different ways:
    1. DAX (Custom Qury), with Web.Contents.
    2. ASP.NET Service (Web).
    3. CData ElasitcSearch ODBC.
      We also have CData Power BI Connector, which is a custom conntector. Unfortentlly, DirectQuery with Custom Connectors still supported only with Power BI Desktop (nieghter in Power BI Server / Power BI Server, as far as I can see). Link here.
  • I'm trying to query 4 indices, which means basically 800GB. Of course that my initial query will try to filter the documents needed (not bringing all the documents) and the fields needed (not bringing all the fields in every matching document), so not all 800GB will be imported. It is very slow to query many documents, as each request only get 10,000 documents in single request (paging/scrolling is applied to get all matching documents). However, caching it though Power BI data-set which refreshes every day is fine - if it works (haven't tried it yet). I'm affraid this data set would be too big, so maybe i'll store a seperate data set for each weekly index and merge them later.
  • How complex is to import the data to some SQL Server / Service Analaysis? Remember it's JSON documents and some of them are very complex.
{
	"ProductId":"88432",
	"Comments":[
		{
			"Username":"stpnet",
			"Comment":"Cool Product!",
			"Stars":5
		},
		{
			"Username":"rblatchford",
			"Comment":"Excellent!",
			"Stars":4
		}
	],
	"Tags":[
		{
			"Key":"Category",
			"Value":"Electricity"
		},
		{
			"Key":"IsNew",
			"Value":true
		},
		{
			"Key":"ShippingType",
			"Value":"Worldwide"
		}	
	]
}

 

Thanks for the explanation!

 

My case is as follows:

  • I have ElasticSearch cluster with weekly indices. 
  • Each index has 200GB of data.
  • Power BI obviously can't query ElasticSearch directly.
    But, I managed to query data from Power BI Desktop in different ways:
    1. DAX (Custom Qury), with Web.Contents
    2. ASP.NET Service (Web)
    3. CData ElasitcSearch ODBC.
      We also have CData Power BI Connector, which is a custom conntector. Unfortentlly, DirectQuery with Custom Connectors still supported only with Power BI Desktop (nieghter in Power BI Server / Power BI Server, as far as I can see). Link here.
  • I'm trying to query 4 indices, which means basically 800GB. I'm trying to query only a few properties from each document, not all the document. It is very slow to query many documents, as each request only get 10,000 documents at once. However, caching it though Power BI data-set which refreshes every day is fine - if it works (haven't tried it yet). I'm affraid this data set would be too big, so maybe i'll store a seperate data set for each weekly index and merge them later.
  • How complex is to import the data to some SQL Server / Service Analaysis? Remember it's JSON documents and some of them are very complex.
{
	"ProductId":"88432",
	"Comments":[
		{
			"Username":"stpnet",
			"Comment":"Cool Product!",
			"Stars":5
		},
		{
			"Username":"rblatchford",
			"Comment":"Excellent!",
			"Stars":4
		}
	],
	"Tags":[
		{
			"Key":"Category",
			"Value":"Electricity"
		},
		{
			"Key":"IsNew",
			"Value":true
		},
		{
			"Key":"ShippingType",
			"Value":"Worldwide"
		}	
	]
}
Anonymous
Not applicable

This time with some formatting! I have had to edit this as I discovered I was talking total rubbish, explanation inline

 

So this is quite a complex question to answer. (most of this is my opinion rather than being anything I'd bet the ranch on)

 

There is a limit to the size of files you can upload to the on Prem Server. Its controlled via a config setting and seems to be set to 1000Mb. I assume it responds if you change it, we haven't had to change this so I can't comment. The last set of release notes suggested this had been raised to 2Gb. (Remember this is the vertipaq compressed size so the uncompressed data can be several times larger than this)

 

Be aware that PowerBI-SSRS is not just a web app. In the background it spins up a captive SSAS TAB instances for PBIX files that have imported data. This captive instance can consume quite large quantities of memory if you get a number of data sets loaded into it concurrently so you may need to give your SSRS-PBI Server some more memory so it can breath if you have moderate to high concurrency.

 

A slightly longer explanation of what I understand happens is included here https://community.powerbi.com/t5/Report-Server/Needed-instances-for-PowerBI-report-server/m-p/445068...

 

The web services that are part of PBI-SSRS that provide the data to your rendered client apps are also quite chatty. Every time a user slices/hilights/drills on a visual/page a set of queries is sent to the backend data source, either the captive SSAS instance or the actual backend if the PBIX is using Direct Query or Live connection. So be prepared for a fair amount of network chatter between the PBI-SSRS and your data sources if you are using Direct Query or Live connection. This is one reason we keep them on the same server/servers.

 

I would suggest that if your imported data PBIX is 5Gb it would probably be way more efficient to translate this into a SSAS tabular instance and host it there. Then connect your PBIX to that SSAS tabular instance using LiveQuery.

 

If you compare the two from a memory consumption point of view (all numbers are taken from looking at our internal servers here)

 

Using SSAS TAB backend with live query connection

SSAS Tab memory usage 700mb

SSAS Tab data usage 5gb

PowerBI SSRS (underlying services) 1.8Gb

Each user viewing Report about 50Mb (this is a guess, it could well be half this or double this, It's hard to see without a lot of work looking at memory allocations inside the processes) of SSRS services/visualisation

Assume 5 concurrent users giving you 250Mb

That's a grand total of about 8Gb + room for your OS, scale this to 25 concurrent users and its only 1-2Gb more

 

The report will load fast (its just the visuals and then you are querying the SSAS Tab model so how responsive it is, will be down to your model/data design/DAX)


Now lets think about hosting the 5GB PBIX (assuming you can and the server doesn't have some fixed limit)

 

My original assumption here was that each report viewer got their own SSAS TAB instance. This isn't the case the server fires up one SSAS TAB instance and shares it across the users, pushing data into it as required (as reports are requested) It looks like it keeps that data resident so only the first person who uses a data set takes the hit, though it may well unload it after its not been used for a while. You certainly get multiple connections to each model in the captive instance. My misreading of what was going on on the server is because we had a user remoting on to our dev box and firing up PBI Desktop. Hence I was seeing different numbers of msmdsrv.exe instances. I had incorrectly assumed this was the PBI-SSRS server. So my initial estimate was pretty spectacularly wrong.

 

SSAS Tab memory usage 0

SSAS Tab data usage 0

PowerBI SSRS (underlying services) 1.8Gb

First user cause 5Gb data load into captive SSAS Instance

Each user viewing the PBIX about 50Mb (this is a guess, see notes above) of SSRS services/visualisation

 

Assume 5 concurrent users and you get 250mb

 

Giving you a grand total of about 8Gb again + room for your OS, scale this to 25 concurrent users and it's pretty similar

 

Now lets think about someone accessing one of those reports. The first report has to spin up the 5Gb of data in the captive SSAS Instance before it can even start to render any data. That's several seconds at best. So its going to feel pretty unresponsive initially. But much the same after that as users slice and dice the reports etc. Also the second user onwards won't pay this hit. They'll get the same kind of response as you would using a separate SSAS Instance. So it looks like we can do some good old fashioned cache warming by spinning the PBIX reports up in the mornings before the users get in, or after data refreshes happen.

 

I do wonder what happens if the data gets refreshed in the PBIX, does this trigger the data in the captive SSAS Instance to be reloaded. And does this flood out to end users already connected to that data set? Might need to see if I can figure this out. An initial scan of the data models suggest this happens transparently. 

 

I would still probably suggest that for anything bigger than 1Gb that's going to have fairly widespread usage (potentially several concurrent users) its more scaleable and practical to import the PBIX into SSAS Tabular and host it there with a LiveQuery connection to the SSAS model. You get some control over this SSAS instance that just doesn't seem to be as easily available in the embedded data model captve SSAS instance.

 

These splitting of the report into visuals and data means when someone is calling up a report, all the server has to do is ship the visuals to them and spin up some web services. The data is already in memory in the SSAS Tab instance in direct query mode. After that the traffic is from the web browser as the user queries/slices/dices to the web services and in turn to the data source. The same could be said for data loaded into the captive instance of course once the first user has spun that data up.

 

Apologies if that doesn't make any sense or isn't what you were asking. And further apologies for talking rubbish the first time around.

 

S

 

Anonymous
Not applicable


HI @BBIUser ,

 

I'm not aware of a published max size for a data model in PBIX using on-prem report server. However, remember its a memory based technology, using the same underlying systems as SSAS Tabular, Vertipaq. So the limit is what will fit in memory. For example, If you have 32gb ram, then you will be lucky to get away with a 25gb PBIx data model, This is not an exact example, just an illustration. But Vertipaq does a great job of compressing the data, so its not the same size in the PBIX as it might be in other formats. 

 

If you want bigger data models you need lots of RAM. I've just speced a new PBI RS server and it has over 700GB of RAM. In the future we plan to double that to 1.4TB of ram. 

 

Hope this helps

Rob

 

  • Well, that's really cool and quite smart.
  • So it means that I can import a huge data set of 5GB, for example?
  • When the user sees the report in the web site, all the transformation occurs in the Report Server itself and not brought to the user machine/browser?
  • It means that for each user opens a report, the data set will be copied?
  • According to this 1 + this 2 the size IS limited - so i'm confused. Maybe the size of ALL the data sets is unlimited, but a single data set is limited to 2 GB?
Anonymous
Not applicable

Hi @panda2004,

 

The size is limited, but there is no exact answer to the limit, it depends on your set-up, how much RAM have you got, how many PBIX solutions in simultaneous use, other software on the server etc... You will need to experiment with the PBIX solutions you have built and see how they perform under use.

 

Good questions about multiple users of the PBIX and its implications on memory use, I'm not sure about that, I wonder if anybody else is? We mainly use SSAS Tabular models, which by their nature are single models supporting multiple users, our PBIX file sizes are very small, our SSAS tabular data models very large.

 

The Power BI server does all of the hard work, the user viewing through a browser only receives the presentation layer back to their browser,  that does include a lot of data if you are displaying a lot of data, but not the whole data model, this is a client / server style implementation.

 

If your PBIX is 5GB then that's not very big, so is likely to be fine. We have much larger datasets and they work well. You  manage the size by being careful  what you include in the model, and what DAX you use to define additional elements. There are some tools for monitoring this stuff, take a look at the  https://www.sqlbi.com/tools/    site, they are experts on this sort of thing.

 

Where are you getting the 2GB size limit info from, I'm not aware that is the case. It maybe that the basic Azure cloud based deployment has a limited data size, but not the on-prem version........ I don't think?

 

Rob

 

 

Hey @Anonymous,

 

"your PBIX is 5GB then that's not very big, so is likely to be fine. We have much larger datasets and they work well".

That's good to hear!

You mean that the data-set stored and refreshed in Report Server, right?

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.