Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Brown
Frequent Visitor

Data Types uncompressed storage size

Hello,

I have been doing some analysis on uncompressed storage sizes for data types on my main fact table. This table currently has ~37M rows and I am estimating at which point I will need to roll up historical data to save space. The answer is 50M..maybe)

I am also interested in using dataflows to store the data and was interested to know how efficiently it would be stored compared to SQL Server.

Based on my analysis I have some data type questions I need clarification on:
1) the smallest integer size on Dataflows and within Power Query is Int64. Does this mean that regardless of the integer size being exported from SQL (e.g. tiny int, small int), the uncompressed storage on the Power BI table will be 8 bytes? I have tested integers of 1, 1000 and 100000. 1000 and 100000 seem to require 8 bytes whereas 1 only requires 2 bytes. If there are rules with integer sizes that fit within int64 it would be good to know what they are.

2) Regarding the Boolean data type, I would expect this to be a 2 byte data type, and that seems to be how it behaves on the table (I can import 5 billion of them without hitting the uncompressed data limit). However when downloading from the Dataflows into Power BI desktop this datatype averages 12 bytes. Leaving aside 2 bytes for delims/overhead/metadata, it appears Dataflows exports them as 10 bytes. Is this the case? it seems inefficient.

3) As mentioned in point 2 above, when Power BI Desktop is downloading from Dataflows, there is an additional ~ 2 bytes per record. Presumably this doesn't count towards the 10Gb uncompressed data limit? also is there an exact amount per record which we can use for planning purposes?

4) Finally, as a general comment, as a Power BI Pro user I find I spend too much of my time on issues like this, worrying about data budgets instead of report development and data analysis. I understand the need for limits like the 10Gb uncompressed data limit, but I don't understand having these rules and then having very little functionality or documentation which would help people stay within them. Its completely confusing to export 3Gb of data from SQL server, download the same data from a dataflow taking 11Gb (which is only 5Gb over the network), and then have it using 8Gb on the table (maybe?? who can tell).

 

Any assitance someone could provide would be greatly appreciated.

 

Regards,

 

3 REPLIES 3
ctmullins
Frequent Visitor

Two and a half years later, and still no updated information on this topic?

 

I am struggling against exactly the same thing - we are limited to the Pro licenses at our (non-profit) organization due to cost issues, which means I have to stay under the 10GB limit.  And I have ZERO visibility into how close I am to that limit, and how much storage space is actually consumed by the various data objects in my datasets.

 

Come on Microsoft!  I know you guys can address this!

v-xuding-msft
Community Support
Community Support

Hi @Brown ,

Maybe a similar thread can help you a bit.

how to determine uncompressed table size in Desktop

 

Best Regards,

Xue Ding

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

Best Regards,
Xue Ding
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi @v-xuding-msft ,  

Thanks for you reply.  I had read that post (incidentally that poster didnt recieve an answer to his question) as well as other posts related to uncompressed data issues.  I have also done my own testing/research before coming to the forums.  The standard answer is to normalise data, remove columns etc, which isn't the point.

 

The issue is we have a limit applied, but no tools to monitor if we are approaching that limit.  We can only do theoritical calculations or upload lots of data until we get an uncompressed data related error message. 

 

Is there not some documentation Microsoft could provide on how table size is calculated on load? Given most data is int64 or fixed decimal is it 8 bytes * number of records?  or does it also depend on the integer value?

 

Regards,

 

 

 

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors