Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
Jhatch
New Member

Single Column nearly doubling the file size (500MB increase from only 1 column!)

Hi all,

Background:
I have a power Bi file that contains the following tables:

1 Promotion Detail table - 30 Million Rows
1 Sales detail table - 29 Million Rows

1 Transaction table - Summary of Sales table

And then a few date, location and product dimension tables.

 

Sales and Promo tables do not by default have a unique key, so I created the transaction summary to be able to create a relationship using Transaction ID.

 

However, this means I cannot create a relationship directly between sales and promo, without first going through the transaction table.

 

To fix this I created a unique ID by combining location, Product ID, and Transaction ID. This key is unique for every row in the sales table, with promo table having duplicates. Now i can create the relationship I was looking for.


Problem:
The file without the unique ID is about 800MB. Once I add the unique ID to both tables, the file becomes 1.9GB!!! I am on Pro so this is preventing me from being able to publish.


Note: Each column takes up just over 500MB (just over 1GB from both), and then once I add the relationship between the two, it adds about 100mb.

I cannot figure out why simply adding two columns is over doubling the file size. 

 

I have tried the following:

creating the columns in Dax
creating the columns in power query

Having columns created by data team in sql before they are imported into PBI

 

All 3 cases casue the exact same file size issue.

 

The unique IDs are about 25 characters each, incase that is related to the issue.

 

Does anyone have any idea why this is happening or what I can do about it? Seems so odd.

 

Thank you in advance for any advice!

 

2 REPLIES 2
Brunner_BI
Super User
Super User

How did you create the unique ID? Maybe some kind of index would be better than using a text/number that already starts at XXXXXX and thus consumes more storage?

HotChilli
Super User
Super User

I don't think it's odd.  It's all about what can be compressed, and in powerbi, it's all about columns. Unique id's don't compress therefore you get a large file size increase for one or two columns with lots of rows.

Are you storing this as a text field or a number? Number would be better but maybe there are alternative options to this strategy.

However, if you are using import mode for quite large datasets, powerbi can struggle.  You might want to think about different granularity in power query (i.e. aggregating early) instead of details lines.

 

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.