Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
swethabonthu
Frequent Visitor

Increased size of PBIX file with dataflow

Hi,
Initially I had 3.8GB imported from BigQuery. After building a report, the size of the report got compressed to 1.2GB which is great!! But to make the refresh faster on service, I used dataflow. For the same report with the same data, I now noticed that the size of the PBIX file went up to 2.8GB while using dataflow. The issue is that my dashboard gets updated everyday and my dashboard soon exceeds the P1 premium capacity limit of 3GB dashboard size.

I understand that this is because VertiPaq engine in desktop is quite good at data compression in comparision to .CSV format on dataflow. I removed all the columns which are not required and made sure that the precision and data types are in a format supported for better compression. Do you think that sorting of few columns would help in better compression? Could you please suggest better compression techniques while using dataflow?

1 ACCEPTED SOLUTION
V-lianl-msft
Community Support
Community Support

Hi @swethabonthu ,

 

Dataflows are Power Query queries designed online with the output stored in Azure Data Lake Storage Gen2 as CSV files. 

Power bi engine uses a columnar database called VertiPaq behind the scenes, CSV is compressed differently.

I think you could consider optimizing the performance of the dataflow from the time of creation.

Refer :

https://data-marc.com/2020/06/09/how-to-keep-your-power-bi-dataflows-organized-and-optimized/ 

https://docs.microsoft.com/en-us/power-bi/transform-model/service-dataflows-best-practices 

 

Best Regards,
Liang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

3 REPLIES 3
V-lianl-msft
Community Support
Community Support

Hi @swethabonthu ,

 

Dataflows are Power Query queries designed online with the output stored in Azure Data Lake Storage Gen2 as CSV files. 

Power bi engine uses a columnar database called VertiPaq behind the scenes, CSV is compressed differently.

I think you could consider optimizing the performance of the dataflow from the time of creation.

Refer :

https://data-marc.com/2020/06/09/how-to-keep-your-power-bi-dataflows-organized-and-optimized/ 

https://docs.microsoft.com/en-us/power-bi/transform-model/service-dataflows-best-practices 

 

Best Regards,
Liang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

parry2k
Super User
Super User

@swethabonthu can you turn off Auto date/time if it is not already done. Do you have many columns with unique values?

 

You can connect to your model from DaxStudio and analyze your table/columns in vertipaq analyzer to see which one is not getting compressed (maybe many unique values) and then go from there to see if you can tweak things. It is very hard to say without you analyze your model.

 

I would  Kudos if my solution helped. 👉 If you can spend time posting the question, you can also make efforts to give Kudos whoever helped to solve your problem. It is a token of appreciation!

Visit us at https://perytus.com, your one-stop shop for Power BI related projects/training/consultancy.



Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!

Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo

If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤


Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.

@parry2k, I did turn off Auto date/time. I tried everything to compress my data and as mentioned, I'm quite happy the VertiPaq compression while importing data from GoogleBigQuery. 

Also, I've already analysed the model from DaxStudio. My issue is with compression while using a dataflow. I came across many sources which mentioned that dataflow is not as good as VertiPaq when considering data compression. While importing data directly from BigQuery or any other source, VertiPaq automatically sorts the columns for better compression. But this is not happening when I use dataflow. I'm more interested in data compression while using dataflow specifically. 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.