Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
dpws88
Frequent Visitor

Azure Blob Storage Gzip files result in double sized download?

I have some data stored in Azure blob storage, as gzip'd CSV files. 

 

I'm then pulling the data into Power BI desktop and using a function with Binary.Decompress to decompress the files. 

 

When I refresh the data, it shows as downloading way more than is actually on the storage - a blob container which should have around 5-600mb of files results in a reported download of well over 1gb. The queries are as follows:

 

Unzip Function: 

 

(gZipFile) => 
let 
    #"Unzip" = Binary.Decompress(gZipFile, Compression.GZip),
    #"CSV" = Csv.Document(#"Unzip"),
    #"Headers" = Table.PromoteHeaders(#"CSV", [PromoteAllScalars=true])
in
    #"Headers"

Blob Retrieval: 

 

let
    Source = AzureStorage.Blobs("apdigitalproducts"),
    #"blobcontainer" = Source{[Name="googleanalyticsdata"]}[Data],
    #"Removed Other Columns" = Table.SelectColumns(#"blobcontainer",{"Content", "Name"}),
    #"Invoked Custom Function" = Table.AddColumn(#"Removed Other Columns", "Data", each fnDecompress([Content])),
    #"Removed Columns1" = Table.RemoveColumns(#"Invoked Custom Function",{"Content"}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns1", "Data", {"ga:visitorType", "ga:sourceMedium", "ga:country", "ga:landingPagePath", "ga:date", "ga:deviceCategory", "ga:fullReferrer", "ga:newUsers", "ga:sessions", "ga:pageviews", "ga:avgSessionDuration", "ga:avgTimeOnpage", "ga:users", "ga:pageviewsPerSession", "ga:sessionDuration", "ga:timeOnPage"}, {"ga:visitorType", "ga:sourceMedium", "ga:country", "ga:landingPagePath", "ga:date", "ga:deviceCategory", "ga:fullReferrer", "ga:newUsers", "ga:sessions", "ga:pageviews", "ga:avgSessionDuration", "ga:avgTimeOnpage", "ga:users", "ga:pageviewsPerSession", "ga:sessionDuration", "ga:timeOnPage"})
in
    #"Expanded Data"

Any ideas? Are the blobs being decompressed at the server side or something? I cannot work out at all what is going on here.

5 REPLIES 5

@dpws88 Random thought. In the Desktop there are auto generated tables for date hierarchies that are automatically created for every date column you have. Chris Webb did a blog about this and describes the behaviour and how to disable it in the options.

It might be the issue.


Looking for more Power BI tips, tricks & tools? Check out PowerBI.tips the site I co-own with Mike Carlo. Also, if you are near SE WI? Join our PUG Milwaukee Brew City PUG

Unfortunately I don't think it is this - the issue is not the size of the pbix file, but that the actual reported downloads from Azure are inflated - when I click refresh it reports downloading double the amount of data that exists before it does any of the modelling, date table creation, etc.
dpws88
Frequent Visitor

Double post

@dpws88 Same answer as this post, try it out and let one of them know. It is appreciated if you don't double post the same thing.

http://community.powerbi.com/t5/Integrations-with-Files-and/Azure-Blob-Storage-Gzip-files-result-in-...


Looking for more Power BI tips, tricks & tools? Check out PowerBI.tips the site I co-own with Mike Carlo. Also, if you are near SE WI? Join our PUG Milwaukee Brew City PUG

Apologies, I assumed that it was lost because I hadn't registered, and then it appeared few hours later!

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors