Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
dpws88
Frequent Visitor

Azure Blob Storage Gzip files result in double sized download?

I have some data stored in Azure blob storage, as gzip'd CSV files. 

 

I'm then pulling the data into Power BI desktop and using a function with Binary.Decompress to decompress the files. 

 

When I refresh the data, it shows as downloading way more than is actually on the storage - a blob container which should have around 5-600mb of files results in a reported download of well over 1gb. The queries are as follows:

 

Unzip Function: 

 

(gZipFile) => 
let 
    #"Unzip" = Binary.Decompress(gZipFile, Compression.GZip),
    #"CSV" = Csv.Document(#"Unzip"),
    #"Headers" = Table.PromoteHeaders(#"CSV", [PromoteAllScalars=true])
in
    #"Headers"

Blob Retrieval: 

 

let
    Source = AzureStorage.Blobs("apdigitalproducts"),
    #"blobcontainer" = Source{[Name="googleanalyticsdata"]}[Data],
    #"Removed Other Columns" = Table.SelectColumns(#"blobcontainer",{"Content", "Name"}),
    #"Invoked Custom Function" = Table.AddColumn(#"Removed Other Columns", "Data", each fnDecompress([Content])),
    #"Removed Columns1" = Table.RemoveColumns(#"Invoked Custom Function",{"Content"}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Removed Columns1", "Data", {"ga:visitorType", "ga:sourceMedium", "ga:country", "ga:landingPagePath", "ga:date", "ga:deviceCategory", "ga:fullReferrer", "ga:newUsers", "ga:sessions", "ga:pageviews", "ga:avgSessionDuration", "ga:avgTimeOnpage", "ga:users", "ga:pageviewsPerSession", "ga:sessionDuration", "ga:timeOnPage"}, {"ga:visitorType", "ga:sourceMedium", "ga:country", "ga:landingPagePath", "ga:date", "ga:deviceCategory", "ga:fullReferrer", "ga:newUsers", "ga:sessions", "ga:pageviews", "ga:avgSessionDuration", "ga:avgTimeOnpage", "ga:users", "ga:pageviewsPerSession", "ga:sessionDuration", "ga:timeOnPage"})
in
    #"Expanded Data"

Any ideas? Are the blobs being decompressed at the server side or something? I cannot work out at all what is going on here.

5 REPLIES 5

@dpws88 Random thought. In the Desktop there are auto generated tables for date hierarchies that are automatically created for every date column you have. Chris Webb did a blog about this and describes the behaviour and how to disable it in the options.

It might be the issue.


Looking for more Power BI tips, tricks & tools? Check out PowerBI.tips the site I co-own with Mike Carlo. Also, if you are near SE WI? Join our PUG Milwaukee Brew City PUG

Unfortunately I don't think it is this - the issue is not the size of the pbix file, but that the actual reported downloads from Azure are inflated - when I click refresh it reports downloading double the amount of data that exists before it does any of the modelling, date table creation, etc.
dpws88
Frequent Visitor

Double post

@dpws88 Same answer as this post, try it out and let one of them know. It is appreciated if you don't double post the same thing.

http://community.powerbi.com/t5/Integrations-with-Files-and/Azure-Blob-Storage-Gzip-files-result-in-...


Looking for more Power BI tips, tricks & tools? Check out PowerBI.tips the site I co-own with Mike Carlo. Also, if you are near SE WI? Join our PUG Milwaukee Brew City PUG

Apologies, I assumed that it was lost because I hadn't registered, and then it appeared few hours later!

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors