Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
anjbauer1993
Frequent Visitor

Data loaded in from Azure Blob Storage double the size of the actual files

I'm pulling in a large set of PDF files from a blob container in Azure, 4,735 files totalling 476 MB. The main issue I'm having is that when refreshing, or even just adding a step in Power Query, over 900 MB of data is loaded. It takes a very long time to refresh and apply any new steps in Power Query. I've attached the relavent Power Query code below. This report is just intended to identify any corrupted PDFs or PDFs that are in the wrong format so that they can be fixed for use in another report so this query is relatively simple.

// simpleLoad
let
    Source = (#"Sample File Parameter1" as binary) => let
        Source = Pdf.Tables(#"Sample File Parameter1"),
        Table002 = Source{[Id="Table002"]}[Data]

    in
        Table002
in
    Source
// timesheets (2)
let
    Source = AzureStorage.Blobs(acct),
    timesheets1 = Source{[Name="timesheets"]}[Data],
    #"buffer" = Table.Buffer(timesheets1),
    #"Invoke Custom Function1" = Table.AddColumn(#"buffer", "simpleLoad", each #"simpleLoad"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "simpleLoad"}),
    #"Kept Errors" = Table.SelectRowsWithErrors(#"Removed Other Columns1", {"simpleLoad"}),
    #"Added Custom" = Table.AddColumn(#"Kept Errors", "Error", each try [simpleLoad]),
    #"Expanded Error" = Table.ExpandRecordColumn(#"Added Custom", "Error", {"Error"}, {"Error.1"}),
    #"Expanded Error.1" = Table.ExpandRecordColumn(#"Expanded Error", "Error.1", {"Reason", "Message"}, {"Reason", "Message"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Error.1",{"simpleLoad"})
in
    #"Removed Columns"

 

 

I found another post that deals with the same issue but with gzip files, it was recommended there that you disable auto generated tables for date hierarchies, which did not change anything in my report. I added the buffer step in to hopefully make adding steps go quicker in Power Query but that also didn't seem to help.

3 REPLIES 3
nowreena21
Frequent Visitor

Hello @anjbauer1993 ,
Were you able to figure out why your data load was doubled in PBI, I am facing similar issue in my load from Azure blob storage. I am reading a csv file which is around 4 GB but PBI load more than 8 GB for it. 
Can anyone help on it?

v-yuta-msft
Community Support
Community Support

@anjbauer1993 ,

 

From the power query you have posted, it seems like you have expend the Error column in several steps. I'm not sure but this could be the cause of this issue, so for troubleshooting you may remove these steps and check if the data size is normal.

let
    Source = AzureStorage.Blobs(acct),
    timesheets1 = Source{[Name="timesheets"]}[Data],
    #"buffer" = Table.Buffer(timesheets1),
    #"Invoke Custom Function1" = Table.AddColumn(#"buffer", "simpleLoad", each #"simpleLoad"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "simpleLoad"}),
    #"Kept Errors" = Table.SelectRowsWithErrors(#"Removed Other Columns1", {"simpleLoad"}),
    #"Added Custom" = Table.AddColumn(#"Kept Errors", "Error", each try [simpleLoad]),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom", {"simpleLoad"})
in
    #"Removed Columns"

 

Community Support Team _ Jimmy Tao

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

@v-yuta-msft

Unfortunately that did not do the trick. I've created a sample dataset in order to move through things quicker, now only 69 MB. I also created a new query that only pulls the files in and does no transformation, but it's still downloading 132 MB when refreshing. The Power Query code is below

 

// timesheets-test
let
    Source = AzureStorage.Blobs(acct),
    #"timesheets-test1" = Source{[Name="timesheets-test"]}[Data]
in
    #"timesheets-test1"

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors