Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

How can I effeciently process and combine multiple Excel (.xlsb)-files together?

Hello there,

After 11 hours of devistating trial-and-error, Google searching and an increasing area of patterend baldness, I hope that you guys can help point me in a right direction. My attempts have been of varying nature, so I will try and explain my attempts as thoroughly as possible. What I consequently are looking for, is an effecient way of combining multiple Excel documents with the same format into one.

Summary of task

Our purchasing department summarises their contracts with suppliers in Excel files (.xlsb), where I am tasked with combining those indiviudals files into one with Power Query, in order to easily browse the seperate financing agreements.

Structure

There is one folder containing 55 subfolders; one for each supplier. Every subfolder contains an Excel file, which shares the exact same structure as the others.

In Excel, the file.xlsb contains 4 sheets, where only one of them is relevant:

Sheet structure:

  • Row 1: 28 Columns, merged into 3 subheaders.

    • (In PQ it will look like (HEADER, null, null .... HEADER2, null, etc...)

  • Row 2: 28 Columns, each representing an unique header.

  • Row 3 & 4: Mostly nulls, with summed totals in chosen columns

    • These 2 rows complicates matters, as it makes me inable to filter out null values.

  • Row 5+: Raw data and entries, between 5-20 rows for each supplier.

    • Each PQ will have ~200 rows

When I use Power Query to access the worksheet, I find 18 tables, due to filters and hidden sheets. The time to access a given file is incredibly slow and takes approximately one minute per worksheet. Each subsequent action has an approximate loading time of 30+ seconds. My first go-to was to manually clean the 55 worksheets in the background, and eventually append them all into a single table which I would load into an Excel file.

Using advanced editor, I would copy paste this:

  • Navigate to selected table

  • Change type

  • Remove top row

  • Promote first row to headers

  • Create Custom column, with header "Supplier" and = "Supplier name" for every row

  • Reorder Custom column to first row

Then I would manually add a new PQ and copy paste this line of code in the advanced editor, and manually edit the supplier name in the code for the custom column. This worked fine for the first queries, but eventually ended up being progressively slower as I added more queries. Each click would easily take 2-4 minutes, and would crash from time to time, so I am afraid that I won't even be able to append them all in the end, due to memory limitations.

 

I then stumbled upon this guide How to process multiple folders with a single PowerQuery script - Exceed , which seemed to solve the approximate same issue. My plan here would be to have just one table with each row containing the supplier name in one column, and the table of the given sheet in the other, which had been cleaned by a function. My plan from there would be to find a way to either appended or merge the table, or some other way to load that into an Excel document which would allow for easy browsing. I never reached that far unfortunately.

 

By using the following code, I ended up with the following:

let 

    Source = Folder.Files("Q:\INDKØB\HELSE & VELVÆRE\AVISER\2022\01 Leverandørønsker\"), 

    #"Removed Other Columns" = Table.SelectColumns(Source,{"Content", "Name"}), 

    #"Removed Top Rows" = Table.Skip(#"Removed Other Columns",2), 

    #"Extracted Text Before Delimiter" = Table.TransformColumns(#"Removed Top Rows", {{"Name", each Text.BeforeDelimiter(_, "."), type text}}), 

    #"Added Custom" = Table.AddColumn(#"Extracted Text Before Delimiter", "ExcelObjects", each Excel.Workbook([Content],false)), 

    #"Expanded ExcelObjects" = Table.ExpandTableColumn(#"Added Custom", "ExcelObjects", {"Name", "Data"}, {"Name.1", "Data"}), 

    #"Invoked Custom Function" = Table.AddColumn(#"Expanded ExcelObjects", "FXCleans", each FXCleans([Data])) 

in 

    #"Invoked Custom Function" 

 

See the table here:

When I invoke a function on "Data", which contains some basic cleaning, only the "Medie booking 2022H1" has a table. I need to filter out all the others, so I only have the rows with tables. I can't seem to get that to work, where I have tried to:

  • Remove rows with errors, which has no effect

  • Applying Filter to Name.1 comes with following error message:

 

 Unexpected error: External component has thrown an exception.  

 

I am currently stuck, and I hope that you can help me get over this. I have once had luck with the filter by using a stronger RDS, but I cannot seem to transfer that to my normal work computer. I havent had luck with it subsequently.

Ultimately, I am trying to create a table with a revoked column for "Medie booking 2022H1" for each supplier. From there I would like to create a another column that dynamically sets the name for the supplier, either based on the file name or something similar. Lastly, I would like to load that into an appended table, or be able to generate a similar table structure in Power Pivot.

 

I'm sorry if this post got to long. I am an inexperienced Power Query user and troubleshooter. I have however spent 1,5 hour writing this post, so I hope that you will take kindly of it and help point me in the right direction.

5 REPLIES 5
KNP
Super User
Super User

Hi @Anonymous,

 

It sounds like there are a few issues to solve here. Lets deal with one at a time.

Is there anyway you could post a sample file (removing/changing sensitive data obviously)?

It would be much easier if I had the structure to work with.

 

I'm curious, in the code you posted above, what is the purpose of this line?

#"Removed Top Rows" = Table.Skip(#"Removed Other Columns",2),

 

 

Is the below, the main problem you need to solve right now...

KNP_0-1633637699268.png

 

 

Have I solved your problem?
Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;).
chrome-9xf-Zagzel-B

If you found this post helpful, please give Kudos.
It gives me a sense of instant gratification and, if you give me Kudos enough times, magical unicorns will appear on your screen.
If you find my signature vaguely amusing, please give Kudos.
KIfp67uy-Sr
Proud to be a Super User!PBI-Super-User-Rank-30x30-1x
Anonymous
Not applicable

Hello and thank you so much for taking the time to read and reply to my post.

 

You should be able to view an anonymizes version of one of the documents here:

https://onedrive.live.com/edit.aspx?resid=714231C5C2874041!51320&ithint=file%2cxlsx&authkey=!AKY1SuP...

 

#"Removed Top Rows" = Table.Skip(#"Removed Other Columns",2),

The purpose of this code, was to remove two Excel files that were lying in the headfolder.

 

The last part you shared is my present obstacle, which I need to get over in order to progress. My ultimate goal is to combine the files into a singular entity.

 

I've had to convert to XLSX to get this to work on my machine otherwise I'd have to install other connectors which I don't want to do but I think the steps will be the same either way.

 

This is not a solution but I'm curious if the below filtering steps are correct to reduce to just the tables you're interested in? If they are, we can look to convert it into a function to make the process dynamic.

 

let
  Source = Folder.Files("H:\My Drive\Power BI\Community Solutions\Files"),
  #"H:\My Drive\Power BI\Community Solutions\Files\_Copy of Example copy xlsx" = Source
    {
      [
        #"Folder Path" = "H:\My Drive\Power BI\Community Solutions\Files\",
        Name           = "Copy of Example copy.xlsx"
      ]
    }
    [Content],
  #"Imported Excel Workbook" = Excel.Workbook(
    #"H:\My Drive\Power BI\Community Solutions\Files\_Copy of Example copy xlsx"
  ),
  #"Filtered Rows" = Table.SelectRows(
    #"Imported Excel Workbook",
    each ([Hidden] = false) and ([Kind] = "Sheet")
  ),
  #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each Text.StartsWith([Name], "Medie")),
  #"Medie booking - 2022H1_Sheet" = #"Filtered Rows1"
    {[Item = "Medie booking - 2022H1", Kind = "Sheet"]}
    [Data]
in
  #"Medie booking - 2022H1_Sheet"
Have I solved your problem?
Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;).
chrome-9xf-Zagzel-B

If you found this post helpful, please give Kudos.
It gives me a sense of instant gratification and, if you give me Kudos enough times, magical unicorns will appear on your screen.
If you find my signature vaguely amusing, please give Kudos.
KIfp67uy-Sr
Proud to be a Super User!PBI-Super-User-Rank-30x30-1x

That method of skipping rows may cause you issues if you haven't explicitly set the order first and no it to be consistent.

It is safer to filter on something like file 'extension =...' or something consistent in the file names.

 

Is XLSX an option for you?

Have I solved your problem?
Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;).
chrome-9xf-Zagzel-B

If you found this post helpful, please give Kudos.
It gives me a sense of instant gratification and, if you give me Kudos enough times, magical unicorns will appear on your screen.
If you find my signature vaguely amusing, please give Kudos.
KIfp67uy-Sr
Proud to be a Super User!PBI-Super-User-Rank-30x30-1x

Also worth noting, the XLSB format will possibly cause you significant performance issues.

https://exceleratorbi.com.au/importing-xlsb-into-power-bi/ 

 

It would be worth converting to XLSX if at all possible.

Have I solved your problem?
Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;).
chrome-9xf-Zagzel-B

If you found this post helpful, please give Kudos.
It gives me a sense of instant gratification and, if you give me Kudos enough times, magical unicorns will appear on your screen.
If you find my signature vaguely amusing, please give Kudos.
KIfp67uy-Sr
Proud to be a Super User!PBI-Super-User-Rank-30x30-1x

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors