cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
Helper II
Helper II

Reading data from CSV files with different columns

I have a CSV file, when stripped down to the barest form, that looks like this:

 

WW33

 

Job,WW31,WW32,WW33
JobA,4,4,4
JobB,2,6,3
JobC,2,4,7

 

 

 

As you can probably guess, this is weekly data, with each Work Week (WW) as column heading. The last column is the most recent WW, and although data goes back only 3 weeks here, in my real data it goes back 12 weeks. So far so good. The next week (WW34 in this case), I will again get a similar file, but the oldest week column (WW31 in this case) eliminated, and the new week added. So, my WW34 and WW35 files would looks like this:

 

WW34

 

Job,WW32,WW33,WW34
JobA,4,4,3
JobB,6,3,4
JobC,4,7,8

 

 

 

WW35

 

Job,WW33,WW34,WW35
JobA,4,3,2
JobB,3,4,1
JobC,7,8,5

 

 

 

Do note that I have no control over the format of data and cannot be changed.

 

My goal is to set up a PowerBI project where it reads data from a certain folder, and whenever someone drops the new weeks' file, it reads data and combines them all into a single data table that should look like this:

 

Expected

 

Job,WW31,WW32,WW33,WW34,WW35
JobA,4,4,4,3,2
JobB,2,6,3,4,1
JobC,2,4,7,8,5

 

 

 

Simply put, it should have as many columns as there are weeks in all files combined, and it shouldn't have duplicate columns. I'm not quite sure how I can achieve this. If I simply set PowerBI to read data from a folder and do 'Combine and Transform', this is what I get, which obviously isn't what I want.

 

Capture.PNG

 

Is there a way to get and format data the way I want?

2 ACCEPTED SOLUTIONS

Accepted Solutions
Highlighted
Super User II
Super User II

Re: Reading data from CSV files with different columns

Yes, and it is actually very simple.  Table.Combine will do exactly what you need without even complaining that you feed it such crappy (sorry, diverse) data.

 

lbendlin_0-1598658514914.png

 

And then you can decide how to handle the nulls, and how to add the source file name (do you actually need that?).  if you want to get fancy you can build a list of files first, then import the first file, then fetch the columns of all subsequent files and only import the "new" (not already present) columns.

 

Otherwise just do a few simple transforms.

 

let
    Source = Table.Combine({WW33,WW34,WW35}),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Job"}, "Attribute", "Value"),
    #"Removed Duplicates" = Table.Distinct(#"Unpivoted Other Columns"),
    #"Pivoted Column" = Table.Pivot(#"Removed Duplicates", List.Distinct(#"Removed Duplicates"[Attribute]), "Attribute", "Value")
in
    #"Pivoted Column"

 

to arrive at this.

 

lbendlin_0-1598659250777.png

 

View solution in original post

Highlighted
Super User III
Super User III

Re: Reading data from CSV files with different columns

Hi @Sach ,

 

I copied your 3 examples to 3 text files, W33.txt, W34.txt, and W35.txt and I get this output:

edhans_1-1598658634214.png

This should work to infiity if you continue to have that same rolling 3 week pattern. This will blow up though if you have multiple files with the same week numbers but different values. Week 33 in 2020 vs Week 33 in 2021 for example. The column names would need to be different, or a lot more logic would need to go into this.

 

That said, here is the M code:

 

let
    Source = Folder.Files("C:\Users\Ed Hansberry\OneDrive\Work Stuff\Power BI Forum Examples\Test Files\Weekly Data"),
    #"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Transform File"}),
    AllColumnsNames = 
        Table.ColumnNames(
            Table.Combine(#"Removed Other Columns1"[Transform File])
        ),
    #"Expanded Transform File" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", AllColumnsNames, AllColumnsNames),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Expanded Transform File", {"Job"}, "Column", "Amount"),
    #"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"Amount", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Job", "Column"}, {{"Amount", each List.Average([Amount]), type nullable number}}),
    #"Pivoted Column" = Table.Pivot(#"Grouped Rows", List.Distinct(#"Grouped Rows"[Column]), "Column", "Amount")
in
    #"Pivoted Column"

 

This also assumes when you do the combine operation, the "Transform Sample File" step only has the promote headers step, not the changed type step. That will wreck it too.

 

I am going to share my PBIX file here because you need to see the full thing including the combine files magic Power Query does. I edited the Transform Sample File query to suit my needs.

 

The key to this is the AllColumnNames step, which just keeps getting a larger and larger list of columns as the weeks roll on. I guess the Table.ColumnNames function does a unique union for me. I was expecting to have to wrap it in List.Distinct but it wasn't necessary. Those column names are what are used later when the table expands, vs the normal hardcoded columns you get when you manually expand a table column.

 

Also, my first table has a lot of duplicates. I get rid of those in the Grouped Rows. That step assumes the values are the same. So the Week 33 file for Job b in week 33 has 3. As long as the week 34 and 35 files also have 4 for job B in week 33, you are good. If that changes, then this will not work. I am using MIN() to get that number. MAX() and AVERAGE() will all do the same thing. If you are expecting that to change, MIN/MAX/AVERAGE woulda all return different results obviously, and none of them "4". That would seem to be to be a source data issue, and we'd have to discuss how to handle that scenario.

 

Here is my PBIX file and the 3 text files, in a single zip.

 

You'll need to unzip, then change the source in Power Query to point to where those TXT files are.

 

How to use M code provided in a blank query:
1) In Power Query, select New Source, then Blank Query
2) On the Home ribbon, select "Advanced Editor" button
3) Remove everything you see, then paste the M code I've given you in that box.
4) Press Done
5) See this article if you need help using this M code in your model.

 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

View solution in original post

15 REPLIES 15
Highlighted
Super User II
Super User II

Re: Reading data from CSV files with different columns

Yes, and it is actually very simple.  Table.Combine will do exactly what you need without even complaining that you feed it such crappy (sorry, diverse) data.

 

lbendlin_0-1598658514914.png

 

And then you can decide how to handle the nulls, and how to add the source file name (do you actually need that?).  if you want to get fancy you can build a list of files first, then import the first file, then fetch the columns of all subsequent files and only import the "new" (not already present) columns.

 

Otherwise just do a few simple transforms.

 

let
    Source = Table.Combine({WW33,WW34,WW35}),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Job"}, "Attribute", "Value"),
    #"Removed Duplicates" = Table.Distinct(#"Unpivoted Other Columns"),
    #"Pivoted Column" = Table.Pivot(#"Removed Duplicates", List.Distinct(#"Removed Duplicates"[Attribute]), "Attribute", "Value")
in
    #"Pivoted Column"

 

to arrive at this.

 

lbendlin_0-1598659250777.png

 

View solution in original post

Highlighted
Super User III
Super User III

Re: Reading data from CSV files with different columns

Hi @Sach ,

 

I copied your 3 examples to 3 text files, W33.txt, W34.txt, and W35.txt and I get this output:

edhans_1-1598658634214.png

This should work to infiity if you continue to have that same rolling 3 week pattern. This will blow up though if you have multiple files with the same week numbers but different values. Week 33 in 2020 vs Week 33 in 2021 for example. The column names would need to be different, or a lot more logic would need to go into this.

 

That said, here is the M code:

 

let
    Source = Folder.Files("C:\Users\Ed Hansberry\OneDrive\Work Stuff\Power BI Forum Examples\Test Files\Weekly Data"),
    #"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Transform File"}),
    AllColumnsNames = 
        Table.ColumnNames(
            Table.Combine(#"Removed Other Columns1"[Transform File])
        ),
    #"Expanded Transform File" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", AllColumnsNames, AllColumnsNames),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Expanded Transform File", {"Job"}, "Column", "Amount"),
    #"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"Amount", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Job", "Column"}, {{"Amount", each List.Average([Amount]), type nullable number}}),
    #"Pivoted Column" = Table.Pivot(#"Grouped Rows", List.Distinct(#"Grouped Rows"[Column]), "Column", "Amount")
in
    #"Pivoted Column"

 

This also assumes when you do the combine operation, the "Transform Sample File" step only has the promote headers step, not the changed type step. That will wreck it too.

 

I am going to share my PBIX file here because you need to see the full thing including the combine files magic Power Query does. I edited the Transform Sample File query to suit my needs.

 

The key to this is the AllColumnNames step, which just keeps getting a larger and larger list of columns as the weeks roll on. I guess the Table.ColumnNames function does a unique union for me. I was expecting to have to wrap it in List.Distinct but it wasn't necessary. Those column names are what are used later when the table expands, vs the normal hardcoded columns you get when you manually expand a table column.

 

Also, my first table has a lot of duplicates. I get rid of those in the Grouped Rows. That step assumes the values are the same. So the Week 33 file for Job b in week 33 has 3. As long as the week 34 and 35 files also have 4 for job B in week 33, you are good. If that changes, then this will not work. I am using MIN() to get that number. MAX() and AVERAGE() will all do the same thing. If you are expecting that to change, MIN/MAX/AVERAGE woulda all return different results obviously, and none of them "4". That would seem to be to be a source data issue, and we'd have to discuss how to handle that scenario.

 

Here is my PBIX file and the 3 text files, in a single zip.

 

You'll need to unzip, then change the source in Power Query to point to where those TXT files are.

 

How to use M code provided in a blank query:
1) In Power Query, select New Source, then Blank Query
2) On the Home ribbon, select "Advanced Editor" button
3) Remove everything you see, then paste the M code I've given you in that box.
4) Press Done
5) See this article if you need help using this M code in your model.

 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

View solution in original post

Highlighted
Memorable Member
Memorable Member

Re: Reading data from CSV files with different columns

if w33, w34 and w45 are your tables, this makes the Job (A, B and so on ...) 😁

 

= let
    Source = w35,
    Cw35=Table.TransformRows(w35, each w33{[Job=_[Job]]}&w34{[Job=_[Job]]}&_)
in
   Table.FromRecords(Cw35)

 

this is only a sketch of an idea tath coul be modified to manage more general similar situations

Highlighted
Community Support
Community Support

Re: Reading data from CSV files with different columns

Hi @Sach ,

 

Is this problem solved?

 

If it is solved, please always accept the replies making sense as solution to your question so that people who may have the same question can get the solution directly.

 

If not, please let me know.

 

 

Best Regards,

Icey

Highlighted
Solution Supplier
Solution Supplier

Re: Reading data from CSV files with different columns

@Icey 

He has made two posts about this problem, and another post has been marked as solved. Here is the link:

Getting data from CSV files that have slightly different columns each week 

Highlighted
Super User III
Super User III

Re: Reading data from CSV files with different columns

Ugh. That is super frustrating @Sach 

 

Please don't create duplicate posts. A lot of time was spent in this thread solving the issue. Please be more careful in the future.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting
Highlighted
Helper II
Helper II

Re: Reading data from CSV files with different columns

Sorry about the confusing guys.

When I created this post, for some reason the community marked it as spam, and wasn't allowed to post. So I created the other post, and this didn't even show up on my profile for a while.

 

Thanks for the detailed answer, I'll go thorugh it! Really appreciate you guys help.

Highlighted
Super User III
Super User III

Re: Reading data from CSV files with different columns

Thanks for that info @Sach - I will alert the mods on this. The spam detector may be a bit aggressive.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting
Highlighted
Helper II
Helper II

Re: Reading data from CSV files with different columns

@edhans thank you! This is pretty detailed and the attached PowerBI file helped me a lot.

Helpful resources

Announcements
Get Ready for Power BI Dev Camp

Get Ready for Power BI Dev Camp

Mark your calendars and join us for our next Power BI Dev Camp!.

Community Conference

Power Platform Community Conference

Check out the on demand sessions that are available now!

secondImage

Power Platform 2020 release wave 2 plan

Features releasing from October 2020 through March 2021

Tech Marathon

Maratón de Soluciones de Negocio Microsoft

Una semana de contenido con +100 sesiones educativas, consultorios, +10 workshops Premium, Hackaton, EXPO, Networking Hall y mucho más!

Top Solution Authors
Top Kudoed Authors