cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
Regular Visitor

Power Query Getting Header Names and File Names from Different CSV files

Hello, 

 

I have several different CSV files in a folder. I want to extract Header names and File Names from these files using Power Query.

 

End result should give me a table with 2 columns: FileName, HeaderName.

 

Also,

  • source files have different names
  • some header names in source files are different, some are identical
  • source files have different number of columns
  • source files are in tabular form (first row contains header names)

 

I thought I had it figured out using the code below. But it turns out some files have more than 21 columns in them and if I use the method below it cuts of at 21 and inludes only maximum of 21 headers per file. Also I am pretty sure there should be a better way to do this then DemotingHeaders to grab FirstRow.  Unfortunately I don't have access to SQL at the moment which would have simplified this task. Thank you! 

 

 

let 
 Source = Folder.Files("C:\Users\????\????\Desktop\?????\Data"), 
 #"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
 #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File from Crossphase_data", each #"Transform File from Crossphase_data"([Content])), 
 #"Removed Columns" = Table.RemoveColumns(#"Invoke Custom Function1",{"Extension", "Date accessed", "Date modified", "Date created", "Attributes", "Folder Path"}), #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Transform File from Crossphase_data", "Custom"}}),
 #"Demote Headers" = Table.TransformColumns(#"Removed Columns", {{"Transform File from Crossphase_data", each Table.DemoteHeaders(_)}}), 
 #"First Row" = Table.TransformColumns(#"Demote Headers", {{"Transform File from Crossphase_data", each Table.First(_)}}), 
 #"Expanded Transform File from Crossphase_data" = Table.ExpandRecordColumn(#"First Row", "Transform File from Crossphase_data", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21"}, {"Transform File from Crossphase_data.Column1", "Transform File from Crossphase_data.Column2", "Transform File from Crossphase_data.Column3", "Transform File from Crossphase_data.Column4", "Transform File from Crossphase_data.Column5", "Transform File from Crossphase_data.Column6", "Transform File from Crossphase_data.Column7", "Transform File from Crossphase_data.Column8", "Transform File from Crossphase_data.Column9", "Transform File from Crossphase_data.Column10", "Transform File from Crossphase_data.Column11", "Transform File from Crossphase_data.Column12", "Transform File from Crossphase_data.Column13", "Transform File from Crossphase_data.Column14", "Transform File from Crossphase_data.Column15", "Transform File from Crossphase_data.Column16", "Transform File from Crossphase_data.Column17", "Transform File from Crossphase_data.Column18", "Transform File from Crossphase_data.Column19", "Transform File from Crossphase_data.Column20", "Transform File from Crossphase_data.Column21"}) 
in 
#"Expanded Transform File from Crossphase_data"

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Hello @v3ntrue 

 

i saw i just yet (and i was also advised by @jborro - thanks for this) that I didn't realize that the column numbers were hardcoded in the function.

So you should use this code instead 

 

let
    Source = Folder.Files("YourPathGoesHere"),
    GetHeaderOfCSV = (data)=>
    let
        Source = Csv.Document(data,[Delimiter=";" Encoding=65001, QuoteStyle=QuoteStyle.None]),
        Promote = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
        Header = Table.ColumnNames(Promote)
    in
        Header,
    AddHeader = Table.AddColumn
    (
        Source,
        "HeaderNames",
        each GetHeaderOfCSV ([Content])
    ),
    RemoveOther = Table.SelectColumns(AddHeader,{"HeaderNames", "Name"}),
    ExpandList = Table.ExpandListColumn(RemoveOther, "HeaderNames")
in
    ExpandList

 

and in case change the CSV-function to your needs (different delimiter, encoding etc.)

image.png

Now it should work

Give it a try and let me know

 

All the best

 

Jimmy

View solution in original post

9 REPLIES 9
Highlighted
Super User I
Super User I

Hi @v3ntrue 

Check if below code works for you. You may want to change Delimiter="," as per your csv ones.

 

let
    Source = Folder.Files("C:\Users\ankit_jain09\Desktop\Power BI\Power BI Community\New folder"),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".csv")),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Content", "Name"}),
    GetColumnNames = Table.AddColumn(#"Removed Other Columns", "ColumnNames", each Table.ColumnNames(Csv.Document([Content],[Delimiter=",",Encoding=1252, QuoteStyle=QuoteStyle.None]))),
    #"Expanded ColumnNames" = Table.ExpandListColumn(GetColumnNames, "ColumnNames"),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded ColumnNames",{"Content"})
in
    #"Removed Columns"

 

Thanks
Ankit Jain
Do Mark it as solution if the response resolved your problem. Do Kudo the response if it seems good and helpful.

Highlighted
Super User II
Super User II

Hello @v3ntrue  and @AnkitBI

 

did the development at the same time, so to codes to choose from 🙂

let
    Source = Folder.Files("yourfoldergoeshere"),
    GetHeaderOfCSV = (data)=>
    let
        Source = Csv.Document(data,[Delimiter=";", Columns=4, Encoding=65001, QuoteStyle=QuoteStyle.None]),
        Promote = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
        Header = Table.ColumnNames(Promote)
    in
        Header,
    AddHeader = Table.AddColumn
    (
        Source,
        "HeaderNames",
        each GetHeaderOfCSV ([Content])
    ),
    #"Removed Other Columns" = Table.SelectColumns(AddHeader,{"HeaderNames", "Name"}),
    #"Extracted Values" = Table.TransformColumns(#"Removed Other Columns", {"HeaderNames", each Text.Combine(List.Transform(_, Text.From), ", "), type text})
in
    #"Extracted Values"

 

result looks like this

image.png

 

Copy paste this code to the advanced editor in a new blank query to see how the solution works.

If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too

Have fun

Jimmy

 

Highlighted

Hi @AnkitBI ,

 

Thank you for your prompt reply.

 

Unfortunately your code does not seem to work as expected. In "ColumnName" column it gives values such as "Column1", "Column2", "Column3" etc. instead of actual column names from csv files. Basically it looks like this,

------------------------------------------

|     File Name     |     ColumnName     |

------------------------------------------

|          A.csv        |     Column1             | 

|          A.csv        |     Column2             |       

|          A.csv        |     Column3             |       

------------------------------------------

Highlighted

Hi @Jimmy801 ,

 

Thank you very much for your reply. Your code works just like you said it would.

 

I think I was not very clear in my initial ask. I wanted to have only one column name per row. Your solution places all column names in a single row. Basically I was looking for something like this,

-------------------------------------

|    File Name   |   ColumnName   |

-------------------------------------

|    A.csv           |   abc_column      |

|    A.csv           |   bac_column      |

|    A.csv           |   cfe_column       |

|    B.csv           |   eda_column      |

|    B.csv           |   edb_column      |

-------------------------------------

 

While your solution is not exactly what I was looking for, it is the closest one to my desired result and it gives me enough to move forward. I can simply use Text-To-Columns Excel functionality and then UnPivot columns with PQ to achieve my desired result.

 

Thank you 🙂

 

Highlighted

Hello @v3ntrue 

 

then just use this code. Instead of combine the result into one row, the list is expaanded to new rows.

let
    Source = Folder.Files("YourPathGoesHere"),
    GetHeaderOfCSV = (data)=>
    let
        Source = Csv.Document(data,[Delimiter=";", Columns=4, Encoding=65001, QuoteStyle=QuoteStyle.None]),
        Promote = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
        Header = Table.ColumnNames(Promote)
    in
        Header,
    AddHeader = Table.AddColumn
    (
        Source,
        "HeaderNames",
        each GetHeaderOfCSV ([Content])
    ),
    RemoveOther = Table.SelectColumns(AddHeader,{"HeaderNames", "Name"}),
    ExpandList = Table.ExpandListColumn(RemoveOther, "HeaderNames")
in
    ExpandList

 

Copy paste this code to the advanced editor in a new blank query to see how the solution works. If this solution fits your need, copy and past a part of it and implement it in your query.

If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too

Have fun

Jimmy

Highlighted

@Jimmy801 

 

I have tried updated solution and unfortunatelly the result looks different from what I expected.

 

This is what the result looks like with your new code,

 

------------------------------------------------------

|     Header Names               |     Name                  |

------------------------------------------------------

|          "A","B","C"...                |    A.csv                   |

|            (blank)                      |    A.csv                   | 

|          _1                               |    A.csv                    |    

|          _2                               |    A.csv                    | 

|          "E","F","G"...                |    B.csv                    |

|            (blank)                      |    B.csv                   | 

|          _1                               |    B.csv                    |    

|          _2                               |    B.csv                    | 

|          "H","J","K"...                |    C.csv                    |

|            (blank)                      |    C.csv                   | 

|          _1                               |    C.csv                    |    

|          _2                               |    C.csv                    | 

------------------------------------------------------

Highlighted

Hello @v3ntrue 

 

i saw i just yet (and i was also advised by @jborro - thanks for this) that I didn't realize that the column numbers were hardcoded in the function.

So you should use this code instead 

 

let
    Source = Folder.Files("YourPathGoesHere"),
    GetHeaderOfCSV = (data)=>
    let
        Source = Csv.Document(data,[Delimiter=";" Encoding=65001, QuoteStyle=QuoteStyle.None]),
        Promote = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
        Header = Table.ColumnNames(Promote)
    in
        Header,
    AddHeader = Table.AddColumn
    (
        Source,
        "HeaderNames",
        each GetHeaderOfCSV ([Content])
    ),
    RemoveOther = Table.SelectColumns(AddHeader,{"HeaderNames", "Name"}),
    ExpandList = Table.ExpandListColumn(RemoveOther, "HeaderNames")
in
    ExpandList

 

and in case change the CSV-function to your needs (different delimiter, encoding etc.)

image.png

Now it should work

Give it a try and let me know

 

All the best

 

Jimmy

View solution in original post

Highlighted

Thanks @Jimmy801 ! This time your solution worked exactly as expected. Perfect!

Highlighted

Hello @v3ntrue 

 

I'm glad to hear that. Your feedback is highly appreciated.

 

BR

 

Jimmy

Helpful resources

Announcements
Community Conference

Power Platform Community Conference

Check out the on demand sessions that are available now!

Community Conference

Microsoft Power Platform Communities

Check out the Winners!

secondImage

Power Platform 2020 release wave 2 plan

Features releasing from October 2020 through March 2021

Top Solution Authors
Top Kudoed Authors