Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hello,
I have several different CSV files in a folder. I want to extract Header names and File Names from these files using Power Query.
End result should give me a table with 2 columns: FileName, HeaderName.
Also,
I thought I had it figured out using the code below. But it turns out some files have more than 21 columns in them and if I use the method below it cuts of at 21 and inludes only maximum of 21 headers per file. Also I am pretty sure there should be a better way to do this then DemotingHeaders to grab FirstRow. Unfortunately I don't have access to SQL at the moment which would have simplified this task. Thank you!
let
Source = Folder.Files("C:\Users\????\????\Desktop\?????\Data"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File from Crossphase_data", each #"Transform File from Crossphase_data"([Content])),
#"Removed Columns" = Table.RemoveColumns(#"Invoke Custom Function1",{"Extension", "Date accessed", "Date modified", "Date created", "Attributes", "Folder Path"}), #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Transform File from Crossphase_data", "Custom"}}),
#"Demote Headers" = Table.TransformColumns(#"Removed Columns", {{"Transform File from Crossphase_data", each Table.DemoteHeaders(_)}}),
#"First Row" = Table.TransformColumns(#"Demote Headers", {{"Transform File from Crossphase_data", each Table.First(_)}}),
#"Expanded Transform File from Crossphase_data" = Table.ExpandRecordColumn(#"First Row", "Transform File from Crossphase_data", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21"}, {"Transform File from Crossphase_data.Column1", "Transform File from Crossphase_data.Column2", "Transform File from Crossphase_data.Column3", "Transform File from Crossphase_data.Column4", "Transform File from Crossphase_data.Column5", "Transform File from Crossphase_data.Column6", "Transform File from Crossphase_data.Column7", "Transform File from Crossphase_data.Column8", "Transform File from Crossphase_data.Column9", "Transform File from Crossphase_data.Column10", "Transform File from Crossphase_data.Column11", "Transform File from Crossphase_data.Column12", "Transform File from Crossphase_data.Column13", "Transform File from Crossphase_data.Column14", "Transform File from Crossphase_data.Column15", "Transform File from Crossphase_data.Column16", "Transform File from Crossphase_data.Column17", "Transform File from Crossphase_data.Column18", "Transform File from Crossphase_data.Column19", "Transform File from Crossphase_data.Column20", "Transform File from Crossphase_data.Column21"})
in
#"Expanded Transform File from Crossphase_data"
Solved! Go to Solution.
Hello @v3ntrue
i saw i just yet (and i was also advised by @Anonymous - thanks for this) that I didn't realize that the column numbers were hardcoded in the function.
So you should use this code instead
let
Source = Folder.Files("YourPathGoesHere"),
GetHeaderOfCSV = (data)=>
let
Source = Csv.Document(data,[Delimiter=";" Encoding=65001, QuoteStyle=QuoteStyle.None]),
Promote = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
Header = Table.ColumnNames(Promote)
in
Header,
AddHeader = Table.AddColumn
(
Source,
"HeaderNames",
each GetHeaderOfCSV ([Content])
),
RemoveOther = Table.SelectColumns(AddHeader,{"HeaderNames", "Name"}),
ExpandList = Table.ExpandListColumn(RemoveOther, "HeaderNames")
in
ExpandList
and in case change the CSV-function to your needs (different delimiter, encoding etc.)
Now it should work
Give it a try and let me know
All the best
Jimmy
did the development at the same time, so to codes to choose from 🙂
let
Source = Folder.Files("yourfoldergoeshere"),
GetHeaderOfCSV = (data)=>
let
Source = Csv.Document(data,[Delimiter=";", Columns=4, Encoding=65001, QuoteStyle=QuoteStyle.None]),
Promote = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
Header = Table.ColumnNames(Promote)
in
Header,
AddHeader = Table.AddColumn
(
Source,
"HeaderNames",
each GetHeaderOfCSV ([Content])
),
#"Removed Other Columns" = Table.SelectColumns(AddHeader,{"HeaderNames", "Name"}),
#"Extracted Values" = Table.TransformColumns(#"Removed Other Columns", {"HeaderNames", each Text.Combine(List.Transform(_, Text.From), ", "), type text})
in
#"Extracted Values"
result looks like this
Copy paste this code to the advanced editor in a new blank query to see how the solution works.
If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too
Have fun
Jimmy
Hi @Jimmy801 ,
Thank you very much for your reply. Your code works just like you said it would.
I think I was not very clear in my initial ask. I wanted to have only one column name per row. Your solution places all column names in a single row. Basically I was looking for something like this,
-------------------------------------
| File Name | ColumnName |
-------------------------------------
| A.csv | abc_column |
| A.csv | bac_column |
| A.csv | cfe_column |
| B.csv | eda_column |
| B.csv | edb_column |
-------------------------------------
While your solution is not exactly what I was looking for, it is the closest one to my desired result and it gives me enough to move forward. I can simply use Text-To-Columns Excel functionality and then UnPivot columns with PQ to achieve my desired result.
Thank you 🙂
Hello @v3ntrue
then just use this code. Instead of combine the result into one row, the list is expaanded to new rows.
let
Source = Folder.Files("YourPathGoesHere"),
GetHeaderOfCSV = (data)=>
let
Source = Csv.Document(data,[Delimiter=";", Columns=4, Encoding=65001, QuoteStyle=QuoteStyle.None]),
Promote = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
Header = Table.ColumnNames(Promote)
in
Header,
AddHeader = Table.AddColumn
(
Source,
"HeaderNames",
each GetHeaderOfCSV ([Content])
),
RemoveOther = Table.SelectColumns(AddHeader,{"HeaderNames", "Name"}),
ExpandList = Table.ExpandListColumn(RemoveOther, "HeaderNames")
in
ExpandList
Copy paste this code to the advanced editor in a new blank query to see how the solution works. If this solution fits your need, copy and past a part of it and implement it in your query.
If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too
Have fun
Jimmy
I have tried updated solution and unfortunatelly the result looks different from what I expected.
This is what the result looks like with your new code,
------------------------------------------------------
| Header Names | Name |
------------------------------------------------------
| "A","B","C"... | A.csv |
| (blank) | A.csv |
| _1 | A.csv |
| _2 | A.csv |
| "E","F","G"... | B.csv |
| (blank) | B.csv |
| _1 | B.csv |
| _2 | B.csv |
| "H","J","K"... | C.csv |
| (blank) | C.csv |
| _1 | C.csv |
| _2 | C.csv |
------------------------------------------------------
Hello @v3ntrue
i saw i just yet (and i was also advised by @Anonymous - thanks for this) that I didn't realize that the column numbers were hardcoded in the function.
So you should use this code instead
let
Source = Folder.Files("YourPathGoesHere"),
GetHeaderOfCSV = (data)=>
let
Source = Csv.Document(data,[Delimiter=";" Encoding=65001, QuoteStyle=QuoteStyle.None]),
Promote = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
Header = Table.ColumnNames(Promote)
in
Header,
AddHeader = Table.AddColumn
(
Source,
"HeaderNames",
each GetHeaderOfCSV ([Content])
),
RemoveOther = Table.SelectColumns(AddHeader,{"HeaderNames", "Name"}),
ExpandList = Table.ExpandListColumn(RemoveOther, "HeaderNames")
in
ExpandList
and in case change the CSV-function to your needs (different delimiter, encoding etc.)
Now it should work
Give it a try and let me know
All the best
Jimmy
Thanks @Jimmy801 ! This time your solution worked exactly as expected. Perfect!
Hi @v3ntrue
Check if below code works for you. You may want to change Delimiter="," as per your csv ones.
let
Source = Folder.Files("C:\Users\ankit_jain09\Desktop\Power BI\Power BI Community\New folder"),
#"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".csv")),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Content", "Name"}),
GetColumnNames = Table.AddColumn(#"Removed Other Columns", "ColumnNames", each Table.ColumnNames(Csv.Document([Content],[Delimiter=",",Encoding=1252, QuoteStyle=QuoteStyle.None]))),
#"Expanded ColumnNames" = Table.ExpandListColumn(GetColumnNames, "ColumnNames"),
#"Removed Columns" = Table.RemoveColumns(#"Expanded ColumnNames",{"Content"})
in
#"Removed Columns"
Thanks
Ankit Jain
Do Mark it as solution if the response resolved your problem. Do Kudo the response if it seems good and helpful.
Hi @AnkitBI ,
Thank you for your prompt reply.
Unfortunately your code does not seem to work as expected. In "ColumnName" column it gives values such as "Column1", "Column2", "Column3" etc. instead of actual column names from csv files. Basically it looks like this,
------------------------------------------
| File Name | ColumnName |
------------------------------------------
| A.csv | Column1 |
| A.csv | Column2 |
| A.csv | Column3 |
------------------------------------------