Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more.
Get startedGrow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.
I have a many text files that contains data I want to extract using Power Query.
The text files each contain multiple separated blocks of csv data. Generally the format is something like what follows:
title string row
subtitle string row
blank row
csv header row
289 csv rows
blank row
csv header row
289 csv rows
blank row
title string row
subtitle string row
blank row
csv header row
289 csv rows
blank row
csv header row
289 csv rows
blank row
title string row
subtitle string row
blank row
csv header row
289 csv rows
blank row
csv header row
289 csv rows
blank row
etc
Two title and subtitle rows contains structured information that needs to be captured which identifies the source of the block of csv content following it. This information should be added the csv block of data as additional new columns.
Any strategies, approaches or methodologies that can be recommended much appreciated!
Solved! Go to Solution.
Hi @009co
Starting with 1 csv file, you want to import is using Csv.Document function, but you need to ignore Split By Delimiter option. Just import all rows into 1 column. With this temporary data table needs to be split by Rows. This is where you need to be creative depending on the number of embedded parts. A simple approach is you use the Top Rows and Bottom Rows if it is a set number of rows.
An advanced option would be assign a key to each part, 1 for title string row , 2 for subtitle string row , 3 for 1st csv header row & 289 csv rows , 4 for 2nd csv header row & 289 csv rows, etc. Once the Key is assigned you can use a Group By function to "Count" rows, and then replace "List.Count([Data])" with just "[Data]" (i.e. remove the Count function). You should now have Key 1, 2, 3, 4 etc with a nested table containing the specific row.
Now you can start spliting the Single Column from each table, and promoting it respective headings.
The final the trick is figure out when to convert to Query function to read all CSV files, so you have a table containing:
Filename, Key, SplitDataTable
It is a bit hard to help because I am uncertain what your data model need to look like. Also you may want to consider using Dataflow to breakup the steps or possible at a incremental load of recent csv to previously processed files.
I ended up creating a few custom columns that look for specific strings in the data rows to label a row as a data row or not in manner similar to that described by @Daryl-Lynch-Bzy
Then I used "Fill down" feature to populate subsequent rows with that label.
Then I could do some selective filtering to remove unneeded rows.
This was quite effective!
Great thanks Daryl-Lynch-Bzy for the detailed and helpful reply!
Reading in data as Csv.Document as sinlge column as you describe is a good start. I was already bringing csv file contents into a single column by splitting on a character that I knew didn't exist in the content eg "|" pipe character.
= Table.AddColumn(Source, "File Content", each Csv.Document([Content],[Delimiter="|", Columns=1, Encoding=1252, QuoteStyle=QuoteStyle.None]))
Hadn't thought to simply ignore the option. Seems obvious in hindsight, thanks!
The next parts is where creativity is required.
The title, subtitle and csv header rows all start with a consistent string which I can rely on to assign unique key in custom column.
Will try both the simple and advanced suggestions.
Hi @009co
Starting with 1 csv file, you want to import is using Csv.Document function, but you need to ignore Split By Delimiter option. Just import all rows into 1 column. With this temporary data table needs to be split by Rows. This is where you need to be creative depending on the number of embedded parts. A simple approach is you use the Top Rows and Bottom Rows if it is a set number of rows.
An advanced option would be assign a key to each part, 1 for title string row , 2 for subtitle string row , 3 for 1st csv header row & 289 csv rows , 4 for 2nd csv header row & 289 csv rows, etc. Once the Key is assigned you can use a Group By function to "Count" rows, and then replace "List.Count([Data])" with just "[Data]" (i.e. remove the Count function). You should now have Key 1, 2, 3, 4 etc with a nested table containing the specific row.
Now you can start spliting the Single Column from each table, and promoting it respective headings.
The final the trick is figure out when to convert to Query function to read all CSV files, so you have a table containing:
Filename, Key, SplitDataTable
It is a bit hard to help because I am uncertain what your data model need to look like. Also you may want to consider using Dataflow to breakup the steps or possible at a incremental load of recent csv to previously processed files.