Solved: Parsing text file with multiple separated blocks o...

009co · ‎02-08-2022

I have a many text files that contains data I want to extract using Power Query.

The text files each contain multiple separated blocks of csv data. Generally the format is something like what follows:

title string row

subtitle string row

blank row

csv header row

289 csv rows

blank row

csv header row

289 csv rows

blank row

title string row

subtitle string row

blank row

csv header row

289 csv rows

blank row

csv header row

289 csv rows

blank row

title string row

subtitle string row

blank row

csv header row

289 csv rows

blank row

csv header row

289 csv rows

blank row

etc

Two title and subtitle rows contains structured information that needs to be captured which identifies the source of the block of csv content following it. This information should be added the csv block of data as additional new columns.

Any strategies, approaches or methodologies that can be recommended much appreciated!

Daryl-Lynch-Bzy · ‎02-09-2022

Hi @009co
Starting with 1 csv file, you want to import is using Csv.Document function, but you need to ignore Split By Delimiter option. Just import all rows into 1 column. With this temporary data table needs to be split by Rows. This is where you need to be creative depending on the number of embedded parts. A simple approach is you use the Top Rows and Bottom Rows if it is a set number of rows.

An advanced option would be assign a key to each part, 1 for title string row , 2 for subtitle string row , 3 for 1st csv header row & 289 csv rows , 4 for 2nd csv header row & 289 csv rows, etc. Once the Key is assigned you can use a Group By function to "Count" rows, and then replace "List.Count([Data])" with just "[Data]" (i.e. remove the Count function). You should now have Key 1, 2, 3, 4 etc with a nested table containing the specific row.

Now you can start spliting the Single Column from each table, and promoting it respective headings.

The final the trick is figure out when to convert to Query function to read all CSV files, so you have a table containing:

Filename, Key, SplitDataTable

It is a bit hard to help because I am uncertain what your data model need to look like. Also you may want to consider using Dataflow to breakup the steps or possible at a incremental load of recent csv to previously processed files.

View solution in original post

009co · ‎02-25-2022

I ended up creating a few custom columns that look for specific strings in the data rows to label a row as a data row or not in manner similar to that described by @Daryl-Lynch-Bzy

Then I used "Fill down" feature to populate subsequent rows with that label.

Then I could do some selective filtering to remove unneeded rows.

This was quite effective!

009co · ‎02-09-2022

Great thanks Daryl-Lynch-Bzy for the detailed and helpful reply!

Reading in data as Csv.Document as sinlge column as you describe is a good start. I was already bringing csv file contents into a single column by splitting on a character that I knew didn't exist in the content eg "|" pipe character.

= Table.AddColumn(Source, "File Content", each Csv.Document([Content],[Delimiter="|", Columns=1, Encoding=1252, QuoteStyle=QuoteStyle.None]))

Hadn't thought to simply ignore the option. Seems obvious in hindsight, thanks!

The next parts is where creativity is required.

The title, subtitle and csv header rows all start with a consistent string which I can rely on to assign unique key in custom column.

Will try both the simple and advanced suggestions.

Daryl-Lynch-Bzy · ‎02-09-2022

Hi @009co
Starting with 1 csv file, you want to import is using Csv.Document function, but you need to ignore Split By Delimiter option. Just import all rows into 1 column. With this temporary data table needs to be split by Rows. This is where you need to be creative depending on the number of embedded parts. A simple approach is you use the Top Rows and Bottom Rows if it is a set number of rows.

An advanced option would be assign a key to each part, 1 for title string row , 2 for subtitle string row , 3 for 1st csv header row & 289 csv rows , 4 for 2nd csv header row & 289 csv rows, etc. Once the Key is assigned you can use a Group By function to "Count" rows, and then replace "List.Count([Data])" with just "[Data]" (i.e. remove the Count function). You should now have Key 1, 2, 3, 4 etc with a nested table containing the specific row.

Now you can start spliting the Single Column from each table, and promoting it respective headings.

The final the trick is figure out when to convert to Query function to read all CSV files, so you have a table containing:

Filename, Key, SplitDataTable

It is a bit hard to help because I am uncertain what your data model need to look like. Also you may want to consider using Dataflow to breakup the steps or possible at a incremental load of recent csv to previously processed files.

Parsing text file with multiple separated blocks of csv data

Helpful resources

New forum boards available in Real-Time Intelligence.

Power BI Monthly Update - May 2024

Jumpstart your career with the Fabric Career Hub

Parsing text file with multiple separated blocks of csv data

Helpful resources

New forum boards available in Real-Time Intelligence.

Power BI Monthly Update - May 2024