Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
009co
Helper IV
Helper IV

Parsing text file with multiple separated blocks of csv data

I have a many text files that contains data I want to extract using Power Query.

 

The text files each contain multiple separated blocks of csv data. Generally the format is something like what follows:

 

title string row

subtitle string row

blank row

csv header row

289 csv rows

blank row

csv header row

289 csv rows

blank row

title string row

subtitle string row

blank row

csv header row

289 csv rows

blank row

csv header row

289 csv rows

blank row

title string row

subtitle string row

blank row

csv header row

289 csv rows

blank row

csv header row

289 csv rows

blank row

 

etc

 

Two title and subtitle rows contains structured information that needs to be captured which identifies the source of the block of csv content following it. This information should be added the csv block of data as additional new columns.

 

Any strategies, approaches or methodologies that can be recommended much appreciated!

 

1 ACCEPTED SOLUTION
Daryl-Lynch-Bzy
Resident Rockstar
Resident Rockstar

Hi @009co 
Starting with 1 csv file, you want to import is using Csv.Document function, but you need to ignore Split By Delimiter option.  Just import all rows into 1 column.  With this temporary data table needs to be split by Rows.  This is where you need to be creative depending on the number of embedded parts.  A simple approach is you use the Top Rows and Bottom Rows if it is a set number of rows. 

An advanced option would be assign a key to each part, 1 for title string row , 2 for subtitle string row , 3 for 1st csv header row & 289 csv rows , 4 for 2nd csv header row & 289 csv rows, etc.  Once the Key is assigned you can use a Group By function to "Count" rows, and then replace "List.Count([Data])" with just "[Data]" (i.e. remove the Count function).  You should now have Key 1, 2, 3, 4 etc with a nested table containing the specific row.

 

Now you can start spliting the Single Column from each table, and promoting it respective headings.

 

The final the trick is figure out when to convert to Query function to read all CSV files, so you have a table containing:

 

Filename, Key, SplitDataTable

 

It is a bit hard to help because I am uncertain what your data model need to look like.  Also you may want to consider using Dataflow to breakup the steps or possible at a incremental load of recent csv to previously processed files.

View solution in original post

3 REPLIES 3
009co
Helper IV
Helper IV

I ended up creating a few custom columns that look for specific strings in the data rows to label a row as a data row or not in manner similar to that described by @Daryl-Lynch-Bzy 

 

Then I used "Fill down" feature to populate subsequent rows with that label.

 

Then I could do some selective filtering to remove unneeded rows.

 

This was quite effective!

009co
Helper IV
Helper IV

Great thanks Daryl-Lynch-Bzy for the detailed and helpful reply!

 

Reading in data as Csv.Document as sinlge column as you describe is a good start. I was already bringing csv file contents into a single column by splitting on a character that I knew didn't exist in the content eg "|" pipe character. 

 

= Table.AddColumn(Source, "File Content", each Csv.Document([Content],[Delimiter="|", Columns=1, Encoding=1252, QuoteStyle=QuoteStyle.None]))

 

Hadn't thought to simply ignore the option. Seems obvious in hindsight, thanks!

 

The next parts is where creativity is required.

 

The title, subtitle and csv header rows all start with a consistent string which I can rely on to assign unique key in custom column.

 

Will try both the simple and advanced suggestions. 

 

 

 

Daryl-Lynch-Bzy
Resident Rockstar
Resident Rockstar

Hi @009co 
Starting with 1 csv file, you want to import is using Csv.Document function, but you need to ignore Split By Delimiter option.  Just import all rows into 1 column.  With this temporary data table needs to be split by Rows.  This is where you need to be creative depending on the number of embedded parts.  A simple approach is you use the Top Rows and Bottom Rows if it is a set number of rows. 

An advanced option would be assign a key to each part, 1 for title string row , 2 for subtitle string row , 3 for 1st csv header row & 289 csv rows , 4 for 2nd csv header row & 289 csv rows, etc.  Once the Key is assigned you can use a Group By function to "Count" rows, and then replace "List.Count([Data])" with just "[Data]" (i.e. remove the Count function).  You should now have Key 1, 2, 3, 4 etc with a nested table containing the specific row.

 

Now you can start spliting the Single Column from each table, and promoting it respective headings.

 

The final the trick is figure out when to convert to Query function to read all CSV files, so you have a table containing:

 

Filename, Key, SplitDataTable

 

It is a bit hard to help because I am uncertain what your data model need to look like.  Also you may want to consider using Dataflow to breakup the steps or possible at a incremental load of recent csv to previously processed files.

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.

Top Solution Authors
Top Kudoed Authors