Solved: How to load this file format from a folder

zap2zero · ‎03-18-2020

Hello,

I have a somewhat special CSV file which works fine to process as long as I only have 1 such file. Unfortunately I have to import several of such files since the exporting system can't handle all data at once.

The format is like this:

TYPE1
Header1.1,Header1.2,Header1.3
1,2,3
4,5,6

TYPE2
Header2.1,Header2.2,Header2.3,Header2.4
1,2,3,4
5,6,7,8

TYPE3
Header3.1,Header3.2
1,2
3,4

To get this split into 3 different tables, one for each data type, I simply add a conditional column like

= Table.AddColumn(Source, "Datatype", each if [Column1] = "TYPE1" then "Type 1" else if [Column1] = "TYPE2" then "Type 2" else if [Column1] = "TYPE3" then "Type 3" else if [Column1] = "" then "" else null

and then run fill down on that new column.

After that I create new tables with a filter like

= Table.SelectRows(Source, each ([Datatype] = "Type 1"))

This all works perfectly fine as long as I have one such file. As soon as I try this with a folder data source, I struggle with the headers in each of the files since in the one file version its always the second line where I can remove the first line and promote headers.

Here is an example how my data looks with the folder import:

TYPE2,,,,Type 2
Header2.1,Header2.2,Header2.3,Header2.4,Type 2
1,2,3,4,Type 2
5,6,7,8,Type 2
TYPE2,,,,Type 2
Header2.1,Header2.2,Header2.3,Header2.4,Type 2
1,2,3,4,Type 2
5,6,7,8,Type 2
TYPE2,,,,Type 2
Header2.1,Header2.2,Header2.3,Header2.4,Type 2
1,2,3,4,Type 2
5,6,7,8,Type 2

Any idea how I can get rid of these headers "in the middle" of my data?

ImkeF · ‎03-23-2020

Hi Zap,

if you import your files from the folder, you can apply this function to the Content-column in there:

// Transform File
(Parameter1 as binary) => let
        Source = Csv.Document(Parameter1,[Delimiter=",", Columns = 20, Encoding=65001, QuoteStyle=QuoteStyle.None]),
        #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
        #"Added Custom" = Table.AddColumn(#"Promoted Headers", "Custom", each if List.Contains({"GEWICHT", "TEMPERATUR", "Benutzerangaben", "Zeitraum", "Kategorie", "BLUTZUCKER", "BLUTDRUCK", "PULSOXY"}, [Datenexportdetails]) then [Datenexportdetails] else null),
        #"Filled Down" = Table.FillDown(#"Added Custom",{"Custom"})
    in
        #"Filled Down"

Key trick here is to add a column to all of your data that indicates which table/category the current row belongs to.

You can work from there to filter and split it out to your desired tables.

BTW: I've set the Colums-parameter in the Csv.Document-function to 20 to cater for more columns in the future. You can reduce it if that's not necessary.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

ImkeF · ‎03-19-2020

Hi @zap2zero ,

Not sure I fully understand what you're doing here. But if you're using the combine binaries action, then a section with functions and sample quieres will be created. In there is a query called "Transform Sample Binary from Combine Binaries" which contains a step that promotes the headers.

Delete that step (so that the headers are in the first row) and then delete this first row of data.

After that, the column names will all be the same and you can append the data.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

zap2zero · ‎03-19-2020

Hello Imke,

I am not sure how that would work. The problem is that my data source puts several CSV tables into one file, so my example above is in one file. Promoting headers is not possible since each section has different headers and I only can promote headers after I have the separate tables filtered by my custom column Datatype.

Zap.

ImkeF · ‎03-19-2020

Hi @zap2zero

starting to make sense to me know. You have to add a column that checks for the existence of a special keyword that is only contained in the headers. Return null if it is not found and then filter out those rows.

To check the entries from one row, you use the function "Record.FieldValues". It returns a list of all the values in the row.

So for example:

List.Contains( Record.FieldValues(_), "YourHeaderString")

would return true if a match of "YourHeaderString" is found.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

zap2zero · ‎03-20-2020

Hi @ImkeF,

thank you for your help!

As I understand your suggestion, this would filter out all my headers and I would have to name all columns manually later on.

Is there some trick to keep only the first header (when I filter by my Datatype column) and remove all others.

Maybe something like adding a conditional column for each header, setting it to 1 and another column which would sum this column left of it upwards? so I could keep all headers where the sum-column is 1 and remove all others?

Header1,1,1
data...
data...
Header2,1,2
data...
data...
Header3,1,3
data...
data...

I hope I could explain my idea well enough.

Zap.

ImkeF · ‎03-20-2020

Hi @zap2zero

can you promote the header that you want to keep first and then apply the further steps?

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

zap2zero · ‎03-20-2020

Hello @ImkeF,

unfortunately not since I have different headers for each datatype and I have to

1. Load all files

2. Filter the lines per datatype

3. promote first line of headers

4. then I could remove all remaining headers

Zap.

ImkeF · ‎03-22-2020

Sorry @zap2zero

but I cannot follow.

Unless you're creating some sample data (incl. files) I have no idea how to further help here.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

zap2zero · ‎03-23-2020

Hello @ImkeF,

here is an example file with data as I tried to sketch in my first post. Due to the sensitivity of the data I replaced all values with dummy values but I hope it gets clear how its structured and where my problem comes from.

This kind of file I get for every month and I'd like to process all these files in a folder:

Datenexportdetails
Benutzerangaben
Vorname,My Firstname
Nachname,My Lastname
Geburtsdatum,01.01.1990
Geschlecht,male
Körpergröße,189cm

Zeitraum
Start,01.03.2020
Ende,31.03.2020

Kategorie
Blutzucker,18
Pulsoxy,17
Blutdruck,18
Gewicht,19
Temperatur,20

TEMPERATUR
Datum,Uhrzeit,Temperatur,Kommentar,Medikament
18.03.2020,06:00,37.1 °C,,
17.03.2020,06:00,37.2 °C,,
16.03.2020,06:00,37.3 °C,,
15.03.2020,06:00,37.4 °C,,
14.03.2020,06:00,37.5 °C,,
13.03.2020,06:00,37.6 °C,,
12.03.2020,06:00,37.7 °C,,
11.03.2020,06:00,37.8 °C,,
10.03.2020,06:00,37.9 °C,,
09.03.2020,06:00,37.1 °C,,
08.03.2020,06:00,37.2 °C,,
07.03.2020,06:00,37.3 °C,,
06.03.2020,06:00,37.4 °C,,
05.03.2020,06:00,37.5 °C,,
05.03.2020,06:00,37.6 °C,,
04.03.2020,06:00,37.7 °C,,
04.03.2020,06:00,37.8 °C,,
03.03.2020,06:00,37.9 °C,,
02.03.2020,06:00,37.1 °C,,
01.03.2020,06:00,37.2 °C,,

BLUTZUCKER
Datum,Uhrzeit,Wert (mmol/L),HbA1c,Markierung,Kommentar,Medikament
18.03.2020,06:00,91.0,0.0,vor Mahlzeit,,
17.03.2020,06:00,92.0,0.0,vor Mahlzeit,,
16.03.2020,06:00,93.0,0.0,vor Mahlzeit,,
15.03.2020,06:00,94.0,0.0,vor Mahlzeit,,
14.03.2020,06:00,95.0,0.0,vor Mahlzeit,,
13.03.2020,06:00,96.0,0.0,vor Mahlzeit,,
12.03.2020,06:00,97.0,0.0,vor Mahlzeit,,
11.03.2020,06:00,98.0,0.0,vor Mahlzeit,,
10.03.2020,06:00,99.0,0.0,vor Mahlzeit,,
09.03.2020,06:00,91.0,0.0,vor Mahlzeit,,
08.03.2020,06:00,92.0,0.0,vor Mahlzeit,,
07.03.2020,06:00,93.0,0.0,vor Mahlzeit,,
06.03.2020,06:00,94.0,0.0,vor Mahlzeit,,
05.03.2020,06:00,95.0,0.0,vor Mahlzeit,,
04.03.2020,06:00,96.0,0.0,vor Mahlzeit,,
03.03.2020,06:00,97.0,0.0,vor Mahlzeit,,
02.03.2020,06:00,98.0,0.0,vor Mahlzeit,,
01.03.2020,06:00,99.0,0.0,vor Mahlzeit,,

PULSOXY
Datum,Startzeit,Endzeit,SpO2 (min),SpO2 (max),SpO2 Durchschnitt,Puls,Kommentar,Medikament
18.03.2020,06:00,06:05,95,97,96,60,,
17.03.2020,06:00,06:05,95,97,96,60,,
16.03.2020,06:00,06:05,95,97,96,60,,
15.03.2020,06:00,06:05,95,97,96,60,,
14.03.2020,06:00,06:05,95,97,96,60,,
13.03.2020,06:00,06:05,95,97,96,60,,
12.03.2020,06:00,06:05,95,97,96,60,,
11.03.2020,06:00,06:05,95,97,96,60,,
10.03.2020,06:00,06:05,95,97,96,60,,
09.03.2020,06:00,06:05,95,97,96,60,,
08.03.2020,06:00,06:05,95,97,96,60,,
07.03.2020,06:00,06:05,95,97,96,60,,
06.03.2020,06:00,06:05,95,97,96,60,,
05.03.2020,06:00,06:05,95,97,96,60,,
03.03.2020,06:00,06:05,95,97,96,60,,
02.03.2020,06:00,06:05,95,97,96,60,,
01.03.2020,06:00,06:05,95,97,96,60,,

BLUTDRUCK
Datum,Uhrzeit,Systole (mmHg),Diastole (mmHg),Puls (bpm),MAD,Herzrhythmusstörungen (IHB),Kommentar,Medikament
18.03.2020,06:00,130,65,60,80,false,,
17.03.2020,06:00,130,65,60,80,false,,
16.03.2020,06:00,130,65,60,80,false,,
15.03.2020,06:00,130,65,60,80,false,,
14.03.2020,06:00,130,65,60,80,false,,
13.03.2020,06:00,130,65,60,80,false,,
12.03.2020,06:00,130,65,60,80,false,,
11.03.2020,06:00,130,65,60,80,false,,
10.03.2020,06:00,130,65,60,80,false,,
09.03.2020,06:00,130,65,60,80,false,,
08.03.2020,06:00,130,65,60,80,false,,
07.03.2020,06:00,130,65,60,80,false,,
06.03.2020,06:00,130,65,60,80,false,,
05.03.2020,06:00,130,65,60,80,false,,
04.03.2020,06:00,130,65,60,80,false,,
03.03.2020,06:00,130,65,60,80,false,,
02.03.2020,06:00,130,65,60,80,false,,
01.03.2020,06:00,130,65,60,80,false,,

GEWICHT
Datum,Uhrzeit,Gewicht,Körperfett,Wasser,Muskeln,Knochen,Grundumsatz,Leistungsumsatz,Kommentar,Medikament
18.03.2020,06:00,"69,1","16,1","52,0","40,0","3,0",1500,3200,,
17.03.2020,06:00,"69,2","16,2","52,0","40,0","3,0",1500,3200,,
16.03.2020,06:00,"69,3","16,3","52,0","40,0","3,0",1500,3200,,
16.03.2020,06:00,"69,4","16,4","52,0","40,0","3,0",1500,3200,,
15.03.2020,06:00,"69,5","16,5","52,0","40,0","3,0",1500,3200,,
14.03.2020,06:00,"69,6","16,6","52,0","40,0","3,0",1500,3200,,
13.03.2020,06:00,"69,7","16,7","52,0","40,0","3,0",1500,3200,,
12.03.2020,06:00,"69,8","16,8","52,0","40,0","3,0",1500,3200,,
11.03.2020,06:00,"69,9","16,9","52,0","40,0","3,0",1500,3200,,
10.03.2020,06:00,"69,8","16,8","52,0","40,0","3,0",1500,3200,,
09.03.2020,06:00,"69,7","16,7!,"52,0","40,0","3,0",1500,3200,,
08.03.2020,06:00,"69,6","16,6","52,0","40,0","3,0",1500,3200,,
07.03.2020,06:00,"69,5","16,5","52,0","40,0","3,0",1500,3200,,
06.03.2020,06:00,"69,4","16,4","52,0","40,0","3,0",1500,3200,,
05.03.2020,06:00,"69,3","16,3","52,0","40,0","3,0",1500,3200,,
04.03.2020,06:00,"69,2","16,2","52,0","40,0","3,0",1500,3200,,
03.03.2020,06:00,"69,1","16,1","52,0","40,0","3,0",1500,3200,,
02.03.2020,06:00,"69,2","16,2","52,0","40,0","3,0",1500,3200,,
01.03.2020,06:00,"69,3","16,3","52,0","40,0","3,0",1500,3200,,

Zap.

ImkeF · ‎03-23-2020

Hi @zap2zero

thanks.

How do you expect the results to look like for multiple files?

Please create xlsx or csv that shows how the results for at least 2 different input files should look like.

Upload that file to a cloud storage (i.e. Onedrive, Dropbox...) and share the link here please.

Alternatively you can share your data as a HTML-table like I've described here: https://community.powerbi.com/t5/Community-Blog/How-to-provide-sample-data-in-the-Power-BI-Forum/ba-...

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

zap2zero · ‎03-23-2020

Hello @ImkeF.

thank you so much for your help on this issue!

Here is the link to a zip-file containing data for February and March and my expected result table for one of the sections (Gewicht). Of course, I'd like to have all sections in their respective table in the end but I guess thats not the difficult part here.

Zap.

ImkeF · ‎03-23-2020

Hi Zap,

if you import your files from the folder, you can apply this function to the Content-column in there:

// Transform File
(Parameter1 as binary) => let
        Source = Csv.Document(Parameter1,[Delimiter=",", Columns = 20, Encoding=65001, QuoteStyle=QuoteStyle.None]),
        #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
        #"Added Custom" = Table.AddColumn(#"Promoted Headers", "Custom", each if List.Contains({"GEWICHT", "TEMPERATUR", "Benutzerangaben", "Zeitraum", "Kategorie", "BLUTZUCKER", "BLUTDRUCK", "PULSOXY"}, [Datenexportdetails]) then [Datenexportdetails] else null),
        #"Filled Down" = Table.FillDown(#"Added Custom",{"Custom"})
    in
        #"Filled Down"

Key trick here is to add a column to all of your data that indicates which table/category the current row belongs to.

You can work from there to filter and split it out to your desired tables.

BTW: I've set the Colums-parameter in the Csv.Document-function to 20 to cater for more columns in the future. You can reduce it if that's not necessary.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

zap2zero · ‎03-23-2020

Hello@ImkeF,

one more question.

I dont have time to try this right away but from the code I don't see how the different headers of each section get handled. I assume they are lost and I'll have to set them manually later on?

Zap.

ImkeF · ‎03-24-2020

Hi @zap2zero

there are many ways to solve the headers, one would be this:

let
    Source = PBI_TestHoToLoadFromFolder,
    #"Filtered Rows" = Table.SelectRows(Source, each ([Custom] = "TEMPERATUR") and ([_1] <> "")),
    #"Promoted Headers" = Table.PromoteHeaders(#"Filtered Rows", [PromoteAllScalars=true]),
    #"Filtered Rows1" = Table.SelectRows(#"Promoted Headers", each ([Temperatur] <> "Temperatur")),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows1",{"Datum", "Uhrzeit", "Temperatur", "Kommentar", "Medikament"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Removed Other Columns",{{"Datum", type date}})
in
    #"Changed Type1"

This selects the section you want (i.e. "TEMPERATUR") and deletes all empty rows from the column with "Temperatur" (here [_1], because the headers aren't promoted yet). Then promote the first row to headers.
Last step is to filter out rows that have the header value (here "Temperatur") in them and you're left with the "good" values only.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

zap2zero · ‎03-25-2020

Good morning @ImkeF,

with your help I got it to work, thank you so much!

Zap.

amitchandak · ‎03-19-2020

@ImkeF , Can you help?

!! Power BI 101 Interview questions !! !! Master Microsoft Fabric- 36 Videos !!
Microsoft Power BI Learning Resources, 2023 !!
Learn Power BI - Full Course with Dec-2022, with Window, Index, Offset, 100+ Topics !!
Did I answer your question? Mark my post as a solution! Appreciate your Kudos !! Proud to be a Super User! !!

How to load this file format from a folder

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly