Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
tonygd
Regular Visitor

Skip rows in CSV before it is parsed by Power BI

Hi, I have a number of reasonably large .csv files. Power BI is great at analysing them but only if I remove a number of rows of data (using Excel) from the top of each file before loading it in to Power BI. The reason I have to remove them is that these rows are effectively header info but they are in a number of different formats along each row and when Power BI attempts to parse the data it uses the first row to determine what it expects the rest of the data to be. This results in the actual data being parsed at the "wrong" comma and ending up useless, unless as I say I "pre-process" the data in Excel.

 

Is there any way (in Power BI) to strip out these header rows and only then parse the rows which compromise the rest of the data?

 

Thanks,

Tony

1 ACCEPTED SOLUTION

Hi Tom, Lydia,

 

Unfortunately neither of your suggestions fixed the problem I was experiencing but I have found a solution.

 

The .csv files I had been working with are generated as output from a portal and I had been trying to load them directly into PBI.

 

It seems that if I open the .csv in Excel and save it (without making any changes) I can then load it into PBI with no problem...

 

Not sure why this works, but it does...

 

Thanks for your suggestions,

Regards,

Tony

View solution in original post

8 REPLIES 8
neilajr
Regular Visitor

This is an old question but I never found a satisfactory answer elsewhere, so I'm sharing my solution to hopefully ease the pain for someone else.

 

The way I found to skip n number of rows is to use the advanced editor.  This avoids having to mess about with any manual steps when oyu want to update the source csv:

let    
    Source = Lines.FromBinary(File.Contents("C:/path_to_file.csv")),
    SkipN = List.Skip(Source,1), // Replace 1 with the number of rows you want to skip
    ConvertToTable = Table.FromList(SkipN, Splitter.SplitTextByDelimiter(","), null, null, ExtraValues.Error), //Assumes the delimiter is a comma
    #"Promoted Headers" = Table.PromoteHeaders(ConvertToTable, [PromoteAllScalars=true])
in
    #"Promoted Headers"

Then select all your columns and run "detect data type" to simulate how PowerBI would detect data columns during the usual data connection process.

TomMartens
Super User
Super User

Hey Tony,

 

there is a way in Power Query

  1. Import the file
  2. Mark the complete table and choose "Remove Top Rows" from the context menu

How to skip Top Rows.png

 

I guess this is what your are looking for

 

Regards



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

Hi Tom,

 

Thanks for the suggestion. Unfortunately that doesn't do the trick because the rest of the rows have already been parsed by Power BI based on what it detected in the first row..  Smiley Frustrated .

 

The challenge is that the "real data" is about 40 columns whereas the "header" is only about 6 columns so Power BI parses based on 6 columns meaning the the real data is wrapped and displaced...

 

I really need some way of jumping over the 1st "n" rows before the data is parsed.

 

Cheers,

Tony                                                    

@tonygd,

After you import data from CSV to Power BI Desktop, in Query Editor, remove all steps except “Source” step in “APPLIED STEPS”, then use “Remove Top Rows” feature to remove rows.
1.JPG

If the above step doesn’t help, please share us sample CSV file and post expected result.

Regards,
Lydia

Community Support Team _ Lydia Zhang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi Tom, Lydia,

 

Unfortunately neither of your suggestions fixed the problem I was experiencing but I have found a solution.

 

The .csv files I had been working with are generated as output from a portal and I had been trying to load them directly into PBI.

 

It seems that if I open the .csv in Excel and save it (without making any changes) I can then load it into PBI with no problem...

 

Not sure why this works, but it does...

 

Thanks for your suggestions,

Regards,

Tony

Jimmy801
Community Champion
Community Champion

Hello

Still have the issue?
Would like to help you.

Br

@tonygd,

Glad to  hear the issue is solved,  you can mark your reply as answer. That way, other community members would easily find the answer when they get same issues.

Regards,
Lydia

Community Support Team _ Lydia Zhang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hey,

 

not sure what you mean by displaced. I created a sample file with

20 rows of garbage at beginning (3 columns)

10000 empty rows

10 valuable rows with 40 columns

 

The table contains 40 columns, of course the 10021st row has to be promoted as header and you also have to manually change the datatype of the column, but this has to be done just once and it's always a good idea to check the datatype.

 

Maybe you have to check the option "Data Type Detection" (this was not necessary during my testing)

Skip Rows - Configure Import - 02.png

 

Cheers



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors