Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hi, I have a number of reasonably large .csv files. Power BI is great at analysing them but only if I remove a number of rows of data (using Excel) from the top of each file before loading it in to Power BI. The reason I have to remove them is that these rows are effectively header info but they are in a number of different formats along each row and when Power BI attempts to parse the data it uses the first row to determine what it expects the rest of the data to be. This results in the actual data being parsed at the "wrong" comma and ending up useless, unless as I say I "pre-process" the data in Excel.
Is there any way (in Power BI) to strip out these header rows and only then parse the rows which compromise the rest of the data?
Thanks,
Tony
Solved! Go to Solution.
Hi Tom, Lydia,
Unfortunately neither of your suggestions fixed the problem I was experiencing but I have found a solution.
The .csv files I had been working with are generated as output from a portal and I had been trying to load them directly into PBI.
It seems that if I open the .csv in Excel and save it (without making any changes) I can then load it into PBI with no problem...
Not sure why this works, but it does...
Thanks for your suggestions,
Regards,
Tony
This is an old question but I never found a satisfactory answer elsewhere, so I'm sharing my solution to hopefully ease the pain for someone else.
The way I found to skip n number of rows is to use the advanced editor. This avoids having to mess about with any manual steps when oyu want to update the source csv:
let
Source = Lines.FromBinary(File.Contents("C:/path_to_file.csv")),
SkipN = List.Skip(Source,1), // Replace 1 with the number of rows you want to skip
ConvertToTable = Table.FromList(SkipN, Splitter.SplitTextByDelimiter(","), null, null, ExtraValues.Error), //Assumes the delimiter is a comma
#"Promoted Headers" = Table.PromoteHeaders(ConvertToTable, [PromoteAllScalars=true])
in
#"Promoted Headers"
Then select all your columns and run "detect data type" to simulate how PowerBI would detect data columns during the usual data connection process.
Hey Tony,
there is a way in Power Query
I guess this is what your are looking for
Regards
Hi Tom,
Thanks for the suggestion. Unfortunately that doesn't do the trick because the rest of the rows have already been parsed by Power BI based on what it detected in the first row.. .
The challenge is that the "real data" is about 40 columns whereas the "header" is only about 6 columns so Power BI parses based on 6 columns meaning the the real data is wrapped and displaced...
I really need some way of jumping over the 1st "n" rows before the data is parsed.
Cheers,
Tony
@tonygd,
After you import data from CSV to Power BI Desktop, in Query Editor, remove all steps except “Source” step in “APPLIED STEPS”, then use “Remove Top Rows” feature to remove rows.
If the above step doesn’t help, please share us sample CSV file and post expected result.
Regards,
Lydia
Hi Tom, Lydia,
Unfortunately neither of your suggestions fixed the problem I was experiencing but I have found a solution.
The .csv files I had been working with are generated as output from a portal and I had been trying to load them directly into PBI.
It seems that if I open the .csv in Excel and save it (without making any changes) I can then load it into PBI with no problem...
Not sure why this works, but it does...
Thanks for your suggestions,
Regards,
Tony
@tonygd,
Glad to hear the issue is solved, you can mark your reply as answer. That way, other community members would easily find the answer when they get same issues.
Regards,
Lydia
Hey,
not sure what you mean by displaced. I created a sample file with
20 rows of garbage at beginning (3 columns)
10000 empty rows
10 valuable rows with 40 columns
The table contains 40 columns, of course the 10021st row has to be promoted as header and you also have to manually change the datatype of the column, but this has to be done just once and it's always a good idea to check the datatype.
Maybe you have to check the option "Data Type Detection" (this was not necessary during my testing)
Cheers