Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
mgartell
Regular Visitor

Power Query - Csv.Document - Select specific columns from csv file

Hi!

I need help with the Csv.Document function from Power Query. I would like to import only specific columns from a CSV file, using the Column argument. However, when using this argument, I always get the first N columns, regardless of the column names I write in this argument (a list of columns, to be more precise). I want to use this in order to reduce the size of the data loaded, as I import dozens of files with the same column structure and some of these columns have no relevant information.

 

Any help would be appreciated!

1 ACCEPTED SOLUTION

Typically, when loading a CSV, it looks like this before promoting headers:

AlexisOlson_0-1656005628174.png

Or like this after promoting headers:

AlexisOlson_1-1656005664863.png

 

If you use the Columns argument in Csv.Document, it interprets a list of column names not as which columns to select but what the column names are (useful if you don't have headers in the file) and assumes you are listing them in order from left to right.

 

Columns: Can be null, the number of columns, a list of column names, or a table type. If the number of columns is lower than the number found in the input, the additional columns will be ignored. If the number of columns is higher than the number found in the input, the additional columns will be null. When not specified, the number of columns will be determined by what is found in the input.

I'd recommend not using this argument but rather selecting the columns you want after promoting headers. You can use the Choose Columns button for this and the query should end up looking like this:

let
    Source = Csv.Document(File.Contents("C:\Users\aolson\Downloads\Sample.csv")),
    #"Promoted Headers" = Table.PromoteHeaders(Source),
    #"Removed Other Columns" = Table.SelectColumns(#"Promoted Headers", {"CO_NCM", "CO_UNID", "CO_PPI"})
in
    #"Removed Other Columns"

Use whatever list of column names you prefer instead of the three I have in my example above.

View solution in original post

5 REPLIES 5
rohit_singh
Solution Sage
Solution Sage

Hi @mgartell ,

Could you please share a screenshot of how you're using the CSV.Document function? I am using the column parameter for my sample file that has 4 columns, and passing only two columns as a list of values. It seems to give me the correct result.

rohit_singh_0-1655818170573.png

 

Kind regards,

Rohit


Please mark this answer as the solution if it resolves your issue.
Appreciate your kudos! 🙂

I am sorry @rohit_singh, I wasn't clear in my question. I have a dataset with 20+ columns and I want to import (let's say) 12 columns total . These specific columns that I want to import aren't the first 12 columns (e.g., I want to retrieve the first 10 columns and the last two columns). However, whenever I use the Columns argument in the Csv.Document function, I always get the first 12 columns, regardless if I use the specific names from the columns in the original csv.

Having said that, I believe I have misinterpreted the Columns' argument logic. I thought it could select the columns to import, but I think it only renames the columns based on the number of itens in the list of names (as in my example).

Below, there are the screenshots (I edited to avoid from exporting personal information). In the power query, I wanted to get the "CO_NCM" and "CO_SH6" columns (which are the first and third columns in the original csv file). However, the function exports the first two columns.

Excel_Example.PNGPower_Query_Example.PNG




Hi @mgartell ,

I think something is not right here. You have set the column parameters as "CO_NCM" and "CO_SH6". 
Power Query correctly pulls in the columns with exactly those names.

rohit_singh_0-1655821483227.png


Looking at I believe that there is a mismatch between the header names in excel and what's being pulled into power query. Could you check the delimiter you're using to import the file?

Kind regards,

Rohit

Hi, @rohit_singh. I thank you for the fast and helpful answers. I believe I am using the correct delimiter (which is semicolon), as when I try another one (such as comma) it entirely messes up the data. I have tried this on another dataset and the result is the same: it always imports the first N columns, regardless of the columns names I've listed. You can see in the Excel Screenshot that the values of the column "CO_SH6" are not the ones that are imported in the Power Query. I don't know how to proceed with this.

Typically, when loading a CSV, it looks like this before promoting headers:

AlexisOlson_0-1656005628174.png

Or like this after promoting headers:

AlexisOlson_1-1656005664863.png

 

If you use the Columns argument in Csv.Document, it interprets a list of column names not as which columns to select but what the column names are (useful if you don't have headers in the file) and assumes you are listing them in order from left to right.

 

Columns: Can be null, the number of columns, a list of column names, or a table type. If the number of columns is lower than the number found in the input, the additional columns will be ignored. If the number of columns is higher than the number found in the input, the additional columns will be null. When not specified, the number of columns will be determined by what is found in the input.

I'd recommend not using this argument but rather selecting the columns you want after promoting headers. You can use the Choose Columns button for this and the query should end up looking like this:

let
    Source = Csv.Document(File.Contents("C:\Users\aolson\Downloads\Sample.csv")),
    #"Promoted Headers" = Table.PromoteHeaders(Source),
    #"Removed Other Columns" = Table.SelectColumns(#"Promoted Headers", {"CO_NCM", "CO_UNID", "CO_PPI"})
in
    #"Removed Other Columns"

Use whatever list of column names you prefer instead of the three I have in my example above.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors