cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
maasool
Helper I
Helper I

Filter csv file before importing data

Hello

I have a very large csv file, which I like to import with Power Query.


However, I'd like to import only selected data from the file (not import everything and filter it afterward).

For example, I'd like to import only data, where column YEAR in the csv file has a value of 2020. 

Is it possible?

4 REPLIES 4
AlexisOlson
Super User
Super User

A CSV is a flat file format where there isn't a good way to filter it without scanning it, so you have to at least scan the whole file even if you don't actually load the whole file. If the file is too big for Power Query to scan in a reasonable amount of time, you'll need to pre-process it or filter it with another tool (like Python as @Waldo35 mentioned) that can scan more efficiently.

Is it still a csv file, when it is accessed through a https://... link? Or is it possible to pre-filter such connections?

https://data.gov.lv/dati/dataset/8d31b878-536a-44aa-a013-8bc6b669d477/resource/50ef4f26-f410-4007-b2...

It depends on the link. If it's just pointing to a file, then you've got to download it before filtering. You can only filter at the source if the source has some sort of data query engine that can do filtering.

 

Often data sources will provide more granular access via an API so that you can pass filters via URL. I think this might work for the resource you linked to. Clicking on the green Data API button on this page gives these suggestions:

AlexisOlson_0-1666795873774.png

Waldo35
Helper I
Helper I

Power Query usually is that prestep where u filter and transform data beforehand so that the dashboard itself is quick even tho it takes a while to load the data.

 

You could try loading it with Python code and using pandas to filter out the csv.

Would look something like this:

 

import pandas as pd
df = pd.read_csv ('file_name.csv')
df= df[df['YEAR'] == 2020]

 

For more info

pandas df docs 

Helpful resources

Announcements
Carousel_PBI_Wave1

2023 Release Wave 1 Plans

Power BI release plans for 2023 release wave 1 describes all new features releasing from April 2023 through September 2023.

Power BI Summit Carousel 2

Global Power BI Training

Make sure you register today for the Power BI Summit 2023. Don't miss all of the great sessions and speakers!

BizApps LATAM 2023

Business Application LATAM Summit 2023

Join the biggest FREE Business Applications Event in LATAM this February.

Power Platform Bootcamp

Global Power Platform Bootcamp

In this bootcamp we will deep-dive into Microsoft’s Power Platform stack with hands-on sessions and labs, delivered to you by experts and community leaders.

Top Solution Authors
Top Kudoed Authors