Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Ismail94
Frequent Visitor

Group By in power query taking too long time.

I have about 10 million rows and trying to groupby my data in Power BI by several columns. I've tried to do this with Power Query editor with "Group By" button but the performance is taking about 15-20 minuts. Then to boost the performance I've tried Python Pandas groupby function but the result is showing only VALUES. 

 So my question is, is it possible to boost the performance of power query? Or how can i workout with python? Thanks in advance.
My script is: data = data.groupby(['date', 'client ', 'product'] ).sum()

Input Data

 

date       client   product qty   total amount  
07-05-2022 Client-1 Phone   5     5000          
07-05-2022 Client-2 PC      8     8500          
07-05-2022 Client-2 PC      8     8500

 

 

Output Data

 

qty    total amount 
5      5000         
16     17000

 

 

 

 

 

 

 

3 REPLIES 3
Ismail94
Frequent Visitor

Thanks for your response @Vijay_A_Verma  @v-jingzhang . I appreciate it.  The source was from pc memory, csv.file. As you said i've tried to pre-summarize(Group.By) my data with Python script before loading it to Power BI. It is working fine and it reduced time from 25 minuts to 3-4. 

v-jingzhang
Community Support
Community Support

Hi @Ismail94 

 

What is the data source? If the data source is a relational database or other type of database, doing the pre-summarizing operation at the data source side will be more effective. This will greatly reduce the data size that is loaded into the report and reduce the performance load on your computer. 

 

When you load all detailed data into Power BI and do the group-by with Power Query, all detailed data needs to be loaded into your computer's memory. The group-by process is executed by the Power Query M engine on your computer and will use the CPU and memory resources on your computer. 

 

If you pre-summarize the data in the database, it will use the database engine to do this group-by job. It will use the resources on the computer or server where the database is installed. When you query the summarized data with Power Query, the data size is smaller and the query speed is faster. You can use the pre-summarized data directly in your report or do further transformations based on that. 

 

Best Regards,
Community Support Team _ Jing
If this post helps, please Accept it as Solution to help other members find it.

Vijay_A_Verma
Super User
Super User

Best best would be load the data in SQL Server (if you don't have access to SQL Server instance - SQL Server Express is free and can be installed on Desktop). Then fetch the data from SQL Server into PQ and while fetching it from SQL Server, issue Group by SQL statement.

Group by in SQL Server is quite faster than native PQ group by. 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors