Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
Pragati11
Super User
Super User

Performance Issues with Group By Clause

Hello Everyone,

 

I am currently experiencing a performance issue on using a GROUP BY clause while transforming data in Power Query Editor in Power BI.

 

Scenario:

I have a very large table worth 50 million rows. (can't share data due to sensitivity)

I am working on creating a Dimension table for my data model by merging the source table say Project Table with a 2nd table say Team table; to extract team members information.

Now a single Project Id can result into multiple team members information - therefore I end up with multiple rows for a Project Id in my source table after the merge operation.

Now what I really need is - combine these multiple rows into a single row and get team members information in a single row that are separated by a delimiter; against a project Id.

I know I can do this by using Group BY with a combination of Text.Combine() function.

 

Challenge:

Now the challenge here is - Project table has nearly 50 million rows. After merge with Teams table, I end up with somewhere around 60 million rows. Applying a Group BY transformation on top of this is really creating a performance challenge for me and I am looking for an alternative way to do this transformation.

 

Questions:

  1. Is there an alternative way to carry out this GROUP BY transformation in an optimised way?
  2. Is creating this summarisation/grouping using DAX more optimised? (In my view it is not but looking to hear others's views)
  3. I am using a Dataflow to consume my data In Power BI and I am trying to do most of the transformations on the dataflow side in online Power Query Editor. But have limitations as the client doesn't have Premium licensing at their end. What is the best approach here then as I end up creating few computed entities which are only supported in power BI Premium?

I am looking forward to some suggestion on best approaches here and to get some tips around the solution implementation.

@parry2k @edhans @Greg_Deckler @MFelix @mwegener @nickyvv @amitchandak 

 

Thanks In advance! Let me know if more details are needed here.

 

Best Regards,

Pragati Jain


MVP logo


LinkedIn | Twitter | Blog YouTube 

Did I answer your question? Mark my post as a solution! This will help others on the forum!

Appreciate your Kudos!!

Proud to be a Super User!!

1 ACCEPTED SOLUTION
parry2k
Super User
Super User

@Pragati11 groupby will be slow, I believe DAX will be much faster, Go ahead and try it. 



Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!

Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo

If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤


Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.

View solution in original post

10 REPLIES 10
parry2k
Super User
Super User

@Pragati11 groupby will be slow, I believe DAX will be much faster, Go ahead and try it. 



Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!

Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo

If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤


Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.

HI @parry2k ,

 

Sure let me try this and get back to you.

 

Best Regards,

Pragati Jain


MVP logo


LinkedIn | Twitter | Blog YouTube 

Did I answer your question? Mark my post as a solution! This will help others on the forum!

Appreciate your Kudos!!

Proud to be a Super User!!

@Pragati11 What type is your original data source?
Can you already do the grouping on the team table before the join?

What are the cases in which you cannot avoid the computed entites?

 

Did I answer your question?
Please mark my post as solution, this will also help others.
Please give Kudos for support.

Marcus Wegener works as Full Stack Power BI Engineer at BI or DIE.
His mission is clear: "Get the most out of data, with Power BI."
twitter - LinkedIn - YouTube - website - podcast


Hi @mwegener ,

 

Yes I do need computed entities here as I have just included 2 tables in my question, but I do have other tables as well.

I have actually cracked the solution to avoid computed entities in the Dataflow. This can be easily done by making sure which ever tables you are using to get this computed entity, they need to be disabled on load. This has actually solved this issue for me.

Thanks so much for your response 🙂

 

Best Regards,

Pragati Jain


MVP logo


LinkedIn | Twitter | Blog YouTube 

Did I answer your question? Mark my post as a solution! This will help others on the forum!

Appreciate your Kudos!!

Proud to be a Super User!!

edhans
Super User
Super User

@Pragati11 I would still lean towards keeping this in Power Query. You can do the concatenation (TEXT.COMBINE) in the Group By function. I just did this yesterday for a client.

 

But - performance wasn't great. Added 30min to the transformation. So I moved this to a Dataflow, and let it to the heavy lifting. Still takes about 30min, but it is offloaded from Power BI and doesn't impact your development. If you have other heavy transformations, if you move them to dataflows as well, they can run in parallel, and when you load Power BI data, it is just a straight load of materalized tables. 


 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSlTSUQpSitWJVkoCsgLALJCYF1zMAy6GkPWHsxzh6lzhrFA4KwSuLhwu5oxmXiqQ5QSX9QazkuEmpwBZfnB1wXBZT7gs0H2xAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Group = _t, Code = _t]),
    #"Grouped Rows" = 
        Table.Group(
            Source, 
            {"Group"}, 
            {
                {
                    "All Rows", 
                    each Text.Combine(_[Code], ", "), 
                    type table [Group=nullable text, Code=nullable text]
                }
            }
        )
in
    #"Grouped Rows"

 

 

That code converts this:

edhans_0-1655223804499.png

 

to this in one step.

edhans_1-1655223837470.png

Given the amount of data you are talking about, pushing more back to the source can make the model perform better since the vertipaq engine can optimize the compression if the data needs to be in a column. If it just needs to be in a measure, then yeah, CONCATENATEX() will work - but you will have a lot more records as you didn't group before loading.

 

So.... it depends. 😁

 

How to use M code provided in a blank query:
1) In Power Query, select New Source, then Blank Query
2) On the Home ribbon, select "Advanced Editor" button
3) Remove everything you see, then paste the M code I've given you in that box.
4) Press Done
5) See this article if you need help using this M code in your model.

 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Hi @edhans ,

 

thank you so much for getting back with this detailed approach.

I did actually try Text.Combine() m-code function but it was in a different way. Let me try using the m-code that you have shared above. I will let you know whether it works for me or not.

 

Best Regards,

Pragati Jain


MVP logo


LinkedIn | Twitter | Blog YouTube 

Did I answer your question? Mark my post as a solution! This will help others on the forum!

Appreciate your Kudos!!

Proud to be a Super User!!

parry2k
Super User
Super User

@Pragati11 If that is the case just use CONCATENATEX DAX function instead of doing group by in PQ



Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!

Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo

If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤


Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.

HI @parry2k ,

 

Does using DAX is good from performance point of view when compared to creating summarisation in Power Query editor?

 

Best Regards,

Pragati Jain


MVP logo


LinkedIn | Twitter | Blog YouTube 

Did I answer your question? Mark my post as a solution! This will help others on the forum!

Appreciate your Kudos!!

Proud to be a Super User!!

parry2k
Super User
Super User

@Pragati11 Thanks for tagging. Just to clarify, you raw data contains two tables:

 

1 - Project Table with unique projects

2 - Team table with multiple team members for each projects 

 

and above two tables can be joined on ProjectId which will be 1 to many relationship, correct?

 

And you want the output to show the Team members of each project separated by a comma, correct?

 

Please confirm when you get a chance. 



Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!

Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo

If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤


Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.

Hi @parry2k ,

 

Thanks for responding so quickly.

Yes you are right in understanding my scenario.

 

Best Regards,

Pragati Jain


MVP logo


LinkedIn | Twitter | Blog YouTube 

Did I answer your question? Mark my post as a solution! This will help others on the forum!

Appreciate your Kudos!!

Proud to be a Super User!!

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.