Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

How to Group By the distinct elements of two columns ?

I'm working on an ATP matches dataset, and I wanna create a Players table out of a Group By on the Matches table, but the players are split between a Winner and a Loser column, and I want a way to create that Players table by taking the list of distinct players in the two columns, and then adding calculated columns such as the number of matches they had, the win percentage, etc. 

 

Thanks in advance.

2 ACCEPTED SOLUTIONS
Johanno
Responsive Resident
Responsive Resident

Hi, I think your solution sounds good. You goal is to have a new PlayersTable and that's what you get? If I create a new table with:

PlayersTable = UNION(DISTINCT('ATP Dataset_2012-01 to 2017-07_Int_V4'[Player1]);DISTINCT('ATP Dataset_2012-01 to 2017-07_Int_V4'[Player2]))

I get a new table with all players, 991 at the moment. 

View solution in original post

Johanno
Responsive Resident
Responsive Resident

Another thought:

- On each row in your table you will be able to find a winner and a looser - we already have the winner so the looser is the other guy (a calculated column):

 

Looser = IF('ATP Dataset_2012-01 to 2017-07_Int_V4'[Player1]='ATP Dataset_2012-01 to 2017-07_Int_V4'[Winner];'ATP Dataset_2012-01 to 2017-07_Int_V4'[Player2];'ATP Dataset_2012-01 to 2017-07_Int_V4'[Player1])

 

- We still need a complete table with all the players, as before create a new table:

 

PlayersTable = SUMMARIZECOLUMNS('ATP Dataset_2012-01 to 2017-07_Int_V4'[Player1];"Int";MIN('ATP Dataset_2012-01 to 2017-07_Int_V4'[Player1_Int]);"Rank average";AVERAGE('ATP Dataset_2012-01 to 2017-07_Int_V4'[Player1_Rank]))

 

- We need a relationship between the two tables, so set an active relationship from PlayersTable to Winner and a passive to Looser

- Then in our fact table we can create a measure with win rate (for instance):

 

Win rate = COUNT('ATP Dataset_2012-01 to 2017-07_Int_V4'[Winner])/(COUNT('ATP Dataset_2012-01 to 2017-07_Int_V4'[Winner])+CALCULATE(COUNT('ATP Dataset_2012-01 to 2017-07_Int_V4'[Looser]);USERELATIONSHIP('ATP Dataset_2012-01 to 2017-07_Int_V4'[Looser];PlayersTable[Player1])))

 

- Create a visual table with player from PlayersTable and our new measure where you can add columns and slicers:

Skärmklipp.JPG

 

I havn't looked at rank, I guess you could create something similar based on the rank from the last match? Any thoughts? 🤨

 

View solution in original post

6 REPLIES 6
Johanno
Responsive Resident
Responsive Resident

Hi, do you have some sample data and can show the disered result?

Anonymous
Not applicable

@Johannohere's the dataset; so far I ended up going around the problem by creating a new table as the union of the distinct elements of Player1 and Player2 and adding calculated columns, but I'm wondering if there's any way to group a table by a union of two columns without taking the shortcut I took.

Johanno
Responsive Resident
Responsive Resident

Hi, I think your solution sounds good. You goal is to have a new PlayersTable and that's what you get? If I create a new table with:

PlayersTable = UNION(DISTINCT('ATP Dataset_2012-01 to 2017-07_Int_V4'[Player1]);DISTINCT('ATP Dataset_2012-01 to 2017-07_Int_V4'[Player2]))

I get a new table with all players, 991 at the moment. 

Anonymous
Not applicable

@Johanno  Thing is, when I use this solution, filters do not apply, for example even if I have a "players" and a "win percentage" column, I can't visualize who has the highest win ratio on clay for example because I don't see any way to link the two tables in a way that'd make the filter recognize the Players table as being related to Matches. Do you know how to resolve this issue by any chance ?

Johanno
Responsive Resident
Responsive Resident

You're right, I didn't think all the way. Guess the challenge is that data is grouped per match and we don't have one row per player and match. Tried to use Unpivot in Power Query to get around this but didn't succeed.

 

Another way to create a table with desired columns is:

PlayersTable = SUMMARIZECOLUMNS('ATP Dataset_2012-01 to 2017-07_Int_V4'[Player1];"Int";MIN('ATP Dataset_2012-01 to 2017-07_Int_V4'[Player1_Int]);"Rank average";AVERAGE('ATP Dataset_2012-01 to 2017-07_Int_V4'[Player1_Rank]))

but that only uses player1..

Johanno
Responsive Resident
Responsive Resident

Another thought:

- On each row in your table you will be able to find a winner and a looser - we already have the winner so the looser is the other guy (a calculated column):

 

Looser = IF('ATP Dataset_2012-01 to 2017-07_Int_V4'[Player1]='ATP Dataset_2012-01 to 2017-07_Int_V4'[Winner];'ATP Dataset_2012-01 to 2017-07_Int_V4'[Player2];'ATP Dataset_2012-01 to 2017-07_Int_V4'[Player1])

 

- We still need a complete table with all the players, as before create a new table:

 

PlayersTable = SUMMARIZECOLUMNS('ATP Dataset_2012-01 to 2017-07_Int_V4'[Player1];"Int";MIN('ATP Dataset_2012-01 to 2017-07_Int_V4'[Player1_Int]);"Rank average";AVERAGE('ATP Dataset_2012-01 to 2017-07_Int_V4'[Player1_Rank]))

 

- We need a relationship between the two tables, so set an active relationship from PlayersTable to Winner and a passive to Looser

- Then in our fact table we can create a measure with win rate (for instance):

 

Win rate = COUNT('ATP Dataset_2012-01 to 2017-07_Int_V4'[Winner])/(COUNT('ATP Dataset_2012-01 to 2017-07_Int_V4'[Winner])+CALCULATE(COUNT('ATP Dataset_2012-01 to 2017-07_Int_V4'[Looser]);USERELATIONSHIP('ATP Dataset_2012-01 to 2017-07_Int_V4'[Looser];PlayersTable[Player1])))

 

- Create a visual table with player from PlayersTable and our new measure where you can add columns and slicers:

Skärmklipp.JPG

 

I havn't looked at rank, I guess you could create something similar based on the rank from the last match? Any thoughts? 🤨

 

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.