Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
neilh
Regular Visitor

Remove Duplicates Based on String

Hello Everyone,

 

I have a large excel sheet of users who were assigned training and I want to generate completion statistics for this in Power BI. Two courses were assigned, with a row for each course showing whether it was completed or not. For example:

BobCourse AComplete
BobCourse BIncomplete
TomCourse AComplete
TomCourse BComplete

How can I remove duplicates in Power Query so that if a user only completed one course, it keeps the one that wasn't completed? Currently when I remove duplicates based on the username, it keeps the rows marked Complete no matter how it's sorted. When doing this in Excel, I would put all the incomplete rows at the top, and removing duplicates would always keep the first entry. This doesn't seem to be the case in Power BI. I am new to this so forgive me if this is trivial. Any help would be great, thank you!

2 ACCEPTED SOLUTIONS

Hi @neilh 

 

You can get the expected result with two steps easily. First group by person name column and perform All Rows operation on the new column named as "All Data". 

vjingzhang_0-1653382139838.png

 

Then add fx to create a custom step with the following code. 

= Table.AggregateTableColumn(#"Grouped Rows", "All Data", {{"Status", List.Max, "Overall Status"}})

vjingzhang_1-1653382663044.png

 

My sample data is as below. "Status" is the column which marks "Complete" or "Incomplete" for every row. 

vjingzhang_2-1653382822882.png

 

Here is the full code. You can paste it into a blank query's Advanced Editor to check every step. 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcspPUtJRUggpSszMy8xLV3AE8ZzzcwtyUktSlWJ1MFU4gXieecnIakLycwmYgq7CCUOFc0ZRZjGmKWg2YarCcE8sAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, #"Course Name" = _t, Status = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}, {"Course Name", type text}, {"Status", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Name"}, {{"All Data", each _, type table [Name=nullable text, Course Name=nullable text, Status=nullable text]}}),
    #"Aggregated All Data" = Table.AggregateTableColumn(#"Grouped Rows", "All Data", {{"Status", List.Max, "Overall Status"}})
in
    #"Aggregated All Data"

 

Best Regards,
Community Support Team _ Jing
If this post helps, please Accept it as Solution to help other members find it.

View solution in original post

Hi @neilh 

 

Just need to modify the first step, switch to Advanced option and add a second new column Department with Max operation on Department. You can add multiple new columns with appropriate aggregations. 

vjingzhang_0-1653443248476.png

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcspPUtJRUggpSszMy8xLV3AE8ZzzcwtyUktSgWxHpVgdTFVOIJ5nXjK6upD8XLymOWFV5YRVlXNGUWYxpmkotjrjUInpPqDKWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, #"Course Name" = _t, Status = _t, Department = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}, {"Course Name", type text}, {"Status", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Name"}, {{"All Data", each _, type table [Name=nullable text, Course Name=nullable text, Status=nullable text, Department=nullable text]}, {"Department", each List.Max([Department]), type nullable text}}),
    #"Aggregated All Data" = Table.AggregateTableColumn(#"Grouped Rows", "All Data", {{"Status", List.Max, "Overall Status"}})
in
    #"Aggregated All Data"

 

Jing

View solution in original post

8 REPLIES 8
Vijay_A_Verma
Super User
Super User

First apply filter on last colum to keep only records which were not completed.

I think just applying the filter will solve your issue as there are only 2 records per user. If needed, select first column and remove duplicates. If duplicates are in 2nd column also for a user, then you will need to select first 2 columns and then remove duplictes.

Thank you for your reply! That definitely works for seeing only the users who didn't complete it. The thing is, I want to see users who did complete the training as well. If I apply a filter showing only records which were completed, it will include users who did not complete one of the two trainings.

I need to see the expected output for the given table before I can suggest the solution. 

I basically want to take a bunch of data that looks like this:

Bob Training A

 Complete

Bob Training B Incomplete
Tom Training A Complete
Tom Training B Complete
Chris Training A Incomplete
Chris Training B Incomplete

And turn it into this:

Bob Incomplete
Tom Complete
Chris Incomplete

Thank you!

Hi @neilh 

 

You can get the expected result with two steps easily. First group by person name column and perform All Rows operation on the new column named as "All Data". 

vjingzhang_0-1653382139838.png

 

Then add fx to create a custom step with the following code. 

= Table.AggregateTableColumn(#"Grouped Rows", "All Data", {{"Status", List.Max, "Overall Status"}})

vjingzhang_1-1653382663044.png

 

My sample data is as below. "Status" is the column which marks "Complete" or "Incomplete" for every row. 

vjingzhang_2-1653382822882.png

 

Here is the full code. You can paste it into a blank query's Advanced Editor to check every step. 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcspPUtJRUggpSszMy8xLV3AE8ZzzcwtyUktSlWJ1MFU4gXieecnIakLycwmYgq7CCUOFc0ZRZjGmKWg2YarCcE8sAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, #"Course Name" = _t, Status = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}, {"Course Name", type text}, {"Status", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Name"}, {{"All Data", each _, type table [Name=nullable text, Course Name=nullable text, Status=nullable text]}}),
    #"Aggregated All Data" = Table.AggregateTableColumn(#"Grouped Rows", "All Data", {{"Status", List.Max, "Overall Status"}})
in
    #"Aggregated All Data"

 

Best Regards,
Community Support Team _ Jing
If this post helps, please Accept it as Solution to help other members find it.

Awesome that worked, thank you! Something else I should have mentioned, what if I had additional columns I wanted to include, for example the user's department? 

Hi @neilh 

 

Just need to modify the first step, switch to Advanced option and add a second new column Department with Max operation on Department. You can add multiple new columns with appropriate aggregations. 

vjingzhang_0-1653443248476.png

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcspPUtJRUggpSszMy8xLV3AE8ZzzcwtyUktSgWxHpVgdTFVOIJ5nXjK6upD8XLymOWFV5YRVlXNGUWYxpmkotjrjUInpPqDKWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, #"Course Name" = _t, Status = _t, Department = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}, {"Course Name", type text}, {"Status", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Name"}, {{"All Data", each _, type table [Name=nullable text, Course Name=nullable text, Status=nullable text, Department=nullable text]}, {"Department", each List.Max([Department]), type nullable text}}),
    #"Aggregated All Data" = Table.AggregateTableColumn(#"Grouped Rows", "All Data", {{"Status", List.Max, "Overall Status"}})
in
    #"Aggregated All Data"

 

Jing

This is perfect, thank you so much!

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors