Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
woofwoof123
Frequent Visitor

Deleting Duplicate Rows

Hi, I want to get rid of duplicate rows from a table that meets the below condition.

 

Given the below table where ticker is the stock symbol, fund is the name of the pooled money funding, bought is how many shares of ticker, and person is the buyer:

woofwoof123_2-1715194993375.png

 


Requirement: I want to get rid of duplicate rows if the Ticker and Person column is the same other rows. Of the duplicate rows, I want to keep the row that has the highest qty/number in the Bought column. Example of duplicate rows are highlighted, where yellow row is to be deleted and green is to be kept as that row has the highest qty/number in the Bought column.

woofwoof123_5-1715195666385.png

 

 

Expected outcome is the below after removing the highlighted duplicates:

woofwoof123_6-1715195683535.png

 

 

3 ACCEPTED SOLUTIONS
v-jiewu-msft
Community Support
Community Support

Hi @woofwoof123 ,

First of all, many thanks to @ryan_mayu  for your very quick and effective replies.

Based on my testing, please try the following methods as workaround:

1.Create the simple table.

vjiewumsft_0-1715219079763.png

2.Create the index column in power query editor.

vjiewumsft_1-1715219125404.png

3.Create the calculated column to rank.

 

Rank = 
VAR _ticket = 'Table'[Ticket]
VAR _person = 'Table'[Person]
RETURN
RANKX(FILTER(ALL('Table'),'Table'[Ticket] = _ticket && 'Table'[Person] = _person),[Index],,ASC)

 

vjiewumsft_2-1715219158207.png

4.Create the calculated column to flag the column.

 

Flag = 
VAR _ticket = 'Table'[Ticket]
VAR _person = 'Table'[Person]
var _maxbought = MAXX(FILTER(ALL('Table'), 'Table'[Ticket] = _ticket && 'Table'[Person] = _person),[Bought])
var _count = COUNTX(FILTER(ALL('Table'),'Table'[Ticket] = _ticket && 'Table'[Person] = _person),[Ticket])
return
IF(
    _count = 1 && [Rank]=1,
    1,
    IF(
        _count > 1 && 'Table'[Bought] = _maxbought,1,0)
)

 

vjiewumsft_3-1715219189990.png

5.Create the new calculated table.

 

Table 2 = 
var _table1 = FILTER('Table',[Flag]=1)
return
SUMMARIZE(
    _table1,[Ticket],[Fund],[Bought],[Person])

 

6.The result is shown below.

vjiewumsft_4-1715219269791.png

 

Best Regards,

Wisdom Wu

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

you can also do that in PQ

 

let
Source = Table,
Custom1 = Table.AddColumn(Source,"Rank",each Table.RowCount(Table.SelectRows(Source,(x)=>x[Bought]>[Bought] and x[Ticket]=[Ticket] and x[Person]=[Person]))+1),
#"Filtered Rows" = Table.SelectRows(Custom1, each ([Rank] = 1)),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Rank"})
in
#"Removed Columns"

 

pls see the attachment below





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




View solution in original post

Hi,

This M code works

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fY8/D4IwFMS/S2eW/qHOlTJoaGKEjTDU2MiA1Fii4dv7WkKpMXG5N/xy9+7aFglxqlCGpL2BUgIirvqOuiyiQk+g2KO9vaSk7o15wOUsT6GS3mXfoISCHLWz4w/i3iNcP5g5sMK6afNhyuAo/VygLJN3O+5TpX6ZwA7n2B/TtL8qGxH7c+YD5TzoMYWLjzD+B+b8a0RTV1tsGChW34rChpxF1H0A", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Ticket = _t, Fund = _t, Bought = _t, Person = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Ticket", type text}, {"Fund", type text}, {"Bought", Int64.Type}, {"Person", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Ticket", "Person"}, {{"All", each Table.Max(_,"Bought")}}),
    #"Expanded All" = Table.ExpandRecordColumn(#"Grouped Rows", "All", {"Fund", "Bought"}, {"Fund", "Bought"})
in
    #"Expanded All"

Hope this helps.

Ashish_Mathur_0-1715224832945.png

 


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

View solution in original post

6 REPLIES 6
v-jiewu-msft
Community Support
Community Support

Hi @woofwoof123 ,

First of all, many thanks to @ryan_mayu  for your very quick and effective replies.

Based on my testing, please try the following methods as workaround:

1.Create the simple table.

vjiewumsft_0-1715219079763.png

2.Create the index column in power query editor.

vjiewumsft_1-1715219125404.png

3.Create the calculated column to rank.

 

Rank = 
VAR _ticket = 'Table'[Ticket]
VAR _person = 'Table'[Person]
RETURN
RANKX(FILTER(ALL('Table'),'Table'[Ticket] = _ticket && 'Table'[Person] = _person),[Index],,ASC)

 

vjiewumsft_2-1715219158207.png

4.Create the calculated column to flag the column.

 

Flag = 
VAR _ticket = 'Table'[Ticket]
VAR _person = 'Table'[Person]
var _maxbought = MAXX(FILTER(ALL('Table'), 'Table'[Ticket] = _ticket && 'Table'[Person] = _person),[Bought])
var _count = COUNTX(FILTER(ALL('Table'),'Table'[Ticket] = _ticket && 'Table'[Person] = _person),[Ticket])
return
IF(
    _count = 1 && [Rank]=1,
    1,
    IF(
        _count > 1 && 'Table'[Bought] = _maxbought,1,0)
)

 

vjiewumsft_3-1715219189990.png

5.Create the new calculated table.

 

Table 2 = 
var _table1 = FILTER('Table',[Flag]=1)
return
SUMMARIZE(
    _table1,[Ticket],[Fund],[Bought],[Person])

 

6.The result is shown below.

vjiewumsft_4-1715219269791.png

 

Best Regards,

Wisdom Wu

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Thanks Wisdom I will try this out

Hi,

This M code works

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fY8/D4IwFMS/S2eW/qHOlTJoaGKEjTDU2MiA1Fii4dv7WkKpMXG5N/xy9+7aFglxqlCGpL2BUgIirvqOuiyiQk+g2KO9vaSk7o15wOUsT6GS3mXfoISCHLWz4w/i3iNcP5g5sMK6afNhyuAo/VygLJN3O+5TpX6ZwA7n2B/TtL8qGxH7c+YD5TzoMYWLjzD+B+b8a0RTV1tsGChW34rChpxF1H0A", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Ticket = _t, Fund = _t, Bought = _t, Person = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Ticket", type text}, {"Fund", type text}, {"Bought", Int64.Type}, {"Person", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Ticket", "Person"}, {{"All", each Table.Max(_,"Bought")}}),
    #"Expanded All" = Table.ExpandRecordColumn(#"Grouped Rows", "All", {"Fund", "Bought"}, {"Fund", "Bought"})
in
    #"Expanded All"

Hope this helps.

Ashish_Mathur_0-1715224832945.png

 


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/
ryan_mayu
Super User
Super User

@woofwoof123 

why we don't delete the first row? the ticket and person are also the same as the second and third rows.





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




ah I missed that.. thanks for catching that

you can also do that in PQ

 

let
Source = Table,
Custom1 = Table.AddColumn(Source,"Rank",each Table.RowCount(Table.SelectRows(Source,(x)=>x[Bought]>[Bought] and x[Ticket]=[Ticket] and x[Person]=[Person]))+1),
#"Filtered Rows" = Table.SelectRows(Custom1, each ([Rank] = 1)),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Rank"})
in
#"Removed Columns"

 

pls see the attachment below





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.