cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Frequent Visitor

Adding New Column - Identification of Distinct or Duplicate records

Hello,

Dear All, I am kindly asking you for help
with getting “NEW Column” which identifies Distinct and duplicates records based on ID.

 

Please see more details below.

IDCountryValue"NEW Column"
CDGermany50Distinct
FWFinland100Distinct
FWFinland200Duplicates
OPUSA2Distinct
OPUSA5Duplicates
OPUSA9Duplicates
FUUK2000Distinct
FUUK2000Duplicates

 

 

Best regards,

1 ACCEPTED SOLUTION
Super User IV
Super User IV

The FU , UK record is same, so break the tie a create a new index column and create a new Column like given below

New column cal = if(COUNTX(FILTER(Sheet1,Sheet1[ID]=EARLIER(Sheet1[ID]) && Sheet1[Country]=EARLIER(Sheet1[Country]) && Sheet1[Value]<=EARLIER(Sheet1[Value]) && Sheet1[Index]<=EARLIER(Sheet1[Index])),Sheet1[ID])=1,"Distinct","Duplicate")

 

Find the solution at: https://www.dropbox.com/s/04bm2kh5qv0quhh/DistDup.pbix?dl=0

 

Appreciate your Kudos. In case, this is the solution you are looking for, mark it as the Solution.
In case it does not help, please provide additional information and mark me with @

Thanks. My Recent Blog -Decoding Direct Query - Time Intelligence, Winner Coloring on MAP, HR Analytics, Power BI Working with Non-Standard TimeAnd Comparing Data Across Date Ranges
Connect on Linkedin



Did I answer your question? Mark my post as a solution! Appreciate your Kudos!!
Dashboard of My Blogs !! YouTube Channel !! Connect on Linkedin

Proud to be a Super User!

View solution in original post

4 REPLIES 4
Super User IV
Super User IV

The FU , UK record is same, so break the tie a create a new index column and create a new Column like given below

New column cal = if(COUNTX(FILTER(Sheet1,Sheet1[ID]=EARLIER(Sheet1[ID]) && Sheet1[Country]=EARLIER(Sheet1[Country]) && Sheet1[Value]<=EARLIER(Sheet1[Value]) && Sheet1[Index]<=EARLIER(Sheet1[Index])),Sheet1[ID])=1,"Distinct","Duplicate")

 

Find the solution at: https://www.dropbox.com/s/04bm2kh5qv0quhh/DistDup.pbix?dl=0

 

Appreciate your Kudos. In case, this is the solution you are looking for, mark it as the Solution.
In case it does not help, please provide additional information and mark me with @

Thanks. My Recent Blog -Decoding Direct Query - Time Intelligence, Winner Coloring on MAP, HR Analytics, Power BI Working with Non-Standard TimeAnd Comparing Data Across Date Ranges
Connect on Linkedin



Did I answer your question? Mark my post as a solution! Appreciate your Kudos!!
Dashboard of My Blogs !! YouTube Channel !! Connect on Linkedin

Proud to be a Super User!

View solution in original post

Memorable Member
Memorable Member

@Robert01  do you need a M or DAX solution

 

I have done it through M

 

 

 

let
    Source = Web.BrowserContents("https://community.powerbi.com/t5/Desktop/Adding-New-Column-Identification-of-Distinct-or-Duplicate/m-p/926039#M443799"),
    #"Extracted Table From Html" = Html.Table(Source, {{"Column1", "DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > TR > :nth-child(1), DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > * > TR > :nth-child(1)"}, {"Column2", "DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > TR > :nth-child(2), DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > * > TR > :nth-child(2)"}, {"Column3", "DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > TR > :nth-child(3), DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > * > TR > :nth-child(3)"}, {"Column4", "DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > TR > :nth-child(4), DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > * > TR > :nth-child(4)"}}, [RowSelector="DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > TR, DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > * > TR"]),
    #"Promoted Headers" = Table.PromoteHeaders(#"Extracted Table From Html", [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"ID", type text}, {"Country", type text}, {"Value", Int64.Type}, {"""NEW Column""", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"""NEW Column"""}),
    #"Added Index" = Table.AddIndexColumn(#"Removed Columns", "Index", 1, 1),
    #"Grouped Rows" = Table.Group(#"Added Index", {"ID", "Country"}, {{"ad", each _, type table [ID=text, Country=text, Value=number, Index=number]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each let
X=[ad],
#"Added Index1" = Table.AddIndexColumn(X, "Index.1", 1, 1),
    #"Added Custom" = Table.AddColumn(#"Added Index1", "Custom", each if[Index.1]=1 then "Unique" else "Duplicate"),
    #"Removed Columns1" = Table.RemoveColumns(#"Added Custom",{"Index", "Index.1"})
in
    #"Removed Columns1"),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Custom"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", {"ID", "Country", "Value", "Custom"}, {"ID", "Country", "Value", "Custom.1"})
in
    #"Expanded Custom"

 

 

 

A DAX solution is following - Calculated column

dews.PNG

 

Rand = RAND()
Rand+Value = [Value]+[Rand]
CAT = 
VAR _1 = RANKX(FILTER('DAX',EARLIER('DAX'[ID])='DAX'[ID]&&EARLIER('DAX'[Country])='DAX'[Country]),[Rand+Value],,ASC,Skip)
VAR _2 = IF(_1=1, "Unique","Duplicate")
RETURN _2
 

New Animated Dashboard: Sales Calendar


I tried to create only one column (CAT) in that way.
Only ID records idicate if the atribute is Unique or Duplicate. 

 

CAT =

VAR _1 = RANKX(FILTER('DAX',EARLIER('DAX'[ID])),ASC,Skip)

VAR _2 = IF(_1=1, "Unique","Duplicate")

RETURN _2

 

Unfortunately it does not work.
Could you please help?

@Robert01

if you want DAX to catch the duplicate IDs,

you need to tell DAX

 

to rank a numerical column (Values in this case) within each ID/ID_Country group (whatever you prefer)

 and

use a RAND helper column and associate that with value to break ties inRANKX since there are duplicate values


New Animated Dashboard: Sales Calendar


Helpful resources

Announcements
secondImage

Congratulations!

We are excited to announce the Power BI Super Users!

Microsoft Ignite

Microsoft Ignite with Arun Ulag

Featured Session Drive Data Culture with Power BI: Vision, Strategy & Roadmap. Register here https://myignite.microsoft.com #MSIgnite #PowerPlatform #Power BI ​

Microsoft Ignite

Microsoft Ignite

Join digitally, March 2–4, 2021 to explore new tech that's ready to implement. Experience the keynote in mixed reality through AltspaceVR!

secondImage

The largest Power BI virtual conference

100+ sessions, 100+ speakers, Product managers, MVPs, and experts. All about Power BI. Attend online or watch the recordings.

Top Solution Authors
Top Kudoed Authors