Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

Adding New Column - Identification of Distinct or Duplicate records

Hello,

Dear All, I am kindly asking you for help
with getting “NEW Column” which identifies Distinct and duplicates records based on ID.

 

Please see more details below.

IDCountryValue"NEW Column"
CDGermany50Distinct
FWFinland100Distinct
FWFinland200Duplicates
OPUSA2Distinct
OPUSA5Duplicates
OPUSA9Duplicates
FUUK2000Distinct
FUUK2000Duplicates

 

 

Best regards,

1 ACCEPTED SOLUTION
amitchandak
Super User
Super User

The FU , UK record is same, so break the tie a create a new index column and create a new Column like given below

New column cal = if(COUNTX(FILTER(Sheet1,Sheet1[ID]=EARLIER(Sheet1[ID]) && Sheet1[Country]=EARLIER(Sheet1[Country]) && Sheet1[Value]<=EARLIER(Sheet1[Value]) && Sheet1[Index]<=EARLIER(Sheet1[Index])),Sheet1[ID])=1,"Distinct","Duplicate")

 

Find the solution at: https://www.dropbox.com/s/04bm2kh5qv0quhh/DistDup.pbix?dl=0

 

Appreciate your Kudos. In case, this is the solution you are looking for, mark it as the Solution.
In case it does not help, please provide additional information and mark me with @

Thanks. My Recent Blog -Decoding Direct Query - Time Intelligence, Winner Coloring on MAP, HR Analytics, Power BI Working with Non-Standard TimeAnd Comparing Data Across Date Ranges
Connect on Linkedin

View solution in original post

4 REPLIES 4
amitchandak
Super User
Super User

The FU , UK record is same, so break the tie a create a new index column and create a new Column like given below

New column cal = if(COUNTX(FILTER(Sheet1,Sheet1[ID]=EARLIER(Sheet1[ID]) && Sheet1[Country]=EARLIER(Sheet1[Country]) && Sheet1[Value]<=EARLIER(Sheet1[Value]) && Sheet1[Index]<=EARLIER(Sheet1[Index])),Sheet1[ID])=1,"Distinct","Duplicate")

 

Find the solution at: https://www.dropbox.com/s/04bm2kh5qv0quhh/DistDup.pbix?dl=0

 

Appreciate your Kudos. In case, this is the solution you are looking for, mark it as the Solution.
In case it does not help, please provide additional information and mark me with @

Thanks. My Recent Blog -Decoding Direct Query - Time Intelligence, Winner Coloring on MAP, HR Analytics, Power BI Working with Non-Standard TimeAnd Comparing Data Across Date Ranges
Connect on Linkedin

smpa01
Super User
Super User

@Anonymous  do you need a M or DAX solution

 

I have done it through M

 

 

 

let
    Source = Web.BrowserContents("https://community.powerbi.com/t5/Desktop/Adding-New-Column-Identification-of-Distinct-or-Duplicate/m-p/926039#M443799"),
    #"Extracted Table From Html" = Html.Table(Source, {{"Column1", "DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > TR > :nth-child(1), DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > * > TR > :nth-child(1)"}, {"Column2", "DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > TR > :nth-child(2), DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > * > TR > :nth-child(2)"}, {"Column3", "DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > TR > :nth-child(3), DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > * > TR > :nth-child(3)"}, {"Column4", "DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > TR > :nth-child(4), DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > * > TR > :nth-child(4)"}}, [RowSelector="DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > TR, DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > * > TR"]),
    #"Promoted Headers" = Table.PromoteHeaders(#"Extracted Table From Html", [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"ID", type text}, {"Country", type text}, {"Value", Int64.Type}, {"""NEW Column""", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"""NEW Column"""}),
    #"Added Index" = Table.AddIndexColumn(#"Removed Columns", "Index", 1, 1),
    #"Grouped Rows" = Table.Group(#"Added Index", {"ID", "Country"}, {{"ad", each _, type table [ID=text, Country=text, Value=number, Index=number]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each let
X=[ad],
#"Added Index1" = Table.AddIndexColumn(X, "Index.1", 1, 1),
    #"Added Custom" = Table.AddColumn(#"Added Index1", "Custom", each if[Index.1]=1 then "Unique" else "Duplicate"),
    #"Removed Columns1" = Table.RemoveColumns(#"Added Custom",{"Index", "Index.1"})
in
    #"Removed Columns1"),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Custom"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", {"ID", "Country", "Value", "Custom"}, {"ID", "Country", "Value", "Custom.1"})
in
    #"Expanded Custom"

 

 

 

A DAX solution is following - Calculated column

dews.PNG

 

Rand = RAND()
Rand+Value = [Value]+[Rand]
CAT = 
VAR _1 = RANKX(FILTER('DAX',EARLIER('DAX'[ID])='DAX'[ID]&&EARLIER('DAX'[Country])='DAX'[Country]),[Rand+Value],,ASC,Skip)
VAR _2 = IF(_1=1, "Unique","Duplicate")
RETURN _2
 
Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs
Anonymous
Not applicable

I tried to create only one column (CAT) in that way.
Only ID records idicate if the atribute is Unique or Duplicate. 

 

CAT =

VAR _1 = RANKX(FILTER('DAX',EARLIER('DAX'[ID])),ASC,Skip)

VAR _2 = IF(_1=1, "Unique","Duplicate")

RETURN _2

 

Unfortunately it does not work.
Could you please help?

@Anonymous

if you want DAX to catch the duplicate IDs,

you need to tell DAX

 

to rank a numerical column (Values in this case) within each ID/ID_Country group (whatever you prefer)

 and

use a RAND helper column and associate that with value to break ties inRANKX since there are duplicate values

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.