Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hello,
Dear All, I am kindly asking you for help
with getting “NEW Column” which identifies Distinct and duplicates records based on ID.
Please see more details below.
ID | Country | Value | "NEW Column" |
CD | Germany | 50 | Distinct |
FW | Finland | 100 | Distinct |
FW | Finland | 200 | Duplicates |
OP | USA | 2 | Distinct |
OP | USA | 5 | Duplicates |
OP | USA | 9 | Duplicates |
FU | UK | 2000 | Distinct |
FU | UK | 2000 | Duplicates |
Best regards,
Solved! Go to Solution.
The FU , UK record is same, so break the tie a create a new index column and create a new Column like given below
New column cal = if(COUNTX(FILTER(Sheet1,Sheet1[ID]=EARLIER(Sheet1[ID]) && Sheet1[Country]=EARLIER(Sheet1[Country]) && Sheet1[Value]<=EARLIER(Sheet1[Value]) && Sheet1[Index]<=EARLIER(Sheet1[Index])),Sheet1[ID])=1,"Distinct","Duplicate")
Find the solution at: https://www.dropbox.com/s/04bm2kh5qv0quhh/DistDup.pbix?dl=0
Appreciate your Kudos. In case, this is the solution you are looking for, mark it as the Solution.
In case it does not help, please provide additional information and mark me with @
Thanks. My Recent Blog -Decoding Direct Query - Time Intelligence, Winner Coloring on MAP, HR Analytics, Power BI Working with Non-Standard TimeAnd Comparing Data Across Date Ranges
Connect on Linkedin
The FU , UK record is same, so break the tie a create a new index column and create a new Column like given below
New column cal = if(COUNTX(FILTER(Sheet1,Sheet1[ID]=EARLIER(Sheet1[ID]) && Sheet1[Country]=EARLIER(Sheet1[Country]) && Sheet1[Value]<=EARLIER(Sheet1[Value]) && Sheet1[Index]<=EARLIER(Sheet1[Index])),Sheet1[ID])=1,"Distinct","Duplicate")
Find the solution at: https://www.dropbox.com/s/04bm2kh5qv0quhh/DistDup.pbix?dl=0
Appreciate your Kudos. In case, this is the solution you are looking for, mark it as the Solution.
In case it does not help, please provide additional information and mark me with @
Thanks. My Recent Blog -Decoding Direct Query - Time Intelligence, Winner Coloring on MAP, HR Analytics, Power BI Working with Non-Standard TimeAnd Comparing Data Across Date Ranges
Connect on Linkedin
@Anonymous do you need a M or DAX solution
I have done it through M
let
Source = Web.BrowserContents("https://community.powerbi.com/t5/Desktop/Adding-New-Column-Identification-of-Distinct-or-Duplicate/m-p/926039#M443799"),
#"Extracted Table From Html" = Html.Table(Source, {{"Column1", "DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > TR > :nth-child(1), DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > * > TR > :nth-child(1)"}, {"Column2", "DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > TR > :nth-child(2), DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > * > TR > :nth-child(2)"}, {"Column3", "DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > TR > :nth-child(3), DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > * > TR > :nth-child(3)"}, {"Column4", "DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > TR > :nth-child(4), DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > * > TR > :nth-child(4)"}}, [RowSelector="DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > TR, DIV[id='bodyDisplay'] > DIV.lia-message-body-content:nth-child(1) > TABLE:nth-child(5) > * > TR"]),
#"Promoted Headers" = Table.PromoteHeaders(#"Extracted Table From Html", [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"ID", type text}, {"Country", type text}, {"Value", Int64.Type}, {"""NEW Column""", type text}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"""NEW Column"""}),
#"Added Index" = Table.AddIndexColumn(#"Removed Columns", "Index", 1, 1),
#"Grouped Rows" = Table.Group(#"Added Index", {"ID", "Country"}, {{"ad", each _, type table [ID=text, Country=text, Value=number, Index=number]}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each let
X=[ad],
#"Added Index1" = Table.AddIndexColumn(X, "Index.1", 1, 1),
#"Added Custom" = Table.AddColumn(#"Added Index1", "Custom", each if[Index.1]=1 then "Unique" else "Duplicate"),
#"Removed Columns1" = Table.RemoveColumns(#"Added Custom",{"Index", "Index.1"})
in
#"Removed Columns1"),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Custom"}),
#"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", {"ID", "Country", "Value", "Custom"}, {"ID", "Country", "Value", "Custom.1"})
in
#"Expanded Custom"
A DAX solution is following - Calculated column
Rand = RAND()
Rand+Value = [Value]+[Rand]
CAT =
VAR _1 = RANKX(FILTER('DAX',EARLIER('DAX'[ID])='DAX'[ID]&&EARLIER('DAX'[Country])='DAX'[Country]),[Rand+Value],,ASC,Skip)
VAR _2 = IF(_1=1, "Unique","Duplicate")
RETURN _2
I tried to create only one column (CAT) in that way.
Only ID records idicate if the atribute is Unique or Duplicate.
CAT =
VAR _1 = RANKX(FILTER('DAX',EARLIER('DAX'[ID])),ASC,Skip)
VAR _2 = IF(_1=1, "Unique","Duplicate")
RETURN _2
Unfortunately it does not work.
Could you please help?
@Anonymous
if you want DAX to catch the duplicate IDs,
you need to tell DAX
to rank a numerical column (Values in this case) within each ID/ID_Country group (whatever you prefer)
and
use a RAND helper column and associate that with value to break ties inRANKX since there are duplicate values
User | Count |
---|---|
140 | |
113 | |
104 | |
77 | |
64 |
User | Count |
---|---|
135 | |
120 | |
101 | |
71 | |
61 |