cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
ivyhai
Post Patron
Post Patron

Identify duplicates and show up records

I've got contact table with columns: names, emails. 

 

I want to identify duplicates in emails and show up their names. 

for example, show me the table like this.   any ideas, thanks. CRM.PNG

 

1 ACCEPTED SOLUTION
Anonymous
Not applicable

You could create a calculated column with the following formula:

Count Emails = 
Var Emails = [Emails]
RETURN

CALCULATE(
    COUNTROWS('Your Table'),
    all('Your Table'),
    'Your Table'[Emails] = Emails
)

From here, you could simply look for any rows where the Count Emails is greater than 1.

View solution in original post

27 REPLIES 27
Olanna
Regular Visitor

Works perfectly

Anonymous
Not applicable

You could create a calculated column with the following formula:

Count Emails = 
Var Emails = [Emails]
RETURN

CALCULATE(
    COUNTROWS('Your Table'),
    all('Your Table'),
    'Your Table'[Emails] = Emails
)

From here, you could simply look for any rows where the Count Emails is greater than 1.

View solution in original post

Hi @Anonymous  

 

The function only works by Import Storage Mode, Do you have any idea how this could work in Direct Query?

 

Thanks for the help

Im not sure if this is better, wors or the same but in later versions of Power Bi you can use "Earlier" to refere to "in same line" values... hence i used it like this:

 

count entry = CALCULATE(COUNT(yourtable[email]);
                          FILTER(yourtable;yourtable[email] = EARLIER(yourtable[email])))
 
 
count entry = CALCULATE(COUNT(FinansRecords[F.Contry+Entry]);FILTER(FinansRecords;FinansRecords[F.Contry+Entry] = EARLIER(FinansRecords[F.Contry+Entry])))

Well look at that, this thread has been open for 2 years and the day I come to post my solution someone has beaten me by 4 days! The EARLIER() solution is cleaner but should be the same complexity as declaring a variable. Essentially, creating a variable outside of the calculate function is one way to pass row context into the calculate, and using the EARLIER function in the filter is the cleaner way of doing the same thing.

Anonymous
Not applicable

Declaring variables has become a habit as it works in all cases.  Yes Earlier works in this instance but i've run into a number of instances where i need to pass information into a calculate because the corresponding function for whatever i'm doing isn't compatable with calculate.  For ease of development i just use variables 100% of the time now.

Amazing! Thank you. This saves me so much time.

This works great for me.  How would i amend this to only perform the duplicate check for the year a record is related to?

Anonymous
Not applicable

I'll have to make some assumptions about your data but it could be something like:

 

Count Emails = 
Var Emails = [Emails]
Var YearVal = YEAR([EmailDate])
RETURN

CALCULATE(
    COUNTROWS('Your Table'),
    all('Your Table'),
    'Your Table'[Emails] = Emails,
    'Your Table'[EmailDate] => DATE(YearVal, 1, 1),
    'Your Table'[EmailDate] <= DATE(YearVal, 12, 31)
)

Hello all,

 

The suggested solution works perfectly, However, in my case I need to count how many times a certain value has appear so if it appears 3 time I need to get an output for each of those 3 lines like 1 or 2 or 3 by descending order.... Anyone can suggest??

 

Thanks in advance 

Anonymous
Not applicable

@Anonymous  hi there, I just came across this post and it works exactly as suggested. thanks for that. My question is how I could use this or an equivalent in power query editor. I would prefer to add the calculated column in this environment because of subsequent logic I want to build into my table. 

 

thanks

Anonymous
Not applicable

A find and replace of null might do what you are chasing.

Go to your "Replace Values" and write 'null'  (without quotes) into the top box and then put 0 in the other box.

 

Is that the part of the post you wanted a Power Query version for?

Anonymous
Not applicable

@Anonymous 

hmm, no I don't think so. I'm looking for a query editor version to count the number of occurrences of a value of all rows in the table.  It's this part of the post I am looking for an alternative for in query editor. 

 

Count Emails = 
Var Emails = [Emails]
RETURN

CALCULATE(
    COUNTROWS('Your Table'),
    all('Your Table'),
    'Your Table'[Emails] = Emails

 

Anonymous
Not applicable

If you are wanting to do a count, you could use the Grouping function.  That will produce a count for you, but will summarize the table you use it in.  If you don't want to upset your existing data, just create a new query to reference the old query and do the grouping in the new table.  This table could be joined in the model.

 

The grouping function is found on the ribbon as "Group By".  Tell it to group by the email address and use the "Count" function to create the summarised field.

@Anonymous not toally understand this DAX, but it's toally work!!!.  many thanks!!!

Anonymous
Not applicable

Here is a quick explaination for your learning:


This formula is run on a row by row basis, so keepthat in mind.

Var Emails = [Emails]
RETURN

This part takes the answer on the current row in the field [Emails] and places it in memory to be used later on (called a variable).

CALCULATE() is used to run a statement, but with extra conditions

COUNTROWS('Your Table') is going to get us the count we care about, but since we are doing this calculation row by row, there is only 1 row that could be counted

 

all('Your Table') is used to tell the CALCULATE to look at every row in the table, instead of just the current row.  Of course we don't actually want every row, or else we'd get the same number

'Your Table'[Emails] = Emails  is a filter that is going to be applied to the all('Your Table').  This will filter the ALL down until it only has in it rows where the [Email] field contains the value we put into that variable in the first step.

Wiz
Regular Visitor

@Anonymous - I'm new to PowerBI. I'm curious to know if the order of the other functions matter when using the CALCULATE function.

Anonymous
Not applicable

Good question! The documentation doesn't seem to specify, however I always assume that it do the worst case scenario of whatever i'm attempting.  So i write statements as best as possible, then test if I get the desired result.



@Anonymous

 

Hi Ross. This code works for me as well but I am a bit confused:

 

Are you able to just use:

 

Count Emails = 

CALCULATE(
    COUNTROWS('Your Table'),
    all('Your Table'),
    'Your Table'[Emails]
)

 (ie. exclude the VAR function)? Correct me if I am wrong, but doesn't that just define the variable 'Invoice'? Shouldn't the code work the same without it (albeit less efficiently). I tried excluding that part of the coding and it didn't work (comes up with an error saying "Cannot convert value ............. of type Text to type True/False".

 

Thanks so much in advance!

 

edday.

 

 

Better late than never, but the 'cannot convert value...' is because the argument for calculate() must evaluate to true or false. It could work on a boolean column, but not on a text column.

Helpful resources

Announcements
UG GA Amplification 768x460.png

Launching new user group features

Learn how to create your own user groups today!

November Power BI Update 768x460.png

Check it Out!

Click here to read more about the November 2021 Updates!

M365 768x460.jpg

Microsoft 365 Collaboration Conference | December 7–9, 2021

Join us, in-person, December 7–9 in Las Vegas, for the largest gathering of the Microsoft community in the world.