cancel
Showing results for
Did you mean:
Post Patron

Identify duplicates and show up records

I've got contact table with columns: names, emails.

I want to identify duplicates in emails and show up their names.

for example, show me the table like this.   any ideas, thanks.

1 ACCEPTED SOLUTION
Anonymous
Not applicable

You could create a calculated column with the following formula:

```Count Emails =
Var Emails = [Emails]
RETURN

CALCULATE(
)```

From here, you could simply look for any rows where the Count Emails is greater than 1.

27 REPLIES 27
Regular Visitor

Works perfectly

Anonymous
Not applicable

You could create a calculated column with the following formula:

```Count Emails =
Var Emails = [Emails]
RETURN

CALCULATE(
)```

From here, you could simply look for any rows where the Count Emails is greater than 1.

Helper III

Hi @Anonymous

The function only works by Import Storage Mode, Do you have any idea how this could work in Direct Query?

Thanks for the help

Resolver I

Im not sure if this is better, wors or the same but in later versions of Power Bi you can use "Earlier" to refere to "in same line" values... hence i used it like this:

count entry = CALCULATE(COUNT(yourtable[email]);
FILTER(yourtable;yourtable[email] = EARLIER(yourtable[email])))

count entry = CALCULATE(COUNT(FinansRecords[F.Contry+Entry]);FILTER(FinansRecords;FinansRecords[F.Contry+Entry] = EARLIER(FinansRecords[F.Contry+Entry])))
Regular Visitor

Well look at that, this thread has been open for 2 years and the day I come to post my solution someone has beaten me by 4 days! The EARLIER() solution is cleaner but should be the same complexity as declaring a variable. Essentially, creating a variable outside of the calculate function is one way to pass row context into the calculate, and using the EARLIER function in the filter is the cleaner way of doing the same thing.

Anonymous
Not applicable

Declaring variables has become a habit as it works in all cases.  Yes Earlier works in this instance but i've run into a number of instances where i need to pass information into a calculate because the corresponding function for whatever i'm doing isn't compatable with calculate.  For ease of development i just use variables 100% of the time now.

Helper I

Amazing! Thank you. This saves me so much time.

Helper III

This works great for me.  How would i amend this to only perform the duplicate check for the year a record is related to?

Anonymous
Not applicable

I'll have to make some assumptions about your data but it could be something like:

```Count Emails =
Var Emails = [Emails]
Var YearVal = YEAR([EmailDate])
RETURN

CALCULATE(
'Your Table'[EmailDate] => DATE(YearVal, 1, 1),
'Your Table'[EmailDate] <= DATE(YearVal, 12, 31)
)```
Frequent Visitor

Hello all,

The suggested solution works perfectly, However, in my case I need to count how many times a certain value has appear so if it appears 3 time I need to get an output for each of those 3 lines like 1 or 2 or 3 by descending order.... Anyone can suggest??

Anonymous
Not applicable

@Anonymous  hi there, I just came across this post and it works exactly as suggested. thanks for that. My question is how I could use this or an equivalent in power query editor. I would prefer to add the calculated column in this environment because of subsequent logic I want to build into my table.

thanks

Anonymous
Not applicable

A find and replace of null might do what you are chasing.

Go to your "Replace Values" and write 'null'  (without quotes) into the top box and then put 0 in the other box.

Is that the part of the post you wanted a Power Query version for?

Anonymous
Not applicable

@Anonymous

hmm, no I don't think so. I'm looking for a query editor version to count the number of occurrences of a value of all rows in the table.  It's this part of the post I am looking for an alternative for in query editor.

```Count Emails =
Var Emails = [Emails]
RETURN

CALCULATE(

Anonymous
Not applicable

If you are wanting to do a count, you could use the Grouping function.  That will produce a count for you, but will summarize the table you use it in.  If you don't want to upset your existing data, just create a new query to reference the old query and do the grouping in the new table.  This table could be joined in the model.

The grouping function is found on the ribbon as "Group By".  Tell it to group by the email address and use the "Count" function to create the summarised field.

Post Patron

@Anonymous not toally understand this DAX, but it's toally work!!!.  many thanks!!!

Anonymous
Not applicable

Here is a quick explaination for your learning:

This formula is run on a row by row basis, so keepthat in mind.

```Var Emails = [Emails]
RETURN```

This part takes the answer on the current row in the field [Emails] and places it in memory to be used later on (called a variable).

CALCULATE() is used to run a statement, but with extra conditions

COUNTROWS('Your Table') is going to get us the count we care about, but since we are doing this calculation row by row, there is only 1 row that could be counted

all('Your Table') is used to tell the CALCULATE to look at every row in the table, instead of just the current row.  Of course we don't actually want every row, or else we'd get the same number

'Your Table'[Emails] = Emails  is a filter that is going to be applied to the all('Your Table').  This will filter the ALL down until it only has in it rows where the [Email] field contains the value we put into that variable in the first step.

Regular Visitor

@Anonymous - I'm new to PowerBI. I'm curious to know if the order of the other functions matter when using the CALCULATE function.

Anonymous
Not applicable

Good question! The documentation doesn't seem to specify, however I always assume that it do the worst case scenario of whatever i'm attempting.  So i write statements as best as possible, then test if I get the desired result.

Regular Visitor

@Anonymous

Hi Ross. This code works for me as well but I am a bit confused:

Are you able to just use:

```Count Emails =

CALCULATE(
)```

(ie. exclude the VAR function)? Correct me if I am wrong, but doesn't that just define the variable 'Invoice'? Shouldn't the code work the same without it (albeit less efficiently). I tried excluding that part of the coding and it didn't work (comes up with an error saying "Cannot convert value ............. of type Text to type True/False".

edday.

Resolver II

Better late than never, but the 'cannot convert value...' is because the argument for calculate() must evaluate to true or false. It could work on a boolean column, but not on a text column.

Announcements

2022 Release Wave 1 Plan

Power Platform release plan for the 2022 release wave 1 describes all new features releasing from April 2022 through September 2022.