Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
ssbagley
Helper III
Helper III

How to identify duplicates - same amount and date

I am working with employee expense submissions and want to identify lines for the same date and amount and count the number of "duplicates" identified, by employee.  I'm relatively new to PowerBi and none of my searches are giving me what I need... Help!

 

My data is quite large - 50k-100k rows.  Below is a sample. 

Employee IDEmployee NameExpense DateAmount
1Rick10/04/20196.57
1Rick11/21/20194.40
2Derek09/08/201980.00
2Derek08/22/2019200.00
2Derek08/22/2019200.00
2Derek08/22/2019-200.00
3Steve09/03/20195.46
4Jim12/19/201916.45
5Donna08/12/20190.00
5Donna10/11/20192.50
5Donna09/13/201912.00
5Donna08/09/20198.50
5Donna08/09/20192.00
6Arnold08/16/201955.00
6Arnold09/13/201918.00
6Arnold11/25/20190.00
6Arnold11/25/20190.00
6Arnold12/06/2019135.93
7Sally10/27/20190.00
7Sally10/16/201931.05
7Sally10/20/2019101.91
7Sally12/10/201967.25
8Thomas09/25/2019945.00
9Juan10/24/20193.60
1 ACCEPTED SOLUTION

@ssbagley looked into it and the measure is working as desired.

 

You need to put all the axis on Don't Summarize

 

smpa01_0-1639602058527.png

 

Also, don't use date hierarchy and do the following

smpa01_1-1639602100470.png

 

 

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

View solution in original post

16 REPLIES 16
ssbagley
Helper III
Helper III

One new question - assuming the code discussed, how can I EXCLUDE a field?  So, I still want all lines where Employee number, Name, Date and Amount match, but where the field containing credit card number (for example) is not the same.  

ssbagley
Helper III
Helper III

I added the index column and we're getting closer; however, the "duplicate exists" flag is appearing any time there are two amounts on the same day for the same person.  The amounts need to be the same.  

So, person 1 submitted $5 twice on 09/19/2019 = duplicate.  I would expect to see

Person 1 $5 09/19/2019

Person 1 $5 09/19/2021

Here is a screenshot from my data with the index column option, but the lines highlighted in yellow are not true duplicates:

ssbagley_0-1639515649165.png

 

@ssbagley put the index column in the viz

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

It didn't change the results.  😞

ssbagley_0-1639579960469.png

 

@ssbagley can you please send the representative dataset /pbix?

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

I can't - this is confidential data so I cannot share.

This is my code:

CountOfSubmission3 =
VAR _COUNT =
CALCULATE (
COUNT ( 'Misc Sundry'[Local Amount] ),
ALLEXCEPT (
'Misc Sundry',
'Misc Sundry'[Employee],
'Misc Sundry'[Expense Date],
'Misc Sundry'[Local Amount]
),
FILTER ( VALUES ( 'Misc Sundry'[Local Amount] ), 'Misc Sundry'[Local Amount] <> 0 )
)
RETURN
IF (
_COUNT > 1,
"DUPLICATE EXISTS",
"NO DUPLICATE EXISTS"
)

@ssbagley  this is the same code as mine and you can see that it perfectly works on my dataset. For me to be able to understand, why it does not do the trick in your dataset, I need to take a look at the reprentative datset.

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

@ssbagley looked into it and the measure is working as desired.

 

You need to put all the axis on Don't Summarize

 

smpa01_0-1639602058527.png

 

Also, don't use date hierarchy and do the following

smpa01_1-1639602100470.png

 

 

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

Incredible!  I cannot thank you enough for your help!  👏

How do I upload/attach a file?

@ssbagley  gdrive/1drive/any other file uploading service

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs
smpa01
Super User
Super User

@ssbagley  you can utilize a measure like this

coutOfSubmission =
CALCULATE (
    COUNT ( 'Table'[Amount] ),
    ALLEXCEPT (
        'Table',
        'Table'[Employee ID],
        'Table'[Expense Date],
        'Table'[Amount]
    ),
    FILTER ( VALUES ( 'Table'[Amount] ), 'Table'[Amount] <> 0 )
)

 

It counts the amount by ID,Date,Amount Partition for any non-zero Amount

 

smpa01_0-1639441111274.png

 

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

But this only "counts", right? I need to identify lines where the same date and amount appear more than once (duplicate submissions).  For example, if Employee 1 submitted $5 three times on 10/04/2019 - that would flag.  But if Employee 2 only submitted $5 one time on 10/5/2019, that would not flag.  

@ssbagley  if you want to do it through measure, you need an index column

coutOfSubmission = 
VAR _count =
    CALCULATE (
        COUNT ( 'Table'[Amount] ),
        ALLEXCEPT (
            'Table',
            'Table'[Employee ID],
            'Table'[Expense Date],
            'Table'[Amount]
        ),
        FILTER ( VALUES ( 'Table'[Amount] ), 'Table'[Amount] <> 0 )
    )
RETURN
    IF (
        _count > 1,
        "Duplicate Submission Exists",
        "No Duplicate Submission Exists"
    )

 

The same can be used for calcukated column without requiring an index column

Please find attached

 

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.