Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
KanisleS
Helper I
Helper I

[DAX] Count duplicates within a period of time

Hello all;

 

I was wondering if there was a way to count duplicates but only within a period of time ( 2 days ).

I have a dataset that looks similar to this : 

ID Date
130-05-2022
128-05-2022
230-05-2022
330-05-2022
430-05-2022
201-05-2022

If an ID is repeated twice ( or more ) within 48 hours then I want it to be labeled as a duplicate, in the table above for example the ID 1 is a duplicate, the ID 2 isn't.

this calculated column allows me to count the number of duplicates : 

CALCULATE ( COUNT ( 'Table'[ID] ), ALLEXCEPT ( 'Table', 'Table'[ID]) )

 

how can i improve it in a way it counts only duplicates that happen within a 48hours period?

 

Thanks for your help!

1 ACCEPTED SOLUTION
DataInsights
Super User
Super User

@KanisleS,

 

Try this calculated column:

 

Duplicate = 
VAR vDate = 'Table'[Date]
VAR vDate2 =
    CALCULATE (
        MAX ( 'Table'[Date] ),
        ALLEXCEPT ( 'Table', 'Table'[ID] ),
        'Table'[Date] <> vDate
    )
VAR vResult =
    SWITCH (
        TRUE,
        // one row exists for an ID
        ISBLANK ( vDate2 ), "N",
        // get number of days between dates for an ID
        IF ( ABS ( DATEDIFF ( vDate, vDate2, DAY ) ) <= 2, "Y", "N" )
    )
RETURN
    vResult

 

DataInsights_0-1654004191541.png

 





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




View solution in original post

16 REPLIES 16
JH25
Frequent Visitor

@DataInsights 

Thanks for your help so far!

I'm getting this error:

DAX comparison operations do not support comparing values of type Text with values of type Integer. Consider using the VALUE or FORMAT function to convert one of the values.

 

I've removed the

 // exclude a particular call
        'Ad Spend - Sponsored'[Call_CLI] = 448716630, "N/A",

part and it works, so it must be because the CLI is set as text?

 

I'm not sure what else i can set the data type to?

@JH25,

 

If Call_CLI is a text data type, embed the value in double quotes:

 

'Ad Spend - Sponsored'[Call_CLI] = "448716630", "N/A",




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




JH25
Frequent Visitor

@DataInsights 

Hi i want to do the same thing but i only want it to label the second entry as a duplicate, is this possible?

@JH25,

 

How do you define "second entry"? Does it rely on the sort order of the table, or have a later date than the first entry? Please provide sample data and the expected result.





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




Hi it would be the later entries that would be duplicates based on time and not sort order.

I've changed your code to the following which makes it do it based on the last 30 days in minutes.

Duplicate = 
VAR vDate = 'Ad Spend - Sponsored'[Header_Created]
VAR vDate2 =
    CALCULATE (
        MAX ( 'Ad Spend - Sponsored'[Header_Created] ),
        ALLEXCEPT ( 'Ad Spend - Sponsored','Ad Spend - Sponsored'[Call_CLI]),
        'Ad Spend - Sponsored'[Header_Created] <> vDate
    )
VAR vResult =
    SWITCH (
        TRUE,
        // one row exists for an ID
        ISBLANK ( vDate2 ), "N",
        // get number of days between dates for an ID
        IF ( ABS ( DATEDIFF ( vDate, vDate2, MINUTE ) ) <= 43200, "Y", "N" )
    )
RETURN
    vResult

That is working but it shows yes for all entries rather than just the 2nd 3rd etc.

Here's an example.

JH25_1-1671108153665.png

I also want to exclude one particular Call_CLI from the calcualtion too, is that possible?

 

Hope that all makes sense.

 

 

@JH25,

 

Try this calculated column:

 

Duplicate = 
VAR vDate =
    'Ad Spend - Sponsored'[Header_Created]
VAR vDate2 =
    CALCULATE (
        MAX ( 'Ad Spend - Sponsored'[Header_Created] ),
        ALLEXCEPT ( 'Ad Spend - Sponsored','Ad Spend - Sponsored'[Call_CLI]),
        'Ad Spend - Sponsored'[Header_Created] <> vDate
    )
VAR vMinDate =
    CALCULATE (
        MIN ( 'Ad Spend - Sponsored'[Header_Created] ),
        ALLEXCEPT ( 'Ad Spend - Sponsored','Ad Spend - Sponsored'[Call_CLI])
    )
VAR vResult =
    SWITCH (
        TRUE,
        // exclude a particular call
        'Ad Spend - Sponsored'[Call_CLI] = 448716630, "N/A",
        // one row exists for an ID
        ISBLANK ( vDate2 ), "N",
        // current row is earliest date for the Call_CLI
        vDate = vMinDate, "N",
        // number of minutes is within threshold
        ABS ( DATEDIFF ( vDate, vDate2, MINUTE ) ) <= 43200, "Y"
    )
RETURN
    vResult

 

DataInsights_0-1671407556714.png

 





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




@DataInsights 

So in checking the data 95%+ looks right but on the odd occasion this happens:

JH25_0-1671453873184.png

I can't work out why, can't see a correlation?

@JH25,

 

To troubleshoot this, I would start by forcing each result expression in the SWITCH function to return a unique value. Currently, the second and third result expressions return "N". Once you know which expression is being evaluated for a row, you can further analyze what might be causing it. Instead of returning vResult, return an expression like the one below. This enables you to see the underlying variables.

 

vDate & " | " & vDate2 & " | " & vMinDate

 





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




@DataInsights Thanks, that makes sense, so i've done the following:

VAR vResult =
    SWITCH (
        TRUE,
       'Ad Spend - Sponsored'[Call_CLI] = "+266696687", "withheld",
        // one row exists for an ID
        ISBLANK ( vDate2 ), "one row",
        // current row is earliest date for the Call_CLI
        vDate = vMinDate, "earliest",
        // number of minutes is within threshold
    ABS ( DATEDIFF ( vDate, vDate2, MINUTE ) ) <=43200, "Y"

 

I did change your code and added a "N" at the end of : 

ABS ( DATEDIFF ( vDate, vDate2, MINUTE ) ) <=43200, "Y"
as it was on your last one, thought it may have been an oversight and it appars to be those ones as they are now coming out blank.

JH25_1-1671464619801.png

So I guess some more code needs to go in?

 

JH25
Frequent Visitor

Also have this issue :

JH25_2-1671464796289.png

There are 8 months between them so unsure why they would be labelled 'Y'

@JH25,

 

Try this approach:

 

Duplicate = 
VAR vDate =
    'Ad Spend - Sponsored'[Header_Created]
VAR vMinDate =
    CALCULATE (
        MIN ( 'Ad Spend - Sponsored'[Header_Created] ),
        ALLEXCEPT ( 'Ad Spend - Sponsored','Ad Spend - Sponsored'[Call_CLI])
    )
VAR vResult =
    SWITCH (
        TRUE,
       'Ad Spend - Sponsored'[Call_CLI] = "+266696687", "withheld",
        // earliest row for a Call_CLI
        vDate = vMinDate, "N",
        // subsequent row for a Call_CLI and number of minutes is within threshold
        vDate <> vMinDate && DATEDIFF ( vMinDate, vDate, MINUTE ) <= 43200, "Y",
        // subsequent row for a Call_CLI and number of minutes is not within threshold
        "N"
    )
RETURN
    vResult

 

DataInsights_0-1671467553026.png

 





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




@DataInsights Thanks for your help, really appreciate it!!

That seems to be working.

This could be too complicated but is there a way of treating each call so they have their own duplicate time window?

In your example ideally the last call should be a duplicate as it came in within 30 days of the previous call.

JH25_0-1671470205590.png

 

@JH25,

 

Try this:

 

Duplicate = 
VAR vDate =
    'Ad Spend - Sponsored'[Header_Created]
VAR vLatestPreviousDate =
    CALCULATE (
        MAX ( 'Ad Spend - Sponsored'[Header_Created] ),
        ALLEXCEPT ( 'Ad Spend - Sponsored','Ad Spend - Sponsored'[Call_CLI]),
        'Ad Spend - Sponsored'[Header_Created] < vDate
    )
VAR vResult =
    SWITCH (
        TRUE,
       'Ad Spend - Sponsored'[Call_CLI] = "+266696687", "withheld",
        // earliest row for a Call_CLI
        ISBLANK ( vLatestPreviousDate ), "N",
        // subsequent row for a Call_CLI and number of minutes is within threshold
        DATEDIFF ( vLatestPreviousDate, vDate, MINUTE ) <= 43200, "Y",
        // subsequent row for a Call_CLI and number of minutes is not within threshold
        "N"
    )
RETURN
    vResult

 

DataInsights_0-1671484370765.png

 





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




@DataInsights  Amazing, thanks so much!!! It's spot on!!

DataInsights
Super User
Super User

@KanisleS,

 

Try this calculated column:

 

Duplicate = 
VAR vDate = 'Table'[Date]
VAR vDate2 =
    CALCULATE (
        MAX ( 'Table'[Date] ),
        ALLEXCEPT ( 'Table', 'Table'[ID] ),
        'Table'[Date] <> vDate
    )
VAR vResult =
    SWITCH (
        TRUE,
        // one row exists for an ID
        ISBLANK ( vDate2 ), "N",
        // get number of days between dates for an ID
        IF ( ABS ( DATEDIFF ( vDate, vDate2, DAY ) ) <= 2, "Y", "N" )
    )
RETURN
    vResult

 

DataInsights_0-1654004191541.png

 





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




Thank you legend!

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.