Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hello all;
I was wondering if there was a way to count duplicates but only within a period of time ( 2 days ).
I have a dataset that looks similar to this :
ID | Date |
1 | 30-05-2022 |
1 | 28-05-2022 |
2 | 30-05-2022 |
3 | 30-05-2022 |
4 | 30-05-2022 |
2 | 01-05-2022 |
If an ID is repeated twice ( or more ) within 48 hours then I want it to be labeled as a duplicate, in the table above for example the ID 1 is a duplicate, the ID 2 isn't.
this calculated column allows me to count the number of duplicates :
CALCULATE ( COUNT ( 'Table'[ID] ), ALLEXCEPT ( 'Table', 'Table'[ID]) )
how can i improve it in a way it counts only duplicates that happen within a 48hours period?
Thanks for your help!
Solved! Go to Solution.
Try this calculated column:
Duplicate =
VAR vDate = 'Table'[Date]
VAR vDate2 =
CALCULATE (
MAX ( 'Table'[Date] ),
ALLEXCEPT ( 'Table', 'Table'[ID] ),
'Table'[Date] <> vDate
)
VAR vResult =
SWITCH (
TRUE,
// one row exists for an ID
ISBLANK ( vDate2 ), "N",
// get number of days between dates for an ID
IF ( ABS ( DATEDIFF ( vDate, vDate2, DAY ) ) <= 2, "Y", "N" )
)
RETURN
vResult
Proud to be a Super User!
Thanks for your help so far!
I'm getting this error:
DAX comparison operations do not support comparing values of type Text with values of type Integer. Consider using the VALUE or FORMAT function to convert one of the values.
I've removed the
// exclude a particular call 'Ad Spend - Sponsored'[Call_CLI] = 448716630, "N/A",
part and it works, so it must be because the CLI is set as text?
I'm not sure what else i can set the data type to?
If Call_CLI is a text data type, embed the value in double quotes:
'Ad Spend - Sponsored'[Call_CLI] = "448716630", "N/A",
Proud to be a Super User!
Hi i want to do the same thing but i only want it to label the second entry as a duplicate, is this possible?
How do you define "second entry"? Does it rely on the sort order of the table, or have a later date than the first entry? Please provide sample data and the expected result.
Proud to be a Super User!
Hi it would be the later entries that would be duplicates based on time and not sort order.
I've changed your code to the following which makes it do it based on the last 30 days in minutes.
Duplicate =
VAR vDate = 'Ad Spend - Sponsored'[Header_Created]
VAR vDate2 =
CALCULATE (
MAX ( 'Ad Spend - Sponsored'[Header_Created] ),
ALLEXCEPT ( 'Ad Spend - Sponsored','Ad Spend - Sponsored'[Call_CLI]),
'Ad Spend - Sponsored'[Header_Created] <> vDate
)
VAR vResult =
SWITCH (
TRUE,
// one row exists for an ID
ISBLANK ( vDate2 ), "N",
// get number of days between dates for an ID
IF ( ABS ( DATEDIFF ( vDate, vDate2, MINUTE ) ) <= 43200, "Y", "N" )
)
RETURN
vResult
That is working but it shows yes for all entries rather than just the 2nd 3rd etc.
Here's an example.
I also want to exclude one particular Call_CLI from the calcualtion too, is that possible?
Hope that all makes sense.
Try this calculated column:
Duplicate =
VAR vDate =
'Ad Spend - Sponsored'[Header_Created]
VAR vDate2 =
CALCULATE (
MAX ( 'Ad Spend - Sponsored'[Header_Created] ),
ALLEXCEPT ( 'Ad Spend - Sponsored','Ad Spend - Sponsored'[Call_CLI]),
'Ad Spend - Sponsored'[Header_Created] <> vDate
)
VAR vMinDate =
CALCULATE (
MIN ( 'Ad Spend - Sponsored'[Header_Created] ),
ALLEXCEPT ( 'Ad Spend - Sponsored','Ad Spend - Sponsored'[Call_CLI])
)
VAR vResult =
SWITCH (
TRUE,
// exclude a particular call
'Ad Spend - Sponsored'[Call_CLI] = 448716630, "N/A",
// one row exists for an ID
ISBLANK ( vDate2 ), "N",
// current row is earliest date for the Call_CLI
vDate = vMinDate, "N",
// number of minutes is within threshold
ABS ( DATEDIFF ( vDate, vDate2, MINUTE ) ) <= 43200, "Y"
)
RETURN
vResult
Proud to be a Super User!
So in checking the data 95%+ looks right but on the odd occasion this happens:
I can't work out why, can't see a correlation?
To troubleshoot this, I would start by forcing each result expression in the SWITCH function to return a unique value. Currently, the second and third result expressions return "N". Once you know which expression is being evaluated for a row, you can further analyze what might be causing it. Instead of returning vResult, return an expression like the one below. This enables you to see the underlying variables.
vDate & " | " & vDate2 & " | " & vMinDate
Proud to be a Super User!
@DataInsights Thanks, that makes sense, so i've done the following:
VAR vResult =
SWITCH (
TRUE,
'Ad Spend - Sponsored'[Call_CLI] = "+266696687", "withheld",
// one row exists for an ID
ISBLANK ( vDate2 ), "one row",
// current row is earliest date for the Call_CLI
vDate = vMinDate, "earliest",
// number of minutes is within threshold
ABS ( DATEDIFF ( vDate, vDate2, MINUTE ) ) <=43200, "Y"
I did change your code and added a "N" at the end of :
ABS ( DATEDIFF ( vDate, vDate2, MINUTE ) ) <=43200, "Y"
So I guess some more code needs to go in?
Also have this issue :
There are 8 months between them so unsure why they would be labelled 'Y'
Try this approach:
Duplicate =
VAR vDate =
'Ad Spend - Sponsored'[Header_Created]
VAR vMinDate =
CALCULATE (
MIN ( 'Ad Spend - Sponsored'[Header_Created] ),
ALLEXCEPT ( 'Ad Spend - Sponsored','Ad Spend - Sponsored'[Call_CLI])
)
VAR vResult =
SWITCH (
TRUE,
'Ad Spend - Sponsored'[Call_CLI] = "+266696687", "withheld",
// earliest row for a Call_CLI
vDate = vMinDate, "N",
// subsequent row for a Call_CLI and number of minutes is within threshold
vDate <> vMinDate && DATEDIFF ( vMinDate, vDate, MINUTE ) <= 43200, "Y",
// subsequent row for a Call_CLI and number of minutes is not within threshold
"N"
)
RETURN
vResult
Proud to be a Super User!
@DataInsights Thanks for your help, really appreciate it!!
That seems to be working.
This could be too complicated but is there a way of treating each call so they have their own duplicate time window?
In your example ideally the last call should be a duplicate as it came in within 30 days of the previous call.
Try this:
Duplicate =
VAR vDate =
'Ad Spend - Sponsored'[Header_Created]
VAR vLatestPreviousDate =
CALCULATE (
MAX ( 'Ad Spend - Sponsored'[Header_Created] ),
ALLEXCEPT ( 'Ad Spend - Sponsored','Ad Spend - Sponsored'[Call_CLI]),
'Ad Spend - Sponsored'[Header_Created] < vDate
)
VAR vResult =
SWITCH (
TRUE,
'Ad Spend - Sponsored'[Call_CLI] = "+266696687", "withheld",
// earliest row for a Call_CLI
ISBLANK ( vLatestPreviousDate ), "N",
// subsequent row for a Call_CLI and number of minutes is within threshold
DATEDIFF ( vLatestPreviousDate, vDate, MINUTE ) <= 43200, "Y",
// subsequent row for a Call_CLI and number of minutes is not within threshold
"N"
)
RETURN
vResult
Proud to be a Super User!
Try this calculated column:
Duplicate =
VAR vDate = 'Table'[Date]
VAR vDate2 =
CALCULATE (
MAX ( 'Table'[Date] ),
ALLEXCEPT ( 'Table', 'Table'[ID] ),
'Table'[Date] <> vDate
)
VAR vResult =
SWITCH (
TRUE,
// one row exists for an ID
ISBLANK ( vDate2 ), "N",
// get number of days between dates for an ID
IF ( ABS ( DATEDIFF ( vDate, vDate2, DAY ) ) <= 2, "Y", "N" )
)
RETURN
vResult
Proud to be a Super User!
Thank you legend!
User | Count |
---|---|
94 | |
83 | |
78 | |
75 | |
66 |
User | Count |
---|---|
115 | |
105 | |
93 | |
65 | |
62 |