Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I am cleaning up some data. I am trying to find start/end dates that overlap. Here is a table showing customer ID, start date, and end date.
We should only have one job open per customer at a time. As you can see, the first end date is after the second start date. One of these was inputted wrong or we had two jobs open at the same time (which is not best practice). I'm trying to produce a list of all these overlaps to investigate them further.
I've ventured down the path of ranking the start dates and end dates per customer and then seeing if rank 2 start is before rank 1 end, etc., but I get stuck. I'm not convinced that's the best way either.
Solved! Go to Solution.
Hi @Anonymous ,
If I understand your requirement correctly that you want to find the date which is inputted wrong.
If it is, you could create an index column in Query Editor and then create the calcualted columns below.
Column = CALCULATE ( FIRSTNONBLANK ( 'Table1'[Start Date], 1 ), FILTER ( 'Table1', 'Table1'[Index] = EARLIER ( 'Table1'[Index] ) + 1 ) )
Column 2 =
IF ( 'Table1'[Column] < 'Table1'[End Date], FALSE (), TRUE () )
Here is the output.
If you still need help, please share your desired output so that we could help further on it.
Best Regards,
Cherry
Hi @Anonymous ,
If I understand your requirement correctly that you want to find the date which is inputted wrong.
If it is, you could create an index column in Query Editor and then create the calcualted columns below.
Column = CALCULATE ( FIRSTNONBLANK ( 'Table1'[Start Date], 1 ), FILTER ( 'Table1', 'Table1'[Index] = EARLIER ( 'Table1'[Index] ) + 1 ) )
Column 2 =
IF ( 'Table1'[Column] < 'Table1'[End Date], FALSE (), TRUE () )
Here is the output.
If you still need help, please share your desired output so that we could help further on it.
Best Regards,
Cherry
This definitely gets me closer. My issue comes when it applies to my database of 100,000 customers.
If we use a straight index for all customers, I start getting a lot of falses because the first start date of a customer gets compared to the last end date of the previous customer.
I keep trying to figure out how to have an index per customer id. So customer ID 100046 would have index 1-4 then customer ID would have index 1-4. Then we would just have to work in a condition to return an error or something else when index + 1 does not exist.
Does this make sense?
Hi @Anonymous ,
You could try to create a group index with Dax formula like below.
index = RANKX ( FILTER ( 'Table1', 'Table1'[ID] = EARLIER ( Table1[ID] ) ), 'Table1'[Start Date], , ASC )
Then create the calcualted column which mentioned in my first reply.
Here is the output.
Hope this can help you!
Best Regards,
Cherry
I really appreciate you working through this with me. We are so close.
Below is a screenshot of what I'm getting when I apply to my whole dataset.
For some reaseon, it is populating Column with dates from the 90s. Does this issue come from the data not being sorted by ID then by Start Date? I have done this sort inside of edit queries, but when I close and apply it does not apply that sort. The sort you see here is done by ID after the query is applied.
As a reference, here's 2 screenshots.
First is after I close and apply. It defaults to be sorted by end date.
Second is inside of edit queries. I sort it using SQL before I even see it by ID and start date.
I got a solution for what I needed. It's a little messy, but got what I wanted.
I used this other solution to get the index based on idea column created...
https://community.powerbi.com/t5/Desktop/Custom-column-Index-or-Ranking-by-other-column/td-p/33864
From here, I created a new custom column that was Index + 1.
I then duplicated the query. In one query, I made a merged column that was "ID,Index." In the other query, I made a merged column column that was "ID, Index+1." I then merged the queries with a left outer join where "ID,Index" = "ID, Index+1" and expanded to include end date. This then gave me the start date with the previous end date in the same row. Conditional column to test if it came before.
Sorry. Hit post by accident when I tried to post picture.
Hope my explanation makes sense of my process. Again, it's not ideal with duplicated queries, but gets the job done for now.
Hi @Anonymous ,
Sorry for the delay.
It seems that you have solved your problem.
Please always accept the replies making sense as solution to your question so that people who may have the same question can get the solution directly.
Best Regards,
Cherry
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
114 | |
97 | |
85 | |
70 | |
61 |
User | Count |
---|---|
151 | |
121 | |
104 | |
87 | |
67 |