Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hi,
I have a feeling that this has a simple solution, that I just can't seem to figure out.
I'm trying to depude one Column (SaleID), assigning priority on which value to keep based on the oldest date which is located in another column.
Input:
SaleID | Date |
1 | 2021 |
1 | 2023 |
1 | 2020 |
2 | 2021 |
2 | 2023 |
2 | 2022 |
3 | 2023 |
3 | 2021 |
3 | 2022 |
Output
SaleID | Date |
1 | 2020 |
2 | 2021 |
3 | 2021 |
Hi, thanks for the suggestion.
Unfortunetly this didn't work as it removed some of the Sale ID's alltogether, do you have any idea why that would happen?
That doesn't make sense to me. Any chance you can share a reproducible example?
Hi, I got this to work in the end by doing table.buffer() and sorting by date within the buffer then deduping afterwards.