Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and a 50 percent discount on exams.
Get startedEarn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hi dear community,
I have an interesting case with removal of duplicates in time series. I can do it in Pandas, but can not find a way to do it in M.
Below is a typical case number 1 - order id was updated. This case is easy - sort by timecode and delete duplicates (first one will be deleted)
Id before after timecode
1 a b 00:01
1 b с 00:02
Second case is much harder to crack. In this case after value is equal to initial value - I need to delete (cancel) all such orders, not only first rows.
Id before after timecode
1 a b 00:01
1 b a 00:02
sometimes it could be a combo of above.
Id before after timecode
1 a b 00:01
1 b c 00:02
1 с a 00:03
Any ideas would be appreciated, thanks!
It looks like the case for Pandas and Python, but maybe it can be solved by M and/or DAX.
I think you will struggle with M to do the first item. In my experience it is not possible to guarantee the removal of the correct records (e.g. Remove earliest based on date). I normally sort on date and then remove duplicates, but I discovered recently that the removal of the correct item is not guaranteed. I confirmed this with Microsoft.
Regarding the second scenario, take a look at my technique here - this may work for you.
https://m.youtube.com/watch?v=xN2IRXQ2CvI
Third one looks tough again.
Yes I think this is a job for Python or some per suitable tool.
Hi @MattAllington,
have you tried using Table.Buffer after the sort? That has worked for me so far but of course I would be interested to know if there is a know issue with it. So did the confirmation you got from Microsoft include the buffered version as well?
Thanks.
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
@ImkeF wrote:So did the confirmation you got from Microsoft include the buffered version as well?
Thanks.
You are right Imke. I went back and checked and the concensus is that Table.Buffer should fix it (my bad). Can you advise how you would implement this? Do you first sort the table, and then manually wrap Table.Buffer around the source? or something else?
Thanks Matt,
relieved to hear 🙂
Theoretically, it should be fine if you simply add another step after your sorting like this:
Sort = let Source = Table1, #"Sorted Rows" = Table.Sort(Source,{"timecode", Order.Ascending}), #"Buffered Table" = Table.Buffer(#"Sorted Rows"). ...
But to be on the safe side, I always proceed like Marcel and "wrap" it directly around the syntax of the sorting-step:
Sort = let
Source = Table1,
#"Sorted Rows" = Table.Buffer(Table.Sort(Source,{"timecode", Order.Ascending})),
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
So the buffer has to go after the sort and before the remove duplicates. Is that correct?
Yes, the buffer shall prefent the previous query to be re-evaluated. So it needs to come directly after the step to maintain.
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
I would expect you'd require the first "before" value and the last "after" value, and remove any resulting records with the same "before" and "after" value.
In the first example that would be from a to c, not just: keep the last record.
let Source = Table1, #"Sorted Rows" = Table.Buffer(Table.Sort(Source,{"timecode", Order.Ascending})), #"Grouped Rows" = Table.Group(#"Sorted Rows", {"Id"}, {{"before", each List.First([before]), type text}, {"after", each List.Last([after]), type text}, {"timecode", each List.Max([timecode]), type time}}), #"Filtered Rows" = Table.SelectRows(#"Grouped Rows", each [before] <> [after]) in #"Filtered Rows"
User | Count |
---|---|
88 | |
73 | |
69 | |
64 | |
56 |
User | Count |
---|---|
98 | |
92 | |
84 | |
74 | |
66 |