Re: Reverse duplicates in time series

vyacheslavg · ‎10-28-2017

Hi dear community,

I have an interesting case with removal of duplicates in time series. I can do it in Pandas, but can not find a way to do it in M.

Below is a typical case number 1 - order id was updated. This case is easy - sort by timecode and delete duplicates (first one will be deleted)

Id before after timecode

1 a b 00:01

1 b с 00:02

Second case is much harder to crack. In this case after value is equal to initial value - I need to delete (cancel) all such orders, not only first rows.

Id before after timecode

1 a b 00:01

1 b a 00:02

sometimes it could be a combo of above.

Id before after timecode

1 a b 00:01

1 b c 00:02

1 с a 00:03

Any ideas would be appreciated, thanks!

It looks like the case for Pandas and Python, but maybe it can be solved by M and/or DAX.

MattAllington · ‎10-28-2017

I think you will struggle with M to do the first item. In my experience it is not possible to guarantee the removal of the correct records (e.g. Remove earliest based on date). I normally sort on date and then remove duplicates, but I discovered recently that the removal of the correct item is not guaranteed. I confirmed this with Microsoft.

Regarding the second scenario, take a look at my technique here - this may work for you.

https://m.youtube.com/watch?v=xN2IRXQ2CvI

Third one looks tough again.

Yes I think this is a job for Python or some per suitable tool.

* Matt is an 8 times Microsoft MVP (Power BI) and author of the Power BI Book Supercharge Power BI.

ImkeF · ‎10-28-2017

Hi @MattAllington,

have you tried using Table.Buffer after the sort? That has worked for me so far but of course I would be interested to know if there is a know issue with it. So did the confirmation you got from Microsoft include the buffered version as well?

Thanks.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

MattAllington · ‎10-29-2017

@ImkeF wrote:
So did the confirmation you got from Microsoft include the buffered version as well?
Thanks.

You are right Imke. I went back and checked and the concensus is that Table.Buffer should fix it (my bad). Can you advise how you would implement this? Do you first sort the table, and then manually wrap Table.Buffer around the source? or something else?

* Matt is an 8 times Microsoft MVP (Power BI) and author of the Power BI Book Supercharge Power BI.

ImkeF · ‎10-29-2017

Thanks Matt,

relieved to hear 🙂

Theoretically, it should be fine if you simply add another step after your sorting like this:

Sort = let
    Source = Table1,
    #"Sorted Rows" = Table.Sort(Source,{"timecode", Order.Ascending}),
    #"Buffered Table" = Table.Buffer(#"Sorted Rows").
...

But to be on the safe side, I always proceed like Marcel and "wrap" it directly around the syntax of the sorting-step:

Sort = let    
Source = Table1,    
#"Sorted Rows" = Table.Buffer(Table.Sort(Source,{"timecode", Order.Ascending})),

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

MattAllington · ‎10-30-2017

So the buffer has to go after the sort and before the remove duplicates. Is that correct?

* Matt is an 8 times Microsoft MVP (Power BI) and author of the Power BI Book Supercharge Power BI.

ImkeF · ‎11-02-2017

Yes, the buffer shall prefent the previous query to be re-evaluated. So it needs to come directly after the step to maintain.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

MarcelBeug · ‎10-29-2017

I would expect you'd require the first "before" value and the last "after" value, and remove any resulting records with the same "before" and "after" value.

In the first example that would be from a to c, not just: keep the last record.

let
    Source = Table1,
    #"Sorted Rows" = Table.Buffer(Table.Sort(Source,{"timecode", Order.Ascending})),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"Id"}, {{"before", each List.First([before]), type text}, {"after", each List.Last([after]), type text}, {"timecode", each List.Max([timecode]), type time}}),
    #"Filtered Rows" = Table.SelectRows(#"Grouped Rows", each [before] <> [after])
in
    #"Filtered Rows"

Specializing in Power Query Formula Language (M)

Reverse duplicates in time series

Helpful resources

New forum boards available in Real-Time Intelligence.

Power BI Monthly Update - May 2024

How to Get Your Question Answered Quickly

Jumpstart your career with the Fabric Career Hub

Reverse duplicates in time series

Helpful resources

New forum boards available in Real-Time Intelligence.

Power BI Monthly Update - May 2024

How to Get Your Question Answered Quickly