cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
vyacheslavg Regular Visitor
Regular Visitor

Reverse duplicates in time series

Hi dear community, 

 

I have an interesting case with removal of duplicates in time series. I can do it in Pandas, but can not find a way to do it in M.

 

Below is a typical case number 1 - order id was updated. This case is easy - sort by timecode and delete duplicates (first one will be deleted)

 

Id before  after  timecode

1   a          b        00:01

1   b          с        00:02

 

Second case is much harder to crack. In this case after value is equal to initial value - I need to delete (cancel) all such orders, not only first rows.

 

Id before  after  timecode

1   a          b       00:01

1   b          a       00:02

 

 

sometimes it could be a combo of above. 

 

Id before  after  timecode

1   a          b       00:01

1   b          c       00:02

1   с           a      00:03

 

Any ideas would be appreciated, thanks! 

It looks like the case for Pandas and Python, but maybe it can be solved by M and/or DAX.

7 REPLIES 7
Super User
Super User

Re: Reverse duplicates in time series

I think you will struggle with M to do the first item. In my experience it is not possible to guarantee the removal of the correct records (e.g. Remove earliest based on date).  I normally sort on date and then remove duplicates, but I discovered recently that the removal of the correct item is not guaranteed. I confirmed this with Microsoft. 

 

Regarding the second scenario, take a look at my technique here - this may work for you. 

https://m.youtube.com/watch?v=xN2IRXQ2CvI

 

Third one looks tough again.

 

Yes I think this is a job for Python or some per suitable tool. 



* Matt is a Microsoft MVP (Power BI) and author of the Power BI Book Supercharge Power BI.
Super User
Super User

Re: Reverse duplicates in time series

Hi @MattAllington,

have you tried using Table.Buffer after the sort? That has worked for me so far but of course I would be interested to know if there is a know issue with it. So did the confirmation you got from Microsoft include the buffered version as well?

Thanks.

Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!

Imke Feldmann

How to integrate M-code into your solution -- Check out more PBI- learning resources here -- Performance Tipps for M-queries




MarcelBeug Super Contributor
Super Contributor

Re: Reverse duplicates in time series

I would expect you'd require the first "before" value and the last "after" value, and remove any resulting records with the same "before" and "after" value.

In the first example that would be from a to c, not just: keep the last record.

 

let
    Source = Table1,
    #"Sorted Rows" = Table.Buffer(Table.Sort(Source,{"timecode", Order.Ascending})),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"Id"}, {{"before", each List.First([before]), type text}, {"after", each List.Last([after]), type text}, {"timecode", each List.Max([timecode]), type time}}),
    #"Filtered Rows" = Table.SelectRows(#"Grouped Rows", each [before] <> [after])
in
    #"Filtered Rows"
Specializing in Power Query Formula Language (M)
Highlighted
Super User
Super User

Re: Reverse duplicates in time series


@ImkeF wrote:

So did the confirmation you got from Microsoft include the buffered version as well?

Thanks.


You are right Imke.  I went back and checked and the concensus is that Table.Buffer should fix it (my bad).  Can you advise how you would implement this?  Do you first sort the table, and then manually wrap Table.Buffer around the source?  or something else?

 

 



* Matt is a Microsoft MVP (Power BI) and author of the Power BI Book Supercharge Power BI.
Super User
Super User

Re: Reverse duplicates in time series

Thanks Matt,

relieved to hear 🙂

 

Theoretically, it should be fine if you simply add another step after your sorting like this:

 

Sort = let
    Source = Table1,
    #"Sorted Rows" = Table.Sort(Source,{"timecode", Order.Ascending}),
    #"Buffered Table" = Table.Buffer(#"Sorted Rows").
...

But to be on the safe side, I always proceed like Marcel and "wrap" it directly around the syntax of the sorting-step:

 

Sort = let    
Source = Table1,
#"Sorted Rows" = Table.Buffer(Table.Sort(Source,{"timecode", Order.Ascending})),

 

Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!

Imke Feldmann

How to integrate M-code into your solution -- Check out more PBI- learning resources here -- Performance Tipps for M-queries




Super User
Super User

Re: Reverse duplicates in time series

So the buffer has to go after the sort and before the remove duplicates.  Is that correct?



* Matt is a Microsoft MVP (Power BI) and author of the Power BI Book Supercharge Power BI.
Super User
Super User

Re: Reverse duplicates in time series

Yes, the buffer shall prefent the previous query to be re-evaluated. So it needs to come directly after the step to maintain.

Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!

Imke Feldmann

How to integrate M-code into your solution -- Check out more PBI- learning resources here -- Performance Tipps for M-queries




Helpful resources

Announcements
Ask Amir Anything

Exclusive LIVE Community Event No. 2 – Ask Amir Anything

Next in our Triple A series: Ask Amir Netz questions about the latest updates, features and future.

Ask Amanda Anything Q&A

Ask Amanda Anything Q&A

Learn the answers to some of the questions asked during the Amanda Triple A event.

October 2019 Community Highlights

October 2019 Community Highlights

October was a busy month in the community. Read the recap article to learn about some of the events and content.

New Solution Badges

New Solution Badges

Two waves of brand new solution badges are coming! Read the article for more information on our new community badges.

Users Online
Currently online: 176 members 1,751 guests
Please welcome our newest community members: