Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
vyacheslavg
Helper II
Helper II

Reverse duplicates in time series

Hi dear community, 

 

I have an interesting case with removal of duplicates in time series. I can do it in Pandas, but can not find a way to do it in M.

 

Below is a typical case number 1 - order id was updated. This case is easy - sort by timecode and delete duplicates (first one will be deleted)

 

Id before  after  timecode

1   a          b        00:01

1   b          с        00:02

 

Second case is much harder to crack. In this case after value is equal to initial value - I need to delete (cancel) all such orders, not only first rows.

 

Id before  after  timecode

1   a          b       00:01

1   b          a       00:02

 

 

sometimes it could be a combo of above. 

 

Id before  after  timecode

1   a          b       00:01

1   b          c       00:02

1   с           a      00:03

 

Any ideas would be appreciated, thanks! 

It looks like the case for Pandas and Python, but maybe it can be solved by M and/or DAX.

7 REPLIES 7

I think you will struggle with M to do the first item. In my experience it is not possible to guarantee the removal of the correct records (e.g. Remove earliest based on date).  I normally sort on date and then remove duplicates, but I discovered recently that the removal of the correct item is not guaranteed. I confirmed this with Microsoft. 

 

Regarding the second scenario, take a look at my technique here - this may work for you. 

https://m.youtube.com/watch?v=xN2IRXQ2CvI

 

Third one looks tough again.

 

Yes I think this is a job for Python or some per suitable tool. 



* Matt is an 8 times Microsoft MVP (Power BI) and author of the Power BI Book Supercharge Power BI.

Hi @MattAllington,

have you tried using Table.Buffer after the sort? That has worked for me so far but of course I would be interested to know if there is a know issue with it. So did the confirmation you got from Microsoft include the buffered version as well?

Thanks.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries


@ImkeF wrote:

So did the confirmation you got from Microsoft include the buffered version as well?

Thanks.


You are right Imke.  I went back and checked and the concensus is that Table.Buffer should fix it (my bad).  Can you advise how you would implement this?  Do you first sort the table, and then manually wrap Table.Buffer around the source?  or something else?

 

 



* Matt is an 8 times Microsoft MVP (Power BI) and author of the Power BI Book Supercharge Power BI.

Thanks Matt,

relieved to hear 🙂

 

Theoretically, it should be fine if you simply add another step after your sorting like this:

 

Sort = let
    Source = Table1,
    #"Sorted Rows" = Table.Sort(Source,{"timecode", Order.Ascending}),
    #"Buffered Table" = Table.Buffer(#"Sorted Rows").
...

But to be on the safe side, I always proceed like Marcel and "wrap" it directly around the syntax of the sorting-step:

 

Sort = let    
Source = Table1,
#"Sorted Rows" = Table.Buffer(Table.Sort(Source,{"timecode", Order.Ascending})),

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

So the buffer has to go after the sort and before the remove duplicates.  Is that correct?



* Matt is an 8 times Microsoft MVP (Power BI) and author of the Power BI Book Supercharge Power BI.

Yes, the buffer shall prefent the previous query to be re-evaluated. So it needs to come directly after the step to maintain.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

MarcelBeug
Community Champion
Community Champion

I would expect you'd require the first "before" value and the last "after" value, and remove any resulting records with the same "before" and "after" value.

In the first example that would be from a to c, not just: keep the last record.

 

let
    Source = Table1,
    #"Sorted Rows" = Table.Buffer(Table.Sort(Source,{"timecode", Order.Ascending})),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"Id"}, {{"before", each List.First([before]), type text}, {"after", each List.Last([after]), type text}, {"timecode", each List.Max([timecode]), type time}}),
    #"Filtered Rows" = Table.SelectRows(#"Grouped Rows", each [before] <> [after])
in
    #"Filtered Rows"
Specializing in Power Query Formula Language (M)

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.