cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
fiveone
Helper II
Helper II

How to remove duplicates based on sort order

When loading the data I need to keep only the latest record.

I don't have a date to rely on to determine the latest record but I have a status, based on which I can sort in a somewhat chronological order.

 I thought I could

  • Associate a statusID to each status, making sure that the lowest the ID the latest the event
  • Sort by statusID
  • Remove duplicates

Unfortunately it seems the sort order is not always respected,

I read that we should use the Table.Buffer() function before removing duplicates but that this could create performance issue.

It does indeed create performance issue. It works on a small scale but when I use this on the entire dataset the load process does not complete.

 

Now I’m looking into using Table.SelectRows with a condition that would only return the row with the smallest statusID for each RequestID.

Something along the line of

  

 #"Latest Request" = Table.SelectRows(#"Previous Step", each (List.Max([RequestID]," statusID "))),

 

But that does return a type incompatibility error

5 REPLIES 5
RachelL
Advocate I
Advocate I

We also had a similar problem. We found this solved our problem:

 

1. Sort data as required

2. Add index column starting at 1

3. Remove Duplicates

 

The index column helps Power BI realise while removing duplicates that there is a specific order to the sorted data.

acstaffa
Frequent Visitor

I have already uploaded a SQL data, how can I create a function to sort the column order and force de removing to follow the date sort? I dont know how to create the function...

 

It is sad PBI dont "follow" the sort order to eliminate duplicates...

MarcelBeug
Community Champion
Community Champion

My suggestion would be the following, illustrated by this video.


I created some test data in Access just to verify if the code would allow for Query Folding (by rightclicking each step in the queries and check for the existence of Native Query). They all did, so performance should be OK..

 

I created a query InputData that just reads the data from Access. For this query, load is disabled.

 

let
    Source = Access.Database(File.Contents("C:\Users\Marcel\Documents\Forum bijdragen\Power BI Community\How to remove duplicates based on sort order.accdb"), [CreateNavigationProperties=true]),
    _Requests = Source{[Schema="",Item="Requests"]}[Data]
in
    _Requests

 

 

I created a query MinStatusByRequestID with input from query InputData that determines the minimum statusID for each RequestID using Group By on the Transform tab. Also for this query, load is disabled.

 

let
    Source = InputData,
    #"Grouped Rows" = Table.Group(Source, {"RequestID"}, {{"minStatus", each List.Min([statusID]), type number}})
in
    #"Grouped Rows"

 

 

I created a query FilteredData that merges the previous 2 queries with an inner join. For this query, load is enabled.

 

let
    Source = InputData,
    #"Merged Queries" = Table.Join(Source,{"RequestID", "statusID"},MinStatusByRequestID,{"RequestID", "minStatus"},JoinKind.Inner),
    #"Removed Columns" = Table.RemoveColumns(#"Merged Queries",{"minStatus"})
in
    #"Removed Columns"

 

Specializing in Power Query Formula Language (M)

 Thank you so much Marcel! I have been struggling for days and googling crazy for a solution and you just solved my problem!!!

Wow, thank you Marcel for taking the time to provide that detailed explanation and even making a video!

Leveraging grouping and merging is brilliant.

 

Helpful resources

Announcements
August 2022 update 768x460.jpg

Check it Out!

Click here to learn more about the August 2022 updates!

August 1 episode 9_no_dates 768x460.jpg

The Power BI Community Show

Watch the playback when Priya Sathy and Charles Webb discuss Datamarts! Kelly also shares Power BI Community updates.

Power Platform Conf 2022 768x460.jpg

Join us for Microsoft Power Platform Conference

The first Microsoft-sponsored Power Platform Conference is coming in September. 100+ speakers, 150+ sessions, and what's new and next for Power Platform.

Top Solution Authors
Top Kudoed Authors