Solved: Re: Remove duplicate and keep row based on a colum...

dyabes · ‎12-08-2018

I'm tyring to remove duplicate on the ID column but would like to keep the row based on the date with the lowest value. Basically, keep the ID with the earliest instance. I tried the Sort -> Table.Buffer -> Remove Duplicate approach but that process is incredibly slow most likely because the dataset is an append of multiple large CSV files.

Are there other approaches that are more efficient?

Thanks in advance!

David

AlexisOlson · ‎12-08-2018

I'd suggest doing a Group By in the query editor. Group by ID and use Min as the aggregation type for the Date column.

View solution in original post

v-cherch-msft · ‎12-09-2018

Hi @dyabes

Attached the sample file for your reference.

Regards,

Cherie

Community Support Team _ Cherie Chen
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

Anonymous · ‎07-14-2021

Hi Alexis

I am also facing similar issue: I have three columns : Supplier Name, Status and Points. there is duplicate value in Supplier and unique value in Status and Points. I need to display the Supplier with lowest points and display corresponding status. Basically need to remove duplicate suppliers keeping the lowest points record

Regards

Arun

AlexisOlson · ‎12-08-2018

I'd suggest doing a Group By in the query editor. Group by ID and use Min as the aggregation type for the Date column.

dyabes · ‎12-08-2018

Thank you. I think I should have provided the complete dataset I'm wokring on. I also need to keep the corresponding row values from other columns

v-cherch-msft · ‎12-09-2018

Hi @dyabes

Attached the sample file for your reference.

Regards,

Cherie

Community Support Team _ Cherie Chen
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous · ‎12-09-2018

Hi Dear

You need to use Group By.

Select Group By -> Select ID column as Group BY and then in Operation select Min and in Column Select Date.
You will get your result.

AlexisOlson · ‎12-09-2018

You can do what I suggested an then merge the extra column(s) back in after.

Remove duplicate and keep row based on a column with the lowest value

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly