cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
pavel_severov
Frequent Visitor

Which row will keep Table.Distinct(Table, {"Column"})

Hi colleagues,

Which row will keep Table.Distinct(Table, {"Column"}) for the rows with the same values in the "Column":

  1. First row?
  2. Last row?
  3. Random row?

MS documentation has no explanation about it (https://docs.microsoft.com/en-us/powerquery-m/table-distinct)

10 REPLIES 10
watkinnc
Post Patron
Post Patron

You are forgetting that if you want to remove distinct letters that are different cases (A/a) then you need to add the optional parameter Comparer.OrdinalIgnoreCase

Table.Distinct(Table, {{“b”,Comparer.OrdinalIgnoreCase}})

This will treat A and a as duplicates. Otherwise, A/a are not the same in this function. 
--Nate

tmijail
Regular Visitor

Ok, this is pretty weird.


I ran the tests again, but instead of reordering the table using Table.Sort I just entered the data in a different order from the start.

Here are the results:

(For some reason I can't embed images even though I had no such problem yesterday. Sorry if this is harder to follow or the images eventually go down)

 

Source

Distinct

Reordered Source

Reordered Distinct

 

Code:

 

 

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WclLSUUpUitWJVnJEYSUpxcYCAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [a = _t, b = _t]),
    Example = Table.TransformColumnTypes(Source,{{"a", type text}, {"b", type text}}),
    ExampleResult = Table.Distinct(Example, {"b"}),
    ReorderedExample = Table.Sort(Example,{{"a", Order.Ascending}}),
    ReorderedExampleResult = Table.Distinct(ReorderedExample, {"b"})
in
    ReorderedExampleResult

 

 

 

 

 

We can easily see that order is indeed important, but that for some reason reordering the table using Table.Sort doesn't make a difference. I found that if we need to, we can use Table.Buffer to make Table.Distinct take into account the changes made by Table.Sort. Here's my third and final test:

 

Source

Distinct

Reordered Source

Reordered Distinct

 

Code:

 

 

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WclTSUUpUitWJVnKCs0BiSUqxsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [a = _t, b = _t]),
    Example = Table.TransformColumnTypes(Source,{{"a", type text}, {"b", type text}}),
    ExampleResult = Table.Distinct(Example, {"b"}),
    ReorderedExample = Table.Buffer( Table.Sort(Example,{{"a", Order.Descending}}) ),
    ReorderedExampleResult = Table.Distinct(ReorderedExample, {"b"})
in
    ReorderedExampleResult

 

 

 

 

Thank you for doing the work of looking into this and identifying a solution to this frustrating issue.

it seems as if the sort operation doesn't change the table from the point of view of the table.distinct function and uses the same result as before.(*)
The operations that "change" the table (manual reordering, table.buffer, adding an index column, replacing any value in the table - even if the new value is the same as the old one -, changing the type of a column) do " start " the procedure of the table.distinct function all over again

 

 

 

 


(*) if between the call to the table.distinct function and the definition of the table to which it is applied, only table.sort functions act, the table distinct function is applied to the original table "saving" the sorting operation which is generally heavy.
Maybe ‌ 😁

table.group also appears to have similar behavior to table.distinct.
After all, a possible algorithm to eliminate duplicates in pseudocode could be like this:

Table.Group (tab, "col", each Table.First (_) [all columns but "col"])

 

 

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUpUitWJVjJCYSWBWcZwMUOIWCwA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [a = _t, b = _t]),
    Example = Table.TransformColumnTypes(Source,{{"a", type text}, {"b", type text}}),
    #"Raggruppate righe" = Table.Group(Example, {"b"}, {{"all", each Table.First(_)[a]}}),
    Example1 = Table.TransformColumnTypes(Source,{{"a", type text}, {"b", type text}}),
    #"Ordinate righe" = Table.Sort(Example1,{{"a", Order.Descending}}),
    #"Raggruppate righe1" = Table.Group(#"Ordinate righe", {"b"}, {{"all", each List.First(_[a]) }})
in
    #"Raggruppate righe1"

 

 

in the following form the "problem" doesn't arise:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUpUitWJVjJCYSWBWcZwMUOIWCwA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [a = _t, b = _t]),
    Example = Table.TransformColumnTypes(Source,{{"a", type text}, {"b", type text}}),
    #"Raggruppate righe" = Table.Group(Example, {"b"}, {{"all", each _}}),
    Example1 = Table.TransformColumnTypes(Source,{{"a", type text}, {"b", type text}}),
    #"Ordinate righe" = Table.Sort(Example1,{{"a", Order.Descending}}),
    #"Raggruppate righe1" = Table.Group(#"Ordinate righe", {"b"}, {{"all", each _ }})
in
    #"Raggruppate righe1"

 

 

 

and not even like this

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUpUitWJVjJCYSWBWcZwMUOIWCwA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [a = _t, b = _t]),
    Example = Table.TransformColumnTypes(Source,{{"a", type text}, {"b", type text}}),
    #"Raggruppate righe" = Table.Group(Example, {"b"}, {{"all", each _[a]{0}}}),
    Example1 = Table.TransformColumnTypes(Source,{{"a", type text}, {"b", type text}}),
    #"Ordinate righe" = Table.Sort(Example1,{{"a", Order.Descending}}),
    #"Raggruppate righe1" = Table.Group(#"Ordinate righe", {"b"}, {{"all", each _[a]{0} }})
in
    #"Raggruppate righe1"

 

About table.sort ...

Suppose you have this table:

image.png

 

 

if you sort by column a, you get this:

 

image.png

 

but even this different situation would have been valid (third and fourth row are reversed):

 

 

image.png

 

This, I believe, depends on the internal algorithm used to do the sorting.

 

If, as plausible as it may be, the table.distinct algorithm first does a sort of rows to "group" the rows that have duplicate values in the control columns, the ordering of the remaining columns, as we have seen, is not determined.

So if the algorithm also takes the first row of the group, it is not necessarily the "first" row you encounter  the source table.

 

 

mahoneypat
Super User IV
Super User IV

I believe it keeps the first one.  You can do quick try to confirm.  You can add a sort step just before that step to keep the one you want.

 

If this works for you, please mark it as the solution.  Kudos are appreciated too.  Please let me know if not.

Regards,

Pat





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


Hi @mahoneypatI, I also "believe" in "First row", but it is good to have official confirmation from MS. Isn't it? 🙂

Hi @pavel_severov ,

 

Sorry for that, but we did not find any logic definition of this function in official document. But you can confirm it by the Example 2 of the function document, it will keep the first row.

 

1.jpg


Best regards,

Community Support Team _ Dong Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hey @v-lid-msft , I did a quick test and that does not seem to be the case.

 

First, I entered the data from that example

 

5.png

 

and used 'Remove duplicates' on b

 

6.png

 

This is what we expected from the documentation and from the assumption that it keeps the first row.

Now to the real test. I altered the order of the first table so that it looks like this:

 

7.png

 

Under the assumption that it keeps the first row, we expect to get the following table:

 

ab
Ba
Ab

 

but instead we get this:

 

8.png

 

Here's the code for the tests in case anyone wants to check

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WclTSUUpUitWJVnKCs0BiSUqxsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [a = _t, b = _t]),
    Example = Table.TransformColumnTypes(Source,{{"a", type text}, {"b", type text}}),
    ExampleResult = Table.Distinct(Example, {"b"}),
    ReorderedExample = Table.Sort(Example,{{"a", Order.Descending}}),
    ReorderedExampleResult = Table.Distinct(ReorderedExample, {"b"})
in
    ReorderedExampleResult

 

Helpful resources

Announcements
PBI User Groups

Welcome to the User Group Public Preview

Check out new user group experience and if you are a leader please create your group

April Update

Check it Out!

Click here to read more about the April 2021 Updates!

secondImage

The largest Power BI virtual conference

100+ sessions, 100+ speakers, Product managers, MVPs, and experts. All about Power BI. Attend online or watch the recordings.

secondImage

Experience what’s next for Power BI

See the latest Power BI innovations, updates, and demos from the Microsoft Business Applications Launch Event.

Top Solution Authors