I have a slightly odd join scenario in which Power Query's Table.Distinct is silently dropping non-duplicate rows. Here's a bit of background: A list of "things" are being read from the database that is the system of record for "things". (The type of data source does not seem to be relevant here since I can repro with csv, Excel, or even a table embedded in the pbix.) A Thing has a Type attribute, and the system of record allows only one Type value to be assigned to a Thing. However, we do need to treat a subset of Things as multi-Type for reporting purposes, which means we need to massage the data structure to change the Thing:Type relationship from 1:1 to 1:* and append the additional Type values from another source. As a bit of added complexity, we don't actually care about all Types of Things, and excluding Things of non-focal Types from our report dataset is desirable due to the volume of data. However, we do need to assess both the orignal Types and added Types when filtering these out since a focal Type could be picked up from either data source. None of this is terribly complicated to implement. Here are the queries involved: RawThings (not loaded to the report) Reads from the Things system of reference. FocalTypes (not loaded to the report) Embedded table with a single Type column, including the list of types we want to keep in the report. TypeOverrides Loads the list of additional Type values for the subset of Things that have more than one Type. (Again, the data source seems to be irrelevant since I can repro with multiple source types, including an embedded table.) ThingTypes Reads from RawThings. Removes all columns except the ID and Type. Appends the ID and Type values from the "override" list. Performs an inner join (Table.NestedJoin of JoinKind.Inner) to FocalTypes to exclude non-focal types. Things Reads from RawThings. Removes the Type column. Performs an inner join to distinct IDs from ThingTypes using the following syntax: Table.NestedJoin(#"Previous Step", {"ID"}, Table.Distinct(ThingTypes, "ID"),{"ID"}, "NewColumn", JoinKind.Inner). (This is the step at which rows get inappropriately dropped.) There are three additional pieces to this behaviour that might be of help in identifying the problem: When this report was first created, there were only two focal Types, and the exclusion of non-focal Types (4.4 in the list above) was accomplished via simple text filtering using Table.SelectRows. Our list of focal Types recently grew, which led to the introduction of the join-based approach. We never noticed the problem under the old filtering approach, but it appeared immediately after the change in approach. Buffering ThingTypes before apply Thing.Distinct does seem to prevent the problem: Table.NestedJoin(#"Previous Step",{"ID"},Table.Distinct(Table.Buffer(ThingTypes), "ID"),{"ID"},"NewColumn",JoinKind.Inner). In both my real dataset and repro attempts, the criteria for a row getting dropped from Things without buffering seem to be that the only focal Type(s) for a Thing comes from the TypeOverrides query, not the RawThings query. If there are other conditions that would trigger the problem, I haven't yet encountered (or at least noticed) them. (Incidentally, I would tend to see this as a fairly serious problem despite the availability of a trivial workaround since it results in data being silently dropped out of a report. In many reporting scenarios, the absence of a small subset of data might easily go unnoticed, which could eventually have some quite unfortunate consequences when report consumers make decisions based on the output. If fixing the problem is difficult, might at least throwing an exception be feasible so that the data loss problem isn't silent?) Here are two pbix files that demonstrate the problem using embedded tables only: Unbuffered.pbix Buffered.pbix There only difference between the two files is the use of Table.Buffer at step 4.4, and Thing 6 is the only one that meets the row dropping criteria mentioned above.
... View more