Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi folks,
I have 2 tables "Original" and "Updated" that match on a primary key. I want to compare which (if any) records in the tables have changed and more specifically, what the change is.
So for example...
let
Source = Original,
#"Merged queries" = Table.NestedJoin(Source, {"Key"}, Updated, {"Key"}, "JoinedTable", JoinKind.Inner),
#"Expanded Target" = Table.ExpandTableColumn(#"Merged queries", "JoinedTable", {"ColumnA", "ColumnB", "ColumnC"}, {"JoinedTable.ColumnA", "JoinedTable.ColumnB", "JoinedTable.ColumnC"}),
#"Added custom" = Table.TransformColumnTypes(Table.AddColumn(#"Expanded Target", "Changes", each Text.Combine({
if [ColumnA] <> [JoinedTable.ColumnA] then "[ColumnA] " & [ColumnA] else null,
if [ColumnB] <> [JoinedTable.ColumnB] then "[ColumnB] " & [ColumnB] else null,
if [ColumnC] <> [JoinedTable.ColumnC] then "[ColumnC] " & [ColumnC] else null
}, ", ")), {{"Changes", type text}}),
#"Filtered rows" = Table.SelectRows(#"Added custom", each [Changes] <> null and [Changes] <> ""),
#"Removed other columns" = Table.SelectColumns(#"Filtered rows", {"Key", "Changes"})
in
#"Removed other columns"
While the above works, this is just for three columns and it's hardcoded for tables with 3 columns specifically named "ColumnA", "ColumnB" and "ColumnC". I want to recycle the query to compare various tables of different widths and structures.
Can I get the "Expand Target" step to be dynamic and extract using something like Table.ColumnNames() ?
Can I get the custom column to be dynamically written too?
Any pointers in the right direction would be greatly appreciated.
Solved! Go to Solution.
try this
let
OriginalB = Table.Buffer(Original),
UpdatedB = Table.Buffer(Updated),
Source = Table.NestedJoin(OriginalB, {"Key"}, UpdatedB, {"Key"}, "Updated", JoinKind.Inner),
#"Added Custom" = Table.AddColumn(
Source,
"Changes",
(k) =>
List.Transform(
List.Skip(Table.ColumnNames(OriginalB), 1),
each
let
orig = Record.Field(Table.SelectRows(OriginalB, each [Key] = k[Key]){0}, _),
chg = Record.Field(Table.SelectRows(UpdatedB, each [Key] = k[Key]){0}, _)
in
if orig = chg then null else _ & " : " & orig & " -> " & chg
)
),
#"Expanded Changes" = Table.ExpandListColumn(#"Added Custom", "Changes"),
#"Filtered Rows" = Table.SelectRows(#"Expanded Changes", each ([Changes] <> null))
in
#"Filtered Rows"
and maybe also consider a Table.AddKey on the [Key] column.
It will also be worth leafing through this series of articles : Chris Webb's BI Blog: Optimising The Performance Of Power Query Merges In Power BI, Part 1: Removing...
There are better tools for this - for example Alteryx or RapidMiner or Knime.
When you use an inner join you are turning a blind eye on all additions and deletions. Is that intended?
Please provide sample data that covers your issue or question completely, in a usable format (not as a screenshot).
Do not include sensitive information or anything not related to the issue or question.
If you are unsure how to upload data please refer to https://community.fabric.microsoft.com/t5/Community-Blog/How-to-provide-sample-data-in-the-Power-BI-...
Hi @lbendlin
Inner join was specifically chosen because Left and Right Anti's are used for inserts and deletes in other queries (see the example).
I have included the example file here...
https://drive.google.com/file/d/1-zJLgcyuyUB5Ct1l50yOt7kGXTfZDWSN/view?usp=sharing
You want to compare Original against Updated across all columns except for Key?
There is a simple way of doing this that is rather cute
Table.Distinct(Original & Updated)
it does need some work to identify what is original and what is update but that can be done with another trick
Table.Distinct(Table.AddColumn(Original,"Source", each "Original") & Table.AddColumn(Updated,"Source",each "Updated"),Table.ColumnNames(Original))
It won't tell you which column changed but it will highlight the rows with changes.
I see it's cuteness 🙂 but it's knowing what has changed that's the crucial bit
it was worth a try. I'll work on the full formed approach.
This seems to work but I think it can be simplified.
let
Source = Table.NestedJoin(Original, {"Key"}, Updated, {"Key"}, "Updated", JoinKind.Inner),
#"Added Custom" = Table.AddColumn(
Source,
"Changes",
(k) =>
List.Transform(
List.Skip(Table.ColumnNames(Original), 1),
each
let
orig = Record.Field(Table.SelectRows(Original, each [Key] = k[Key]){0}, _),
chg = Record.Field(Table.SelectRows(Updated, each [Key] = k[Key]){0}, _)
in
if orig = chg then null else _ & " : " & orig & " -> " & chg
)
),
#"Expanded Changes" = Table.ExpandListColumn(#"Added Custom", "Changes"),
#"Filtered Rows" = Table.SelectRows(#"Expanded Changes", each ([Changes] <> null))
in
#"Filtered Rows"
Awesome - I've added the query but at the moment I'm getting
"Evaluation resulted in a stack overflow and cannot continue"
My dataset is 32 columns by 675,000 rows so I'm not sure if that's just too big or there's some other nuance I'm missing. I'll investigate deeper.
22 million operations? What could possibly go wrong? 🙂
It's almost as if I'm using Power BI for something it's not intended for becasue IT haven't given me the tools I've been asking for for the last 3 years
try this
let
OriginalB = Table.Buffer(Original),
UpdatedB = Table.Buffer(Updated),
Source = Table.NestedJoin(OriginalB, {"Key"}, UpdatedB, {"Key"}, "Updated", JoinKind.Inner),
#"Added Custom" = Table.AddColumn(
Source,
"Changes",
(k) =>
List.Transform(
List.Skip(Table.ColumnNames(OriginalB), 1),
each
let
orig = Record.Field(Table.SelectRows(OriginalB, each [Key] = k[Key]){0}, _),
chg = Record.Field(Table.SelectRows(UpdatedB, each [Key] = k[Key]){0}, _)
in
if orig = chg then null else _ & " : " & orig & " -> " & chg
)
),
#"Expanded Changes" = Table.ExpandListColumn(#"Added Custom", "Changes"),
#"Filtered Rows" = Table.SelectRows(#"Expanded Changes", each ([Changes] <> null))
in
#"Filtered Rows"
and maybe also consider a Table.AddKey on the [Key] column.
It will also be worth leafing through this series of articles : Chris Webb's BI Blog: Optimising The Performance Of Power Query Merges In Power BI, Part 1: Removing...
There are better tools for this - for example Alteryx or RapidMiner or Knime.
Many thanks for your contribution. I will take a look through what you have given me.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.