Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
KF-Hornsby
Frequent Visitor

Dynamic comparison of 2 tables

Hi folks,

 

I have 2 tables "Original" and "Updated" that match on a primary key.  I want to compare which (if any) records in the tables have changed and more specifically, what the change is.

 

So for example...

 

 

 

let
  Source = Original,

  #"Merged queries" = Table.NestedJoin(Source, {"Key"}, Updated, {"Key"}, "JoinedTable", JoinKind.Inner),
  
  #"Expanded Target" = Table.ExpandTableColumn(#"Merged queries", "JoinedTable", {"ColumnA", "ColumnB", "ColumnC"}, {"JoinedTable.ColumnA", "JoinedTable.ColumnB", "JoinedTable.ColumnC"}),
 
  #"Added custom" = Table.TransformColumnTypes(Table.AddColumn(#"Expanded Target", "Changes", each Text.Combine({
    if [ColumnA] <> [JoinedTable.ColumnA] then "[ColumnA] " & [ColumnA] else null,
    if [ColumnB] <> [JoinedTable.ColumnB] then "[ColumnB] " & [ColumnB] else null,
    if [ColumnC] <> [JoinedTable.ColumnC] then "[ColumnC] " & [ColumnC] else null
}, ", ")), {{"Changes", type text}}),
  
#"Filtered rows" = Table.SelectRows(#"Added custom", each [Changes] <> null and [Changes] <> ""),
  #"Removed other columns" = Table.SelectColumns(#"Filtered rows", {"Key",  "Changes"})
in
  #"Removed other columns"

 

 

 

While the above works, this is just for three columns and it's hardcoded for tables with 3 columns specifically named "ColumnA", "ColumnB" and "ColumnC".  I want to recycle the query to compare various tables of different widths and structures.

 

Can I get the "Expand Target" step to be dynamic and extract using something like Table.ColumnNames() ?

Can I get the custom column to be dynamically written too?

Any pointers in the right direction would be greatly appreciated.

1 ACCEPTED SOLUTION

try this

 

 

 

 

let
  OriginalB = Table.Buffer(Original),
  UpdatedB = Table.Buffer(Updated),
  Source = Table.NestedJoin(OriginalB, {"Key"}, UpdatedB, {"Key"}, "Updated", JoinKind.Inner), 
  #"Added Custom" = Table.AddColumn(
    Source, 
    "Changes", 
    (k) =>
      List.Transform(
        List.Skip(Table.ColumnNames(OriginalB), 1), 
        each 
          let
            orig = Record.Field(Table.SelectRows(OriginalB, each [Key] = k[Key]){0}, _), 
            chg  = Record.Field(Table.SelectRows(UpdatedB, each [Key] = k[Key]){0}, _)
          in
            if orig = chg then null else _ & " : " & orig & " -> " & chg
      )
  ), 
  #"Expanded Changes" = Table.ExpandListColumn(#"Added Custom", "Changes"), 
  #"Filtered Rows" = Table.SelectRows(#"Expanded Changes", each ([Changes] <> null))
in
  #"Filtered Rows"

 

 

and maybe also consider a Table.AddKey on the [Key] column.

It will also be worth leafing through this series of articles : Chris Webb's BI Blog: Optimising The Performance Of Power Query Merges In Power BI, Part 1: Removing...

 

There are better tools for this - for example Alteryx or RapidMiner or Knime.

View solution in original post

11 REPLIES 11
lbendlin
Super User
Super User

When you use an inner join you are turning a blind eye on all additions and deletions.  Is that intended?

 

Please provide sample data that covers your issue or question completely, in a usable format (not as a screenshot).

Do not include sensitive information or anything not related to the issue or question.

If you are unsure how to upload data please refer to https://community.fabric.microsoft.com/t5/Community-Blog/How-to-provide-sample-data-in-the-Power-BI-...


Hi @lbendlin 

Inner join was specifically chosen because Left and Right Anti's are used for inserts and deletes in other queries (see the example).

I have included the example file here...

 

https://drive.google.com/file/d/1-zJLgcyuyUB5Ct1l50yOt7kGXTfZDWSN/view?usp=sharing

 

 

You want to compare Original against Updated across all columns except for Key?

 

There is a simple way of doing this that is rather cute

 

Table.Distinct(Original & Updated)

 

lbendlin_0-1709662977059.png

it does need some work to identify what is original and what is update but that can be done with another trick

 

Table.Distinct(Table.AddColumn(Original,"Source", each "Original") & Table.AddColumn(Updated,"Source",each "Updated"),Table.ColumnNames(Original))

 

lbendlin_1-1709663170484.png

It won't tell you which column changed but it will highlight the rows with changes.

 

I see it's cuteness 🙂 but it's knowing what has changed that's the crucial bit

it was worth a try.  I'll work on the full formed approach.

@KF-Hornsby 

 

This seems to work but I think it can be simplified.

 

 

let
  Source = Table.NestedJoin(Original, {"Key"}, Updated, {"Key"}, "Updated", JoinKind.Inner), 
  #"Added Custom" = Table.AddColumn(
    Source, 
    "Changes", 
    (k) =>
      List.Transform(
        List.Skip(Table.ColumnNames(Original), 1), 
        each 
          let
            orig = Record.Field(Table.SelectRows(Original, each [Key] = k[Key]){0}, _), 
            chg  = Record.Field(Table.SelectRows(Updated, each [Key] = k[Key]){0}, _)
          in
            if orig = chg then null else _ & " : " & orig & " -> " & chg
      )
  ), 
  #"Expanded Changes" = Table.ExpandListColumn(#"Added Custom", "Changes"), 
  #"Filtered Rows" = Table.SelectRows(#"Expanded Changes", each ([Changes] <> null))
in
  #"Filtered Rows"

 

 

lbendlin_0-1709679330425.png

 

Awesome - I've added the query but at the moment I'm getting 

"Evaluation resulted in a stack overflow and cannot continue"

My dataset is 32 columns by 675,000 rows so I'm not sure if that's just too big or there's some other nuance I'm missing.  I'll investigate deeper.

22 million operations? What could possibly go wrong?  🙂

It's almost as if I'm using Power BI for something it's not intended for becasue IT haven't given me the tools I've been asking for for the last 3 years

try this

 

 

 

 

let
  OriginalB = Table.Buffer(Original),
  UpdatedB = Table.Buffer(Updated),
  Source = Table.NestedJoin(OriginalB, {"Key"}, UpdatedB, {"Key"}, "Updated", JoinKind.Inner), 
  #"Added Custom" = Table.AddColumn(
    Source, 
    "Changes", 
    (k) =>
      List.Transform(
        List.Skip(Table.ColumnNames(OriginalB), 1), 
        each 
          let
            orig = Record.Field(Table.SelectRows(OriginalB, each [Key] = k[Key]){0}, _), 
            chg  = Record.Field(Table.SelectRows(UpdatedB, each [Key] = k[Key]){0}, _)
          in
            if orig = chg then null else _ & " : " & orig & " -> " & chg
      )
  ), 
  #"Expanded Changes" = Table.ExpandListColumn(#"Added Custom", "Changes"), 
  #"Filtered Rows" = Table.SelectRows(#"Expanded Changes", each ([Changes] <> null))
in
  #"Filtered Rows"

 

 

and maybe also consider a Table.AddKey on the [Key] column.

It will also be worth leafing through this series of articles : Chris Webb's BI Blog: Optimising The Performance Of Power Query Merges In Power BI, Part 1: Removing...

 

There are better tools for this - for example Alteryx or RapidMiner or Knime.

Many thanks for your contribution.  I will take a look through what you have given me.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors