Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

Merge and update History table from Update table having new or updated values

I have 2 tables viz., a History table that should get updated from an Update table whenever there are new or updated row values coming in the Update table.

History table:

image.png

Please note: the History table has extra Manual1 & Manual2 columns that may contain user updated values.

 

Update table:

sifar786_2-1638643192103.png

The Update table may not have those columns, but only the standard columns from `Product Key` till `Out of Stock` column.

 

- ORANGE rows of History table are present in Update table, but have updated values. 

- YELLOW row of History table has same values in Update table.

- GREEN rows of History table are not present in Update table.

- BLUE row of Update table is not present in History table, i.e. it is a New row coming in data.

 

What i need is,

- ORANGE row values from Update table, get updated into History table, but their Manual entry column values remain unchanged.

- YELLOW values remain unchanged. Also their Manual entry columns remain unchanged.

- GREEN values remain unchanged, as they are not present in Update table.

-  History table gets new BLUE row values added to it from Update table, with blanks for Manual entry columns.

sifar786_3-1638643886504.png

I have tried various solutions, but no joy for some of the reasons listed below:

- History table has more columns than Update table, so exact matching of row values (using concatenation) is not possible.

- Merging with Left.AntiJoin gives only the new or updated values from Update table and removes other rows.

- How to update rows with new or updated data without affecting the existing values in the Manual entry columns of History table.

SAMPLE Workbook

Can anyone help urgently with this frustrating problem of updating a History table from another table with new or updated data, where their number of columns are different?

@AlexisOlson @ImkeF @edhans @KNP @BA_Pete @Anonymous @thowa 

1 ACCEPTED SOLUTION
AlexisOlson
Super User
Super User

I'd recommend checking out these relevant resources for the self-referencing logic:

Self Referencing Tables in Power Query

How can I add a new column to the workbook so data can be entered in addition to the results loaded ...

Inserting text manually in a custom column and should be visible on refresh of the report

 

To do the update, I appended the tables together, grouped by ProductKey taking the latest version of the data, expanded the non-manual columns, and then merged back in the manual columns from the beginning of the query.

 

Please see the attached.

View solution in original post

10 REPLIES 10
Anonymous
Not applicable

I would probably go with a full join. Then add new columns for each column in the History table: if a value in the column of the Update table is null take History, otherwise Update. Then remove old columns and rename the new ones to match the original History table columns.

 

Absolutely inelegant and inflexible, but reasonably straightforward and quick to implement and understand.

 

Kind regards,

JB

AlexisOlson
Super User
Super User

I'd recommend checking out these relevant resources for the self-referencing logic:

Self Referencing Tables in Power Query

How can I add a new column to the workbook so data can be entered in addition to the results loaded ...

Inserting text manually in a custom column and should be visible on refresh of the report

 

To do the update, I appended the tables together, grouped by ProductKey taking the latest version of the data, expanded the non-manual columns, and then merged back in the manual columns from the beginning of the query.

 

Please see the attached.

Anonymous
Not applicable

Hi @AlexisOlson ,

This is so consice, but brilliant! 🙂

Can you help me with my below queries?

On opening Query editor, Updateable Table gives the following error on this line:

 

Formula.Firewall: Query 'UpdateableTable' (step 'AppendedQuery') references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.
AppendedQuery = Table.Combine({AddedVersion, NewData})

 

 

- I see you have joined ProductKey column with Product column in step "Grouped Rows".

 

#"Grouped Rows" = Table.Group(#"Appended Query", {"ProductKey", "Product"}, {{"All Rows", each Table.Max(_, "Version"), type record}})

 

 

In real data, the Product column may not be unique i.e. ProductKeys may have same Product names. How do we handle that? Is it possible to only use the ProductKey which is unique?

- Is there a way to make the column names selection dynamic, rather than hard-coded? There could be many standard columns in UpdateableTable.

 

#"Expanded All Rows" = Table.ExpandRecordColumn(#"Grouped Rows", "All Rows", {"Status", "Date", "Forecast", "Out of stock"}, {"Status", "Date", "Forecast", "Out of stock"})

 

 

- Can Manual entry column names in last step be made dynamic?

 

Expanded Expanded All Rows = Table.ExpandTableColumn(#"Merged Queries", "Expanded All Rows", {"Manual1", "Manual2"}, {"Manual1", "Manual2"})

 

 

 Best,

Sifar

Read this for info on Formula.Firewall:

https://www.thepoweruser.com/2019/03/12/data-privacy-and-the-formula-firewall/

(The simplest solution is the 'Always ignore Privacy Level settings' under Query Options > Privacy.)

 

You can use whatever column or combination of columns you'd like to define what you consider unique. You can certainly drop the Product column. and only group over ProductKey.

 

Here's a bit more dynamic version of the query where I've added steps to generate lists of column names.

let
    Source = Excel.CurrentWorkbook(){[Name="UpdateableTable"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"ProductKey", Int64.Type}, {"Product", type text}, {"Status", type text}, {"Date", type date}, {"Forecast", Int64.Type}, {"Out of stock", type text}, {"Manual1", type text}, {"Manual2", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Version", each 0, Int64.Type),
    #"Appended Query" = Table.Combine({#"Added Custom", NewData}),
    #"Grouped Rows" = Table.Group(#"Appended Query", {"ProductKey"}, {{"All Rows", each Table.Max(_, "Version"), type record}}),
    ColumnNames = List.Select(Table.ColumnNames(NewData), each _ <> "ProductKey"),
    #"Expanded All Rows" = Table.ExpandRecordColumn(#"Grouped Rows", "All Rows", ColumnNames, ColumnNames),
    #"Merged Queries" = Table.NestedJoin(#"Expanded All Rows", {"ProductKey"}, #"Changed Type", {"ProductKey"}, "Expanded All Rows", JoinKind.LeftOuter),
    ManualColumns = List.Difference(Table.ColumnNames(#"Changed Type"), Table.ColumnNames(NewData)),
    #"Expanded Expanded All Rows" = Table.ExpandTableColumn(#"Merged Queries", "Expanded All Rows", ManualColumns, ManualColumns)
in
    #"Expanded Expanded All Rows"
Anonymous
Not applicable

Thanks @AlexisOlson 🙂

- I used a similar method to create a list query that generates a list of column names from the Updateable table and then use List.Skip() to skip the ProductKey column and List.Difference().

- One question: Is it possible to make the column datatype transforms more dynamic for this line?

#"Changed Type" = Table.TransformColumnTypes(Source,{{"ProductKey", Int64.Type}, {"Product", type text}, {"Status", type text}, {"Date", type date}, {"Forecast", Int64.Type}, {"Out of stock", type text}, {"Manual1", type text}, {"Manual2", type text}}),

I tried Table.Schema, and then used the "Name" & "TypeName" but couldnt wrap my head around how to get it in the same format.

SelectColumns = Table.SelectColumns( Table.Schema( ColTypes ),{"Name","TypeName"},MissingField.UseNull),

where "ColTypes" is the Table.TransformColumnTypes().

 

P.S: Strange enough, when the query picks up the data types from the Excel Updateable table, most of the datatypes show up as "ABC 123" - though i have formatted the each of the table columns properly. Why is the query not able to recognize the data types from the table?

I don't think you can preserve types when it's loading a table from a spreadsheet, so if you want types, you'll have to have a step for that. However, you could just delete that step entirely and the query should still work.

 

P.S. I think I should have excluded "Version" from ColumnNames in my query.

Anonymous
Not applicable

I think with the merge the versions column wasn't present.

Anonymous
Not applicable

@AlexisOlson ,

 

One question:

Incase there are already duplicate rows in the Updateable table, this does not seem to work!

for e.g in my real data, there are duplicate ProductKey rows where one row may have the manual columns filled in, and the other row may have nulls (blanks).

 

How to handle this situation in the `Updateable` table in the initial stage itself, before we combine it with `NewData` table?

Yeah, get rid of rows with blanks before combining. Maybe by doing a group by and taking the max over the manual columns?

Anonymous
Not applicable

@AlexisOlson just too many Manual columns. 😞
But i cleaned the data of duplicates and in m-code also used `RemovedDuplicates` over ProductKey column. 
Not sure if i understood `Groupby and taking a Max over the Manual columns`...

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors