Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
This is just an idea out there for discussions...I imagine this kind of function would be incredibly useful for many people out there.
Imagine I have table A, of 10 columns and 100,000 rows, we call it DictionaryTbl. I also have table B, of the same 10 columns but only 10 rows, we call it UpdateTbl. UpdateTbl contains some new values and some existing values when compared to DictionaryTbl. The ID column is the unique key for both tables.
This magical function would check each record in DuplicateTbl against its counterpart in DictionaryTbl. If any column value has changed, update that value in DictionaryTbl.
In my mind, the "normal" (boring) approach would be to:
1. Merge DuplicateTbl to DictionaryTbl, keeping on records of DictionaryTbl and only those matching from DuplicateTbl
2. Check each col from DuplicateTbl against its counterpart in DictionaryTbl. If the value is different, replace DictionaryTbl with the value from DuplicateTbl
3. Remove all DuplicateTbl columns
Of course like most elegant functions, this needs to work with any table names.
For simplicty, let's assume both tables must have exactly the same number of columns, and the columns are in the exact same order. If you have a solution that works with fewer columns in DuplicateTbl, and/or irrespective of column orders, you are a genius.
Anybody even think of attempting something like this? @Mariusz @ImkeF @MarcelBeug
Yes, I have thought about this and would be surprised if there is no elegant way of doing this very common task! Found your post when looking for a solution. See my more detailed question post here:
Overlaying/merging two tables to add missing data in empty cells and new rows/columns
Hope there is this functionality; looking thorugh huge tables for compare would be a bummer...
Hi @freelensia & @thowa ,
not sure about the performance on 100s of 1000s rows - Table.Group may be a bit heavy on large datasets, - but this is a function to do what you want (using thowa's data as an example):
(Dictionary as table, Update as table, Attributes as list)=>
let
AttributeFields = Attributes,
ValueFields = List.RemoveItems(Table.ColumnNames(Source), AttributeFields),
ValueFieldFunction = List.Accumulate(ValueFields, {}, (a, n) => a & {{n, (x)=>List.First(List.RemoveNulls(Table.Column(x, n)))}}),
Source = Table.Combine({Update, Dictionary}),
#"Grouped Rows" = Table.Group(Source, AttributeFields, ValueFieldFunction)
in
#"Grouped Rows"
Update parameter is the one that takes precedent on changed data (i.e. assumed table 1 in thowa's example).
This relies on the fact that you can direct PQ to put one table "on top" of another when appending.
Attributes parameter takes the "group by" list - a list of fields that you want to match by.
The output needs to the post-processed as the function removes types and shuffle the columns.
Kind regards,
JB
Works fine!
An other alternative approach was posted here:
Overlaying/merging two tables to add missing data in empty cells and new rows/columns
By @camargos88
Thanks to both oof you!
(Deleted prvious response as I noticed that I made an error when trying to apply)
Thanks for the suggestion! Lean piece of code, but I have to admit that I have not yet fully understood it. Will try when I find the time.
Really surprised that there is no stock-function for this tasks - it seems quite common.
When traqying to apply your solution, I don't seem to get the planned result though - same as a Table.Combine except that the column sequence has changed.
I also notice that it only works with fables fully converted to text, else I get "cannot convert to text" errors.
Attributed input must be a list of column names, right (so a list from 1-5 in my eample case)?
I may be doing something wrong, though...
Thomas
Hi Thomas,
Sorry, was not clear in my post - thee attributes list assumes "group by" columns, in your case [Date]. 1-5 are "value" columns.
If you do it this way it should work Ok, in my test 1-5 has "number-type" values, everything works fine. In fact, it doe not matter what type of data is in these columns, it will be removed in any case (a code for adding the formatting back is quite simple anyway). You can even mix formats in the "old" and "new" data, in this case a number can be replaced by a text and vice versa.
This is my code for using the function (assuming it is called fUpdate😞
let
Source = fUpdateTable(DictionaryTbl, UpdateTbl, {"Date"})
in
Source
Kind regards,
John
Sorry to jump on a 2yo thread, but for noobs hoping to reuse this code, can you explain where to add this function? Which string refers to the tables? Which string refers to the unique column? Also: will this work as-is for merging a single table with pre-existing dupe records against itself?
@Anonymous
Important to note: it ONLY works with this trigger code.
I used the interface and Power Query will make me chose a column from one of the Tables.
As a reasult the list of dates is handed over as Attributes parameter instead of the column name ("Date").
With that in mind: works great!
Have posted a summary of the two solutions proposed in form of functions here:
Overlaying/merging two tables to add missing data in empty cells and new rows/columns
@Anonymous
THANKS!
One table had 2020 and the other 2000 as the years...so obviously they wouldn't combine!
...I noticed that after banging my head against the wal all day!
Thanks so much for following up. Great Code piece.
Thomas
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.