Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Mathieu_W
New Member

Merge with Fuzzy algo with transformation table and exact match

Dear all, 

 

I currently have a problem with merging database with Fuzzy and Transformation table. I have 2 databases : one is from internal, ans is a database on transport rides to stores. The second one is from an external app, with exact data on arriving time at the store. I rounded the arrival time in a value format, so that it can be used as

Internal : 

StoreTrailer-numberCalculated arrival time
F1234C1221/10/2022 13:51:00
F1235C1222/10/2022 14:12:00
F1234C2123/10/2022 14:22:00

 

External :

StoreTrailer-numberArrival time
F1234C1221/10/2022 14:18:22
F1235C1222/10/2022 14:21:58
F1234C2123/10/2022 15:12:31

 

I used the Fuzy Merge so that it can linkt the 2 tables, when calculated arrival time and arrival time is not the same. But with the Fussy Merge, it wil soms also link lines that cannot be linked, for example line 1 from table 1, and line 3 from table 2 because the store is the same, trailer-number is almost the same, en time is almost the same. 

 

For now, I made a Transformation table like that, and I used a similarity threshold of 0,98. And I still get this problem, where is sometimes link a ride of yesterday with another 3 weeks ago because it is the same store and the same trailer-number.

ToFrom
4520945208,9
4520945209,1
45209,145209
45209,145209,2

 

Is there a way to say that Store and trailer-number has to be axactly the same, and arrival time can only differ for max x hours (2h of 3h, something like that) ? 

 

Already thank you for reading me.

Regards, 
Mathieu

1 ACCEPTED SOLUTION

let
    Source = Table.NestedJoin(Internal, {"Store", "Trailer-nr"}, External, {"Store-nr", "Trailer-nr"}, "External", JoinKind.LeftOuter),
    #"Expanded External" = Table.ExpandTableColumn(Source, "External", {"Arrival date","Arrival time"}),
    #"Added Custom" = Table.AddColumn(#"Expanded External", "Delta", each Number.Abs(Number.From([Calculated arrival])-Number.From([Arrival date] & [Arrival time]))),
    #"Changed Type" = Table.TransformColumnTypes(#"Added Custom",{{"Delta", type duration}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Store", "Trailer-nr", "Calculated arrival"}, {{"Min Delta", each List.Min([Delta]), type nullable duration}, {"Rows", each _, type table [KEY=nullable text, Store=nullable number, #"Trailer-nr"=nullable text, Calculated arrival=nullable datetime, Arrival date=nullable date, Arrival time=nullable time, Delta=nullable duration]}}),
    #"Expanded Rows" = Table.ExpandTableColumn(#"Grouped Rows", "Rows", {"Arrival date", "Arrival time", "Delta", "KEY"}, {"Arrival date", "Arrival time", "Delta", "KEY"}),
    #"Added Custom1" = Table.AddColumn(#"Expanded Rows", "Best Match", each [Delta]=[Min Delta]),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Best Match] = true))
in
    #"Filtered Rows"

see attached. "Best Match" :  arrival timestamp that is closest to the calculated arrival.

View solution in original post

7 REPLIES 7
lbendlin
Super User
Super User

Yes, and you don't need a fuzzy merge for that.  You can use a custom column generator, or more simply an additional column.

lbendlin_0-1697065322567.png

let
    Source = Table.NestedJoin(Internal, {"Store", "Trailer-number"}, External, {"Store", "Trailer-number"}, "External", JoinKind.LeftOuter),
    #"Expanded External" = Table.ExpandTableColumn(Source, "External", {"Arrival time"}, {"Arrival time"}),
    #"Added Custom" = Table.AddColumn(#"Expanded External", "Delta", each Number.Abs(Number.From([Calculated arrival time])-Number.From([Arrival time]))),
    #"Changed Type" = Table.TransformColumnTypes(#"Added Custom",{{"Delta", type duration}})
in
    #"Changed Type"

 

 

Hi, Thanks for the reply. Maybe it wasn't clear, but the a trailer can go to multiple stores, and multiples trailes can go in store. And a trailer can go multiple times in a store, on difeerent days, or self on the same day. The store-trailer combination is not unique. This is where it is complicated, I have to use Store, trailer number, but also the time to make my merge. But the time is not exactly the same. 

 

Even working with days does nog work, calculated time can be monday 23h30, but arrival is tuesday at 00h15. 

 

This is the reason I used Fuzzy Merge, so that it could link those 2 moments in my table.

 

Regards,

Mathieu

Hi, @Mathieu_W just an idea:

 - add some column (say, "Source") to your tables with "Internal" and "External" values in each

 - rename time in both tables to, say, "Arrival"

 - combine tables in PQ so that all data are in the same table like this: 

st.jpg

 

 

 

 

 

then sorting and Table.Group instead of merge using this code: 

let
    sorting = Table.Sort(combined_table,{{"Trailer-number", Order.Ascending}, {"Store", Order.Ascending}, {"Arrival", Order.Ascending}}),
    fx_pivot = (tbl) => Table.Pivot(tbl, {"Internal", "External"}, "Source", "Arrival"),
    group = 
        Table.Group(
            sorting, Table.ColumnNames(sorting), {"all", fx_pivot}, GroupKind.Local,
            (s, c) => 
                Number.From(
                    List.AnyTrue(
                        {s[Source] = c[Source],
                        s[Store] <> c[Store],
                        s[#"Trailer-number"] <> s[#"Trailer-number"], 
                        Duration.From(
                            Number.Abs(Number.From(s[Arrival]) - 
                            Number.From(c[Arrival]))
                        ) > #duration(0, 3, 0, 0)}
                    )
                )

        ),
    z = Table.Combine(group[all])
in
    z

I applied this to your test data and ended up with the following: 

fnl.jpg

 

 

 

Please provide sample data that fully covers your issue.
Please show the expected outcome based on the sample data you provided.

Hi Ibendlin,

 

I made a data sample available. You have 2 sheets : internal and external with the 2 databases that I would like to link, where I highlighted 2 lines that should be linked to each other. And on sheet 'output', you have an example of the result I would like to have with the 2 lines highlighted from the other sheets. 

 

https://docs.google.com/spreadsheets/d/1o6Atn5Yt6LV_kF9E5MNNg_SprHI-Dfzv/edit?usp=drive_link&ouid=11...

 

Many thanks in advance,
Regards, 

Mathieu

let
    Source = Table.NestedJoin(Internal, {"Store", "Trailer-nr"}, External, {"Store-nr", "Trailer-nr"}, "External", JoinKind.LeftOuter),
    #"Expanded External" = Table.ExpandTableColumn(Source, "External", {"Arrival date","Arrival time"}),
    #"Added Custom" = Table.AddColumn(#"Expanded External", "Delta", each Number.Abs(Number.From([Calculated arrival])-Number.From([Arrival date] & [Arrival time]))),
    #"Changed Type" = Table.TransformColumnTypes(#"Added Custom",{{"Delta", type duration}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Store", "Trailer-nr", "Calculated arrival"}, {{"Min Delta", each List.Min([Delta]), type nullable duration}, {"Rows", each _, type table [KEY=nullable text, Store=nullable number, #"Trailer-nr"=nullable text, Calculated arrival=nullable datetime, Arrival date=nullable date, Arrival time=nullable time, Delta=nullable duration]}}),
    #"Expanded Rows" = Table.ExpandTableColumn(#"Grouped Rows", "Rows", {"Arrival date", "Arrival time", "Delta", "KEY"}, {"Arrival date", "Arrival time", "Delta", "KEY"}),
    #"Added Custom1" = Table.AddColumn(#"Expanded Rows", "Best Match", each [Delta]=[Min Delta]),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Best Match] = true))
in
    #"Filtered Rows"

see attached. "Best Match" :  arrival timestamp that is closest to the calculated arrival.

Ibendlin, 

 

It worked perfectly, thank you. It helped me so much. I think that I wen too far with my fuzzy match. 

 

For those reading, correct my if I am wrong, but what you did was : 

1) Link the 2 tables together, taking all occurences from 'External' linked to an occurence in 'Internal'

2) Calculate the difference between calculated arrival and actual arrival time

3) Take as correct match the minimum between the calculated and actual arrivals times

 

Regards, 

Mathieu

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors