cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
JeffWeir Regular Visitor
Regular Visitor

Multilple recursive Inner and Anti Joins for robust fuzzy date matching - Can List.Accumulate help?

I’m trying to reconcile data from two different entities regarding movement of shipping containers, by doing a bulletproof fuzzy match on dates.  One is a shipping port, the other is a railway.

 

Here's a simplified sample file illustrating what I'm trying to do:

Match to closest day_20190722 buffer Revised.xlsx

 

I work at the port, and I want to ensure that the railway are charging us for the correct amout of containers they are moving to/from the port. But the dates in the two systems don’t always match for the reasons outlined below:

  • Containers have a unique ID and a direction of travel common to both systems…either RAIL IN or RAIL OUT.
  • The Port system records when containers enter or leave the port, on their way to/from the Rail depot.
  • The Rail system records when the containers are actually put on a train and railed somewhere. (Either inwards, or outwards as containers are sent to customers who load or unload them as the case may be, and then they go to the port so they can be loaded onto ships (if full) or stored at the port (if empty).
  • The containers might sit at the rail depot for minutes, days, or weeks. And they might make multiple outward or inward trips in a matter of days. (e.g. out, then in, then out, then in again).

I need to do a "Fuzzy Date match" in order to reconcile one system to the other, that doesn’t inadvertently double count when containers are moved rapidly back and forth between the port and the customer multiple times. So I need to progressively increase the 'mismatch' tolerence on those dates, and remove matches at each pass leaving just the unmatched rows to do increasingly desperate matches on. (If I don't remove the matches at each subsequent pass, then inevitably some container movements get incorrectly matched to their next similar move).

 

I have a hard-coded query that does this works fine, but it is cumbersome. I wonder if it is possible to use something like List.Accumulate to make the query dynamic. But I have 2 seed tables, and need three output tables, so it might be a pipe dream.

 

Here's a screenshot of sample input and output. I've color coded the matching records from each table to the 'Matched Records' table below. (Edit: I've revised this as per reply to @ImkeF below)

 

Revised output.PNG

 

My manual query performs the following steps:

  1. Loads data from Rail and from Port, and does an inner join.
  2. Effectively removes these matches from the original Rail and Port tables via an anti-join against each table.
  3. Does an inner join on the remaining records, but this time offsets the dates in the Port table by +/- 1 day. This finds some more matches.
  4. Performs steps 2 and 3 over and over, increasing the date offset by 1 each time until I have captured all date mismatches up to a week.

Here’s how that looks in the query dependency view:

Dependencies.PNG

 

And here's how I do the 'Fuzzy Date Match": At each 'pass', my Day X Matched step takes the unmatched records from the two tables at the end of the previous Day X Matched step, and creates a 'Date Offset' column using  this general approach:

= Table.AddColumn(Source, "Table2.Date", each List.Dates(Date.AddDays([Date],-X),2,#duration(2X,0,0,0)))

 

That X bit successively pads out the date in one table so that it will match dates 1, 2, ..., X days before or after.

 

All this requires a lot of different Queries. Here's the amount of queries required just to handle 5 days' worth of date offsets:

Queries.PNG

 

What I would like to do is use List.Accumulate or similar to make this dynamic. Here’s how that would look:

Recursion.PNG

 

Uwe from the data-insights blog has a fantastic 3 part series on recursion in PQ, including great references to the usual suspects (Chris Webb, @ImkeF , and others). This shows some great uses of List.Accumulate and List.Generate. But working out whether these functions are flexible enough to do the above is beyond me.

 

Anybody feeling brave enough to take a stab, or even give me their thoughts as to whether this is possible?

Match to closest day_20190722 buffer Revised.xlsx

 

9 REPLIES 9
ImkeF
MVP

Re: Multilple recursive Inner and Anti Joins for robust fuzzy date matching - Can List.Accumulate he

I think a "simpler" group and sort would do the job here as well: 

1) Group by "ID" and "Direction"

2) Sort by "Date" and add Index column

3) Merge on ID, Direction and new Index column in FullOuter-mode: Matches and non-matches will be shown

 

See attached file

 

Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!

Imke Feldmann

How to integrate M-code into your solution -- Check out more PBI- learning resources here -- Performance Tipps for M-queries




JeffWeir Regular Visitor
Regular Visitor

Re: Multilple recursive Inner and Anti Joins for robust fuzzy date matching - Can List.Accumulate he

Hi @ImkeF 

 

Unfortunately that approach isn’t robust under all circumstances, as it will always match an entry in one table with a previous entry in the other table even if a closer match is available. And this is a very real possibility in this dataset.

 

Revised sample file and screenshot: Match to closest day_20190722 buffer Imke.xlsx

 

Capture.PNG

ImkeF
MVP

Re: Multilple recursive Inner and Anti Joins for robust fuzzy date matching - Can List.Accumulate he

Hi @JeffWeir ,

sorry, I lost track of this thread a bit. 

Is this still an issue?

 

Cheers, Imke

Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!

Imke Feldmann

How to integrate M-code into your solution -- Check out more PBI- learning resources here -- Performance Tipps for M-queries




JeffWeir Regular Visitor
Regular Visitor

Re: Multilple recursive Inner and Anti Joins for robust fuzzy date matching - Can List.Accumulate he

My hard-coded query works fine, but I'm trying to find the time to see if I can learn how to write a dynamic function where I can specify the number of days to check either side.  At the moment I progressively  match the tables on larger values of a 'tolerance' factor X using this condition:

Table1.Date = Table2.Date +/- X . 

 

And I'm effectiviely running this in a loop for values of X between 1 to 5, and removing successful matches on each pass leaving just unsuccessful ones. This lets me progressively increase the 'mismatch' tolerence on those dates, and remove matches at each pass leaving just the unmatched rows to do increasingly desperate matches on.

 

My  query performs the following steps:

  1. Loads data from Table1 and Table2, and does an inner join on unique ID and Date
  2. Effectively removes these matches from Table1 and Table2 via an anti-join against each referenced table.
  3. Does an inner join on the remaining records, but this time offsets the dates in the Port table by +/- 1 day. This finds some more matches.
  4. Performs steps 2 and 3 over and over, increasing the date offset by 1 each time until I have captured all date mismatches up to a week.

 

The query dependency tree that results looks like so:

 

Dependencies.PNG

 

 

All that takes a lot of setup. I really need to work out if this can be turned into a function that is a) dynamic and b) efficient.

 

I still mean to take a crack at it, but it might be beyond my beginning/intermediate M.

 

For what it's worth, here's my current manual approach.

 

Match to closest day_20190722 buffer3 Revised.xlsx

ImkeF
MVP

Re: Multilple recursive Inner and Anti Joins for robust fuzzy date matching - Can List.Accumulate he

Hi @JeffWeir

although a dynamic recursive approach (using List.Accumulate or List.Generate) would work here, that would still be a considerable amount of code (and work for me) and I'm not sure if it would perform faster than the following approach:

 

let
    Source = Table1,
    AllowedRange = Table.AddColumn(Source, "Days", each {-5..5}),
    ExpandAllowedRange = Table.ExpandListColumn(AllowedRange, "Days"),
    DaysAbsoluteFigures = Table.AddColumn(ExpandAllowedRange, "AbsDays", each Number.Abs([Days])),
    AllowedDates = Table.AddColumn(DaysAbsoluteFigures, "Dates", each Date.AddDays([Date], [Days])),
    #"Changed Type" = Table.TransformColumnTypes(AllowedDates,{{"Dates", type date}}),
    MergeTable2 = Table.NestedJoin(#"Changed Type", {"Dates", "ID", "Direction"}, Table2, {"Date", "ID", "Direction"}, "Table2", JoinKind.Inner),
    Cleanup = Table.RemoveColumns(MergeTable2,{"Table2", "Dates"}),
    #"Sorted Rows" = Table.Buffer(Table.Sort(Cleanup,{{"AbsDays", Order.Ascending}})),
    #"Removed Duplicates" = Table.Distinct(#"Sorted Rows", {"Date", "ID", "Direction"})
in
    #"Removed Duplicates"

This produces the "Matched Records"-table. Just do some Anti-Joins to retrieve the unmatched tables from there.

If performance is too bad, please come back.

 

 

 

Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!

Imke Feldmann

How to integrate M-code into your solution -- Check out more PBI- learning resources here -- Performance Tipps for M-queries




JeffWeir Regular Visitor
Regular Visitor

Re: Multilple recursive Inner and Anti Joins for robust fuzzy date matching - Can List.Accumulate he

@ImkeFThat Sorted Rows > Remove Duplicates step is clever.

 

I'll tweak my code to use this approach and see if there is a performance improvement. I still need to do lots of different passes, because i want this to be a 'progressive' match, so it may not simplify the query chain overly. But it will be interesting to see if there's any performance difference. Thanks 🙂

ImkeF
MVP

Re: Multilple recursive Inner and Anti Joins for robust fuzzy date matching - Can List.Accumulate he

Hi @JeffWeir 

this apprach was supposed to replace your whole chain. I've tweaked to code a little bit to come up with the correct Table2, but now it delivers all you need. No need at all to go through all these iterations. (results are shown in row 42 -57 of  Sheet1)

 

Please check the results in the attached file. 

 

Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!

Imke Feldmann

How to integrate M-code into your solution -- Check out more PBI- learning resources here -- Performance Tipps for M-queries




ImkeF
MVP

Re: Multilple recursive Inner and Anti Joins for robust fuzzy date matching - Can List.Accumulate he

Hi @JeffWeir 

have you had a chance to test this out on your real data?

I'd be interested to hear how this scales on a large dataset.

 

Please mark my answer as solution if it solved your case - thanks.

 

Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!

Imke Feldmann

How to integrate M-code into your solution -- Check out more PBI- learning resources here -- Performance Tipps for M-queries




JeffWeir Regular Visitor
Regular Visitor

Re: Multilple recursive Inner and Anti Joins for robust fuzzy date matching - Can List.Accumulate he

Sorry @ImkeF I've been away. I'll take a look later this week and post back.

Helpful resources

Announcements
New Kudos Received Badges Coming

New Kudos Received Badges Coming

Kudos to you if you earned one of these! Check your inbox for a notification.

Microsoft Implementation for Communities Wins Award

Microsoft Implementation for Communities Wins Award

Learn about the award-winning innovation that was implemented across Microsoft’s Business Applications Communities.

Power Platform World Tour

Power Platform World Tour

Find out where you can attend!

Top Kudoed Authors (Last 30 Days)
Users online (2,966)