Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Dan80
Helper II
Helper II

PQ Matching data that doesn’t exactly match!

Hi all,
I have 2 data sets and I need to merge them together based on date and time. However the date and time quite often don’t match. What I need to do is take first table date and time and find a match in second table within a four hour window (2 hours before and 2 hours after). For example 2 April 2019 10:00 needs to look up any matches between 2 April 2019 08:00 and 12:00.

Appreciate any help! Thanks. Dan
1 ACCEPTED SOLUTION

In that case, go through the following example based on your sample data set.

 

Modify it as necessary. But open a "BlankQuery" in power bi desktop, paste the following codes and run it to get some idea.

 

let

//your sample data set of "Seats" table
// Replace the following codes with your actual power query code that fetches the Seats table.

    Seats = #table(
        {"Train","Origin","Date","Time","Seats"},
        {
        {"A", "X",  #date(2018,4,1),#time(10,5,0), 186},
        {"B", "Y",  #date(2018,4,1),#time(11,40,0), 147},
        {"C", "Z",  #date(2018,4,1),#time(13,48,0), 167},
        {"A", "X",  #date(2018,4,1),#time(22,10,0), 191},
        {"B", "Y",  #date(2018,4,1),#time(23,55,0), 146},
        {"C", "Z",  #date(2018,4,1),#time(1,30,0), 170}
        }

    ),

//your sample data set of "Passengers" table.
// Replace the following codes with your actual power query code that fetches the Passengers table.

    Passengers = #table(
        {"Train","Origin","Date","Time","Passengers"},
        {
            {"A", "X",  #date(2018,4,1),#time(10,9,0), 149},
            {"B", "Y",  #date(2018,4,1),#time(11,51,0), 118},
            {"C", "Z",  #date(2018,4,1),#time(13,59,0), 134},
            {"A", "X",  #date(2018,4,1),#time(22,2,0), 153},
            {"B", "Y",  #date(2018,4,2),#time(0,2,0), 117},
            {"C", "Z",  #date(2018,4,1),#time(1,29,0), 136}
        }

    ),

// Add the date field and time field together to form a datetime column for operations

AddDateTimeColumnS = Table.AddColumn(Seats,"DateTimeColumn",each [Date] & [Time],type datetime),

// Reduce 2 hours from the DateTimeColumn added in the previous step as a new column "TimeWindowFrom"

AddTimeWindowFrom = Table.AddColumn(AddDateTimeColumnS,"TimeWindowFrom",each [DateTimeColumn] - #duration(0,2,0,0),type datetime),

// Add 2 hours from the DateTimeColumn added in the previous step as a new column "TimeWindowTo"

AddTimeWindowTo = Table.AddColumn(AddTimeWindowFrom,"TimeWindowTo",each [DateTimeColumn] + #duration(0,2,0,0),type datetime),

// Add the date field and time field together to form a datetime column for operations - To Passengers Table.

AddDateTimeColumnP = Table.AddColumn(Passengers,"DateTimeColumn",each [Date] & [Time],type datetime),

// Custom Function
// This function takes and Train Number, Origin, TimeWindowFrom, TimeWindowTo from Seats table and the entire Passengers table as input.
// It matches the details and returns the number of passengers as the list. 
// IF there are two or more trains matching all the criteria (a rare scenario), it returns the first match from the list

MatchTrain = (p as table,t as text,o as text,dtf as datetime,dtt as datetime) as number => 
    List.First(
        Table.Column(
            Table.SelectRows(p, each [Train]=t and [Origin]=o and dtf <= [DateTimeColumn] and dtt >=[DateTimeColumn]),
            "Passengers")
    ),

// Adds a column "Passengers" to the "Seats Table" after matching the details using the custom function "MatchTrain"

MatchedTable = Table.AddColumn(AddTimeWindowTo,
    "Passengers",
    each MatchTrain(AddDateTimeColumnP,[Train],[Origin],[TimeWindowFrom],[TimeWindowTo])
    ),

// Removes the unnecessary columns.

RemoveUnnecessaryColumns = Table.RemoveColumns(MatchedTable,{"DateTimeColumn","TimeWindowFrom","TimeWindowTo"}),

Output=RemoveUnnecessaryColumns

in
    Output

If you want to see the step by step operations, change the "Output=RemoveUnnecessaryColumns" line to each of the following and execute it. You will be able to see step by step output.

 

Output = Seats

Output = Passengers

Output = AddDateTimeColumnS

Output = AddTimeWindowFrom

Output = AddTimeWindowTo

Output = AddDateTimeColumnP

Output = MatchedTable

Output = RemoveUnnecessaryColumns

View solution in original post

8 REPLIES 8
sreenathv
Solution Sage
Solution Sage

Without looking at a sample of your actual dataset, it's difficult to give a ready to use solution. But I will approach this in the following way.

 

Internally, DAX represents datetime datatype as a floating point number where the integer part corresponds to a day and the decimal part corresponds to the hours, minutes, and seconds. i.e.

 

1 day = 1

1 hour = 1/24

1 minute = 1 / (24*60)

1 second = 1 / (24*60*60)

 

If you have to match your datetime value with a plus or minus 2-hour window, I think you should add two columns as follows..

 

StartTime = <datetime1> - (2/24)

EndTime = <datetime1> + (2/24)

 

But set the data type of StartTime and EndTime to datetime although it does not really matter.

 

Now from the second table, you can match your datetime2 value with the following expression.

 

 

datetime2 >= StartTime AND datetime2 <= EndTime

 

 

If if the result of this expression is true, you will have a match.

 

For every second, within a span of 4 hours window, there are 14400 distinct datetime values possible (4 * 60 * 60). 

In a day there are 86400 distinct datetime values (24*60*60). I think instead of maintaining a table of (14400 * 86400) = 1244160000 records per day for matching, you will be better off with adding the start range and end range columns to your table1 and use the boolean condition to match.

 

Moreover, this method will also handle the datetime window that spans across days. for example 1:00 AM will have the StartTime on the previous day and EndTime on the same day. 

 

Thanks for your responses so far, however, the scenario is a little more complicated and the solution needs to be in PQ and not using DAX.  

 

Example below, there are 2 tables regarding train operations.  One table has the number of seats that each train had and the other the number of passengers on board each train.  Both tables have the following details:

 

1  Train company

2  Origin of train

3  Date of arrival

4  Time of arrival

 

However, the data is captured by 2 different systems and so the time of arrival doesn't match but to get a match we need to use the time (give or take ~ an hour or so) to join the two tables together.  Any suggestions would be really apprecaited!  Thanks,

 

Dan


Capture.PNG

CheenuSing
Community Champion
Community Champion

Hi @Dan80 ,

 

Here is the way I went about

1. The time column being  totally differen in two tables, it is not possible to do merge of the two tables.

2. So based on the value of time in the tables create a column called  Shift.

                06:00 AM to 18:59 PM let us assign  shift  with value 1.

                  And time from 19:00 (07:00 PM) to 05:59 AM let us assign shift with value 2.

3. This shift column is computed for both SeatData and PassengerData table.

4. Now you can merge the SeatDate with PassengerData on Train+Origin+Date+Shift

 

5.  The M-Script for PassengerSeatData

       

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WclTSUYoA4pCM0qLilMRKHQUTBa/EvNLEokoFIwNDC6CUoYGVgSWINrFUitWJVnICsiMJaDG0MjUE0xZgLc5AdhQBLcZWpmBbjE3AWohwmJGRlZEBSIupMbEOA3rFCOwuc2LdZWBoZQxxl5lSbCwA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Train = _t, Origin = _t, Date = _t, Time = _t, Seats = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Train", type text}, {"Origin", type text}, {"Date", type date}, {"Time", type time}, {"Seats", Int64.Type}}),
    #"Range" = Table.AddColumn(#"Changed Type", "Shift", each if
                               Time.Hour([Time]) >= 06 and Time.Hour([Time]) < 19   then 1     else 2)

   
in
    Range

 

 

6. The M-Script for SeatData

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("dY5LDoAgDETv0jUJLZSP3amXUAn3v4alLpFFM9PkZWZagx0cXHpIHtkHpKoPoWAaWjN01+BQf08QCeNQLgad6p8JisKm+YN+60IQsqSN1nUhSrJNnNd1SBItqSD0/gI=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Train = _t, Origin = _t, Date = _t, Time = _t, Seats = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Train", type text}, {"Origin", type text}, {"Date", type date}, {"Time", type time},
                                                   {"Seats", Int64.Type}}),
  #"Range" = Table.AddColumn(#"Changed Type", "Shift", each if

                                                            Time.Hour([Time]) >= 06 and  Time.Hour([Time]) < 19

                                                                     then 1      else 2),
    #"Merged Queries" = Table.NestedJoin(Range, {"Train", "Origin", "Date", "Shift"}, PassengerData, {"Train", "Origin", "Date", "Shift"}, "PassengerData", JoinKind.LeftOuter),
    #"Expanded PassengerData" = Table.ExpandTableColumn(#"Merged Queries", "PassengerData", {"Seats"}, {"PassengerData.Seats"}),
    #"Renamed Columns" = Table.RenameColumns(#"Expanded PassengerData",{{"PassengerData.Seats", "PassengerSeats"}})
in
    #"Renamed Columns"

 

7. After the merging of SeatData with PassengerData you can find the number of empty seats by Train/Origin/Date/TIme(seattable).

 

The only caveat what if a train from an origin operates on a 2 hour shuttle. How to identify such train services and the logic for computed Column -  Shift.

 

Cheers

 

CheenuSing

 

CheenuSing

Did I answer your question? Mark my post as a solution and also give KUDOS !

Proud to be a Datanaut!

In that case, go through the following example based on your sample data set.

 

Modify it as necessary. But open a "BlankQuery" in power bi desktop, paste the following codes and run it to get some idea.

 

let

//your sample data set of "Seats" table
// Replace the following codes with your actual power query code that fetches the Seats table.

    Seats = #table(
        {"Train","Origin","Date","Time","Seats"},
        {
        {"A", "X",  #date(2018,4,1),#time(10,5,0), 186},
        {"B", "Y",  #date(2018,4,1),#time(11,40,0), 147},
        {"C", "Z",  #date(2018,4,1),#time(13,48,0), 167},
        {"A", "X",  #date(2018,4,1),#time(22,10,0), 191},
        {"B", "Y",  #date(2018,4,1),#time(23,55,0), 146},
        {"C", "Z",  #date(2018,4,1),#time(1,30,0), 170}
        }

    ),

//your sample data set of "Passengers" table.
// Replace the following codes with your actual power query code that fetches the Passengers table.

    Passengers = #table(
        {"Train","Origin","Date","Time","Passengers"},
        {
            {"A", "X",  #date(2018,4,1),#time(10,9,0), 149},
            {"B", "Y",  #date(2018,4,1),#time(11,51,0), 118},
            {"C", "Z",  #date(2018,4,1),#time(13,59,0), 134},
            {"A", "X",  #date(2018,4,1),#time(22,2,0), 153},
            {"B", "Y",  #date(2018,4,2),#time(0,2,0), 117},
            {"C", "Z",  #date(2018,4,1),#time(1,29,0), 136}
        }

    ),

// Add the date field and time field together to form a datetime column for operations

AddDateTimeColumnS = Table.AddColumn(Seats,"DateTimeColumn",each [Date] & [Time],type datetime),

// Reduce 2 hours from the DateTimeColumn added in the previous step as a new column "TimeWindowFrom"

AddTimeWindowFrom = Table.AddColumn(AddDateTimeColumnS,"TimeWindowFrom",each [DateTimeColumn] - #duration(0,2,0,0),type datetime),

// Add 2 hours from the DateTimeColumn added in the previous step as a new column "TimeWindowTo"

AddTimeWindowTo = Table.AddColumn(AddTimeWindowFrom,"TimeWindowTo",each [DateTimeColumn] + #duration(0,2,0,0),type datetime),

// Add the date field and time field together to form a datetime column for operations - To Passengers Table.

AddDateTimeColumnP = Table.AddColumn(Passengers,"DateTimeColumn",each [Date] & [Time],type datetime),

// Custom Function
// This function takes and Train Number, Origin, TimeWindowFrom, TimeWindowTo from Seats table and the entire Passengers table as input.
// It matches the details and returns the number of passengers as the list. 
// IF there are two or more trains matching all the criteria (a rare scenario), it returns the first match from the list

MatchTrain = (p as table,t as text,o as text,dtf as datetime,dtt as datetime) as number => 
    List.First(
        Table.Column(
            Table.SelectRows(p, each [Train]=t and [Origin]=o and dtf <= [DateTimeColumn] and dtt >=[DateTimeColumn]),
            "Passengers")
    ),

// Adds a column "Passengers" to the "Seats Table" after matching the details using the custom function "MatchTrain"

MatchedTable = Table.AddColumn(AddTimeWindowTo,
    "Passengers",
    each MatchTrain(AddDateTimeColumnP,[Train],[Origin],[TimeWindowFrom],[TimeWindowTo])
    ),

// Removes the unnecessary columns.

RemoveUnnecessaryColumns = Table.RemoveColumns(MatchedTable,{"DateTimeColumn","TimeWindowFrom","TimeWindowTo"}),

Output=RemoveUnnecessaryColumns

in
    Output

If you want to see the step by step operations, change the "Output=RemoveUnnecessaryColumns" line to each of the following and execute it. You will be able to see step by step output.

 

Output = Seats

Output = Passengers

Output = AddDateTimeColumnS

Output = AddTimeWindowFrom

Output = AddTimeWindowTo

Output = AddDateTimeColumnP

Output = MatchedTable

Output = RemoveUnnecessaryColumns

@sreenathv  you are a genius!  Managed to apply your solution to 'real life' scenario and works a charm (with the help of a colleague who is much smarter than I am!).  Really appreciate your time in solving this for me.

@sreenathv don't follow what you have done but it looks like you have cracked it!  You are far too clever for me, this looks bl00dy impressive.  I will play around with it and see if it works in my 'real life' example but it is looking very promising.  Thanks heaps for your help.

 

BTW, how did you get to know all this stuff?  Is it your job, courses, own time???  Thanks,

 

dan

 

v-juanli-msft
Community Support
Community Support

Hi @Dan80 

I create a test file according to my assumption.

If my understanding is correct or there is any confustion of my solution, please let me know.

7.png

Best Regards
Maggie

 

Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

v-juanli-msft
Community Support
Community Support

Hi @Dan80 

If Table 1 is 

datetime1
2/2/2019 10:00
2/2/2019 13:00

Table 2 is

datetime2
2/2/2019 7:00
2/2/2019 8:00
2/2/2019 9:00
2/2/2019 10:00
2/2/2019 11:00
2/2/2019 0:00
2/2/2019 13:00

Do you want to crossjoin two tables as below

datetime1 datetime2
2/2/2019 10:00 2/2/2019 8:00
2/2/2019 10:00 2/2/2019 9:00
2/2/2019 10:00 2/2/2019 10:00
2/2/2019 10:00 2/2/2019 11:00
2/2/2019 13:00 2/2/2019 11:00
2/2/2019 13:00 2/2/2019 13:00

 

Best Regards
Maggie

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.