Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

Measuring Distance to find closest point between points from a single table

Hi All, 

 

Hoping someone may know how to include a FILTER to remove itself as a possible answer. 

 

For the example i am trying to find the closest neighbour to a house in a single table, i have the full addresses including Longitude and Lattitude. 

 

I am trying to find the next closest point to itself using the Haversine formula but excluding itself ultimely, but really struggling to remove its own coordinates as a possible answer and returning the closest point as 0 miles. 

 

So at the moment i have one working formula that shows the closest cinema but this is using two tables so i have the House points table and the Cinema Address table and it finds the closest Cinema for examples sake. Using the Formula below; 

 

Distance to Closest Cinema (Miles) =
VAR HouseLatitude = 'Route Information'[Lattitude]
VAR HouseLongitude = 'Route Information'[Longitude]
VAR EarthCircumference = 3959
VAR P =
DIVIDE ( PI (), 180 )
Return
MINX('Cinema Locations',

VAR CinemaLatitude = 'Cinema Locations'[Lattitude]
VAR CinemaLongitude = 'Cinema Locations'[Longitude]

return

ACOS(
SIN(HouseLatitude * P )*SIN(CinemaLatitude * P )
+COS(HouseLatitude * P )*COS(CinemaLatitude * P )
*COS((CinemaLongitude * P)-(HouseLongitude * P )))
 
* EarthCircumference
)

 

I now need to calculate the closest neighbour to the house addresses. 

 

So i created a duplicate of the Route Information table so i have two tables containing the same information, then run the exact formula above but removing the Cinema Locations table for the Route Information duplicate table. 

 

This then returns 0 miles as the closest distance as itself is the closest distance. 

 

So how do i filter out itself as a possible answer? 

 

Something like IF(AND(lattitude = lattitude, longitude = longitude), move to next closest would be the best way just no idea how to achieve that so hoping someone far more versed with DAX can help me out. 

 

Thanks, 

 

Josh.  

 

 

1 ACCEPTED SOLUTION

OK, I took @AlBs code and @Anonymouss last post and turned it into a measure.  First, I recreated 'Route Information' to have 5 dates' worth of data, one date of which is missing 3 of the locations (Dt is literally just a list of 5 dates 1/14 - 1/18):

 

Route Information =
UNION (
    CROSSJOIN (
        SUMMARIZE (
            'Table2 (2)',
            'Table2 (2)'[MEMNO],
            'Table2 (2)'[Post Code],
            'Table2 (2)'[Lattitude],
            'Table2 (2)'[Longitude]
        ),
        FILTER ( Dt, Dt[Date] < "1/18/18" )
    ),
    CROSSJOIN (
        SUMMARIZE (
            FILTER (
                'Table2 (2)',
                NOT ( 'Table2 (2)'[MEMNO] IN { "A21200", "A22701", "B10500" } )
            ),
            'Table2 (2)'[MEMNO],
            'Table2 (2)'[Post Code],
            'Table2 (2)'[Lattitude],
            'Table2 (2)'[Longitude]
        ),
        FILTER ( Dt, Dt[Date] = "1/18/18" )
    )
)

Then taking the working code, transformed it into a measure, keeping the Date filter (I'm probably not doing it the most efficient way using variables - I still have trouble getting ALLEXCEPT and KEEPFILTER working without trial and error):

 

DCH (Miles) =
VAR HouseLatitude = MIN ( 'Route Information'[Lattitude] )
VAR HouseLongitude = MIN ( 'Route Information'[Longitude] )
VAR EarthCircumference = 3959
VAR P = DIVIDE ( PI (), 180 )
VAR House = SELECTEDVALUE ( 'Route Information'[MEMNO] )
VAR __Dt = SELECTEDVALUE ( 'Route Information'[Date] )
RETURN
    MINX (
        FILTER (
            ALL ( 'Route Information' ),
            'Route Information'[MEMNO] <> House
                && 'Route Information'[Date] = __Dt
        ),
        VAR CinemaLatitude = 'Route Information'[Lattitude]
        VAR CinemaLongitude = 'Route Information'[Longitude]
        VAR _DistanceFromCurrentRough =
            80 * SQRT ( POWER ( ( HouseLatitude - CinemaLatitude ), 2 ) +  POWER ( ( HouseLongitude - CinemaLongitude ), 2 )
                )
        RETURN
            IF ( _DistanceFromCurrentRough <> 0, _DistanceFromCurrentRough )
    )

As you can see, on 1/18/18, the distance is different for M49418

 

miles.png

 

 

View solution in original post

37 REPLIES 37
Anonymous
Not applicable

Give me 20 minutes and ill have the smaller file size. 

 

Thanks, 

 

Josh

Anonymous
Not applicable

Hi @AlB @dedelman_clng

 

This link should work now think its 60kb now. 

 

Available here.

 

The v2 file.  

 

Thanks, 

 

Josh

 

or here;

http://www.filedropper.com/closestdestinationsamplev2 

Anonymous
Not applicable

@AlB @dedelman_clng

 

Thinking onwards i may need to filter these results at a later point aswell based on a DATE field so ALL() might not be the best and//or using a measure might be better. 

 

So with the date column we could say on 22/01 if there anyone traveling with a neighbor greater than 20 miles away. 

With the ALL() in the formual would that egnore the visual filter to only show house refferences travelling today as an exmaple? 

 

Or would the visual slicer just limited the results and base the calculation on the data thats relevant? 

 

Thanks, 

 

Josh

If the nearest point is to be dynamic, it will have to be a measure. You would still want to use ALL, but then would have to put filters back onto the data. I'll take that into consideration when I start looking at your sample file.

@Anonymous, @dedelman_clng

 

I'm having a look at the file and with my approach I do get that error that @Anonymous was mentioning:

"An argument of function 'ACOS' has the wrong data type or the result is too large or too small."  

 

That's a bit weird. I guess the type is not the problem here and ACOS should throw that error if the argument is outside [-1,1] which is the valid input range for the function. I checked the values that are passed to ACOS and all fall within that range. There are some 1s so I thought it might be that due to rounding those ones might be actually 1+1e-N with N=10 or greater. I've checked up to the 15th decimal and they are all zeros. Plus from a couple of tests I've run ACOS ignores the decimals further than the 15th.

So I'm a bit puzzled as to why this error is generated. I think the approach should work otherwise

     

Anonymous
Not applicable

@AlB @dedelman_clng

 

If we had a duplicate Route Information table, or a distinct values only table taking it back slightly and to emulate the working scenario with the Cinema, is there a way to filter the duplicate table to remove the same house coordinates before the formula is ran based on the MEMNO? 

 

Would that work, then the answer cant be 0 as the address would be removed as an option per MEMNO? 

 

Then just execute the same formula as the cinema one? 

 

Thanks, 

 

Josh

@Anonymous

 

I think my approach solves the issue of eliminating the zero distance to the the base house itself. The problem lies with the distance calculation formula. I did a test. Instead of using the haversine formula, which is raising the error, I just used a simple approximation based on the Euclidean distance between two points and adjusting for the approximate size in km of a degree of Lat/Lon. A rough approximation probably but that is beside the point. I just wanted to see if the approach works. It seems to. Here's the calculated column:

 

 

Distance to Closest House (Miles)3 = 
VAR HouseLatitude = 'Route Information'[Lattitude]
VAR HouseLongitude = 'Route Information'[Longitude]
VAR EarthCircumference = 3959
VAR P =  DIVIDE (PI (); 180 )
RETURN
    MINX (
        'Route Information';
        VAR CinemaLatitude = 'Route Information'[Lattitude]
        VAR CinemaLongitude = 'Route Information'[Longitude]
        VAR _DistanceFromCurrentRough = 80*SQRT(POWER((HouseLatitude-CinemaLatitude);2) + POWER((HouseLongitude-CinemaLongitude);2))
        RETURN
            IF (
                _DistanceFromCurrentRough <> 0;
                _DistanceFromCurrentRough
            )
    )

This provides results that look meaningful.

 

On top of that, I tried a variation that does not exclude the zero distances obtained:

 

Distance to Closest House (Miles)3 = 
VAR HouseLatitude = 'Route Information'[Lattitude]
VAR HouseLongitude = 'Route Information'[Longitude]
VAR EarthCircumference = 3959
VAR P =  DIVIDE (PI (); 180 )
RETURN
    MINX (
        'Route Information';
        VAR CinemaLatitude = 'Route Information'[Lattitude]
        VAR CinemaLongitude = 'Route Information'[Longitude]
        VAR _DistanceFromCurrentRough = 80*SQRT(POWER((HouseLatitude-CinemaLatitude);2) + POWER((HouseLongitude-CinemaLongitude);2))
        RETURN
                _DistanceFromCurrentRough
            
    )

and this, as expected, yields zero in all rows due to the comparison with the base house itself which was the initial problem/question.

So that can be solved with this approach.

Now the next issue regards something in the haversine formula or in the data or in some weird behavior that raises the error of ACOS mentioned earlier. But that's a whole different story.

Agree?

 

 

 

 

 

   

Anonymous
Not applicable

Hi @AlB

 

Yeah i have tried the Calculated column on the full data set and prior to any slicers been applied have managed to verify the distances produced are very close to accurate, 100% accurate enough for what i intend. 

 

Slight issue when we filter the data though, so there is a Date column which will date when these points are relevant or occured over the previous 7 days. 

 

If i filter to only show houses on the 16th January as an example the closest distance remains the same but its closest may be from a house on the 18th January so in this scanario wouldn't be a possible closest. 

 

Based on the conversation i did try creating a caluclated measure with the same formula without much sucsess it doesn't relate to the MEMNO so wont display the values for the closest distance in a table view using the MEMNO. 

 

So it is working from a total dataset view but the information wont update as we filter down for a specific date or only houses with a closest cinema of Doncaster Vue as an example. 

 

How would be make the same working formula work with filtered data, as a measure? 

 

Thanks though from a dataset point of view its great and from what points ive validated is fairly accurate. 

 

Thanks, 

 

Josh

@Anonymous

 

Hmmm... I'm a bit surprised it's accurate as I just put it together quite quickly and only with rough numbers. I mean I wouldn't trust my formula if I were you Smiley Very Happy Anyway we can keep using it until all other issues are solved and then we could try to change it if necessary as that wouldn't affect the rest.

 

I was going to post the measure but I see @dedelman_clng has already done it. 

 

@Anonymous

 

What are you going to need regarding the dates? Filter only a single day? A period?

 

@dedelman_clng

Out of curiosity, what problems did you have with ALLEXCEPT and KEEPFILTERS ?

 

 

@AlB- mainly wrapping my head around the advanced context filter concepts (extended tables, etc). 20+ years of SQL development has me more comfortable with assigning and using variables. I've taken the Mastering DAX class but some of the concepts are having trouble sticking in my brain.  When I get advanced concepts to work it feels more like luck than skill because I'd have a hard time explaining it to someone else.

Anonymous
Not applicable

@AlB

 

Yeah i mean i've only checked 10 points so far, but initially it looks accurate for a total dataset. 

 

House RefReturned Closest House Returned Distance Closest House RefActual Distance
A05402 1.53D152012.6
A05703 0.83M946550.9
A23400 1.51S735002
A27202 1.41M947742.2
B01701 3.49C013074.2
B14800 2.36W146002.8

 

There out to different degrees of variation but aslong as there within a +/- 5 mile radius i should be able to achieve the desired result i am aiming for. 

 

Plus i am verifying as a car would drive between points on Google Maps and not as the bird would fly so i would expect the actual to be a degree more. 

 

@AlB

 

I want to inititally see the data for a full week with closest house to see anyone who is not within 30 miles of another house point to be classed as a "remote location". 

 

Then Filter down to a single date and see again if there is anyone with a greater distance than 30 miles, that isnt categorised as a "remote location" so in theory if that house point was on a different date it could be within 30 miles of the next house if that makes sense just to visulise a theory and analyse by specific dates and by weeks. 

 

I would like to filter and distances be recalculated on a range on variables aswell like Town/City would be one to remove certain points from the analysis overall as we have some points in Belfast it is throwing some issues as it doesn't account for the water crossing so i need to split those by Island and Mainland to avoid any issues later. 

 

@dedelman_clng 

I have just tried out the measure from above and its calculating distances but sporadically across the range so some rows in the table visual have enteries and some dont?

 

@dedelman_clng Ammendment. 

 

The measure is working, must of been my impatience while it populated the data returned. 

 

Yet to verify the entries it has produced though against actual distances aswell but will take a look to see if its a similar finding to the calculated column. 

 

Thanks again, 

 

Josh

@Anonymous

Cool to hear it works. I think it will need some minor modifications to respond to time periods (other than a single day) and other slicers.

By the way I had a quick look at the data you posted earlier comparing the actual distance and the distance with my rough calculation. It seems that 96 would be the best factor to apply, instead of the 80 that we have now. This is purely empirical though and strictly based in the limited sample you provided.

Anonymous
Not applicable

@AlB

 

Yeah was just the start, I wanted to verify around 100 points and then go with the average factor from that sample should get us in the ballpark of accuracy I will need.

 

I have created a v3 of the sample file here.

 

To include a date column.

 

There is also the Closest House New column, which is a combination of the original closest house formula and the Distance to closest house formula you wrote to try and retrieve the associated nearest MEMNO that is that distance away. That would also be ideal as a measure to adjust with any on page slicers. 

 

Think I've learnt more depicting your guy's formulas this week than I have all year. 

 

Thanks again for the help!! 

 

Josh

Anonymous
Not applicable

Just thought... 

 

The closest House is a TOPN function... 

TOPN(
            1,
            'Route Information',
            VAR Lat2 = 'Route Information'[Lattitude]
            VAR Lng2 = 'Route Information'[Longitude]
            //---- Algorithm here -----
                           
            VAR final =
                80*SQRT(POWER((Lat1-Lat2),2) + POWER((Lng1-Lng2),2))
            RETURN
                IF (
                final <> 0,
                final
            ),
            ASC
        )

So currently it is returning the TOP 1st Value in ascending order. I need the second TOPN. 

 

Going with the same theory as the; 

IF (
                final <> 0,
                final
            )

Piece the in distance to closest house working formula. 

 

 

 

OK, I took @AlBs code and @Anonymouss last post and turned it into a measure.  First, I recreated 'Route Information' to have 5 dates' worth of data, one date of which is missing 3 of the locations (Dt is literally just a list of 5 dates 1/14 - 1/18):

 

Route Information =
UNION (
    CROSSJOIN (
        SUMMARIZE (
            'Table2 (2)',
            'Table2 (2)'[MEMNO],
            'Table2 (2)'[Post Code],
            'Table2 (2)'[Lattitude],
            'Table2 (2)'[Longitude]
        ),
        FILTER ( Dt, Dt[Date] < "1/18/18" )
    ),
    CROSSJOIN (
        SUMMARIZE (
            FILTER (
                'Table2 (2)',
                NOT ( 'Table2 (2)'[MEMNO] IN { "A21200", "A22701", "B10500" } )
            ),
            'Table2 (2)'[MEMNO],
            'Table2 (2)'[Post Code],
            'Table2 (2)'[Lattitude],
            'Table2 (2)'[Longitude]
        ),
        FILTER ( Dt, Dt[Date] = "1/18/18" )
    )
)

Then taking the working code, transformed it into a measure, keeping the Date filter (I'm probably not doing it the most efficient way using variables - I still have trouble getting ALLEXCEPT and KEEPFILTER working without trial and error):

 

DCH (Miles) =
VAR HouseLatitude = MIN ( 'Route Information'[Lattitude] )
VAR HouseLongitude = MIN ( 'Route Information'[Longitude] )
VAR EarthCircumference = 3959
VAR P = DIVIDE ( PI (), 180 )
VAR House = SELECTEDVALUE ( 'Route Information'[MEMNO] )
VAR __Dt = SELECTEDVALUE ( 'Route Information'[Date] )
RETURN
    MINX (
        FILTER (
            ALL ( 'Route Information' ),
            'Route Information'[MEMNO] <> House
                && 'Route Information'[Date] = __Dt
        ),
        VAR CinemaLatitude = 'Route Information'[Lattitude]
        VAR CinemaLongitude = 'Route Information'[Longitude]
        VAR _DistanceFromCurrentRough =
            80 * SQRT ( POWER ( ( HouseLatitude - CinemaLatitude ), 2 ) +  POWER ( ( HouseLongitude - CinemaLongitude ), 2 )
                )
        RETURN
            IF ( _DistanceFromCurrentRough <> 0, _DistanceFromCurrentRough )
    )

As you can see, on 1/18/18, the distance is different for M49418

 

miles.png

 

 

@dedelman_clng, @Anonymous

agree. A short data sample would be more convenient for quick testing

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.