Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

Measuring Distance to find closest point between points from a single table

Hi All, 

 

Hoping someone may know how to include a FILTER to remove itself as a possible answer. 

 

For the example i am trying to find the closest neighbour to a house in a single table, i have the full addresses including Longitude and Lattitude. 

 

I am trying to find the next closest point to itself using the Haversine formula but excluding itself ultimely, but really struggling to remove its own coordinates as a possible answer and returning the closest point as 0 miles. 

 

So at the moment i have one working formula that shows the closest cinema but this is using two tables so i have the House points table and the Cinema Address table and it finds the closest Cinema for examples sake. Using the Formula below; 

 

Distance to Closest Cinema (Miles) =
VAR HouseLatitude = 'Route Information'[Lattitude]
VAR HouseLongitude = 'Route Information'[Longitude]
VAR EarthCircumference = 3959
VAR P =
DIVIDE ( PI (), 180 )
Return
MINX('Cinema Locations',

VAR CinemaLatitude = 'Cinema Locations'[Lattitude]
VAR CinemaLongitude = 'Cinema Locations'[Longitude]

return

ACOS(
SIN(HouseLatitude * P )*SIN(CinemaLatitude * P )
+COS(HouseLatitude * P )*COS(CinemaLatitude * P )
*COS((CinemaLongitude * P)-(HouseLongitude * P )))
 
* EarthCircumference
)

 

I now need to calculate the closest neighbour to the house addresses. 

 

So i created a duplicate of the Route Information table so i have two tables containing the same information, then run the exact formula above but removing the Cinema Locations table for the Route Information duplicate table. 

 

This then returns 0 miles as the closest distance as itself is the closest distance. 

 

So how do i filter out itself as a possible answer? 

 

Something like IF(AND(lattitude = lattitude, longitude = longitude), move to next closest would be the best way just no idea how to achieve that so hoping someone far more versed with DAX can help me out. 

 

Thanks, 

 

Josh.  

 

 

1 ACCEPTED SOLUTION

OK, I took @AlBs code and @Anonymouss last post and turned it into a measure.  First, I recreated 'Route Information' to have 5 dates' worth of data, one date of which is missing 3 of the locations (Dt is literally just a list of 5 dates 1/14 - 1/18):

 

Route Information =
UNION (
    CROSSJOIN (
        SUMMARIZE (
            'Table2 (2)',
            'Table2 (2)'[MEMNO],
            'Table2 (2)'[Post Code],
            'Table2 (2)'[Lattitude],
            'Table2 (2)'[Longitude]
        ),
        FILTER ( Dt, Dt[Date] < "1/18/18" )
    ),
    CROSSJOIN (
        SUMMARIZE (
            FILTER (
                'Table2 (2)',
                NOT ( 'Table2 (2)'[MEMNO] IN { "A21200", "A22701", "B10500" } )
            ),
            'Table2 (2)'[MEMNO],
            'Table2 (2)'[Post Code],
            'Table2 (2)'[Lattitude],
            'Table2 (2)'[Longitude]
        ),
        FILTER ( Dt, Dt[Date] = "1/18/18" )
    )
)

Then taking the working code, transformed it into a measure, keeping the Date filter (I'm probably not doing it the most efficient way using variables - I still have trouble getting ALLEXCEPT and KEEPFILTER working without trial and error):

 

DCH (Miles) =
VAR HouseLatitude = MIN ( 'Route Information'[Lattitude] )
VAR HouseLongitude = MIN ( 'Route Information'[Longitude] )
VAR EarthCircumference = 3959
VAR P = DIVIDE ( PI (), 180 )
VAR House = SELECTEDVALUE ( 'Route Information'[MEMNO] )
VAR __Dt = SELECTEDVALUE ( 'Route Information'[Date] )
RETURN
    MINX (
        FILTER (
            ALL ( 'Route Information' ),
            'Route Information'[MEMNO] <> House
                && 'Route Information'[Date] = __Dt
        ),
        VAR CinemaLatitude = 'Route Information'[Lattitude]
        VAR CinemaLongitude = 'Route Information'[Longitude]
        VAR _DistanceFromCurrentRough =
            80 * SQRT ( POWER ( ( HouseLatitude - CinemaLatitude ), 2 ) +  POWER ( ( HouseLongitude - CinemaLongitude ), 2 )
                )
        RETURN
            IF ( _DistanceFromCurrentRough <> 0, _DistanceFromCurrentRough )
    )

As you can see, on 1/18/18, the distance is different for M49418

 

miles.png

 

 

View solution in original post

37 REPLIES 37
dedelman_clng
Community Champion
Community Champion

Instead of having two tables, can you add a column to the single table indicating if a location record is a house or a theater? Then you can build a filter on that column within your calculations.

 

Hope this helps

David

Anonymous
Not applicable

Hi Dedelman, 

 

The calculation for closest theater is working correctly at the moment, but i need another column to find the closest house. 

So there is the primary Route Information table with a column for Names which is a unique identifier, along with Postal address and longitude, lattitude points. 

 

I want to find the closest house to each house excluding its own postal information so the distance is never 0 miles. 

I only made the duplicate table which is hidden in report view to duplicate the working formula for closest theater, just modifying the refferenced columns to use the 'duplicate table. 

 

There is a unique identifier column those for house refferences so we could filter on that to exclude itself as the destination but how would i build that filter into the formula i am using? 

 

Thanks, 

 

Josh

Try something like this

 

Closest House =
  var __ThisHouse = SELECTEDVALUE(House[Name])

//Then for the other variables
CALCULATE([your code here], House[Name] <> __ThisHouse)
or
CALCULATE([your code here], FILTER(House, House[Name] <> __ThisHouse)

Because SELECTEDVALUE will return (BLANK) if more than one house is selected, this will not work at any aggregate level, but it sounds like you're pretty much wanting to do this on a single house anyway.

 

Hope this helps

David

 

 

Anonymous
Not applicable

Hi, 

 

Tried the below as a caluclated column; 

 

Closest HOUSE (Miles) test =
VAR HOUSELatitude = 'Route Information'[Lattitude]
VAR HOUSELongitude = 'Route Information'[Longitude]
VAR EarthCircumference = 3959
VAR P =
DIVIDE ( PI (), 180 )
Var P2 =
DIVIDE(PI(), 360 )
VAR _ThisHOUSE = SELECTEDVALUE('Route Information'[MEMNO])


return


CALCULATE(
MINX('Route Information (Dup'')',

VAR ClostestHOUSELatitude = 'Route Information (Dup'')'[Lattitude]
VAR ClostestHOUSELongitude = 'Route Information (Dup'')'[Longitude]

return
MINX('Route Information (Dup'')'
, ROUND(2 * EarthCircumference *
ASIN(SQRT(
SIN((ClostestHOUSELatitude - HOUSELatitude) * P2)^2 +
COS(HOUSELatitude * P) * COS(ClostestHOUSELatitude* P) *
SIN((ClostestHOUSELongitude - HOUSELongitude) * P2)^2)), 1)))

 

, 'Route Information (Dup'')'[MEMNO] <> _ThisHOUSE)

 

This still returns a 0 miles value, any ideas? 

 

This is still based on the two table model, does it need to be one table? 

 

Thanks, 

 

Josh

You should be able to do the calculation against a single instance of 'Route Information'.  Try this:

 

Closest HOUSE (Miles) test =
VAR HOUSELatitude = MAX('Route Information'[Lattitude])
VAR HOUSELongitude = MAX('Route Information'[Longitude])
VAR EarthCircumference = 3959
VAR P =
    DIVIDE ( PI (), 180 )
VAR P2 =
    DIVIDE ( PI (), 360 )
VAR _ThisHOUSE =
    SELECTEDVALUE ( 'Route Information'[MEMNO] )
RETURN
    CALCULATE (
        MINX (
            FILTER(ALL('Route Information'), 'Route Information'[MEMNO] <> _ThisHOUSE),  //Note the use of FILTER
            VAR ClostestHOUSELatitude = 'Route Information'[Lattitude]
            VAR ClostestHOUSELongitude = 'Route Information'[Longitude]
            RETURN
                MINX (
                    FILTER(ALL('Route Information'), 'Route Information'[MEMNO] <> _ThisHOUSE), //Note the use of FILTER 
ROUND ( 2 * EarthCircumference * ASIN ( SQRT ( SIN ( ( ClostestHOUSELatitude - HOUSELatitude ) * P2 ) ^ 2 + COS ( HOUSELatitude * P ) * COS ( ClostestHOUSELatitude * P ) * SIN ( ( ClostestHOUSELongitude - HOUSELongitude ) * P2 ) ^ 2 ) ), 1 ) ) ) // 'Route Information'[MEMNO] <> _ThisHOUSE //moving this up to the table on MINX and adding ALL )

I tested on some Lat-Long data that I had and it definitely changes values (that are not 0) when I click on a spot on a map (I can't vouch for the formula's accuracy, but at least it changes).  Basically I added the MAX at the top, removed the duplicate table, and moved the filter into each call to 'Route Information', and wrapped ALL() around the table so it will iterate over the entire table regardless of which point is selected.

 

 

Anonymous
Not applicable

Okdokie we are making some progress no 0 values like before, thanks Dedelman!

 

Not sure on whether its accurate though in its calculation; 

 

So along with the Closest House (Miles) columns now using the formula you provided below i have this Closest House column which is returning the refference for this closest point so there is some usable context; 

 

Formula; 

Closest House = CALCULATE(MAX('Route Information (Dup'')'[MEMNO]),
FILTER('Route Information (Dup'')',
ROUND(2 * 3959 *
ASIN(SQRT(
SIN(('Route Information (Dup'')'[Lattitude] - 'Route Information'[Lattitude]) * PI()/360)^2 +
COS('Route Information'[Lattitude] * PI()/180) * COS('Route Information (Dup'')'[Lattitude] * PI()/180) *
SIN(('Route Information (Dup'')'[Longitude] - 'Route Information'[Longitude]) * PI()/360)^2)), 1)
= 'Route Information'[Closest House (Miles)]))

 

Now this formula before just returned every MEMNO the same as the MEMNO we were looking up the closest value for, which was expected as before the distance was Zero. 

 

Now its returning the odd few, stradorically but not sure on the validity of the closest house as i can see with some filters its a bit random. 

Example.PNG

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

So in this exmaple we can see it saying closest store to A05400 is 145.6 miles and that house ref is R15700. 

 

So on the bottom map we are just seeing A05400 and R15700 then the top map is all the houses available in that area, highlighting in red what it classed as closest? 

 

Also in the table we can see it thinks R15700 closest house is S55000 which just adds more mystery but is slightly plausable if S55000 was further north but still S55000 isn't the closest to R15700 anyway. 

 

Any ideas? 

 

Been stairing at this for days now and you've already got better results than me just need to be able to rely on it 🙂 

 

Thanks again in advance! 

 

Cheers

Since you're looking at a small scale, is it possible that the Lat/Long for all of those nearby house (per the map) are the same (if you don't have very precise lat/long measurements), and the formula is excluding them (thinking they're the same house) ? I know it shouldn't likely matter, but just a thought. 

Hey guys. Hope you don't mind my partaking in what sounds like an interesting conversation

 

@Anonymous

Can you share the pbix  or at least a fragment of it with a data sample that can be used for testing?

Just to double check: Are you sure the formula to calculate the distance between to points (Harversine) is working correctly and not the issue?

 

@dedelman_clng

I'm curious about your approach and have a couple of questions:

 

a) Why are you using two nested MINXs? Wouldn't only one suffice, like in the initial example? I'm probably missing something.

b) I understand @Anonymous needs a calculated column. Were you thinking of a measure? I'm asking because  you are using SELECTEDVALUE( ) in the definition of _ThisHOUSE. If you were thinking of a calc column, why SELECTEDVALUE ( 'Route Information'[MEMNO] ) instead of simply 'Route Information'[MEMNO]? 

c) Is the CALCULATE after the return necessary? I guess it's just the remnants of the first approach whose very last line  you commented out? If it's so, I would take it out. Although I think in this case it would be fine since you are filtering over the whole 'Route Information' table, it might in general bring undesired effects through context transition.   

 

 

Hi @AlB - welcome to the conversation. I was about to light up the bat signal as I had reached the end of my skills.

 

My code was actually @Anonymous's code, just cleaned up with DAX formatter, and putting FILTER in the proper places.  I've been trying to break the code down myself to help with the second issue but having no luck.  I assumed it was for a measure, but yes I suppose he could be looking for a calculated column (no offense to @Anonymous, but many times on these forums people are looking for "columns" when they really mean a measure to be placed in a table/matrix so I default to measure instinctively).

 

David

@dedelman_clng

I see,  but the initial code had only one MINX( ). That's why I was asking, in case I was missing something

 

the bat signal? Smiley Very Happy

 

@AlB- sorry, by "original" I meant in the 5th post Smiley Happy. It was at that point I started playing with the code.

 

Bat signal - I was going to @ some of the heavy hitters on these boards to come to the rescue Smiley Very Happy

@dedelman_clng

Ah ok, Then the question is for @Anonymous: why the two MINXs( ) as opposed to only one MINX() you used initially?

If you have just a sample of the table with the important columns and a  few rows including Latitude and Longitude I can probably give it a go with that.

 

Anonymous
Not applicable

@dedelman_clng

No offence taken, I am trying to create the solution as a calculated column as i wanted to later filter the results for houses with the closest neighbour more than 20 miles away and recently had issues filtering visuals based on a measure in the desktop application since the last update. (like IF(FY = This FY, 1, 0 as an example, this FY been the measure to display the fiscal year based on Today() in ruined the visuals so was opting for a calculated column to avoid this issue). 

 

I tried what @AlB mentioned removing the SELECTEDVALUE () and am now back to displaying 0 mile values. 

 

Closest House (Miles) =
VAR HouseLatitude = MIN('Route Information'[Lattitude])
VAR HouseLongitude = MIN('Route Information'[Longitude])
VAR EarthCircumference = 3959
VAR P =
DIVIDE ( PI (), 180 )
VAR P2 =
DIVIDE ( PI (), 360 )
VAR _ThisHouse = MIN('Route Information'[MEMNO])

VAR ClostestHouseLatitude = MIN('Route Information'[Lattitude])
VAR ClostestHouseLongitude = MIN('Route Information'[Longitude])

RETURN
MINX (
FILTER(ALL('Route Information'), 'Route Information'[MEMNO] <> _ThisHouse), //Note the use of FILTER
ROUND (
2 * EarthCircumference
* ASIN (
SQRT (
SIN ( ( ClostestHouseLatitude - HouseLatitude ) * P2 ) ^ 2
+ COS ( HouseLatitude * P )
* COS ( ClostestHouseLatitude * P )
* SIN ( ( ClostestHouseLongitude - HouseLongitude ) * P2 ) ^ 2
)
),
1
)
)

 

On the variables if I switch MIN for MAX all the distances are 3000 miles + 

If I keep MAX on the HouseLatitude variable and none on the ClosestHouseLatitude the results vary but are fictional 

&& if I drop the min / max on both we're back to 0 miles again. 

 

@dedelman_clng

 

In my example, I was zoomed into that area to display the area but overall there are around 13000 points across the UK. 

 

@AlB

 

I could share a PBIX file I would just need a little time to create an anonymised data set as the true project isn't homes addresses and cinemas. 

 

Any suggestions in the meantime though I'm happy to try out will help me no end if we can get this working. 

 

Thanks, 

 

Josh

@AnonymousI think you undid more than you mean to.  The top of your code now has this 

 

VAR HouseLatitude = MIN('Route Information'[Lattitude])
VAR HouseLongitude = MIN('Route Information'[Longitude])
VAR EarthCircumference = 3959
VAR P =
DIVIDE ( PI (), 180 )
VAR P2 =
DIVIDE ( PI (), 360 )
VAR _ThisHouse = MIN('Route Information'[MEMNO])

VAR ClostestHouseLatitude = MIN('Route Information'[Lattitude])
VAR ClostestHouseLongitude = MIN('Route Information'[Longitude])

 

Notice the bolded sections - you are storing the same thing in ClosestHouse and House. That is why you are now getting zeroes.

 

Getting back to column vs measure, a calculated column allows you more freedom in some ways than a measure, because you do not have to put in "meaningless" aggregators (when you intend the measure to be used on a simgle record).  Now, a calculated column will also make data refreshes longer, because that is when the calculation happens (it killed my my computer trying to add the code as a column to the existing dataset, which may be from the nested MINX calls as @AlB mentioned).  I can definitely see why you would want the nearest neighbor as a column, since it shouldn't really be a dynamic value. 

@dedelman_clng

Yeah the nested MINXs could certainly increase execution time, potentially by a lot.

@Anonymous

Since you said the very first version (with he cinemas) worked, I've modified that to take a first shot:

 

Distance to Closest House (Miles) =
VAR HouseLatitude = 'Route Information'[Lattitude]
VAR HouseLongitude = 'Route Information'[Longitude]
VAR EarthCircumference = 3959
VAR P =  DIVIDE (PI (), 180 )
RETURN
    MINX (
        'Route Information',
        VAR CinemaLatitude = 'Route Information'[Lattitude]
        VAR CinemaLongitude = 'Route Information'[Longitude]
        VAR _DistanceFromCurrent =
            ACOS (
                SIN ( HouseLatitude * P ) * SIN ( CinemaLatitude * P ) + COS ( HouseLatitude * P ) * COS ( CinemaLatitude * P ) * COS ( ( CinemaLongitude * P ) - ( HouseLongitude * P ) )
            ) * EarthCircumference
        RETURN
            IF (
                _DistanceFromCurrent <> 0,
                _DistanceFromCurrent
            )
    )

The only thing I've done is check whether the distance is zero (distance to the house itself) and if so return a blank(). The MINX( ) will ignore blanks so with that we'd obtain the minimum non-zero distance. This assumes the Latitude/Longitude pairs are unique in your data (and thus the distance to a house is only zero from the house itself). 

I haven't tested. It's just based on the info on your first posting. Try it out and let me know.

 

Cheers 

 

 

Anonymous
Not applicable

Morning @AlB & @dedelman_clng

 

Tried the formula you mentioned @AlB and currently its returning the below error; 

"An argument of function 'ACOS' has the wrong data type or the result is too large or too small."

 

The data types haven't changed though and it was returning in other formulas for the same sample size. 

 

I wrapped a MIN() round the first variable like @dedelman_clng did in his example and we are seeing some answers again but they aren't accuarte from a quick validation check. 

 

Distance to Closest House (Miles) =
VAR HouseLatitude = MIN('Route Information'[Lattitude])
VAR HouseLongitude = MIN('Route Information'[Longitude])
VAR EarthCircumference = 3959
VAR P =  DIVIDE (PI (), 180 )
RETURN
    MINX (
        'Route Information',
        VAR CinemaLatitude = 'Route Information'[Lattitude]
        VAR CinemaLongitude = 'Route Information'[Longitude]

I then wrapped MIN round both and even with the IF statement for 0 values; 

 

IF (
                _DistanceFromCurrent <> 0,
                _DistanceFromCurrent
            )

The entire column just returns a blank? 

 

Is there a way to validate the longitude/latitude coordinates aren't the same before the calculation is preformed? 

 

I will start preparing some data in a pbix now and attached asap. 

 

Thanks, 

 

Josh

 

 

Anonymous
Not applicable

Hi @AlB & @dedelman_clng

Please find attached link to the sample pbix.

 

I have included the calc for the closest cinema too so its as is and also the current/ latest formula for closest neighbour. 

 

The file is large due to the latitude / longitude lookup and supporting csv but if that causes an issue let me know and i can chop it up and remove that csv datasource.

 

Thanks, 

 

Josh

 

 

 

@Anonymous

 

I can't download the file, don't know what is going on.

How big is it? Could you upload it to for instance tinyupload.com and I'll try from there? Or www.filedropper.com if it's larger than 50 MB

I can't get to a file that large either - a dataset with 100 points in it should be sufficient if you can make that happen.

Anonymous
Not applicable

http://www.filedropper.com/closestdestinationsample 

 

Its 97mb because of the latitude and longitude table to pair with the postcodes. 

I never started with the long/lat coordinates i had to get them from another source and then tie them together based on postcode. 

 

If thats still an issue, i can take a static copy of the tables with all the values and pass it over that way, should make it alot smaller without the long/lat csv included. 

 

Thanks, 

 

Josh

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.