cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
Helper II
Helper II

Need Measure to count nearby points on a Scatter

Quite simply I want a measure that counts the number of nearby points (relative to a given point) on a Scatter Plot visual. It could use an arbitrary radius (like 5 units). I would use this to assign color or size to points that are close together. 

 

Points in the red circle below have many nearby neighbors, and so the measure should return a higher number (so I can associate it with a "hot" color).  

bvy_1-1598645905851.png

 

bvy_2-1598645933479.png

 

Is this doable? 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
Super User IV
Super User IV

@bvy - Yes, definitely. However, I thought there was a built-in K-means cluster finder

https://docs.microsoft.com/en-us/power-bi/visuals/power-bi-visualization-scatter

 

Anyway, you could probably always do this:

Measure =
  VAR __radius = 5
  VAR __x1 = MAX('Data'[X])
  VAR __y1 = MAX('Data'[Y])
  VAR __id = MAX('Data'[ID])
  VAR __Table = ADDCOLUMNS(FILTER(ALL('Data'),[ID]<>__id),"Distance",SQRT( (__x1 - [X])^2 + (__y1 - [Y])^2) )
RETURN
  COUNTROWS(FILTER(__Table,[Distance]<=__radius))

Warning, this could eat up some processing cycles!!


---------------------------------------

@ me in replies or I'll lose your thread!!!

I have a NEW book! 
DAX Cookbook from Packt
Over 120 DAX Recipes!




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




View solution in original post

17 REPLIES 17
Highlighted
Super User IV
Super User IV

@bvy - Yes, definitely. However, I thought there was a built-in K-means cluster finder

https://docs.microsoft.com/en-us/power-bi/visuals/power-bi-visualization-scatter

 

Anyway, you could probably always do this:

Measure =
  VAR __radius = 5
  VAR __x1 = MAX('Data'[X])
  VAR __y1 = MAX('Data'[Y])
  VAR __id = MAX('Data'[ID])
  VAR __Table = ADDCOLUMNS(FILTER(ALL('Data'),[ID]<>__id),"Distance",SQRT( (__x1 - [X])^2 + (__y1 - [Y])^2) )
RETURN
  COUNTROWS(FILTER(__Table,[Distance]<=__radius))

Warning, this could eat up some processing cycles!!


---------------------------------------

@ me in replies or I'll lose your thread!!!

I have a NEW book! 
DAX Cookbook from Packt
Over 120 DAX Recipes!




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




View solution in original post

Highlighted

@Greg_Deckler  This works pretty well. Thanks so much for taking the time. A few observations/questions: 

 

1. What is the purpose of this filter? 

[ID]<>__id

 When I plug that measure in for Size, the Scatter drops points that have 0 neighbors. 

 

2. Also, if I want to have the measure also respond to the report filters, is that just a matter of changing FILTER(ALL to FILTER(ALLSELECTED? (I'm new to DAX.)

 

3. And if so (#2), will that potentially help performance since the measure is operating on fewer records? (To be honest, performance wasn't bad in my sample app but we'll see how it scales.) 

 

Thanks again. 

Highlighted

@bvy - Well, I didn't actually test the code...

 

The intention of the [ID]<>__id was that it would exclude the current point (itself) from the calculation of distance. Eliminating a single point is not a huge performance boost, but, whatever. Not sure why that wouldn't work though, but whatever.

 

Yes, if you want to preserve slicer selections, use ALLSELECTED versus ALL and yes that should improve performance as well in situations where you have slicer selections.

 

Finally, I believe if you put something like your Category into your Legend that if you then clicked the ellipses on your scatter chart that you would have to option to automatically find clusters and that, I believe, uses K-means.


---------------------------------------

@ me in replies or I'll lose your thread!!!

I have a NEW book! 
DAX Cookbook from Packt
Over 120 DAX Recipes!




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




Highlighted
Resolver V
Resolver V

Hi there.

How many points in total do you have? Asking because you've got 2 choices: either you calculate this stuff on the fly (could take quite a time and eat up resources) or you pre-calculate the distances and then just retrieve them.
Highlighted

Hi @daxer-almighty . Can't really pre-calculate because the count is very subject to selected filters. Open to ideas though... 

Highlighted

If, in your dataset, you have N points, then you need a table with 3 columns: StartPoint, EndPoint, Distance. So, you need - depending on how easy the DAX must later be - either N*(N-1) or N*(N-1)/2 rows in the table. You can then connect it to other table(s) and it'll be filtered. You can also leave it disconnected and use DAX to retrieve the points and distances you need. I can't see filtering as being any obstacle to implement this...
Highlighted

@daxer-almighty Okay, well this is interesting and worth a little exploration.

 

The raw, unfiltered dataset of points (coming from SSAS) could potentially have a million rows. In the PBI Scatter visual, though, with filteres and slicers applied, I wouldn't expect more than a few thousand points to be displayed at one time (at most). Since the dataset of distances you're proposing is on the order of N^2, I don't see it as practical to do this in SSAS on the full dataset. So are you thinking of implementing this in memory somewhere using DAX, on the filtered dataset? 

 

Definitely interested in hearing more. 

Highlighted

You can easily make the calculations speedier by just storing the distances of points to their nearest neighbours. You could, for instance, take the 10^6 points and either:

1) pre-calculate the distances to points within a certain radius, or
2) pre-calculate the distances to the 10 or 100 or 1000... nearest points.

How to use this? Easy. Just create DAX that will, for any point, look for an entry in the calculated distances table, and if it doesn't find it there, just calculate it as you normally do. This way you can certainly speed up the calculations. Looking things up - maybe through a relationship - may be faster than calculating. You just have to experiment.
Highlighted

@daxer-almighty 

 

In that case I really don't even need the distance. It's only used to determine if the closeness requirement is met. So two points get a record in the table only if they're sufficiently close. If each point has, on average, 10 neighbors, then that might be manageable to pre-process. Then it would just be a matter of doing a count where the point appears in either column (assuming we're not storing (A, B) and (B, A) in the table).

 

I'm hazy on how to implement this. A separate table in SSAS? Process via an SP in the back end database? DAX? 

 

Thoughts @Greg_Deckler ? 

Helpful resources

Announcements
Community Conference

Power Platform Community Conference

Check out the on demand sessions that are available now!

Community Conference

Microsoft Power Platform Communities

Check out the Winners!

secondImage

Create an end-to-end data and analytics solution

Learn how Power BI works with the latest Azure data and analytics innovations at the digital event with Microsoft CEO Satya Nadella.

Top Solution Authors
Top Kudoed Authors