Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

ruthpozuelo

SandDance, my first impressions and test

During the Easter holidays I spent a few hours playing with Microsoft’s new visualization SandDance.

 

The first thing I did was test their web version and test the titanic data, but it didn’t take long until I downloaded it as a custom visual to play with it using my own data: Google Analytics data, the best and most structured dataset I personally own.

 

As Google Analytics has Geo data with latitude and longitude I could not resist to test that first. I have a website that has around 600 visitors/day, so I used that site producing the following results:

 

You are seeing : Latitude, longitude and sessions.

 

I guess there are too many data points and as you see below, SandDance visualization removed data points without any warning (?).

 

image001.png

  

 

Improvement idea n1: Warning when it is not possible to visualize all data

....If that is the reason why I had missing data.

 

Improvement Idea nr2: Use Power BI filters to filter data

It would have been great to be able to use Power BI filters to display only recent years, but unfortunately it is not possible. I didn't want to filter the data in the Query Editor (Power Query) so I switched to another website with less data.

 

image002.png

 

Now everything was displaying as it should. Great! I can see Spain, US, India, Europe, UK….

 

I changed to the column visualization and was of course delighted with the animation of the Sand grains moving. I could see Europe dissolve into a column as well as other parts of the world:

 

image003.png

When the animation was completed, this was the result:

 

image004.png

 

What was I looking at?

  • Longitude on the X-axis
  • No of records on the y-axis
  • And I could adjust the number of bins

So I changed the number of bins to 25 to divide Europe in more sections and get more detail, and highlighted the tallest bar by clicking on the area marked by a red rectangle.

 

image005.png

 

With that area highlighted, if I changed back to the map (scatter plot) I could see which region it represented:

 

image006.png

 

Great, it was Germany, Italy, parts of Scandinavia and some other countries.

 

Ok, what else can I do?

  1. I switched to the Grid visualization
  2. Sorted by sessions
  3. Selected a data point and
  4. Clicked on the “I” to get the details on that data point

image007.png 

 

The grid visualization shows you all data points used by SandDance and the “I” shows you what each data point represents.

 

Improvement Idea nr3: Allow users to resize the details “window” so it shows all the info without hovering or scrolling

 

image008.png

 

It was here when I started wondering what each data point meant. That should have been the first thing I wondered, but I was too busy playing with the visualizations :S

 

Better late than never, I started getting the details of some data points to understand what they represented. To get a better understanding of what each data point represented, I added extra fields which led to Improvement Idea nr4:

 

Improvement Idea nr 4: Keep the actual settings and visualization when new fields are added

I don’t know why, but every time I added fields to the visuals, the visualization reseted back to the scatter visual and I had to recreate it all  over again. Quite annoying after playing around with it for a while.

 

Ok, back to our grid visual. I added city and country to the values and started exploring each data point:

Data point 1: Phnom Penh, Cambodia, 11 sessions

 

image009.png

 

Data point 2: Moscow, Russia, 9 sessions

Data point 3: No Geo data, 9 sessions

Data point 4: Baghdad, Iraq, 7 sessions

 

Something is wrong… Moscow 9 sessions? Not possible. I created a simple table with Moscow as filter and the results:

 

image010.png

Obviously I am doing something wrong or misunderstanding how SandDance works altogether.

 

I created a filter hoping I can see if Moscow have multiple data points and I am just visualizing the one with the higher sessions and here is what I see:

 

Moscow: 7 records.

 

1 record: 1 session

1 record: 2 sessions

1 record: 3 sessions

1 record: 4 sessions

1 record: 5 sessions

1 record: 6 sessions

1 record: 9 sessions

 

image011.png

 

???????? (You should have seen my face) :S

 

I know Moscow has a lot of records, so what is going on?

 

image012.png

 

Is it doing a distinct count?

 

image013.png

 

No, it is not. Nine sessions is not in the raw data. What is then SandDance doing? I have no idea….

 

I looked at the example from Microsoft on the elections and I saw that the data was aggregated.

 

I did that with my data in PowerPivot and I ended up with one row where Moscow had 297 sessions, and suddenly I could see that in SandDance. 1 record= 297 sessions.

 

image014.png

 

I filter by Moscow to make sure I only have one record:

 

image015.png

 

Is that it? I need to aggregate the data? Anybody knows?

 

I would love to hear what SandDance is doing with non-aggregated data.

 

Ok, lets continue, but with my aggregated data, that is, one city one row of data.

 

I now went back to the scatter visualization and looked at the “world map” but this time I colored by sessions:

  1. Select scatter plot
  2. Select color by Sessions
  3. SandDance will decide the size of the buckets.

 image016.png

 

In this visualizations 4 cities stand out based on the bucket size (3) chosen by SandDance: Stockholm, London, New York and Moscow.

 

I want more granularity in the bucket size, so I can changed that by:

  1. Select color by
  2. Click on Palette
  3. Select the color theme you want to use: Custom, Power BI, Excel, etc.
  4. Select the number of buckets
  5. Select the color palette

 image017.png

 

This leads me to the next improvement:

 

Improvement idea nr5: Allow the user to select the number of buckets to color the data by.

I can only choose max 12 buckets, so I do that and the data looks like this:

 

image018.png

 

A few new colors appear, but the data is hard to read, as most of the data is on the red section.

This leads me to the next improvement,

 

Improvement idea nr6: Allow users to select the colors on the palette.

I think this is possible in the web version but I have not figured out how to do it on power bi.

 

Now, you can click on the session’s buckets to filter the data on the map. For example, if I click on the number zero, it will highlight the number of sessions with zero value. In Google Analytics this normally means referral spam and that is fake data, so we can remove those data points by clicking on the “square with a white dot” button:

 

image019.png

 

The data is still quite unreadable, so I changed the visualization to a column and when I clicked on the bucket 42, I can see that 99% of the data is in that bucket:

 

image020.png

 

Again, not very useful bucket distribution. If I remove the “outliers” I would want to get new “buckets” on the data that is left, but unfortunately, the buckets still remain. The same occurs if I look only at the “outliers”:

 

image021.png

Improvement idea nr7: Automatically resize buckets acc. to data in the visualization. Allow user to choose bucket size.

That would be great, to be able to dig in the data like that….. Hopefully soon.

 

So, let’s look at the only thing I can look at without modifying the data, the countries with highest sessions. As the detail window does not show all the data, I have to scroll to find the sessions, I created a table with cities and sessions and place it in the background:

 

image022.png

This leads me to the next improvement suggestion:

Improvement nr8: The order of the fields determines the order of the fields in the detail window.

 

Ok, let’s reset the filters and look at other visualizations:

 

 image023.png

The density visualization is telling me that most of my data points are in the northern half part of the world: Europe and US.

 

image024.png

 

Let’s look at the stacked visualization:

 

image025.png

 

Let’s see what this is plotting:

  1. Y-axis is the latitude with a bucket size of 9
  2. X-axis is the longitude with a bucket size of 9 and
  3. What does the bins in the corner do? (3)

To understand that, I had to zoom in the data. You can do that with the Navigation Panel:

 

 image026.png

I changed the Bin size from 3 to 1 and the visualization looked like this:

 

image027.png

I then changed it by 2 and this is what happened:

 

image028.png

Ok, so it widens the data….how is that useful I have yet to figure out.

 

By widening the latitude and longitude bins the word map starts to appear again:

 

image029.png

 

Here I need to zoom in to see any detail on the data:

 

image030.png

It is not really showing me anything new, or I am missing something? Perhaps, as all my data is hidden in one bucket, that is the reason why this is not so useful at the moment?

 

Let’s look at other data to see if it is more useful. This particular site has a membership program. I have aggregated the data to see where the memberships occur.

If I remove all the zero memberships, I am left with:

 

image031.png

 

I need to zoom in the data again, but how to do it the best way?

I changed the left-down corner bin size to 1 and suddenly I could see the data better:

 

image032.png

 

Most of my sessions are from Europe and US, but a lot of people are converting in Asia. Is that a missing opportunity?

To see it in more detail I convert the visualization to a column visualization:

 

image033.png

 

Let’s look at my engagement goal, that is, users that spent more than 1 minute on a page.

 

image034.png

 

It would be great to be able to plot memberships and engagement goals in the same graph. We’ll do it manually. How are the Asian readers converting?

 

image035.png

 

At this point I am giving up and the snow is tempting me to go out.

 

I have Power Bi dashboards to see all these metrics in a better way in my opinion. I am probably misunderstanding the use cases for the SandDance visualization?

 

Have you tried it? How are you using it? Any tips? Does the data need to be aggregated?

 

I am looking forward to see posts from other people and their use cases to see if I can better utilize this and hopefully get answers to my questions Smiley Happy

 

 Want to see it live? Watch it on youtube1.png

 

Comments

Awesome review Ruth! Love it. I've been playing around with it myself and it's awesome!

Very thorough review - great feedback

I spent a lot of time this week playing with Sanddance and had a lot of the same reactions as you...

Great review!

Thanks for the time and the insight!

Hi Robin,

This is Steven Drucker from the SandDance team.

Great review and we appreciate all the feedback. Would love to start a dialog to address some of the issues that you've brought up. It seems like several of the issues might be related to the PowerBI custom visual version and may not apply as much to the standalone version and you might be able to give that a try: http://www.sanddance.ms, but then, of course, it won't have the other benefits that you get from PowerBI.

 

To try to address some of your points: The current version of PowerBI limits the number of points sent down to a custom visual to 10000 but that's changing to 30000 in the next release so that should help. Also, as you noticed, we only want unaggregated data in SandDance since we do the aggregation visually. I suspect that some of the confusion came by sending some sort of aggregation of the data where only some of the data made it to the visual. Definitely not the right behavior and we'll work on fixing that!

 

Also, in the current version, the filters from PowerBI do not impact SandDance, which is slated to be fixed as well.

Another small missing feature in the PowerBI version is the ability to resize the details box which is available in the web version.

 

Right now, adding another column of data from PowerBI resets the SandDance view since it acts as if an entirely new dataset has been loaded. This is also something that will be fixed.

 

For features: great ideas on selecting colors for data from a palette, setting bucket sizes, and allowing equal population buckets as opposed to equal value buckets. We're adding them to the 'features to implement' list.

 

We should also make sure that we have options for original column order as well as a sorted order for column pickers.

 

Finally, there's an option for setting the size of items in scatter plot (Shape Size in the options panel) which can make it so you don't have to zoom which can be more work. Another item on our list is to improve panning and zooming to work more intuitively.

 

As for scenarios that we're hoping people can use SandDance for, we've found it especially useful for showing stories about how people came to insights about their data by taking the audience through a logical sequence of steps (which may or may not be the steps they used to find the insights themselves). We've also found it best for identifying outliers and clusters that you would not see when you aggregate the data. 

 

Again, thanks for your comments and videos and we hope you continue to find ways to use SandDance!

Cheers!
--Steven Drucker

Principal Researcher

Microsoft Research

 

 

Hi Steven,

 

Many thanks for taking the time to answer my questions and suggestions.

 

Some comments:

 You mention that you want unaggregated data in SandDance, but mine behaved as I explained on the post. Only when I aggregated it in Power Bi I could make sense of it. I can attach my Power BI file if it helps for troubleshooting?

 

I did play with the web version of SandDance but just for a short while. Moving the data to the web so I can use SandDance is not a likely scenario for me: time consuming to move/ update the data, spreading analytics insights on different tools, etc... Perhaps if Power BI connectors and Power query was available in SandDance web....but then again, that's what Power BI does... I think it was a great decision to offer SandDance as a custom visual. 

 

I am looking forward to test the custom visual again when the new version is released, and I am really happy that my test, contributed with ideas for the development of the tool. What I am looking forward the most in the future development, is being able to set my own bucket size and colors.

 

I did test SandDance with Google analytics user demographic data and with that, a story could actually be told using SandDance, so I understand what you mean that it could be use as a storytelling tool.

 

Thanks for the great feedback from everybody, it inspires me to keep on sharing!  🙂

 

/Ruth

Very impressive post.

Thanks @CABIRDUK!