During the Easter holidays I spent a few hours playing with Microsoft’s new visualization SandDance.
The first thing I did was test their web version and test the titanic data, but it didn’t take long until I downloaded it as a custom visual to play with it using my own data: Google Analytics data, the best and most structured dataset I personally own.
As Google Analytics has Geo data with latitude and longitude I could not resist to test that first. I have a website that has around 600 visitors/day, so I used that site producing the following results:
You are seeing : Latitude, longitude and sessions.
I guess there are too many data points and as you see below, SandDance visualization removed data points without any warning (?).
Improvement idea n1: Warning when it is not possible to visualize all data
....If that is the reason why I had missing data.
Improvement Idea nr2: Use Power BI filters to filter data
It would have been great to be able to use Power BI filters to display only recent years, but unfortunately it is not possible. I didn't want to filter the data in the Query Editor (Power Query) so I switched to another website with less data.
Now everything was displaying as it should. Great! I can see Spain, US, India, Europe, UK….
I changed to the column visualization and was of course delighted with the animation of the Sand grains moving. I could see Europe dissolve into a column as well as other parts of the world:
When the animation was completed, this was the result:
What was I looking at?
Longitude on the X-axis
No of records on the y-axis
And I could adjust the number of bins
So I changed the number of bins to 25 to divide Europe in more sections and get more detail, and highlighted the tallest bar by clicking on the area marked by a red rectangle.
With that area highlighted, if I changed back to the map (scatter plot) I could see which region it represented:
Great, it was Germany, Italy, parts of Scandinavia and some other countries.
Ok, what else can I do?
I switched to the Grid visualization
Sorted by sessions
Selected a data point and
Clicked on the “I” to get the details on that data point
The grid visualization shows you all data points used by SandDance and the “I” shows you what each data point represents.
Improvement Idea nr3: Allow users to resize the details “window” so it shows all the info without hovering or scrolling
It was here when I started wondering what each data point meant. That should have been the first thing I wondered, but I was too busy playing with the visualizations
Better late than never, I started getting the details of some data points to understand what they represented. To get a better understanding of what each data point represented, I added extra fields which led to Improvement Idea nr4:
Improvement Idea nr 4: Keep the actual settings and visualization when new fields are added
I don’t know why, but every time I added fields to the visuals, the visualization reseted back to the scatter visual and I had to recreate it all over again. Quite annoying after playing around with it for a while.
Ok, back to our grid visual. I added city and country to the values and started exploring each data point:
Data point 1: Phnom Penh, Cambodia, 11 sessions
Data point 2: Moscow, Russia, 9 sessions
Data point 3: No Geo data, 9 sessions
Data point 4: Baghdad, Iraq, 7 sessions
Something is wrong… Moscow 9 sessions? Not possible. I created a simple table with Moscow as filter and the results:
Obviously I am doing something wrong or misunderstanding how SandDance works altogether.
I created a filter hoping I can see if Moscow have multiple data points and I am just visualizing the one with the higher sessions and here is what I see:
Moscow: 7 records.
1 record: 1 session
1 record: 2 sessions
1 record: 3 sessions
1 record: 4 sessions
1 record: 5 sessions
1 record: 6 sessions
1 record: 9 sessions
???????? (You should have seen my face)
I know Moscow has a lot of records, so what is going on?
Is it doing a distinct count?
No, it is not. Nine sessions is not in the raw data. What is then SandDance doing? I have no idea….
I looked at the example from Microsoft on the elections and I saw that the data was aggregated.
I did that with my data in PowerPivot and I ended up with one row where Moscow had 297 sessions, and suddenly I could see that in SandDance. 1 record= 297 sessions.
I filter by Moscow to make sure I only have one record:
Is that it? I need to aggregate the data? Anybody knows?
I would love to hear what SandDance is doing with non-aggregated data.
Ok, lets continue, but with my aggregated data, that is, one city one row of data.
I now went back to the scatter visualization and looked at the “world map” but this time I colored by sessions:
Select scatter plot
Select color by Sessions
SandDance will decide the size of the buckets.
In this visualizations 4 cities stand out based on the bucket size (3) chosen by SandDance: Stockholm, London, New York and Moscow.
I want more granularity in the bucket size, so I can changed that by:
Select color by
Click on Palette
Select the color theme you want to use: Custom, Power BI, Excel, etc.
Select the number of buckets
Select the color palette
This leads me to the next improvement:
Improvement idea nr5: Allow the user to select the number of buckets to color the data by.
I can only choose max 12 buckets, so I do that and the data looks like this:
A few new colors appear, but the data is hard to read, as most of the data is on the red section.
This leads me to the next improvement,
Improvement idea nr6: Allow users to select the colors on the palette.
I think this is possible in the web version but I have not figured out how to do it on power bi.
Now, you can click on the session’s buckets to filter the data on the map. For example, if I click on the number zero, it will highlight the number of sessions with zero value. In Google Analytics this normally means referral spam and that is fake data, so we can remove those data points by clicking on the “square with a white dot” button:
The data is still quite unreadable, so I changed the visualization to a column and when I clicked on the bucket 42, I can see that 99% of the data is in that bucket:
Again, not very useful bucket distribution. If I remove the “outliers” I would want to get new “buckets” on the data that is left, but unfortunately, the buckets still remain. The same occurs if I look only at the “outliers”:
Improvement idea nr7: Automatically resize buckets acc. to data in the visualization. Allow user to choose bucket size.
That would be great, to be able to dig in the data like that….. Hopefully soon.
So, let’s look at the only thing I can look at without modifying the data, the countries with highest sessions. As the detail window does not show all the data, I have to scroll to find the sessions, I created a table with cities and sessions and place it in the background:
This leads me to the next improvement suggestion:
Improvement nr8: The order of the fields determines the order of the fields in the detail window.
Ok, let’s reset the filters and look at other visualizations:
The density visualization is telling me that most of my data points are in the northern half part of the world: Europe and US.
Let’s look at the stacked visualization:
Let’s see what this is plotting:
Y-axis is the latitude with a bucket size of 9
X-axis is the longitude with a bucket size of 9 and
What does the bins in the corner do? (3)
To understand that, I had to zoom in the data. You can do that with the Navigation Panel:
I changed the Bin size from 3 to 1 and the visualization looked like this:
I then changed it by 2 and this is what happened:
Ok, so it widens the data….how is that useful I have yet to figure out.
By widening the latitude and longitude bins the word map starts to appear again:
Here I need to zoom in to see any detail on the data:
It is not really showing me anything new, or I am missing something? Perhaps, as all my data is hidden in one bucket, that is the reason why this is not so useful at the moment?
Let’s look at other data to see if it is more useful. This particular site has a membership program. I have aggregated the data to see where the memberships occur.
If I remove all the zero memberships, I am left with:
I need to zoom in the data again, but how to do it the best way?
I changed the left-down corner bin size to 1 and suddenly I could see the data better:
Most of my sessions are from Europe and US, but a lot of people are converting in Asia. Is that a missing opportunity?
To see it in more detail I convert the visualization to a column visualization:
Let’s look at my engagement goal, that is, users that spent more than 1 minute on a page.
It would be great to be able to plot memberships and engagement goals in the same graph. We’ll do it manually. How are the Asian readers converting?
At this point I am giving up and the snow is tempting me to go out.
I have Power Bi dashboards to see all these metrics in a better way in my opinion. I am probably misunderstanding the use cases for the SandDance visualization?
Have you tried it? How are you using it? Any tips? Does the data need to be aggregated?
I am looking forward to see posts from other people and their use cases to see if I can better utilize this and hopefully get answers to my questions