08-14-2016 09:28 AM - last edited 11-16-2016 01:28 AM
Clustering is used to find similarity groups in your data. For example, the clustering algorithm and visual can automatically find customer segments, for which you can then optimize in your marketing campaigns.
Prerequisites (The sample .pbix files will not work without these prerequites completed)
1. Install R Engine
Power BI Desktop does not include, deploy or install the R engine. To run R scripts in Power BI Desktop, you must separately installR on your local computer. You can download and install R for free from many locations, including the Revolution Open download page, and the CRAN Repository.
2. Install the required R packages.
Download the R script attached to this message and run it to install all required packages on your local machine.
Required R packages:
cluster. car. scales, fpc, mclust, apcluster, vegan
CRAN 3.3.1, MRO 3.3.0, powerbi.com
As a Power BI author with only the most basic knowledge of R, I think the R Library is a great concept. However, actually making use of what's been posted on the site is extremely hard because so much code has been written into the examples, I have no idea which lines of code are actually driving the visual. When I pasted the code into a Word document, it was 18 pages in length, with 1851 words.
Is there any way to simplify what's in the library so that it's more comprehensible and usable?
R is its own language and it takes most of us some time to learn how to do the basics in it, not to mention the more advance transformations and graphics. This code is very well commented with #Comments Here and that should give you a sense as to what each step is doing.
You might want to load the script into R and then watch the graphics pane as you parse the code line by line. The function plot(...) is where some of the work is done, and then the subsequent statements add to it according to the parameters defined at the beginning of the script.
The clustering code is too complicated, mostly because it contains the implementation of the the automatic mode and many parameters to create flexible visual. In addition we do a lot of data correctness testing...
The simplest code to start with is "corrplot". We recommend to look at the R code in dedicated R-IDE, like RStudio.
Try to install it from RStudio console with
You mau comment out
Just don't use "long" method for number of clusters search