R script and R visuals in Power BI

ankitpatira · ‎04-03-2016

Prerequisites -

R environment setup with 'maps', 'geosphere', 'data.table' packages installed.

Power BI desktop is configured for R installation.

I have two sample csv datasets, one containing migration statistics data from different countries to Australia over the number of years and other having list of countries with latitude and longitude co-ordinates for each.

To import dataset into Power BI using R, go to Get Data -> Other -> R script (Beta) and paste below code to generate data frame from above mentioned two datasets. Inside read.csv method replace ~ with appropriate folder location. Script will also prepare data frame by merging data from both csv, using match to lookup values and finally removing unwanted columns.

#Import both csv datasets
dataset1 <- read.csv("~/countries.csv", header = TRUE)
dataset2 <- read.csv("~/migrations.csv", header = TRUE)

#merge both datasets to generate single data frame by joining with country name
df <- merge(x=dataset1, y=dataset2, by.x = "country", by.y = "From")

#rename columns
library(data.table)
setnames(df, old=c("latitude","longitude"), new=c("latitude_from","longitude_from"))

#find co-ordinates of Australia using match function from the countries dataframe
df$latitude_to <- dataset1[,"latitude"][match(df$To, dataset1[,"country"])]
df$longitude_to <- dataset1[,"longitude"][match(df$To, dataset1[,"country"])]

#remove unwanted columns 
df$country <- NULL
df$To <- NULL

Result willl be three datasets as shown below. Click on df and Load.

Dataset will have 6 columns ready to be plotted onto map.

Select R script visuals from visualization pane and drag all the fields into visualization pane to start scripting. Power BI at this stage will automatically create another data from the selected fields and remove duplicates for you.

Paste and run below code into R script editor to generate world map using R visual as below.

#load maps and geosphere packages 
library(maps)
library(geosphere)

#order rows in data frame by values of their count this is to ensure when lines are drawn into map
# higher count lines gets drawn on top of smaller count lines 
dataset <- dataset[order(dataset$count),]

# create basic map 
map("world", fill=TRUE, bg="black", lwd=0.05)

To plot connections onto map, I will use for each loop in R to iterate over each row of the data frame and create a list of co-ordinates from and to. Then using gcIntermediate function of geosphere package which will return points on the map and using lines() function join two points returned by gcIntermediate.

#load maps and geosphere packages 
library(maps)
library(geosphere)

#order rows in data frame by values of their count this is to ensure when lines are drawn into map
# higher count lines gets drawn on top of smaller count lines 
dataset <- dataset[order(dataset$Count),]

# create basic map 
map("world", col="#f2f2f2", fill=TRUE,  lwd=0.05)

#iterate over data frame, plot points on map using from and to coordinates and draw lines connecting two points
for (j in 1:length((dataset$Year)))
{
a <- c(dataset[j,]$longitude_from, dataset[j,]$latitude_from)
b <- c(dataset[j,]$longitude_to, dataset[j,]$latitude_to)
inter <- gcIntermediate(a,b,n=100,addStartEnd=TRUE)
lines(inter, col="black", lwd=0.5)
}

Let’s change some property to enhance this map. Change map colour, background and lines colour to red.

#load maps and geosphere packages 
library(maps)
library(geosphere)

#order rows in data frame by values of their count this is to ensure when lines are drawn into map
# higher count lines gets drawn on top of smaller count lines 
dataset <- dataset[order(dataset$Count),]

# create basic map 
map("world", col="grey50", fill=TRUE, bg="black", lwd=0.05)

#iterate over data frame, plot points on map using from and to coordinates and draw lines connecting two points
for (j in 1:length((dataset$Year)))
{
a <- c(dataset[j,]$longitude_from, dataset[j,]$latitude_from)
b <- c(dataset[j,]$longitude_to, dataset[j,]$latitude_to)
points <- gcIntermediate(a,b,n=100,addStartEnd=TRUE)
lines(points, col="red", lwd=0.5)
}

It is better but still not able to distinct the proportion of population from each country. It would be nicer if there was a way to demonstrate the number of migrations by colour coding to distinct each country. Yes there is, colorRampPalette() function which returns n number of RGB values between two colours specified. Next before applying colours vector to lines calculate the colour based on number of count current row has compared to maximum count in for loop. Then apply that to lines() function.

#load maps and geosphere packages 
library(maps)
library(geosphere)

#order rows in data frame by values of their count this is to ensure when lines are drawn into map
# higher count lines gets drawn on top of smaller count lines 
dataset <- dataset[order(dataset$Count),]

# create basic map 
map("world", col="grey50", fill=TRUE, bg="black", lwd=0.05)

#get max of count column
maxcnt <- max(dataset$Count)

#getting list of colours between two colours
colors <- colorRampPalette(c("orange", "yellow"))(25)

#iterate over data frame, plot points on map using from and to coordinates and draw lines connecting two points
for (j in 1:length((dataset$Year)))
{
a <- c(dataset[j,]$longitude_from, dataset[j,]$latitude_from)
b <- c(dataset[j,]$longitude_to, dataset[j,]$latitude_to)
points <- gcIntermediate(a,b,n=100,addStartEnd=TRUE)
colindex <- round( (dataset[j,]$Count / maxcnt) * length(colors) )
lines(points, col=colors[colindex], lwd=0.5)
}

It is clear to distinct that density around Asian countries is more than that of African countries and so migration count has been higher for former than the latter.

avinashbeepath · ‎04-14-2016

this is brilliant...so much ideas we can use this for!

Good stuff! Keep it up!

mjh123 · ‎04-20-2016

Perhaps I missed it... Where do I get the two sample csv datasets?

Anand · ‎04-26-2016

Can we get the sample files please

Anand · ‎04-26-2016

Same logic I tries was unsucessful U showed smart way

ankitpatira · ‎04-26-2016

@Anand @mjh123 Sorry guys I missed to upload csv files. Will get them uploaded asap.

Vkrishnan · ‎07-21-2016

Can someone explain if I can use a R file instead of pasting the code in the R script window. I read someone saying that we can give the full path name of the file, but that is not working for me. Also need some explanation on how to rerefresh the R-script using personal gateway, rather than the source file (In the Power BI service, only the output file(data frame) can be refreshed, if I specify the path of the script, it throws error.

xstodeepak · ‎08-07-2016

Brilliant article.....thanks.

xstodeepak · ‎08-07-2016

@ankitpatira Waiting for the sample CSV files used....

Sirmark · ‎02-16-2017

Hello

Without trying the two CVS sets, I tried to use the script where map libraries are available:

library(maps)
library(geosphere)

got following error:

Details: "ADO.NET: R script error.
Error in library(maps) : there is no package called 'maps'
Execution halted

any reason?

Sirmark · ‎02-16-2017

I was able to install the missing packages with Rstudio, (maps, geosphere, data.table)

But now got another error:

Feedback Type:
Frown (Error)

Timestamp:
2017-02-16T10:03:45.1555118Z

Local Time:
2017-02-16T11:03:45.1555118+01:00

Product Version:
2.43.4647.541 (PBIDesktop) (x86)

Release:
June, 2017

IE Version:
11.0.9600.18537

OS Version:
Microsoft Windows NT 6.1.7601 Service Pack 1 (x86 en-US)

CLR Version:
4.5.1 or later [Release Number = 378758]

Workbook Package Info:
1* - en-US, fastCombine: Disabled, runBackgroundAnalysis: True.

Peak Working Set:
457 MB

Private Memory:
398 MB

Peak Virtual Memory:
1.44 GB

Error Message:
R script error.
Loading required package: sp
Loading required package: methods
Error in order(dataset$count) : argument 1 is not a vector
Calls: [ -> [.data.frame -> order
Execution halted

User ID:
d4ec9cd4-f1b8-4176-a1b9-9e126e5d9e11

Session ID:
04872e61-a54a-4786-bee9-0a618e08d1e9

Telemetry Enabled:
True

Model Default Mode:
Import

Snapshot Trace Logs:
C:\Users\gisirei\AppData\Local\Microsoft\Power BI Desktop\FrownSnapShot1531301364.zip

Performance Trace Logs:
C:\Users\gisirei\AppData\Local\Microsoft\Power BI Desktop\PerformanceTraces.zip

Enabled Preview Features:
PBI_RedShift
PBI_Impala
PBI_Snowflake
PBI_shapeMapVisualEnabled
PBI_allowBiDiCrossFilterInDirectQuery
PBI_clusteringEnabled
PBI_esriEnabled

Disabled DirectQuery Options:
DirectQuery_Unrestricted

Cloud:
GlobalCloud

Activity ID:
d542171f-e60c-5f8e-d07e-f61d782e4d9d

Time:
Thu Feb 16 2017 11:03:17 GMT+0100 (W. Europe Standard Time)

Version:
2.43.4647.541 (PBIDesktop)

Client Error Code:
ServiceErrorToClientError

Error Details:
R script error.
Loading required package: sp
Loading required package: methods
Error in order(dataset$count) : argument 1 is not a vector
Calls: [ -> [.data.frame -> order
Execution halted

Stack Trace:
Microsoft.PowerBI.ExploreServiceCommon.ScriptHandlerException: R script error.
Loading required package: sp
Loading required package: methods
Error in order(dataset$count) : argument 1 is not a vector
Calls: [ -> [.data.frame -> order
Execution halted
---> Microsoft.PowerBI.Radio.RScriptRuntimeException: R script error.
Loading required package: sp
Loading required package: methods
Error in order(dataset$count) : argument 1 is not a vector
Calls: [ -> [.data.frame -> order
Execution halted

at Microsoft.PowerBI.Radio.RScriptWrapper.RunScript(String originalScript, Int32 timeoutMs)
at Microsoft.PowerBI.Radio.RScriptWrapper.RunScriptFromText(String scriptPlainText, Int32 timeoutMs)
at Microsoft.PowerBI.Client.Windows.R.RScriptHandler.GenerateVisual(ScriptHandlerOptions options)
--- End of inner exception stack trace ---
at Microsoft.PowerBI.Client.Windows.R.RScriptHandler.GenerateVisual(ScriptHandlerOptions options)
at Microsoft.PowerBI.ExploreServiceCommon.ScriptVisualCommandFlow.RunInternal(Stream dataShapeResultStream, QueryBindingDescriptor& bindingDescriptor)
at Microsoft.PowerBI.ExploreServiceCommon.ScriptVisualCommandFlow.Run(Stream dataShapeResultStream, QueryBindingDescriptor& bindingDescriptor)
at Microsoft.PowerBI.ExploreHost.SemanticQuery.ExecuteSemanticQueryFlow.TransformDataShapeResult(QueryCommand transformCommand, String dataShapeId, SemanticQueryDataShapeCommand command, Stream dataShapeResultStream, QueryBindingDescriptor& bindingDescriptor)
at Microsoft.PowerBI.ExploreHost.SemanticQuery.ExecuteSemanticQueryFlow.ProcessAndWriteDataQuery(IQueryResultDataWriter queryResultDataWriter, DataShapeGenerationContext dsqGenContext, EntityDataModel model, DataQuery query, ServiceErrorStatusCode& serviceErrorStatusCode)
at Microsoft.PowerBI.ExploreHost.SemanticQuery.ExecuteSemanticQueryFlow.ProcessAndWriteSemanticQueryCommands(IQueryResultsWriter queryResultsWriter, ExecuteSemanticQueryRequest request, IConceptualSchema conceptualSchema, EntityDataModel model)

DPI Scale:
100%

Supported Services:
Power BI

Formulas:

section Section1;

shared Sheet3 = let
Source = Excel.Workbook(File.Contents("N:\Planning\Project Reports Area\PROJECT & PROCESSES\Yoann Albrecht\TRADE STOWAGE UTILIZATION\Trade Utilization Test_3.xlsx"), null, true),
Sheet3_Sheet = Source{[Item="Sheet3",Kind="Sheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Sheet3_Sheet),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Month", type text}, {"Service", type text}, {"Location", type text}, {"Commercial Region", type text}, {"Values", type text}, {"Total", type number}})
in
#"Changed Type";

BIsteht · ‎04-20-2017

Is it possible to do this with one 1 - Origins and 2+ Destinations? This would be extremely helpful. I am not an R expert so just looking for a starting point.

Anonymous · ‎05-02-2017

Hello,

you also have a visual called 'Flow map' for PowerBI at the MS Office Store: https://store.office.com/en-us/app.aspx?assetid=WA104380901&ui=en-US&rs=en-US&ad=US&appredirect=fals...

Thank you

Pablo

BIsteht · ‎05-03-2017

Thanks Pablo! I'll check it out.

MAwbre · ‎10-03-2017

Why does it remove duplicates by default? When performing univariate qualitative analysis, I want to be able to drop in a single qualitative field. This emans I WANT duplicates and having keys complicates the analysis meaninglessly.

That automatic removal should be made an option. I believe that it was added because of the limitation of R scripts in Power BI to 150k rows. The removal of duplicates by Power BI (in what I assume to be some sort of pre-processor directive like call judging by the invalid code syntax shows in the editor) probably helps mitigate that limitation in certain types of data sets. Unfortunately, without the option to turn off that "pre-processor" like call, an entire segment of potential analysis is complicated or even impossible (if the original data set has no key).

Anonymous · ‎10-03-2017

MAwbre, I used to have the same problem, and what I did was to add one extra column with row counting, then I have distinct lines and Power BI won't remove any line of my data set.

Thanks

R script and R visuals in Power BI

Unveiling the Power of Lakehouse's SQL Analytics E...

How to provide sample data in the Power BI Forum

Re: Power BI Tip: How To Sort A Column With Duplic...

Important Links every Power BI enthusiastic shoul...

Enhanced Data Profiling in Power Query: GUI and Ta...