Unexpected behavior of R Script Visual caused by m...

ZSAV01 · ‎08-27-2018

When plotting R Script Visual, Power BI seems to output all the columns related to a .csv(UTF-8 encoding) file, then read the data into R with "read.csv" function.

The function is like this:

`dataset` = read.csv('C:/Users/***/REditorWrapper_5c23c015-1ba8-4bb0-a824-3f8179725132/input_df_73ea8250-a892-46d5-8e7d-c38f5993bbed.csv', check.names = FALSE, encoding = "UTF-8", blank.lines.skip = FALSE)

When handling data with Chinese characters, this line of code may cause a serious problem, that R cannot read the data correctly.

Mainly, R will fail to separate the data using commas. The correct way of handling UTF-8 encoded '.csv' file is to replace

encoding = 'UTF-8'

with

fileEncoding = 'UTF-8'

See this file for example https://1drv.ms/u/s!An3qTCClETscjq41_6sBSz5RlJ2C0w

This is the original data:

Original Data

This is the data loaded into R using ’read.csv‘ with encoding = "UTF-8", which apparently is wrong. This seems to be the way that R handles the data when collaborating with PowerBI...

Data loaded into R

If I change the code in REditorWrapper.R to fileEncoding = "UTF-8", it could be fixed. However, I haven't figure out how to alter the code to fix the problem in R Script Visual... The way R load the data is handled by BI right now.

Fixed Fixed

Fixed

v-qiuyu-msft · ‎08-29-2018

Hi @ZSAV01,

Does the issue happen when you get data via R script or use R visual to plot data?

Based on my test, we can get data via R script data source below:

let
    Source = R.Execute("MyData <- read.csv('C:/Users/<user>/Downloads/input_df_73ea8250-a892-46d5-8e7d-c38f5993bbed.csv', check.names = FALSE, encoding = ""UTF-8"", blank.lines.skip = FALSE)"),
    MyData1 = Source{[Name="MyData"]}[Value]
in
    MyData1

If you add a R visual you can also execute below code:

library(ggplot2)
library(gridExtra)
MyData <- read.csv('C:/Users/<user>/Downloads/input_df_73ea8250-a892-46d5-8e7d-c38f5993bbed.csv', check.names = FALSE, encoding = "UTF-8", blank.lines.skip = FALSE)
grid.table(MyData)

In your scenairo, please update Power BI desktop to the same version 2.61.5192.601 64-bit (August 2018) as ours then test again.

Best Regards,
Qiuyun Yu

ZSAV01 · ‎08-29-2018

Hi Qiuyu,

I confirmed my BI version is Version: 2.61.5192.601 64-bit (2018年8月)

I encountered this problem when I loaded the data into BI and tried to use R Script Visual visualizations directly in BI.

As you can see below, BI will throw an error that data cannot be loaded into R.

企业微信截图_20180829175922.png

Chinese R users may be familiar with this error. Mainly this is caused by wrong parameters set in read.table / read.csv . When loading this kind of data into R, the preferable way of setting encoding is to use the "fileEncoding" parameter instead of the "encoding" parameter.

If you callup Rstudio IDE and see the code in REditorWrapper.R, you will find the reason clearly.

企业微信截图_20180829182320.png

BI orders R to use read.csv(..., encoding = "UTF-8"), which is a bad idea.

The "encoding" parameter only applies to the "string", while "fileEncoding" parameter applies to the "file".

When Chinese characters are involved, encoding = "UTF-8" may result in un expected behavior.

See te R documents cited below.

--------------------------

R document for read.table : DATA INPUT

fileEncoding
character string: if non-empty declares the encoding used on a file (not a connection) so the character data can be re-encoded. See the ‘Encoding’ section of the help for file, the ‘R Data Import/Export Manual’ and ‘Note’.

encoding
encoding to be assumed for input strings. It is used to mark character strings as known to be in Latin-1 or UTF-8 (see Encoding): it is not used to re-encode the input, but allows R to handle encoded strings in their native encoding (if one of those two). See ‘Value’ and ‘Note’.

-------------------------

Error Message from BI:

Feedback Type:
Frown (Error)

Timestamp:
2018-08-29T10:03:11.9536001Z

Local Time:
2018-08-29T18:03:11.9536001+08:00

Session ID:
6b145a69-5cbe-4449-8ee8-5e6f45e3d070

Release:
August 2018

Product Version:
2.61.5192.601 (18.08) (x64)

Error Message:
R script error.
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
列的数目比列的名字要多
Calls: read.csv -> read.table
停止执行

OS Version:
Microsoft Windows NT 10.0.15063.0 (x64 zh-CN)

CLR Version:
4.7 or later [Release Number = 460798]

Peak Virtual Memory:
34.3 GB

Private Memory:
361 MB

Peak Working Set:
498 MB

IE Version:
11.1088.15063.0

User ID:
9f2e364a-333c-4c09-adf6-a5d7e344fdcb

Workbook Package Info:
1* - zh-CN, Query Groups: 0, fastCombine: Disabled, runBackgroundAnalysis: True.

Telemetry Enabled:
True

Model Default Mode:
Import

Snapshot Trace Logs:
C:\Users\Zheng\AppData\Local\Microsoft\Power BI Desktop\FrownSnapShot1398057214.zip

Performance Trace Logs:
C:\Users\Zheng\AppData\Local\Microsoft\Power BI Desktop\PerformanceTraces.zip

Disabled Preview Features:
PBI_shapeMapVisualEnabled
PBI_newFromWeb
PBI_SpanishLinguisticsEnabled
CustomConnectors
PBI_variationUIChange
PBI_canvasTooltips
PBI_PythonSupportEnabled
PBI_showIncrementalRefreshPolicy
PBI_compositeModels
PBI_DB2DQ

Disabled DirectQuery Options:
PBI_DirectQuery_Unrestricted
TreatHanaAsRelationalSource

Cloud:
GlobalCloud

DPI Scale:
150%

Supported Services:
Power BI

Formulas:

section Section1;

shared #"input_df_73ea8250-a892-46d5-8e7d-c38f5993bbed (1)" = let
Source = Csv.Document(File.Contents("C:\Users\Zheng\Downloads\input_df_73ea8250-a892-46d5-8e7d-c38f5993bbed (1).csv"),[Delimiter=",", Columns=6, Encoding=65001, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"1.近三个月，您最常用哪个版块看资讯？[图片]", type text}, {"编号", Int64.Type}, {"2.整体来说，您看资讯的总满意度是", Int64.Type}, {"3.能根据我一直感兴趣的兴趣点推荐相关资讯", Int64.Type}, {"4.能根据我最近感兴趣的兴趣点推荐相关资讯", Int64.Type}, {"6.我选择对某类资讯不感兴趣后，就没有再推荐给我", Int64.Type}})
in
#"Changed Type";

ZSAV01 · ‎08-29-2018

@v-qiuyu-msft Thank you!

v-qiuyu-msft · ‎08-30-2018

Hi @ZSAV01,

Which R version do you set Power BI desktop to run with? I tested with R 3.5.1, the same R script doesn't throws error.

But based on my research, ggplot can't be used in this way to plot a visual. To plot a table visual, you can run the script below as I mentioned previously:

library(gridExtra)
grid.table(dataset)

Best Regards,
Qiuyun Yu

ZSAV01 · ‎08-30-2018

Hi @v-qiuyu-msft,

This data is just for demonstration of the read.csv error. I understand that it cannot be plotted by ggplot.

I tested this code on R 3.5.0... I'm not sure if this is the reason, this is more like a system locale related problem. Which locale are you using? I'm in CP936.

Sys.getlocale()
[1] "LC_COLLATE=Chinese (Simplified)_People's Republic of China.936;LC_CTYPE=Chinese (Simplified)_People's Republic of China.936;LC_MONETARY=Chinese (Simplified)_People's Republic of China.936;LC_NUMERIC=C;LC_TIME=Chinese (Simplified)_People's Republic of China.936"

I'll test it on R 3.5.1 when I have time. For now, I added this line in Rprofile.site and fixed this problem temporarily.

In short, I overwrote the "read.csv" function, to force pass "UTF-8" to "fileEncoding" instead of "encoding".

.First = function() read.csv <<- function (file, header = TRUE, sep = ",", quote = "\"", dec = ".", 
    fill = TRUE, comment.char = "", encoding = "" , ...) {
read.table(file = file, header = header, sep = sep, quote = quote, dec = dec, fill = fill, comment.char = comment.char, fileEncoding = encoding, ...)
	}

Anonymous · ‎05-05-2019

I have a similar problem when there is new line character.

Unexpected behavior of R Script Visual caused by misused 'read.csv' parameter

TypeConversionFailure when not trying to convert

filter pane values in service are showing in black...

Wrong french translation for "reader" permission

'Select All' option in a slicer is not intuitive w...

Error could not copy visual