Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi Guys,
I'm trying to cleanse my data in Power BI with R. The data comes from a SQL query and then in power query I'm using the "Run R Script" as a final step.
I have 6 columns of data (Score [int], Date [date], Comments [varchar(max)], username [varchar(20)], Language [varchar(20)] and Source [int].
I'm new to R so this could be simple or I may be barking up the wrong tree completely. The R script I have attempted is:
# 'dataset' holds the input data for this script library(tm) library(wordcloud) words <- Corpus(VectorSource(dataset[,3])) words <- tm_map(words, stripWhitespace) words <- tm_map(words, content_transformer(tolower)) words <- tm_map(words, removeNumbers) words <- tm_map(words, removePunctuation) words <- tm_map(words, removeWords, stopwords("english")) words <- tm_map(words, stemDocument) output <- dataset output$cleansed <- words
It outputs all original columns as per before the R Script EXCEPT the date column which returns "Microsoft.OleDb.Date" for every value. The final column "Cleansed" just returns null.
Any ideas on what needs to be changed to get this to work?
Cheers
Hey @braybrookj,
I just used the code above and had no issues creating the visual (see below). What is the date column formatting? Mine is formatted as data type Date, format M/d/yyyy, "Do Not Summarize".
Hope this helps,
Alan
Hi @alanhodgson
Thanks for the response.
The issue I have is that I publish a data model to the service and build a collection of reports in the based off a common data model. R Visuals cannot be created in the service.
I'm trying to use the R script in power query to cleanse the data within teh data model and then use the Wordcloud custom visual.
Any thoughts on this process?
James
UPDATE
I have overcome the formatting issue by making every column text before running the R scipt, I will then re-format the columns after the script has run.
I'm still having issues with the script though:
# 'dataset' holds the input data for this script library(tm) library(wordcloud) words <- Corpus(VectorSource(dataset[,3])) words <- tm_map(words, stripWhitespace) words <- tm_map(words, content_transformer(tolower)) words <- tm_map(words, removeNumbers) words <- tm_map(words, removePunctuation) words <- tm_map(words, removeWords, stopwords("english")) output <- dataset output$text <- data.frame(text = sapply(words, as.character), stringsAsFactors = FALSE)
The output i get is my original 6 columns, plus a 7th column named "text". This column however is populated with null except for the first row which has an error.
DataFormat.Error: We cannot convert this value from its .RData representation.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
111 | |
100 | |
80 | |
64 | |
58 |
User | Count |
---|---|
148 | |
111 | |
93 | |
84 | |
66 |