Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
Anonymous
Not applicable

R script regex with spaces

Hi,

 

I tried creating small R script that is able to parse incorrect JSON column. The data is poor quality as it lacks quotations. Here is my code:

# 'dataset' holds the input data for this script
pattern <- "(:)(?!null)(?!true)(?!false)(?!\"[a-z0-9A-Z.:-[[:space:]]]+\")([a-z0-9A-Z.:-[[:space:]]]+)(,|})"
replacer <- function(x) { gsub(pattern, "\\1_TEST_\\2_TEST_\\3", as.character(x), perl = TRUE) }
output <- within(dataset, { PARSED=replacer(dataset$WORKSTATUSES) } )

What is happening the regex matches incorrect data in R runner but doesn't in Power BI. The problem are spaces. I tried it all; \\s, [:blank:], [[:blank:]], [:space:], [[:space:]] as well as inserting the space character itself into my patter. Some gave errors, some did nothing (regex must had not match). Can you tell me how to do it properly?

1 ACCEPTED SOLUTION
Anonymous
Not applicable

It turned out the problem is my script didn't handle all cases of string-type values that occure in JSON I parse. Also some cells contain empty strings (not null) and therefore aren't valid for parsing (produce Errors as well). Filtering empty rows out and then filtering rows with error let me investigate what the problem really was. 

View solution in original post

3 REPLIES 3
v-juanli-msft
Community Support
Community Support

Hi @Anonymous 

First, could you show me the error in Power BI when running the R script?

 

What is your purpose to use this code?

Some requiremnets can be directly achieved by Power Query, if your requirement is clear, we can make a test for it.

 

Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Anonymous
Not applicable

Hi,

 

What I needed was parsing improper JSON from mongoDB. JSON I get is missing quotations around keys and some of values. I managed to work this out through the weekend and the code that works looks like this:

'dataset' holds the input data for this script
pattern_key <- "(\"?)([a-zA-Z_-]+)(\"?:)"
pattern_value <- "(:\"?)(?!null|true|false)([a-zA-Z0-9\\s,-_:.]+)(,|})"
key_replacer <- function(x) { gsub(pattern_key, "\"\\2\":", x, perl = TRUE) }
value_replacer <- function(x) { gsub(pattern_value, "\\1\"\\2\"\\3", x, perl = TRUE) }
output <- within(dataset, { PARSED=value_replacer(key_replacer(dataset$WORKSTATUSES)) } )

Although this may lack some characters that should be treated as part of keys or values and as the result some quotations aren't being inserted in the right places. The error I get now says there are some unexpected characters at the end of JSON file. What's interesting I am able to parse JSON with my R script to new column, then parse JSON into Lists, further into separate rows and then expand as columns and it works fine in Query Editor. Once I click 'Apply changes' on the yellow bar in the main window it loads the data from remote server, tries to bend them to my needs and then this error occures. So I managed to make some progress in this area but it still doesn't work.

Anonymous
Not applicable

It turned out the problem is my script didn't handle all cases of string-type values that occure in JSON I parse. Also some cells contain empty strings (not null) and therefore aren't valid for parsing (produce Errors as well). Filtering empty rows out and then filtering rows with error let me investigate what the problem really was. 

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel1

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.