cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
bartvandervurst Frequent Visitor
Frequent Visitor

How to remove the default 'deduplication' from R Custom visual

Hi,


PowerBI has R Custom visuals to allow R programmers to create their graphs/efforts within PowerBI. 

While I'm trying to do so, I notice that the default code (which we can't change) is that we load the data into a data.frame & then deduplicate the data (see screenshot and the code line "unique(dataset)").

 

 

The latter (i.e. deduplication) is causing limitations to what you can do with this R visual: e.g. you can't create a histogram (cause all duplicates would have been removed), you can't create a proper Decision tree (as again all duplicates would be removed and the tree would be biased).

 

Can we remove the deduplication from the R Custom visual core code & make it 'optional'? Any way I can bypass this deduplication in the meantime.

 

PS: As a result of this deduplication, the decision tree results achieved by the custom visual 'Decision Tree' are wrong. This is how I actually came to found out.. 

 

Am I overlooking something?

3 REPLIES 3
Super User
Super User

Re: How to remove the default 'deduplication' from R Custom visual

I have actually posted an Idea on this here:

https://ideas.powerbi.com/forums/265200-power-bi-ideas/suggestions/13505508--r-don-t-remove-duplicat...

 

It is Under Review, please vote for it.

 

The only work-a-round that I have is to ignore the dataframe that is automatically created and load the data from the source into my own dataframe within the R code itself. Not optimal at all


Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!


koenverbeeck Regular Visitor
Regular Visitor

Re: How to remove the default 'deduplication' from R Custom visual

+3 votes for your idea Smiley Happy

Blog: sqlkover.com
Booth070 Frequent Visitor
Frequent Visitor

Re: How to remove the default 'deduplication' from R Custom visual

Until they go along with the idea of taking that out, what I am doing is creating an ID column which runs from 1 to nrow of the dataset and importing that into R as well. That makes all rows different so none gets deleted, and you can delete the dummy column in R and use your normal code.