Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Dear all, I already asked for help regarding this case and with the help of many participants here, I was able to find a solution, and now I have an additional question, that I hope someone can help me with 🙏 Here is a short description of my case : I'm running an "lm" regression to a large dataset by location. My datasets consists on 4 columns : Y = response variable X1 : 1st explanatory variable X2 : 2nd explanatory variable Location : is the column to group my data, so "lm" runs by Location. Here below the code I'm using to get Predicted and Studentized-residuals by observation.
The object "df" gives me exactly what I need, i.e : my original dataset + the output of "augment". My need now, is to extend this table to the SNK.test output, specifically the "group" column summarizing letters of the SNK test related to the means comparison of Factor1. Can anyone help me on how to integrate this command in my code?
library(tidyverse) library(broom) library(dplyr) dataset <- as.data.frame(dataset) dataset$perf <- as.numeric(dataset$perf) dataset$factor1 <- as.factor(dataset$factor1) dataset$factor2 <- as.factor(dataset$factor2) dataset <- dataset %>% group_by(location) %>% mutate(row=row_number()) df <- dataset %>% group_by(location) %>% mutate(unique_factor1 = n_distinct(factor1), unique_factor1 = n_distinct(factor1), var = var(perf)) %>% filter( unique_factor1 != 1 & unique_factor2 != 1 & var != 0 ) %>% do(cbind(row = .$row, lm(perf ~ factor1 + factor2, data = .) %>% augment)) %>% right_join(. , dataset, by=c("location", "row"