The general advertisement is that the Auto ML is designed for "business analysts to build machine learning models". However, in some situations there could be realistic problems, unless someone is having a data science background.
Here is the case:
Use the Power BI sample projects (Supplier Quality Analysis Sample)
Extend the metrics query by creating a HasDefect column, which we would like to have as a label for future prediction. In my case: = Table.AddColumn(#"Added Custom", "HasDefect", each if [Defect Type ID]<=1 then 0 else 1)
Create a data flow and entity based on the Metrics query
Proceed to creating an ML model and follow the wizard
For historical outcome field select "HasDefect"
When you go to "Customize inputs" step you will get something like:
Now the problem is that HasDefect is very closely correlated to the column DefectType and a user without a datascience background will "successfully" train a model with 100% accuracy
Here is what a simple Python visual with Pearson correlation shows: there is an evident correlation between the 2nd and 4th columns.
Below is the code to generate this. However, as you may see, I tried to address the "Defect" string column by encoding it to integer, so that it could take part in the correlation, as the defect text is 1-to-1 match with the defect ID, which determines the defect type, which is related with the label "Has defect". However, I did not manage to make it identify this correlation, due to the shuffled order.
My suggestion is: Please run a correlation algorithm (or improve the existing), so that features that are correlated with the label are not suggested. Otherwise, "business analysts" will create models that are not useful and this would degrade the value of the excellent job done here.