Showing results for 
Search instead for 
Did you mean: 
Frequent Visitor



I am working with a data sample that contains numeric values on "y" axis (pH results) and text values on "x" axis (sample/batch code) and need to build a linear regression analysis.

So far, I found a website "IterationInsights" that published an article "How to do simple linear regression in Power BI", 

which really helped me to analyse using date, but not sample code (since it is a text). I need to evaluate the trend between 2 samples, independent of when they were collected.

Note that if x is a Date, we have: y = 0.0023x - 99.065, r² = 0.2566

But if x is the sample, we have: y = 0.0297x + 5.5252, r² = 0.3024


How am I able to build a variable table that converts the sample code to a numeric sequence (A1AA/22 = 1, A2AA/22 = 2, ...., A21AA/22 = 21)?
Besides, it's possible to filter this table? Ex: analyse A12AA/22 to A20AA/22 (interval which would automatically be converted to the numbers: A12AA/22 = 1, A13AA/22 = 2, ....,  A20AA/22 = 9, always starting with 1).


Since slope formula is: [n(Σxy) - (Σx)(Σy)] / [n(Σx²) - (Σx)²],

x should be refeered as a sequence of natural numbers representing the samples.

n is the distinct count of the samples.


I suppose that I should build a data var table to convert sample code to a number, and use All selected function to filter the samples I'm working with, but I'm really having a hard time to combine these functions and obtain a functional linear regression equation.

I exemplified the solpe formula because I think the same solution would be applied to intercep, r², etc.


For slope, I used the formula:









--Variables to Solve for:

var y_sum = SUMX(data,[y_values])

var x_sum = SUMX(data,[x_values])

var x2 = SUMX(data,[x_values]^2)

var xy = SUMX(data,[x_values]*[y_values])

var row_count = COUNTROWS(data)


--Solve for formula


var Slope = ((row_count * xy) - (x_sum * y_sum))/((row_count*x2)-((x_sum)^2))


return Slope



Anyone could help me?

Super User
Super User

@pedroluccas So, duplicate your query in Power Query Editor. Remove all rows except the SAMPLE CODE column. Do a Remove Duplicates. Create an Index column (Add Columns | Index). You can then either do a merge to get your index or in a DAX calculated column, add a column to your original table that grabs the Index using RELATED.


You can find sample code here:

Correlation coefficient - Microsoft Power BI Community

If you have seasonality in your data:

De-Seasonalized Correlation Coefficient - Microsoft Power BI Community

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Mastering Power BI 2nd Edition

DAX is easy, CALCULATE makes DAX hard...

Thank you very much!

It worked for Slope and R². But now seems that I have other problem.

When I have a huge amount of samples of different products: A1AA/22 (Product A), B1AA/22 (Product B), C1AA/22 (Product C), etc, it generates a index with huge numbers (10,000 or more).

Other scenario is when I wnat to use just an interval of the samples (sample 5 - 12).

Since Intercept formula is b = [(Σy)*(Σx²) - (Σx) * (Σxy)] / [n * (Σx²) - (Σx)²], and now appears the factor (Σx²) on the numerator, huge numbers seems to affect the calculation.


See below the calculation and graphs





What can I do to make the index change according with my selection and restart the count with the number 1?





Helpful resources

Vote for T-Shirt Design

Power BI T-Shirt Design Challenge 2023

Vote for your favorite t-shirt design now through March 28.

March 2023 Update3

Power BI March 2023 Update

Find out more about the March 2023 update.

March Events 2023A

March 2023 Events

Find out more about the online and in person events happening in March!