Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Bootstrapping - Statistical Method

Hi, I've got a very large set of data (let's say it's about attendence at schools during covid), which I've done some analysis on to determine rates of change of the attendence of students from day to day. I have taken the median rate of change, and I am using the statistical method called 'bootstrapping' to obtain a confidence interval for this. For anyone unfamiliar with bootstrapping, the process is:

1. Randomly sample the data (with replacement) (this is called obtaining a 'bootstrap sample') - I have done this using two calculated columns. One contains a random date which I determined using: 

 

Random Sampling 1 = 
VAR startdate = MIN(Bootstrapping[Date])
VAR enddate = MAX(Bootstrapping[Date])
 
return CALCULATE(RANDBETWEEN(startdate, enddate), ALL(Bootstrapping)) 

 

The next has the corresponding rate of change:

 

Sampled Rate of Change 1 = 
VAR WANTEDDATE = Bootstrapping[Random Sampling 1]
return CALCULATE(AVERAGE(Bootstrapping[Rate of change (Normalised)]), FILTER(ALL(Bootstrapping), Bootstrapping[Date] = WANTEDDATE))

 

This works fine - although I struggled to combine them into one column - whenever I tried, the same piece of data was sampled for each row in the column, rather than a different piece of data.

 

2. You repeat this say 2000 times, and for each bootstrap sample you calculate the median value. This gives a distribution of medians, which can be used to obtain a confidence interval.

 

My ideal output is a column with the medians of each bootstrap sample I take.

 

The issue I have here is that I need to repeat this many, many times. As is, I have 2 columns for each bootstrap sample, which I have to create manually. So that would be 4000+ columns if I do it properly. Anyone have any suggestions on how to cut this down using some functions I may not have come across? Or any other recommendations? I'm quite new to DAX and having trouble adapting to how it works!

 

Thanks a lot.

1 ACCEPTED SOLUTION
v-yuaj-msft
Community Support
Community Support

Hi @Anonymous ,

 

Sorry for that DAX does not currently have the function to repeat a formula in a loop. You can consider using other statistical software such as SAS, SPSS.

 

Hope that's what you were looking for.

Best Regards,

Yuna

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

3 REPLIES 3
benfedit
Advocate I
Advocate I

Hi  @Anonymous 

 

I actually happened to have written a post in my blog about implementing bootstrap sampling in Power BI.

For the "sort of" loops, I used the generateseries function which allows me to create multiples sample with replacements and also to define the size of my sample.

Here is my post:

https://datakuity.com/2021/02/15/bootstrap-analysis-with-power-bi/

 

With regards to the performance, I tested only on a small sample of about 100k rows so I'd be very interested to know how it goes on a large sample.

 

Feel free to reach out if you any questions or want to discuss it 

 

Ben

v-yuaj-msft
Community Support
Community Support

Hi @Anonymous ,

 

Sorry for that DAX does not currently have the function to repeat a formula in a loop. You can consider using other statistical software such as SAS, SPSS.

 

Hope that's what you were looking for.

Best Regards,

Yuna

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

lbendlin
Super User
Super User

DAX has a concept of table variables, accessible via ADDCOLUMNS or SELECTCOLUMNS.  You can use these to create temporary tables for your computation, and then return the results as scalars.

 

Since this is statistics you can also consider running R or python scripts instead of plain DAX.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.