Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
bamba98
Helper I
Helper I

Create table with random samples based on conditions from another table

Hi,

 

I have a table consisiting of approx 50k rows and 15 columns. The file looks, for example, as followes:

IDPeriod IDCategory
12020 Q11
22020 Q21
32020 Q41
42020 Q12
52020 Q22
72020 Q42
92020 Q13
102021 Q11
122020 Q41
112021 Q13
62021 Q12
202020 Q12
182020 Q23
162021 Q13
302020 Q13
402020 Q42
352020 Q43
342021 Q13
382021 Q12
292020 Q22
212021 Q11
222020 Q31
332020 Q32
442020 Q33
82020 Q31
412020 Q23
422020 Q21

From this table, I would like to create random table. The random table must only contain data from the latest quarter and only one 1 row per category. The new table would look as follows:

IDPeriod IDCategory
122021 Q11
302021 Q13
42021 Q12

I found the following expression posted on this forum:

 

 

 

Random Selection = 
VAR SampleSize = 10
VAR TableWithRand = ADDCOLUMNS(Table,"Rand", RAND())
RETURN TOPN(SampleSize, TableWithRand, [Rand])

 

 

 

However, I want the samplesize to be the same size as the number of categories available during the latest quarter. For example:

 

 

 

SampleSize = calculate(distinctcount(Category), allexcept(Table, PeriodID))

 

 

 

How do I add this and that it only selects samples from the latest period to the previous code sample? 

 

 

1 ACCEPTED SOLUTION

Hi @bamba98 

Please try summarize function to get "Period ID" and "Category".

Rand LAST Period = 
VAR _LastPeriod = MAX('Table'[Period ID])
RETURN
ADDCOLUMNS(SUMMARIZE(FILTER('Table','Table'[Period ID] = _LastPeriod),'Table'[Period ID],'Table'[Category]),"RAND",INT(RAND()*100))

Result is as below.

1.png

 

Best Regards,

Rico Zhou

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly. 

View solution in original post

2 REPLIES 2
bamba98
Helper I
Helper I

I've added the following for selecting the correct number of samples:

 

 

Random Selection = 
VAR SampleSize = CALCULATE(DISTINCTCOUNT(Table[Category]), ALLEXCEPT(Table, Table[Period]),FILTER(ALL(Table), Table[Latest Period]=Table[Period]))
VAR TableWithRand = ADDCOLUMNS(Table,"Rand", RAND())
RETURN TOPN(SampleSize, TableWithRand, [Rand])

 

Where:

Latest Period = MAX(Table[Period])

This only gets me the correct number of samples to select. However, it still selects samples from different periods and same categories. 

Hi @bamba98 

Please try summarize function to get "Period ID" and "Category".

Rand LAST Period = 
VAR _LastPeriod = MAX('Table'[Period ID])
RETURN
ADDCOLUMNS(SUMMARIZE(FILTER('Table','Table'[Period ID] = _LastPeriod),'Table'[Period ID],'Table'[Category]),"RAND",INT(RAND()*100))

Result is as below.

1.png

 

Best Regards,

Rico Zhou

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly. 

Helpful resources

Announcements
PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.