Solved: Using Intersect function

sfmike99 · ‎03-03-2021

I have a table of survey data with four fields of interest:

Respondent: unique identifier for each participant
QuestionRoot: fieldname to link data with questions
RowIndex: an index to the specific response chosen for that question
Value: the individual response to that question/response pair (0 or 1)

We have created a dashboard to allow exploration of the data. One question that often comes up however is of the form: "How many people who chose reponse 1 to question A also chose response 1 to question B?"

Here is an example which uses temporary tables and the Intersect function to answer the question: "What % of respondents who collect data on paper also store the data on paper?"

ManualCollectAndStore = 
    VAR ManualCollect = 
        CALCULATETABLE(
          DISTINCT('Raw Data'[Respondent]),
          FILTER('Raw Data',
            'Raw Data'[QuestionRoot] = "CollectMethod" && 'Raw Data'[RowIndex] = 1 && 'Raw Data'[Value] > 0
          )
    )
    VAR ManualStore = 
        CALCULATETABLE(
          DISTINCT('Raw Data'[Respondent]),
          FILTER('Raw Data',
            'Raw Data'[QuestionRoot] = "StorageMethod" && 'Raw Data'[RowIndex] = 1 && 'Raw Data'[Value] > 0
          )
    )
    RETURN
        COUNTROWS(INTERSECT(ManualCollect, ManualStore)) / 
        COUNTROWS(ManualCollect)

(Answer is 43.6% in case you are interested)

Question 1: Make sense? Any suggestions?

Question 2: Is it possible in Power BI to generalize this to the entire dataset? In other words, allow the user to select any two question/response pairs and see the resulting intersect value?

Thanks for any guidance!

sfmike99 · ‎03-07-2021

Armed with the suggestion for disconnected tables and some head scratching time, I think I figured this out.

Solution was to clone the data table (QuestionA and QuestionB) and give each of them separate slicers for the questions and response options. I use DAX to create calculated tables based on the current selection for each, and use Intersect to get the numbers I need. The calculation works like this:

Numerator: everyone who chose selected question/response options for BOTH questions
Denominator: everyone who chose selected question/response for A and ANY response for B question

Here's the DAX:

PctRespAthenB = 
    VAR ThisResponseA = CALCULATETABLE(
        DISTINCT(QuestionA[Respondent]),
        ALLEXCEPT(QuestionA, QuestionA[QuestionRoot], QuestionA[RowIndex])
    )
    VAR ThisResponseB = CALCULATETABLE(
        DISTINCT(QuestionB[Respondent]),
        ALLEXCEPT(QuestionB, QuestionB[QuestionRoot], QuestionB[RowIndex])
    )
    VAR AllResponseB = CALCULATETABLE(
        DISTINCT(QuestionB[Respondent]),
        ALLEXCEPT(QuestionB, QuestionB[QuestionRoot])
    )
RETURN
   COUNTROWS(INTERSECT(ThisResponseA, ThisResponseB)) /
   COUNTROWS(INTERSECT(ThisResponseA, AllResponseB))

And here's how it looks in a rough version of report. I will swap in text for questions & responses later but you get the idea.

I tested this against some crosstab matrices I created before and the numbers lined up. Nifty that you can even select multiple response options to essentially bin the data.

Make sense? Any suggestions?

View solution in original post

Ashish_Mathur · ‎03-03-2021

Hi,

With disconnected tables this should be possible to solve. Share some data to work with.

Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

sfmike99 · ‎03-07-2021

Armed with the suggestion for disconnected tables and some head scratching time, I think I figured this out.

Solution was to clone the data table (QuestionA and QuestionB) and give each of them separate slicers for the questions and response options. I use DAX to create calculated tables based on the current selection for each, and use Intersect to get the numbers I need. The calculation works like this:

Numerator: everyone who chose selected question/response options for BOTH questions
Denominator: everyone who chose selected question/response for A and ANY response for B question

Here's the DAX:

PctRespAthenB = 
    VAR ThisResponseA = CALCULATETABLE(
        DISTINCT(QuestionA[Respondent]),
        ALLEXCEPT(QuestionA, QuestionA[QuestionRoot], QuestionA[RowIndex])
    )
    VAR ThisResponseB = CALCULATETABLE(
        DISTINCT(QuestionB[Respondent]),
        ALLEXCEPT(QuestionB, QuestionB[QuestionRoot], QuestionB[RowIndex])
    )
    VAR AllResponseB = CALCULATETABLE(
        DISTINCT(QuestionB[Respondent]),
        ALLEXCEPT(QuestionB, QuestionB[QuestionRoot])
    )
RETURN
   COUNTROWS(INTERSECT(ThisResponseA, ThisResponseB)) /
   COUNTROWS(INTERSECT(ThisResponseA, AllResponseB))

And here's how it looks in a rough version of report. I will swap in text for questions & responses later but you get the idea.

I tested this against some crosstab matrices I created before and the numbers lined up. Nifty that you can even select multiple response options to essentially bin the data.

Make sense? Any suggestions?

v-kelly-msft · ‎03-07-2021

Hi @sfmike99 ,

Could you pls upload your .pbix file if there' s no confidential information?It would be beneficial for test.

Best Regards,
Kelly

Did I answer your question? Mark my post as a solution!

sfmike99 · ‎03-08-2021

Thanks @v-kelly-msft but I think I'm good. This was a really fun project to work on, and an interesting alternative to crosstab / matrices. Did a demo for the client today and they loved it.

Also I had an aha this morning. Instead of duplicating the large unpivoted dataset (3m+ rows) I think I can just duplicate the Question/Response table (already used for reports) and join both to the data. The two slicers should then work to produce two different views of the same underlying data.

LMK if anyone has any questions.

sfmike99 · ‎03-04-2021

Great - here's a sample with four questions and ten respondents. (source table is > 3m rows):

https://1drv.ms/x/s!AvmRZllooYiOnyuQ3NNFa7laXSL9?e=giUbQ9

If I understand your comment, then treating this data as two different tables would enable any intersection to be calculated - letting you have two differents slicers for the same data. But how would that work with the measure?

Thanks for looking at this.

Ashish_Mathur · ‎03-04-2021

Hi,

What is the use of the RowIndex column? Not only do i fund that confusing, i see that it has the same values as those shown in the Value column. Don't we need just the first 3 columns to answer your question?

Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

sfmike99 · ‎03-05-2021

Ah sorry - I see where that is confusing.

There are multiple question types in this dataset, of which this is one example. The meaning of the Value field varies according to the question type. It could be a 0/1 or a ranking for example. For simplicity here I just included one type where the Value and RowIndex are the same.

The purpose of RowIndex is a lookup for the specific response option chosen, while value contains their response (which may or may not be the same). For this sample data you can ignore one of them.

Using Intersect function

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly