cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
sfmike99
Advocate II
Advocate II

Using Intersect function

I have a table of survey data with four fields of interest: 

  • Respondent: unique identifier for each participant
  • QuestionRoot: fieldname to link data with questions
  • RowIndex: an index to the specific response chosen for that question
  • Value: the individual response to that question/response pair (0 or 1)

We have created a dashboard to allow exploration of the data. One question that often comes up however is of the form: "How many people who chose reponse 1 to question A also chose response 1 to question B?"

 

Here is an example which uses temporary tables and the Intersect function to answer the question: "What % of respondents who collect data on paper also store the data on paper?"

 

ManualCollectAndStore = 
    VAR ManualCollect = 
        CALCULATETABLE(
          DISTINCT('Raw Data'[Respondent]),
          FILTER('Raw Data',
            'Raw Data'[QuestionRoot] = "CollectMethod" && 'Raw Data'[RowIndex] = 1 && 'Raw Data'[Value] > 0
          )
    )
    VAR ManualStore = 
        CALCULATETABLE(
          DISTINCT('Raw Data'[Respondent]),
          FILTER('Raw Data',
            'Raw Data'[QuestionRoot] = "StorageMethod" && 'Raw Data'[RowIndex] = 1 && 'Raw Data'[Value] > 0
          )
    )
    RETURN
        COUNTROWS(INTERSECT(ManualCollect, ManualStore)) / 
        COUNTROWS(ManualCollect)

 

(Answer is 43.6% in case you are interested)

 

Question 1: Make sense? Any suggestions?

 

Question 2: Is it possible in Power BI to generalize this to the entire dataset? In other words, allow the user to select any two question/response pairs and see the resulting intersect value?

Thanks for any guidance!

 

 

1 ACCEPTED SOLUTION

Armed with the suggestion for disconnected tables and some head scratching time, I think I figured this out.

 

Solution was to clone the data table (QuestionA and QuestionB) and give each of them separate slicers for the questions and response options. I use DAX to create calculated tables based on the current selection for each, and use Intersect to get the numbers I need. The calculation works like this:

  • Numerator: everyone who chose selected question/response options for BOTH questions
  • Denominator: everyone who chose selected question/response for A and ANY response for B question

Here's the DAX: 

PctRespAthenB = 
    VAR ThisResponseA = CALCULATETABLE(
        DISTINCT(QuestionA[Respondent]),
        ALLEXCEPT(QuestionA, QuestionA[QuestionRoot], QuestionA[RowIndex])
    )
    VAR ThisResponseB = CALCULATETABLE(
        DISTINCT(QuestionB[Respondent]),
        ALLEXCEPT(QuestionB, QuestionB[QuestionRoot], QuestionB[RowIndex])
    )
    VAR AllResponseB = CALCULATETABLE(
        DISTINCT(QuestionB[Respondent]),
        ALLEXCEPT(QuestionB, QuestionB[QuestionRoot])
    )
RETURN
   COUNTROWS(INTERSECT(ThisResponseA, ThisResponseB)) /
   COUNTROWS(INTERSECT(ThisResponseA, AllResponseB))

 

And here's how it looks in a rough version of report. I will swap in text for questions & responses later but you get the idea.

 

sfmike99_1-1615158234333.png


I tested this against some crosstab matrices I created before and the numbers lined up. Nifty that you can even select multiple response options to essentially bin the data.

Make sense? Any suggestions?

View solution in original post

7 REPLIES 7
Ashish_Mathur
Super User III
Super User III

Hi,

With disconnected tables this should be possible to solve.  Share some data to work with.


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

Armed with the suggestion for disconnected tables and some head scratching time, I think I figured this out.

 

Solution was to clone the data table (QuestionA and QuestionB) and give each of them separate slicers for the questions and response options. I use DAX to create calculated tables based on the current selection for each, and use Intersect to get the numbers I need. The calculation works like this:

  • Numerator: everyone who chose selected question/response options for BOTH questions
  • Denominator: everyone who chose selected question/response for A and ANY response for B question

Here's the DAX: 

PctRespAthenB = 
    VAR ThisResponseA = CALCULATETABLE(
        DISTINCT(QuestionA[Respondent]),
        ALLEXCEPT(QuestionA, QuestionA[QuestionRoot], QuestionA[RowIndex])
    )
    VAR ThisResponseB = CALCULATETABLE(
        DISTINCT(QuestionB[Respondent]),
        ALLEXCEPT(QuestionB, QuestionB[QuestionRoot], QuestionB[RowIndex])
    )
    VAR AllResponseB = CALCULATETABLE(
        DISTINCT(QuestionB[Respondent]),
        ALLEXCEPT(QuestionB, QuestionB[QuestionRoot])
    )
RETURN
   COUNTROWS(INTERSECT(ThisResponseA, ThisResponseB)) /
   COUNTROWS(INTERSECT(ThisResponseA, AllResponseB))

 

And here's how it looks in a rough version of report. I will swap in text for questions & responses later but you get the idea.

 

sfmike99_1-1615158234333.png


I tested this against some crosstab matrices I created before and the numbers lined up. Nifty that you can even select multiple response options to essentially bin the data.

Make sense? Any suggestions?

View solution in original post

Hi  @sfmike99 ,

 

Could you pls upload your .pbix file if there' s no confidential information?It would be beneficial for test.

 

Best Regards,
Kelly

Did I answer your question? Mark my post as a solution!

 

Thanks @v-kelly-msft but I think I'm good. This was a really fun project to work on, and an interesting alternative to crosstab / matrices. Did a demo for the client today and they loved it. 

Also I had an aha this morning. Instead of duplicating the large unpivoted dataset (3m+ rows) I think I can just duplicate the Question/Response table (already used for reports) and join both to the data. The two slicers should then work to produce two different views of the same underlying data.

LMK if anyone has any questions.

Great - here's a sample with four questions and ten respondents. (source table is > 3m rows):

https://1drv.ms/x/s!AvmRZllooYiOnyuQ3NNFa7laXSL9?e=giUbQ9

 

If I understand your comment, then treating this data as two different tables would enable any intersection to be calculated - letting you have two differents slicers for the same data. But how would that work with the measure?

 

Thanks for looking at this.

Hi,

What is the use of the RowIndex column?  Not only do i fund that confusing, i see that it has the same values as those shown in the Value column.  Don't we need just the first 3 columns to answer your question?


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

Ah sorry - I see where that is confusing. 

 

There are multiple question types in this dataset, of which this is one example. The meaning of the Value field varies according to the question type. It could be a 0/1 or a ranking for example. For simplicity here I just included one type where the Value and RowIndex are the same. 

 

The purpose of RowIndex is a lookup for the specific response option chosen, while value contains their response (which may or may not be the same). For this sample data you can ignore one of them. 

Helpful resources

Announcements
PBI User Groups

Welcome to the User Group Public Preview

Check out new user group experience and if you are a leader please create your group!

MBAS on Demand

Microsoft Business Applications Summit sessions

On-demand access to all the great content presented by the product teams and community members! #MSBizAppsSummit #CommunityRocks

MBAS Attendee Badge

Claim Your Badge & Digital Swag!

Check out how to claim yours today!

Top Solution Authors
Top Kudoed Authors