Re: Best way to handle cohort type queries?

Anonymous · ‎07-17-2018

Hi, we're a medical center, and it's extremely common for us to write a query that selects a cohort of patients, and then multiple queries against multiple fact tables pulling related information for that cohort. I'm uncertain what the best way to do this in Power BI is. When we use SQL, we create temp tables with the cohort and/or use common table expressions.

I'm trying to figure out the "best" way to accomplish this in Power BI. My options seem to be:

1. Write SQL for each of the fact table pulls, adding the cohort selection logic to each one

Pros: Relatively easy to do

Cons: Reruns the cohort query over and over. If cohort logic needs changed, creates multiple points where change must be made

2. Pull the cohort data into a Power BI table, and then use Power Query "merge" to join that to each of the fact table queries

Pros: Seems to follow the Power BI GUI better, easier to understand

Cons: SQL doesn't seem to get pushed down, this performs like the entire set of data is returned and only then filtered to the matching rows

Is there a better way to do this?

Thanks,

Scott

Greg_Deckler · ‎07-17-2018

Haven't done this in M but I did build a Patient Cohort Quick Measure in DAX here:

https://community.powerbi.com/t5/Quick-Measures-Gallery/Patient-Cohort/m-p/391883

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Anonymous · ‎07-17-2018

Hi Greg - that's excellent work, but it's not quite the problem I'm facing. I'm not struggling with "how do I allow users to select cohorts using AND vs. OR logic", it's "given an initial cohort type SQL query, how do I run 5 other SQL queries against fact tables using the cohort to filter the rows returned."

My bad - I definitely didn't make that clear in my original email.

I do love the work you did, I've been trying to figure out how to switch slicer AND vs. OR logic for a while, looks like you've already got this solved. Thanks!

Scott

Greg_Deckler · ‎07-18-2018

Hmm, so, you get an initial list of ID's lets say from some initial SQL query and you want to then use that list of ID's to run 5 other queries that essentially bring back additional information about those ID's. So, in simple terms:

Initial Query returns: 204, 300, 405

Query 1 would essentially be: "SELECT * FROM table WHERE ID = 204 OR ID = 300 OR ID = 405"

Is that along the lines of what you are trying to do? I'm going to invoke @ImkeF because she is a genius with M.

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

ImkeF · ‎07-18-2018

True, most techniques will not fold down to the SQL-server. But this one does: https://community.powerbi.com/t5/Desktop/Parameterized-SQL-Query-with-query-folding/td-p/171503

Pls let me know if you need help how to implement it.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous · ‎07-18-2018

Hi lmkeF, that's exactly the solution we ended up using, but I'm very concerned about whether it will work once the cohorts get large. I'll have to find out how many values SQL Server can accept with an "in" clause - in Oracle it was only a few thousand before it started throwing errors.

Thanks!!!

Scott

ImkeF · ‎07-18-2018

Hi Scott,

If it gets too large, you can try this method: https://www.thebiccountant.com/2017/07/20/sql-query-folding-bug-still-alive-sucking-powerbi-powerque... (but I haven't used it for a while, so not sure if it still works)

Cheers, Imke

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous · ‎07-18-2018

Thank you very much for the leads! I'll also continue playing with query folding, I see inconsistent behavior when trying to use "merge" in Power Query and choosing inner vs. left vs. right vs. outer. Sometimes it seems like it is pushing down properly...but maybe that's my imagination.

Thank you!

Scott

ImkeF · ‎07-18-2018

Disabling privacy settings and background refresh are also worth trying out: https://www.thebiccountant.com/speedperformance-aspects/

If you cannot run SQL-profiler, check out Process Monitor to see whats actually going on: https://t.co/OUs7beEfgU

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous · ‎07-18-2018

Hi Greg - yep - that's it exactly. When we write it in SQL it looks like:

with

cohort as

( select blah from tables where some conditions are true...),

procedures as

(select blah from procedures inner join cohort using encounter ID),

diagnoses as

(select blah from diagnoses inner join cohort using encounter ID)

...

etc.

Unfortunately my goal in this isn't really to "be a genius with M" - I'm looking for a general solution that I can teach our 150+ and growing power users. I really feels like using "merge queries" in Power Query should work...but it doesn't seem like the SQL gets pushed down correctly, resulting in (for this example) all rows for procedures and diagnoses getting loaded, and THEN filtered down to the cohorts. Obviously what I'd prefer is for only the matching results to come back.

One other solution that did work, but is very ineligant and I suspect very liable to break, was to write the cohort to a "list" instead of a table in Power Query. Then, we were able to take the list and dynamically build a "where" clause in custom SQL and M to pass those values in. But this isn't easy to understand, isn't easy to teach to non-technical folk, and I strongly suspect would break once the "where" clause gets too long (not sure how long it can be in MS SQL Server, but oracle used to limit to just a few thousand values - and a diabetes cohort for example could have WAY more than that...)

Appreciate the help - thank you very much Greg!

Scott

Best way to handle cohort type queries?

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly