Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi, we're a medical center, and it's extremely common for us to write a query that selects a cohort of patients, and then multiple queries against multiple fact tables pulling related information for that cohort. I'm uncertain what the best way to do this in Power BI is. When we use SQL, we create temp tables with the cohort and/or use common table expressions.
I'm trying to figure out the "best" way to accomplish this in Power BI. My options seem to be:
1. Write SQL for each of the fact table pulls, adding the cohort selection logic to each one
Pros: Relatively easy to do
Cons: Reruns the cohort query over and over. If cohort logic needs changed, creates multiple points where change must be made
2. Pull the cohort data into a Power BI table, and then use Power Query "merge" to join that to each of the fact table queries
Pros: Seems to follow the Power BI GUI better, easier to understand
Cons: SQL doesn't seem to get pushed down, this performs like the entire set of data is returned and only then filtered to the matching rows
Is there a better way to do this?
Thanks,
Scott
Haven't done this in M but I did build a Patient Cohort Quick Measure in DAX here:
https://community.powerbi.com/t5/Quick-Measures-Gallery/Patient-Cohort/m-p/391883
Hi Greg - that's excellent work, but it's not quite the problem I'm facing. I'm not struggling with "how do I allow users to select cohorts using AND vs. OR logic", it's "given an initial cohort type SQL query, how do I run 5 other SQL queries against fact tables using the cohort to filter the rows returned."
My bad - I definitely didn't make that clear in my original email.
I do love the work you did, I've been trying to figure out how to switch slicer AND vs. OR logic for a while, looks like you've already got this solved. Thanks!
Scott
Hmm, so, you get an initial list of ID's lets say from some initial SQL query and you want to then use that list of ID's to run 5 other queries that essentially bring back additional information about those ID's. So, in simple terms:
Initial Query returns: 204, 300, 405
Query 1 would essentially be: "SELECT * FROM table WHERE ID = 204 OR ID = 300 OR ID = 405"
Is that along the lines of what you are trying to do? I'm going to invoke @ImkeF because she is a genius with M.
True, most techniques will not fold down to the SQL-server. But this one does: https://community.powerbi.com/t5/Desktop/Parameterized-SQL-Query-with-query-folding/td-p/171503
Pls let me know if you need help how to implement it.
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Hi lmkeF, that's exactly the solution we ended up using, but I'm very concerned about whether it will work once the cohorts get large. I'll have to find out how many values SQL Server can accept with an "in" clause - in Oracle it was only a few thousand before it started throwing errors.
Thanks!!!
Scott
Hi Scott,
If it gets too large, you can try this method: https://www.thebiccountant.com/2017/07/20/sql-query-folding-bug-still-alive-sucking-powerbi-powerque... (but I haven't used it for a while, so not sure if it still works)
Cheers, Imke
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Thank you very much for the leads! I'll also continue playing with query folding, I see inconsistent behavior when trying to use "merge" in Power Query and choosing inner vs. left vs. right vs. outer. Sometimes it seems like it is pushing down properly...but maybe that's my imagination.
Thank you!
Scott
Disabling privacy settings and background refresh are also worth trying out: https://www.thebiccountant.com/speedperformance-aspects/
If you cannot run SQL-profiler, check out Process Monitor to see whats actually going on: https://t.co/OUs7beEfgU
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Hi Greg - yep - that's it exactly. When we write it in SQL it looks like:
with
cohort as
( select blah from tables where some conditions are true...),
procedures as
(select blah from procedures inner join cohort using encounter ID),
diagnoses as
(select blah from diagnoses inner join cohort using encounter ID)
...
etc.
Unfortunately my goal in this isn't really to "be a genius with M" - I'm looking for a general solution that I can teach our 150+ and growing power users. I really feels like using "merge queries" in Power Query should work...but it doesn't seem like the SQL gets pushed down correctly, resulting in (for this example) all rows for procedures and diagnoses getting loaded, and THEN filtered down to the cohorts. Obviously what I'd prefer is for only the matching results to come back.
One other solution that did work, but is very ineligant and I suspect very liable to break, was to write the cohort to a "list" instead of a table in Power Query. Then, we were able to take the list and dynamically build a "where" clause in custom SQL and M to pass those values in. But this isn't easy to understand, isn't easy to teach to non-technical folk, and I strongly suspect would break once the "where" clause gets too long (not sure how long it can be in MS SQL Server, but oracle used to limit to just a few thousand values - and a diabetes cohort for example could have WAY more than that...)
Appreciate the help - thank you very much Greg!
Scott
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
114 | |
97 | |
85 | |
70 | |
61 |
User | Count |
---|---|
151 | |
121 | |
104 | |
87 | |
67 |