Solved: Redshift Direct Query - Using (List) Parameters

MarkEvansHW · ‎02-10-2022

Have a client with request to be able to select multiple values in a filter (any or all from a list of 12) and have the aggregated totals respond accordingly. The setup is Redshift with Direct Query.

Was going down the path of using parameters bound to a single column table to pass a list of values to match against, but when I try to test this, any condition I use to filter from the parameter list queries the entire table rather than the aggregated results that are the output of the query. That isn't viable with the size of these tables. Simple conditions on a literal value (i.e. [fieldname]="Mark") seem to work correctly and query the aggregated results.

So far I've tried using List.Contains(parameterList, [fieldname]) and converting the parameter list to a single string and using Text.Contains(parameterString, [fieldname]). As I said, once I add this, it appears to be querying the entire table to complete the aggregation within Power BI.

I also looked at constructing the full sql text but it looks like that isn't an option for Redshift. Thanks in advance for any help.

MarkEvansHW · ‎02-14-2022

Alexis - Thanks so much for your help, but when I tried using your method as a template for my situation it generated a query at the detail level. That is, it tried to import the full seven million row result into memory as part of its execution.

However, I think I did find a solution: by using a fully contructed raw sql string in a native query, I was able to insert the parameter values directly into the sql string itself so that it executed remotely on Redshift and retrieved the pre-aggregated results. So far this works locally in Desktop and in the workspace using a data gateway connection to Redshift.

I know this doesn't map to your example, but here's how my solution works:

let
programList = 
if 
Type.Is(Value.Type(param_programid), List.Type) then 
let

// convert parameter to comma delimited text string with values wrapped in quotes for 'IN' expression

// conversion logic if multiple values (i.e. is a list)

AddSingleQuotes = List.Transform(param_programid, each "'" & _ & "'"),
DelimitedList = Text.Combine(AddSingleQuotes, ",") in DelimitedList


// add single quotes if parameter is a single value

else "'" & param_programid& "'",
Source = Value.NativeQuery(AmazonRedshift.Database("xxxxxxxxxxxxx:5439","dev"), 
"select number_of_programs_participated, count(*) as user_count from (select userid, count(distinct programid) as number_of_programs_participated#(lf)from source_data_table #(lf)
where programid in (" & programList & ") #(lf)group by userid) pc #(lf)group by number_of_programs_participated", null, [EnableFolding=true])
in
Source

View solution in original post

AlexisOlson · ‎02-11-2022

I think I have a working example that's analogous to your question and connects the sample database mentioned in this Power BI Community Blog post.

I did all of the grouping logic in the query editor and let it fold that back into a native query to be sent back to the source. The parameter selection is inserted as a where-clause into the native query that gets sent back to the source. Please see attached.

Thanks to Chris Webb for help with the single/multiple value parameter handling:

https://blog.crossjoin.co.uk/2020/11/08/handling-multi-select-in-power-bi-dynamic-m-parameters/

MarkEvansHW · ‎02-14-2022

Alexis - Thanks so much for your help, but when I tried using your method as a template for my situation it generated a query at the detail level. That is, it tried to import the full seven million row result into memory as part of its execution.

However, I think I did find a solution: by using a fully contructed raw sql string in a native query, I was able to insert the parameter values directly into the sql string itself so that it executed remotely on Redshift and retrieved the pre-aggregated results. So far this works locally in Desktop and in the workspace using a data gateway connection to Redshift.

I know this doesn't map to your example, but here's how my solution works:

let
programList = 
if 
Type.Is(Value.Type(param_programid), List.Type) then 
let

// convert parameter to comma delimited text string with values wrapped in quotes for 'IN' expression

// conversion logic if multiple values (i.e. is a list)

AddSingleQuotes = List.Transform(param_programid, each "'" & _ & "'"),
DelimitedList = Text.Combine(AddSingleQuotes, ",") in DelimitedList


// add single quotes if parameter is a single value

else "'" & param_programid& "'",
Source = Value.NativeQuery(AmazonRedshift.Database("xxxxxxxxxxxxx:5439","dev"), 
"select number_of_programs_participated, count(*) as user_count from (select userid, count(distinct programid) as number_of_programs_participated#(lf)from source_data_table #(lf)
where programid in (" & programList & ") #(lf)group by userid) pc #(lf)group by number_of_programs_participated", null, [EnableFolding=true])
in
Source

MarkEvansHW · ‎02-10-2022

Just adding the current results are delivered via Direct Query by a VIEW in Redshift and are not (obviously) at the detailed level. It currently delivers the results for all programs.

AlexisOlson · ‎02-10-2022

I'm not quite sure I understand the setup. You mention "aggregated results that are the output of the query" but DirectQuery doesn't store aggregates; it pulls targeted subsets of the entire table as needed. I don't think you need to do any parameter binding to do basic filtering. DirectQuery should handle this automatically.

MarkEvansHW · ‎02-10-2022

Thanks - I realized I haven't explained it very well. The results are not a straight aggregation (sum, count, etc) of the detail.

The source is a table like this:

userid	programid
1	A
1	B
2	B
3	A
3	B
3	C
4	A

The end result is a table that looks like this:

user_count	number_of_programs_participated
1	3
1	2
2	1

Where the result shows count of distinct users by number of programs those users participated in. In the example above, there are three programs. In reality, there are 12 programs and they want to be able to select any combination of the 12 and get the results. The underlying query looks like:

select count(*) as user_count, number_of_programs_participated

from (

select userid, count(distinct programid) as number_of_programs_participated

from source_data_table

group by userid

) as pc

group by number_of_programs_participated

I hope this helps. The idea would be to be able to pass the equivalent of a programid "in ('A', 'B')" to the WHERE clause of the query based on the programs selected in the Power BI filter.

AlexisOlson · ‎02-10-2022

Given this logic, you definitely have to go back to the original table since, as you mention, results are not simple aggregations that can be filtered by [programid].

Note that unless your view is actually materialized on your database, you're always going back to the original table anyway (usually views are just queries, not actual tables). Supposing your view is materialized, it would need to be recomputed for different combinations of [programid] (due to the non-additive nature), so you can't get away from hitting the original table unless you pre-compute and store all 2^12 = 4096 combinations of selections for the 12 programs.

MarkEvansHW · ‎02-10-2022

The view isn't materialized, but regardless you're totally right that this will need to query from the original tables. This would require replacing the view entirely with a dynamic query with an IN expression in the WHERE clause. I was trying to write an M query to do that and it did work until I tried to build in the parameter within Power BI.

My non-parameterized M query correctly queried back just the aggregated totals using a group by. Once I tried to add parameters, it started querying back the entire table.

AlexisOlson · ‎02-10-2022

Let's back up a bit. The number of unique user/program combinations shouldn't be more than a few times larger than the number of unique users. Why is such a table too large to handle? How many users are we talking about?

MarkEvansHW · ‎02-11-2022

Hey Alexis - Total user-program combinations are just at seven million. We could import this table for this purpose, if that's what you're getting at, but we were hoping to avoid that for this one request.

Even importing it, I've been unclear on how to summarize an imported table in this way but let's set that to the side right now as it is a different question.

Redshift Direct Query - Using (List) Parameters

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024