cancel
Showing results for
Did you mean:

## Combinatorics in Power Query -part1

Combinatorics is one of the most important areas of data analysis. It helps in painting a meaningful picture from tons of data very quickly.

We are often faced with a situation when we need to quickly detemine which combination of factors are actually driving the subtotal. In order to know that, it is important to generate all the subsets of a set and perform aggregation by those combinations in order to answer that question.

It is possible to generate all possible combinations of items in PowerBI using Power query and this post focuses on how to do that.

To elaborate, let's suppose there is a list as following

let
Source = {"powerBI","powerQuery","DAX"}
in
Source

and the end goal is to generate all possible subsets of this set which would be following

{},

{"powerBI"},{"powerQuery"},{"DAX"},

{"powerBI","powerQuery"},{"powerQuery","DAX"},{"powerBI","DAX"},

{"powerBI","powerQuery","DAX"}

The following code generates all possible subsets of this set.

let
Source = {"powerBI","powerQuery","DAX"}
p1 = List.Transform({1 .. Number.Power(2, List.Count(Source))}, each _ - 1),
#"Converted to Table" = Table.FromList(p1, Splitter.SplitByNothing(), {"Value"}),
#"Converted to Table",
"Custom",
each
let
Loop = List.Generate(
() =>
[i = [Value], j = Number.IntegerDivide(i, 2), k = Number.Mod(i, 2), l = Text.From(k)],
each [i] > 0,
each [
i = [j],
j = Number.IntegerDivide(i, 2),
k = Number.Mod(i, 2),
l = Text.From(k) & [l]
],
each [l]
),
y = try Loop{List.Count(Loop) - 1} otherwise "0",
in
z
),
"Custom.2",
each
let
x = [Custom.1],
Terminate = List.Count(x),
Loop = List.Generate(
() => [i = 0, j = x{i}, k = if j = "1" then Source{i} else null],
each [i] < Terminate,
each [i = [i] + 1, j = x{i}, k = if j = "1" then Source{i} else null],
each [k]
)
in
List.RemoveNulls(Loop)
),
"Combinations",
each
let
x = [Custom.2],
y = List.Generate(
() => [i = 0, j = x{i}, k = j],
each [i] < List.Count(x),
each [i = [i] + 1, j = x{i}, k = [k] & "," & j],
each [k]
)
in
try y{List.Count(y) - 1} otherwise null
),
in
Combinations

The code starts from here

and generates this

An optimized version of the above code is following

let
Source = {"powerBI","powerQuery","DAX"},
Initiator={{}},
Loop = List.Generate(
()=>[i=0,j=Source{i},k=List.Combine({Initiator{i},{j}}),l=List.InsertRange(Initiator,List.Count(Initiator),{k})],
each[i]<List.Count(Source),
each[i=[i]+1,j=Source{i},k=[l],l=
let x = List.Generate(
()=>[a=0,b=k{a},c=List.Combine({b,{j}}),d=List.Combine({k,{c}})],
each [a]<List.Count(k),
each [a=[a]+1,b=k{a},c=List.Combine({b,{j}}),d=List.Combine({[d],{c}})],
each[d]  ) in x{List.Count(x)-1}],
each [l]

)
in
List.Transform(Loop{List.Count(Loop)-1},each Text.Combine(_,","))

A parameterized version with the choice a different delimiter is avaialble here

In my next post, I will show how this concept applies to a real-life scenario.

Here's another approach:

let
Source = {"powerBI", "powerQuery", "DAX"},
N = List.Count(Source),
Subsets =
List.Transform(
{0..Number.Power(2, N)-1},
(i) => List.Transform(
{0..N-1},
(j) => if Number.Mod(Number.IntegerDivide(i, Number.Power(2, j)), 2) = 1
then Source{j}
else null
)
),
Concatenate = List.Transform(Subsets, each Text.Combine(List.RemoveNulls(_), ","))
in
Concatenate

The way it works:

A set with N elements has 2^N possible subsets which can be represented with N bits.

For this set of three values, there are 2^3 = 8 subsets that can be represented as

0 = 000 = {}
1 = 001 = {"DAX"}
2 = 010 = {"powerQuery"}
3 = 011 = {"powerQuery", "DAX"}
4 = 100 = {"powerBI"}
5 = 101 = {"powerBI", "DAX"}
6 = 110 = {"powerBI", "powerQuery"}
7 = 111 = {"powerBI", "powerQuery", "DAX"}

My code starts with the list {0..2^N-1} = {0,1,2,3,4,5,6,7} and transforms each of the elements in this list to into a subset of {"powerBI", "powerQuery", "DAX"} according to the representation above.

For each of these integers 0-7 (index = i), I calculate each of the digits of the binary number and return the corresponding element of the original set. For three elements, there are three digits (index = j).

Let's look at i = 6 as an example. The innermost transform is on the list of binary digit indices, {0,1,2}.

The 0th digit is mod( 6 / (2^0), 2 ) = mod ( 6, 2 ) = 0, so I insert null instead of an set element.

The 1st digit is mod( 6 / (2^1), 2 ) = mod ( 3, 2 ) = 1, so I include the 1st element of the original set.
The 2nd digit is mod( 6 / (2^2), 2 ) = mod ( 1, 2 ) = 1, so I include the 2nd element of the original set.

Note: In the expression mod( i / (2^j), 2), the "/" represents integer division in the above and the digits indices correspond to the powers of 2, which grow from right-to-left. Hence {0,1,2} --> {0,1,1} above corresponds to 110 (binary) = 6 (decimal).

Thus for the element 6, I transform the index list {0,1,2} according to the binary digits of 6 and return  the list {null, "powerBI", "powerQuery"}. The final step is to remove nulls and concatenate the remaining elements to get "powerBI,powerQuery".

@AlexisOlson  is there any way this can be translated to DAX? If the combinations can be generated as per the filter context, it would be simply insane. I tried but I could not make it to work.

@smpa01 It can be done quite similarly in DAX.

All Subsets =
VAR WordList = VALUES ( Words[Word] )
VAR N = COUNTROWS ( WordList )
VAR RankedList =
ADDCOLUMNS ( WordList, "j", RANKX ( WordList, [Word],, ASC ) )
VAR Subsets =
SELECTCOLUMNS ( GENERATESERIES ( 0, 2 ^ N - 1 ), "i", [Value] )
RETURN
CONCATENATEX (
Subsets,
CONCATENATEX (
FILTER (
RankedList,
MOD ( INT ( [i] / 2 ^ ( [j] - 1 ) ), 2 ) = 1
),
[Word],
", "
),
UNICHAR ( 10 )
)

Here's what this looks like as a Christmas tree in a card visual:

@AlexisOlson  this is epic. I need to study this and make it suitable for my case.  But thank you very much for this.

This is awesome !!!  I might come back with some follow-up questions to you.

This method seams rather convoluted to generate all possible Combinations, IMO.

Surely, there must be a simple way to generate all possible Combinations of n elements in k-size sequences.

(no repetition, so n>=k)

Also, I think we can restrict the study to just whole numbers.

(we can always have a dimension table at the side to translate those whole numbers to whtatever we want.)

Top Kudoed Posts
Latest Articles
Archives