Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

smpa01

Combinatorics in Power Query -part1

Combinatorics is one of the most important areas of data analysis. It helps in painting a meaningful picture from tons of data very quickly.

We are often faced with a situation when we need to quickly detemine which combination of factors are actually driving the subtotal. In order to know that, it is important to generate all the subsets of a set and perform aggregation by those combinations in order to answer that question.

It is possible to generate all possible combinations of items in PowerBI using Power query and this post focuses on how to do that. 

To elaborate, let's suppose there is a list as following

 

let 
   Source = {"powerBI","powerQuery","DAX"}
in
  Source

 

and the end goal is to generate all possible subsets of this set which would be following

 

{},

{"powerBI"},{"powerQuery"},{"DAX"},

{"powerBI","powerQuery"},{"powerQuery","DAX"},{"powerBI","DAX"},

{"powerBI","powerQuery","DAX"}

 

The following code generates all possible subsets of this set.

 

let
Source = {"powerBI","powerQuery","DAX"}
  p1 = List.Transform({1 .. Number.Power(2, List.Count(Source))}, each _ - 1), 
  #"Converted to Table" = Table.FromList(p1, Splitter.SplitByNothing(), {"Value"}), 
  #"Added Custom" = Table.AddColumn(
    #"Converted to Table", 
    "Custom", 
    each 
      let
        Loop = List.Generate(
          () =>
            [i = [Value], j = Number.IntegerDivide(i, 2), k = Number.Mod(i, 2), l = Text.From(k)], 
          each [i] > 0, 
          each [
            i = [j], 
            j = Number.IntegerDivide(i, 2), 
            k = Number.Mod(i, 2), 
            l = Text.From(k) & [l]
          ], 
          each [l]
        ), 
        y = try Loop{List.Count(Loop) - 1} otherwise "0", 
        z = Text.PadStart(y, List.Count(Source), "0")
      in
        z
  ), 
  #"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each Text.ToList([Custom])), 
  #"Added Custom2" = Table.AddColumn(
    #"Added Custom1", 
    "Custom.2", 
    each 
      let
        x = [Custom.1], 
        Terminate = List.Count(x), 
        Loop = List.Generate(
          () => [i = 0, j = x{i}, k = if j = "1" then Source{i} else null], 
          each [i] < Terminate, 
          each [i = [i] + 1, j = x{i}, k = if j = "1" then Source{i} else null], 
          each [k]
        )
      in
        List.RemoveNulls(Loop)
  ), 
  #"Added Custom3" = Table.AddColumn(
    #"Added Custom2", 
    "Combinations", 
    each 
      let
        x = [Custom.2], 
        y = List.Generate(
          () => [i = 0, j = x{i}, k = j], 
          each [i] < List.Count(x), 
          each [i = [i] + 1, j = x{i}, k = [k] & "," & j], 
          each [k]
        )
      in
        try y{List.Count(y) - 1} otherwise null
  ), 
  Combinations = #"Added Custom3"[Combinations]
in
  Combinations

 

 

The code starts from here

smpa01_0-1634763818622.png

and generates this

smpa01_1-1634763910168.png

An optimized version of the above code is following

 

 

 

let
    Source = {"powerBI","powerQuery","DAX"},
    Initiator={{}},
    Loop = List.Generate(
    ()=>[i=0,j=Source{i},k=List.Combine({Initiator{i},{j}}),l=List.InsertRange(Initiator,List.Count(Initiator),{k})],
    each[i]<List.Count(Source),
    each[i=[i]+1,j=Source{i},k=[l],l=
                         let x = List.Generate(
                                ()=>[a=0,b=k{a},c=List.Combine({b,{j}}),d=List.Combine({k,{c}})],
                                each [a]<List.Count(k),
                                each [a=[a]+1,b=k{a},c=List.Combine({b,{j}}),d=List.Combine({[d],{c}})],
                                each[d]  ) in x{List.Count(x)-1}],
                                each [l]
                                
    )
in
    List.Transform(Loop{List.Count(Loop)-1},each Text.Combine(_,","))

 

 

 

A parameterized version with the choice a different delimiter is avaialble here 

In my next post, I will show how this concept applies to a real-life scenario.

Comments

Here's another approach:

let
  Source = {"powerBI", "powerQuery", "DAX"},
  N = List.Count(Source),
  Subsets =
      List.Transform(
          {0..Number.Power(2, N)-1},
          (i) => List.Transform(
                     {0..N-1},
                     (j) => if Number.Mod(Number.IntegerDivide(i, Number.Power(2, j)), 2) = 1
                            then Source{j}
                            else null
                 )
      ),
  Concatenate = List.Transform(Subsets, each Text.Combine(List.RemoveNulls(_), ","))
in
  Concatenate

 

The way it works:

A set with N elements has 2^N possible subsets which can be represented with N bits.

 

For this set of three values, there are 2^3 = 8 subsets that can be represented as

0 = 000 = {}
1 = 001 = {"DAX"}
2 = 010 = {"powerQuery"}
3 = 011 = {"powerQuery", "DAX"}
4 = 100 = {"powerBI"}
5 = 101 = {"powerBI", "DAX"}
6 = 110 = {"powerBI", "powerQuery"}
7 = 111 = {"powerBI", "powerQuery", "DAX"}

My code starts with the list {0..2^N-1} = {0,1,2,3,4,5,6,7} and transforms each of the elements in this list to into a subset of {"powerBI", "powerQuery", "DAX"} according to the representation above.

 

For each of these integers 0-7 (index = i), I calculate each of the digits of the binary number and return the corresponding element of the original set. For three elements, there are three digits (index = j).

 

Let's look at i = 6 as an example. The innermost transform is on the list of binary digit indices, {0,1,2}.

The 0th digit is mod( 6 / (2^0), 2 ) = mod ( 6, 2 ) = 0, so I insert null instead of an set element.

The 1st digit is mod( 6 / (2^1), 2 ) = mod ( 3, 2 ) = 1, so I include the 1st element of the original set.
The 2nd digit is mod( 6 / (2^2), 2 ) = mod ( 1, 2 ) = 1, so I include the 2nd element of the original set.

 

Note: In the expression mod( i / (2^j), 2), the "/" represents integer division in the above and the digits indices correspond to the powers of 2, which grow from right-to-left. Hence {0,1,2} --> {0,1,1} above corresponds to 110 (binary) = 6 (decimal).

 

Thus for the element 6, I transform the index list {0,1,2} according to the binary digits of 6 and return  the list {null, "powerBI", "powerQuery"}. The final step is to remove nulls and concatenate the remaining elements to get "powerBI,powerQuery".

@AlexisOlson  is there any way this can be translated to DAX? If the combinations can be generated as per the filter context, it would be simply insane. I tried but I could not make it to work. 

@smpa01 It can be done quite similarly in DAX.

All Subsets =
VAR WordList = VALUES ( Words[Word] )
VAR N = COUNTROWS ( WordList )
VAR RankedList =
    ADDCOLUMNS ( WordList, "j", RANKX ( WordList, [Word],, ASC ) )
VAR Subsets =
    SELECTCOLUMNS ( GENERATESERIES ( 0, 2 ^ N - 1 ), "i", [Value] )
RETURN
    CONCATENATEX (
        Subsets,
        CONCATENATEX (
            FILTER (
                RankedList,
                MOD ( INT ( [i] / 2 ^ ( [j] - 1 ) ), 2 ) = 1
            ),
            [Word],
            ", "
        ),
        UNICHAR ( 10 )
    )

 

Here's what this looks like as a Christmas tree in a card visual:

AlexisOlson_0-1642102983999.png

@AlexisOlson  this is epic. I need to study this and make it suitable for my case.  But thank you very much for this. 

 

This is awesome !!!  I might come back with some follow-up questions to you.

This method seams rather convoluted to generate all possible Combinations, IMO.

Surely, there must be a simple way to generate all possible Combinations of n elements in k-size sequences.

(no repetition, so n>=k)

Also, I think we can restrict the study to just whole numbers.

(we can always have a dimension table at the side to translate those whole numbers to whtatever we want.)