Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Nolock

Transitive Closure in Power Query

In the last few months I have been answering dozens of users’ questions. And there were at least three or four of you with different use cases but the same nominator: I have a table with two columns From and To. And for every distinct value of the column From I want to see all nodes which I can reach by traversing the edges From --> To. This problem has a name: transitive closure.

 

What is a transitive closure?

The omniscient Wikipedia says to transitive closure: “In mathematics, the transitive closure of a binary relation R on a set X is the smallest relation on X that contains R and is transitive.
For example, if X is a set of airports and xRy means "there is a direct flight from airport x to airport y" (for x and y in X), then the transitive closure of R on X is the relation R+ such that x R+ y means "it is possible to fly from x to y in one or more flights". Informally, the transitive closure gives you the set of all places you can get to from any starting place.”

A picture is worth a thousand words:

 

1.png

 

What we want to achieve is shown on the following image:

23.png

 

The code

Let’s start with our data sample from the previous illustration. It is a table with two columns From and To. Every row is a directed edge from From to To.

It will be recursion, again, I know, but I don’t know any other way to solve this problem in PowerQuery. But before we come to a recursive function, let us create key-value pairs where key is a node and value is a list of all nodes, which you can reach directly from the key. The result won’t be a table but a record instead. (Why do I do that? It will be much faster – you can read more about this performance boost in my recent post Lookup Table in Power Query.)

 

// get all direct descendants as a record (key-value pairs From ==> To)
DirectDescendantsRecord = List.Accumulate(
    FromDistinctList,
    [],
    (state, current) =>
        state & 
        Expression.Evaluate(
            "[" & current & "= Table.Column(Table.SelectRows(BufferedTable, each [From] = current), ""To"")]", 
            [Table.Column = Table.Column, Table.SelectRows = Table.SelectRows, BufferedTable = BufferedTable, current = current]
        )
),

 

This fast access to all direct descendants of a node will be very helpful in the next step, in the recursive function fnTransitiveRelationList. This function gets a list of all nodes which we should visit and the newly created record with key-value pairs. It goes recursively through all the nodes until the list is empty. As a result, you get all nodes which you can visit from the current node.

 

// get recursively list of all descendants
fnTransitiveRelationList = (toBeDoneList as list, directDescendantsRecord as record) as list =>
    let
        result = 
            if List.IsEmpty(toBeDoneList) then
                {}
            else
                let
                    newToBeDoneList = List.RemoveItems(
                        // combine all lists together
                        List.Combine(
                            // get a list of direct descendants for every value
                            List.Transform(
                                toBeDoneList, 
                                each Record.FieldOrDefault(directDescendantsRecord, _, {})
                            )
                        ),
                        toBeDoneList
                    )
                in
                    List.Union({toBeDoneList, newToBeDoneList, @fnTransitiveRelationList(newToBeDoneList, directDescendantsRecord)})
    in
        result,

 

And that’s all. The last part is only a call of the recursive function.

 

// get recursive all descendants of current value
TransitiveClosure = Table.AddColumn(
    TableFromList, 
    "AllDescendantRelations", 
    (_) =>
        let 
            allRelations = fnTransitiveRelationList({[From]}, DirectDescendantsRecord),
            removeItself = List.RemoveItems(allRelations, {[From]}),
            toText = Text.Combine(removeItself, ";")
        in
            toText,
    type list
)

And the whole code in one piece.

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTJSitWBsIzBLCMgywTMMgayLMAsUyDLDMwyA7LM4SygbCwA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [From = _t, To = _t]),
    ChangedType = Table.TransformColumnTypes(Source,{{"From", type text}, {"To", type text}}),

    // buffer the source table
    BufferedTable = Table.Buffer(ChangedType),

    // create a distinct sorted list of From values
    FromDistinctList = List.Sort(List.Distinct(Table.Column(BufferedTable, "From"))),

    // get all direct descendants as a record (key-value pairs From ==> To)
    DirectDescendantsRecord = List.Accumulate(
        FromDistinctList,
        [],
        (state, current) =>
            state & 
            Expression.Evaluate(
                "[" & current & "= Table.Column(Table.SelectRows(BufferedTable, each [From] = current), ""To"")]", 
                [Table.Column = Table.Column, Table.SelectRows = Table.SelectRows, BufferedTable = BufferedTable, current = current]
            )
    ),

    // create a table of distinct From values
    TableFromList = Table.FromList(FromDistinctList, Splitter.SplitByNothing(), {"From"}),

    // get recursively list of all descendants
    fnTransitiveRelationList = (toBeDoneList as list, directDescendantsRecord as record) as list =>
        let
            result = 
                if List.IsEmpty(toBeDoneList) then
                    {}
                else
                    let
                        newToBeDoneList = List.RemoveItems(
                            // combine all lists together
                            List.Combine(
                                // get a list of direct descendants for every value
                                List.Transform(
                                    toBeDoneList, 
                                    each Record.FieldOrDefault(directDescendantsRecord, _, {})
                                )
                            ),
                            toBeDoneList
                        )
                    in
                        List.Union({toBeDoneList, newToBeDoneList, @fnTransitiveRelationList(newToBeDoneList, directDescendantsRecord)})
        in
            result,

    // get recursive all descendants of current value
    TransitiveClosure = Table.AddColumn(
        TableFromList, 
        "AllDescendantRelations", 
        (_) =>
            let 
                allRelations = fnTransitiveRelationList({[From]}, DirectDescendantsRecord),
                removeItself = List.RemoveItems(allRelations, {[From]}),
                toText = Text.Combine(removeItself, ";")
            in
                toText,
        type list
    )
in
    TransitiveClosure

 

Comments