Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Nolock

Transitive Closure in Power Query

In the last few months I have been answering dozens of users’ questions. And there were at least three or four of you with different use cases but the same nominator: I have a table with two columns From and To. And for every distinct value of the column From I want to see all nodes which I can reach by traversing the edges From --> To. This problem has a name: transitive closure.

 

What is a transitive closure?

The omniscient Wikipedia says to transitive closure: “In mathematics, the transitive closure of a binary relation R on a set X is the smallest relation on X that contains R and is transitive.
For example, if X is a set of airports and xRy means "there is a direct flight from airport x to airport y" (for x and y in X), then the transitive closure of R on X is the relation R+ such that x R+ y means "it is possible to fly from x to y in one or more flights". Informally, the transitive closure gives you the set of all places you can get to from any starting place.”

A picture is worth a thousand words:

 

1.png

 

What we want to achieve is shown on the following image:

23.png

 

The code

Let’s start with our data sample from the previous illustration. It is a table with two columns From and To. Every row is a directed edge from From to To.

It will be recursion, again, I know, but I don’t know any other way to solve this problem in PowerQuery. But before we come to a recursive function, let us create key-value pairs where key is a node and value is a list of all nodes, which you can reach directly from the key. The result won’t be a table but a record instead. (Why do I do that? It will be much faster – you can read more about this performance boost in my recent post Lookup Table in Power Query.)

 

// get all direct descendants as a record (key-value pairs From ==> To)
DirectDescendantsRecord = List.Accumulate(
    FromDistinctList,
    [],
    (state, current) =>
        state & 
        Expression.Evaluate(
            "[" & current & "= Table.Column(Table.SelectRows(BufferedTable, each [From] = current), ""To"")]", 
            [Table.Column = Table.Column, Table.SelectRows = Table.SelectRows, BufferedTable = BufferedTable, current = current]
        )
),

 

This fast access to all direct descendants of a node will be very helpful in the next step, in the recursive function fnTransitiveRelationList. This function gets a list of all nodes which we should visit and the newly created record with key-value pairs. It goes recursively through all the nodes until the list is empty. As a result, you get all nodes which you can visit from the current node.

 

// get recursively list of all descendants
fnTransitiveRelationList = (toBeDoneList as list, directDescendantsRecord as record) as list =>
    let
        result = 
            if List.IsEmpty(toBeDoneList) then
                {}
            else
                let
                    newToBeDoneList = List.RemoveItems(
                        // combine all lists together
                        List.Combine(
                            // get a list of direct descendants for every value
                            List.Transform(
                                toBeDoneList, 
                                each Record.FieldOrDefault(directDescendantsRecord, _, {})
                            )
                        ),
                        toBeDoneList
                    )
                in
                    List.Union({toBeDoneList, newToBeDoneList, @fnTransitiveRelationList(newToBeDoneList, directDescendantsRecord)})
    in
        result,

 

And that’s all. The last part is only a call of the recursive function.

 

// get recursive all descendants of current value
TransitiveClosure = Table.AddColumn(
    TableFromList, 
    "AllDescendantRelations", 
    (_) =>
        let 
            allRelations = fnTransitiveRelationList({[From]}, DirectDescendantsRecord),
            removeItself = List.RemoveItems(allRelations, {[From]}),
            toText = Text.Combine(removeItself, ";")
        in
            toText,
    type list
)

And the whole code in one piece.

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTJSitWBsIzBLCMgywTMMgayLMAsUyDLDMwyA7LM4SygbCwA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [From = _t, To = _t]),
    ChangedType = Table.TransformColumnTypes(Source,{{"From", type text}, {"To", type text}}),

    // buffer the source table
    BufferedTable = Table.Buffer(ChangedType),

    // create a distinct sorted list of From values
    FromDistinctList = List.Sort(List.Distinct(Table.Column(BufferedTable, "From"))),

    // get all direct descendants as a record (key-value pairs From ==> To)
    DirectDescendantsRecord = List.Accumulate(
        FromDistinctList,
        [],
        (state, current) =>
            state & 
            Expression.Evaluate(
                "[" & current & "= Table.Column(Table.SelectRows(BufferedTable, each [From] = current), ""To"")]", 
                [Table.Column = Table.Column, Table.SelectRows = Table.SelectRows, BufferedTable = BufferedTable, current = current]
            )
    ),

    // create a table of distinct From values
    TableFromList = Table.FromList(FromDistinctList, Splitter.SplitByNothing(), {"From"}),

    // get recursively list of all descendants
    fnTransitiveRelationList = (toBeDoneList as list, directDescendantsRecord as record) as list =>
        let
            result = 
                if List.IsEmpty(toBeDoneList) then
                    {}
                else
                    let
                        newToBeDoneList = List.RemoveItems(
                            // combine all lists together
                            List.Combine(
                                // get a list of direct descendants for every value
                                List.Transform(
                                    toBeDoneList, 
                                    each Record.FieldOrDefault(directDescendantsRecord, _, {})
                                )
                            ),
                            toBeDoneList
                        )
                    in
                        List.Union({toBeDoneList, newToBeDoneList, @fnTransitiveRelationList(newToBeDoneList, directDescendantsRecord)})
        in
            result,

    // get recursive all descendants of current value
    TransitiveClosure = Table.AddColumn(
        TableFromList, 
        "AllDescendantRelations", 
        (_) =>
            let 
                allRelations = fnTransitiveRelationList({[From]}, DirectDescendantsRecord),
                removeItself = List.RemoveItems(allRelations, {[From]}),
                toText = Text.Combine(removeItself, ";")
            in
                toText,
        type list
    )
in
    TransitiveClosure

 

Comments

Very cool stuff @Nolock !

 

With regards to recursion, you can use List.Generate as replacement, as it us usually supposed to be faster: https://www.thebiccountant.com/2017/09/26/recursion-m-beginners/

 

And with regards to the lookups, an inner merge could also be an alternative (giving sufficient performance for me at least 😉 https://www.thebiccountant.com/2017/02/14/dynamically-flatten-parent-child-hierarchies-in-dax-and-po...

Hi @ImkeF,

thank you for your ideas 🙂

I will try to rewrite the recursion using an interation and I will also prepare some performance tests.

Hi @Nolock,

 

Thank you for publishing this. Can you tell me how I can make the step from the column 'AllDescendantRelations' to records? On the TransitiveClosure-result, I can choose for "Expand to new rows" (code: Table.ExpandListColumn(TransitiveClosure, "AllDescendantRelations")) but I get the error "We cannot convert the value "2;3;4;8" to type Table.".

 

BTW: I see you omitted the part of the solution for cyclic relations (which is relevant for your airplanes example, too).

 

JVos

Hi @Nolock,

I found a solution to the problem mentioned in my previous comment. I changed the 'TransitiveClosure' part as follows (I commented out what is in your code):

 

    TransitiveClosure = Table.AddColumn(
        TableFromList, 
        "AllDescendantRelations", 
        (_) =>
            let 
                allRelations = fnTransitiveRelationList({[From]}, DirectDescendantsRecord),
                removeItself = List.RemoveItems(allRelations, {[From]})
                //toText = Text.Combine(removeItself, ";")
            in
                //toText,
                removeItself,
        type list
    ),

 

I would like to know: why did you add Text.Combine? I can't imagine you did this without a reason.

 

JVos

@Nolock: I added functionality for cyclic relations to 'fnTransitiveRelationList':

 

    // get recursively list of all descendants
    //fnTransitiveRelationList = (toBeDoneList as list, directDescendantsRecord as record) as list =>
      fnTransitiveRelationList = (toBeDoneList as list, directDescendantsRecord as record, alreadyDoneList as list) as list =>
        let
            result = 
                if List.IsEmpty(toBeDoneList) then
                    {}
                else
                    let
                        // added by JVos:
                        alreadyDoneList = List.Combine({alreadyDoneList, toBeDoneList}),

                        newToBeDoneList = List.RemoveItems(
                            // combine all lists together
                            List.Combine(
                                // get a list of direct descendants for every value
                                List.Transform(
                                    toBeDoneList, 
                                    each Record.FieldOrDefault(directDescendantsRecord, _, {})
                                )
                            ),
                            toBeDoneList
                        //)
                        ),

                        // added by JVos:
                        newToBeDoneList1 = List.Difference(newToBeDoneList, alreadyDoneList)

                    in
                        //List.Union({toBeDoneList, newToBeDoneList, @fnTransitiveRelationList(newToBeDoneList, directDescendantsRecord)})
                          List.Union({toBeDoneList, newToBeDoneList1, @fnTransitiveRelationList(newToBeDoneList1, directDescendantsRecord, alreadyDoneList)})
        in
            result,

And in 'TransitiveClosure':

 

//allRelations = fnTransitiveRelationList({[ROUTING_ID_From]}, DirectDescendantsRecord),
  allRelations = fnTransitiveRelationList({[ROUTING_ID_From]}, DirectDescendantsRecord,{}),

JVos

@Nolock: I had a performance problem with my data. Investigating it with a test set, I found that a chain of 10 relations (1 => 2, 2 => 3, ... , 9 => 10) can just be done on my computer. With a chain of 11, I loose my patience...

@Nolock: I see a comment regarding cyclic relations is not there anymore, so here it is again.

 

To support cyclic relations, I adjusted the code of the 'fnTransitiveRelationList' part as follows:

 

    // get recursively list of all descendants
    //fnTransitiveRelationList = (toBeDoneList as list, directDescendantsRecord as record) as list =>
      fnTransitiveRelationList = (toBeDoneList as list, directDescendantsRecord as record, alreadyDoneList as list) as list =>
        let
            result = 
                if List.IsEmpty(toBeDoneList) then
                    {}
                else
                    let
                        // added by JVos:
                        alreadyDoneList = List.Combine({alreadyDoneList, toBeDoneList}),

                        newToBeDoneList = List.RemoveItems(
                            // combine all lists together
                            List.Combine(
                                // get a list of direct descendants for every value
                                List.Transform(
                                    toBeDoneList, 
                                    each Record.FieldOrDefault(directDescendantsRecord, _, {})
                                )
                            ),
                            toBeDoneList
                        //)
                        ),

                        // added by JVos:
                        newToBeDoneList1 = List.Difference(newToBeDoneList, alreadyDoneList)

                    in
                        //List.Union({toBeDoneList, newToBeDoneList, @fnTransitiveRelationList(newToBeDoneList, directDescendantsRecord)})
                          List.Union({toBeDoneList, newToBeDoneList1, @fnTransitiveRelationList(newToBeDoneList1, directDescendantsRecord, alreadyDoneList)})
        in
            result,

And in the 'TransitiveClosure' part:

 

//allRelations = fnTransitiveRelationList({[ROUTING_ID_From]}, DirectDescendantsRecord),
  allRelations = fnTransitiveRelationList({[ROUTING_ID_From]}, DirectDescendantsRecord,{}),

@Nolock: I've already posted several comments - still another one. This is about extra functionality I need, because there is an extra attribute that determines which indirect relations can be derived from the direct relations.

 

See the picture below. nodes 1, 2, 4 and 5 are country-specific, node 3 is country independent. Indirect relations 1 => 4 and 2 => 5 are allowed. But 1 => 5 and 2 => 4 not.

 

Forbidden indirect relations.png

 

Do you have a solution for this? In the earlier provided solutions - with tables instead of lists - I can figure out a solution.

 

JVos

Hi @ImkeF and @JVos,

I have rewritten the algorithm. It uses an iteration instead of a recursion.

It can also handle cycles as you can see in the test data sample.

In the beginning of the code is also another data sample which shows, that it works with paths longer than 10.

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("Hcu5EQAwCAPBXhQT+DfUwtB/G5aV3GxymegwDJQla5jSpLq0qC1t6kiHutKlXHIqpPhvQ9UD", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [From = _t, To = _t]),
    ChangedType = Table.TransformColumnTypes(Source,{{"From", type text}, {"To", type text}}),

/*
    x1 = Table.FromList({1..20}, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    x2 = Table.RenameColumns(x1, {"Column1", "From"}),
    x3 = Table.AddColumn(x2, "To", each [From] + 1),
    x4 = Table.TransformColumns(x3, {{"From", Text.From}, {"To", Text.From}}),
    ChangedType = Table.TransformColumnTypes(x4,{{"From", type text}, {"To", type text}}),
*/

    // buffer the source table
    BufferedTable = Table.Buffer(ChangedType),

    // create a distinct sorted list of From values
    FromDistinctList = List.Sort(List.Distinct(Table.Column(BufferedTable, "From"))),

    // get all direct descendants as a record (key-value pairs From ==> To)
    DirectDescendantsRecord = List.Accumulate(
        FromDistinctList,
        [],
        (state, current) =>
            state & 
            Expression.Evaluate(
                "[" & current & "= Table.Column(Table.SelectRows(BufferedTable, each [From] = current), ""To"")]", 
                [Table.Column = Table.Column, Table.SelectRows = Table.SelectRows, BufferedTable = BufferedTable, current = current]
            )
    ),    

    // get a count of all nested elements
    fnCountOfRecordElements = (rec as record) => List.Count(List.Combine(Record.FieldValues(rec))),

    fnNextStep = (
        lastStepResult as record
    ) as record =>
        let
            // get descendants of the last step
            lastDescendents = lastStepResult[result],
            
            newDescendents = List.Transform(
                Record.FieldNames(lastDescendents),
                (fieldName) => 
                    let
                        // last descendents of a field name
                        values = Record.Field(lastDescendents, fieldName),
                        // for all last descendents get their descendents
                        newIndirectDescendents = List.Combine(
                            List.Transform(
                                values, 
                                (value) => Record.FieldOrDefault(lastDescendents, value, {})
                            )
                        ),
                        // remove descendents which we have already visited (cares of cycles)
                        removeAlreadyVisited = List.RemoveItems(newIndirectDescendents, values),
                        // combine old and new descendents
                        allDescendants = List.Distinct(
                            List.Combine({values, removeAlreadyVisited})
                        ),
                        // create a record of field name and all its descendants
                        result = Expression.Evaluate("[" & fieldName & "=allDescendants]", [allDescendants = allDescendants])
                    in
                        result
            ),

            // create a new descendents record as a combination of all till now found descendents
            combinedRecord = Record.Combine(newDescendents)
        in
            [
                result = combinedRecord, 
                countOfRecordElements = fnCountOfRecordElements(combinedRecord), 
                lastCountOfRecordElements = lastStepResult[countOfRecordElements]
            ],

    // get a list of all descendants (iteration)
    fnTransitiveRelationList = (
        directDescendantsRecord as record
    ) as table =>
        let
            // iterate until 2 following each other iterations have the same count of elements
            whileTrue = 
                List.Generate(
                    // init
                    () => 
                        [
                            // all descendants found till now
                            result = directDescendantsRecord, 
                            // count of all nested elements
                            countOfRecordElements = fnCountOfRecordElements(directDescendantsRecord), 
                            // count of last all nested elements
                            lastCountOfRecordElements = -1
                        ],
                    // do while counts of all nested elements are different
                    (current) => current[lastCountOfRecordElements] <> current[countOfRecordElements],
                    // calculate next step
                    (last) => fnNextStep(last)                                                
                ),
            // get last item of the list (the list contains all iterations of the algorithm)
            lastItem = List.Last(whileTrue),
            // get property result
            combinedRecord = lastItem[result],
            // create a table
            toTable = Record.ToTable(combinedRecord),         
            // concat values
            combineDescendants = Table.TransformColumns(toTable, {{"Value", each Text.Combine(_, ";")}})
        in
            combineDescendants,

    // get recursive all descendants of current value
    TransitiveClosure = fnTransitiveRelationList(DirectDescendantsRecord)
in
    TransitiveClosure

Hi @JVos,

to your last request about forbidden transitive relations: If you had a table with all forbidden relations, it should be possible to remove them from the result in every step of the iteration. But I think such table will be huge. Also a table of allowed relations will be huge. You know your scenario better than anybody else. You have to decide what is better.

 

Hi @Nolock:

 

1) Thank you for your new algorithm! It might last a week before I will work on it and give you feedback.

 

2) About forbidden transitive relations: indeed it's not workable to have a table with all forbidden relations. So I was thinking that at the moment of defining a transitive relation to check whether it's an allowed one. So my initial table of from-to's should not only have the identification of the node, but also the country-attribute for both the from-node as well as the to-node. But I have to look at your latest code to see if I can extend it for this subject.

 

JVos

@Nolock, Thanks for putting this together!  Wanted to ask why you are using the Evaluate statement to add records, instead of the AddRecord function?

 

state & 
Expression.Evaluate(
    "[" & current & "= Table.Column(Table.SelectRows(BufferedTable, each [From] = current), ""To"")]", 
    [Table.Column = Table.Column, Table.SelectRows = Table.SelectRows, BufferedTable = BufferedTable, current = current]
        )

 

My dataset caused issues because my "From" values contained a hyphen (-).  I was receiving an invalid identifier error.  I rewrote the line above as below

Record.AddField(state, current, Table.Column(Table.SelectRows(BufferedTable, each [From] = current), "To"))

 

I no longer get the error and it appears that I get the desired output.  So I am curious as to the benefit of one over the other?

 

Thanks Again!

Hello @ChuckDeal,

that's an excellent point. I haven't thought about that in this way. It's simpler and more elegant. Thumb up!