Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The ultimate Microsoft Fabric, Power BI, Azure AI & SQL learning event! Join us in Las Vegas from March 26-28, 2024. Use code MSCUST for a $100 discount. Register Now

Nolock

Transitive Closure in Power Query - Improvements

It was about a week ago when I published a blog post about transitive closure in Power Query: https://community.powerbi.com/t5/Community-Blog/Transitive-Closure-in-Power-Query/bc-p/786063. As you can see, under the original blog post there are some great comments about possible performance boosts and functionality improvements. I’d like to present you the improved solution.

 

If you have not read the original blog post, yet, I highly recommend that you do so, since today I will present only some improvements.

 

Replace recursion with iteration

First of all, I’d like to say thank you to @ImkeF. She has proposed to use iteration instead of recursion. When I was writing the original code, I was also thinking about using iteration, but I didn’t know how to write the code in M. Imke’s blog post (https://www.thebiccountant.com/2017/09/26/recursion-m-beginners/) has inspired me to use the function List.Generate in a completely new way.

On the next screenshot you see that you always have access to the result of the last step. It was the Aha! Moment for me. Now it seems obvious to me, but before I have never thought about that this way. The keyword each used in all examples on the internet has kept this obvious fact hidden from me for a long time.

 

5.PNG

 

The same logic written in c#:

 

for (var i = 1; i < 10; i *= 2) 
    Console.WriteLine(i);

 

I should also mention that you can generate a list of more complex objects than just a list of integers. In my transitive closure solution, I use a record which contains an intermediate result. The while-cycle ends if nothing has changed since the last iteration.

 

List.Generate(
    // init
    () => 
        [
            // all descendants found till now
            result = directDescendantsRecord, 
            // count of all nested elements
            countOfRecordElements = fnCountOfRecordElements(directDescendantsRecord), 
            // count of last all nested elements
            lastCountOfRecordElements = -1
        ],
    // do while counts of all nested elements are different
    (current) => current[lastCountOfRecordElements] <> current[countOfRecordElements],
    // calculate next step
    (last) => fnNextStep(last),
    // select only the property result
    (current) => current[result]                                          
)

 

Cyclic directed graph

The other improvement is adding a test for cycles. A graph isn’t always acyclic like a tree but can contains cycles of any length. More about cycles on https://en.wikipedia.org/wiki/Cycle_(graph_theory).

The simplest example is a node having an edge directed to itself: 1 => 1. Or a more complex one: 1 => 2, 2 => 1, and so on.

4.png

 

The community user @JVos has reported having this problem with a cyclic graph and I am happy to be able to solve it. Luckily, thanks to the iteration the problem has disappeared. But even for the recursive algorithm the solution is only 1 row long 😉

And at the end, as usual, the whole code:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("Pcu5DQAwDALAXahd5FNiz2J5/zVCKNKgE4hMdBgGypJpmNKk+u+WtP66qSMdyiWnQor3bai6", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [From = _t, To = _t]),
    ChangedType = Table.TransformColumnTypes(Source,{{"From", type text}, {"To", type text}}),

    // buffer the source table
    BufferedTable = Table.Buffer(ChangedType),

    // create a distinct sorted list of From values
    FromDistinctList = List.Sort(List.Distinct(Table.Column(BufferedTable, "From"))),

    // get all direct descendants as a record (key-value pairs From ==> To)
    DirectDescendantsRecord = List.Accumulate(
        FromDistinctList,
        [],
        (state, current) =>
            state & 
            Expression.Evaluate(
                "[" & current & "= Table.Column(Table.SelectRows(BufferedTable, each [From] = current), ""To"")]", 
                [Table.Column = Table.Column, Table.SelectRows = Table.SelectRows, BufferedTable = BufferedTable, current = current]
            )
    ),    

    // get a count of all nested elements
    fnCountOfRecordElements = (rec as record) => List.Count(List.Combine(Record.FieldValues(rec))),

    fnNextStep = (
        lastStepResult as record
    ) as record =>
        let
            // get descendants of the last step
            lastDescendents = lastStepResult[result],
            
            newDescendents = List.Transform(
                Record.FieldNames(lastDescendents),
                (fieldName) => 
                    let
                        // last descendents of a field name
                        values = Record.Field(lastDescendents, fieldName),
                        // for all last descendents get their descendents
                        newIndirectDescendents = List.Combine(
                            List.Transform(
                                values, 
                                (value) => Record.FieldOrDefault(lastDescendents, value, {})
                            )
                        ),
                        // combine old and new descendents
                        allDescendants = List.Distinct(
                            List.Combine({values, newIndirectDescendents})
                        ),
                        // create a record of field name and all its descendants
                        result = Expression.Evaluate("[" & fieldName & "=allDescendants]", [allDescendants = allDescendants])
                    in
                        result
            ),

            // create a new descendents record as a combination of all till now found descendents
            combinedRecord = Record.Combine(newDescendents)
        in
            [
                result = combinedRecord, 
                countOfRecordElements = fnCountOfRecordElements(combinedRecord), 
                lastCountOfRecordElements = lastStepResult[countOfRecordElements]
            ],

    // get a list of all descendants (iteration)
    fnTransitiveRelationList = (
        directDescendantsRecord as record
    ) as table =>
        let
            // iterate until 2 following each other iterations have the same count of elements
            whileTrue = 
                List.Generate(
                    // init
                    () => 
                        [
                            // all descendants found till now
                            result = directDescendantsRecord, 
                            // count of all nested elements
                            countOfRecordElements = fnCountOfRecordElements(directDescendantsRecord), 
                            // count of last all nested elements
                            lastCountOfRecordElements = -1
                        ],
                    // do while counts of all nested elements are different
                    (current) => current[lastCountOfRecordElements] <> current[countOfRecordElements],
                    // calculate next step
                    (last) => fnNextStep(last),
                    // select only the property result
                    (current) => current[result]                                          
                ),
            // get last item of the list (the list contains all iterations of the algorithm)
            lastItem = List.Last(whileTrue),
            // create a table
            toTable = Record.ToTable(lastItem),   
            // concat values
            combineDescendants = Table.TransformColumns(toTable, {{"Value", each Text.Combine(_, ";")}})
        in
            combineDescendants,

    // get recursive all descendants of current value
    TransitiveClosure = fnTransitiveRelationList(DirectDescendantsRecord)
in
    TransitiveClosure

 

Comments

Hi @Nolock: Great! Due to changed priorities in our project, it took some time before I could get back to you. I did a performance test with over 200 from-to's, and also one including a long chain. The performance is perfect! Cycles are also dealt with. Thank you very much!

 

Question: How can I best create seperate records containing all found from-to's, so from e.g. 8 - 9;10 to 8 - 9 and 8 - 10? My current solution (I repeat the last lines of your code):

 

    // get recursive all descendants of current value
    TransitiveClosure = fnTransitiveRelationList(DirectDescendantsRecord),
    SeparateFromToRecords = Table.ExpandListColumn(Table.TransformColumns(TransitiveClosure, {{"Value", Splitter.SplitTextByDelimiter(";", QuoteStyle.None), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Value") // added by JVos
in
    // TransitiveClosure // disabled by JVos
    SeparateFromToRecords // added by JVos

@Nolock: I have problems with bigger input datasets. I had one with 1,000 records, but after 30 minutes I cancelled the query. Testing with 250 input records was OK. The 251st one led to a problem. It turned out that changing the from-value and/or the to-value of that record gives no performance problem. So it depends on the values. I have no idea what the problem might be. Can I provide you the input dataset to have a look at it?

@Nolock: Additional information. I created a small, straightforward testset with from-to's as follows: 1-2, 2-3, 3-4, and so on until 16-17 and then 17-1. At me, until 16 and then 16-1 is OK, but then it stops. I have the idea this is more or less happening with my real dataset.