cancel
Showing results for 
Search instead for 
Did you mean: 

Find articles, guides, information and community news

Most Recent
Super User
Super User

Every respected computer scientist has heard about recursive functions (more details on Wiki: https://en.wikipedia.org/wiki/Recursion). There are many algorithms which are mentioned as example of such functions, i.e. Fibonacci sequence as the simplest example:

Fib(0) = 0
Fib(1) = 1
For all integers n > 1, Fib(n) = Fib(n-1) + Fib(n-2)

 

How do we rewrite this pseudo code into a function in PowerQuery?

let
    fnFibonacci = (value as number) as number =>
        if value = 0 then
            0
        else if value = 1 then
            1
        else
            @fnFibonacci(value - 1) + @fnFibonacci(value - 2),
    Source = List.Transform({0..10}, fnFibonacci)
in
    Source

 

The most important part is the use of @ before we call the recursive function. We tell the PowerQuery to reference its own name from inside itself.

Let’s test our function:

Screenshot1.PNG

 

Well, that is great but not very useful in real life. I have been searching for long time for a scenario which can use recursive functions in a meaningful way. And I have found one!

In PowerQuery, you can select or rename many columns in one step. Moreover, you can change their data types at once. But repetitive modifications of a table are not so easy anymore.

 

In my ETL process written in PowerQuery, I wanted to split every datetime column in a fact table into 2 separate columns: date and time. It has at least 2 big advantages. First of all, date and time values become a foreign key for my date and time dimensions; second, the size of my dataset will decrease dramatically. This means that instead of many unique values I’ll get only 365 unique values a year for date dimension and 86 400 unique values for time dimension with the precision of one second.

What are my options? I can create a new custom column of date datatype for every datetime column and another custom column for time. Yes, I can, but I do not like doing a repetitive work. Let’s try to automate it.

 

Firstly, we need some test data.

// create a table with some datetime columns
    Source = Table.FromRecords(
        {  
            [OrderID = 1, CustomerID = 1, OrderPlaced_dt = #datetime(2019,6,16,10,0,0), OrderPaid_dt = #datetime(2019,6,16,10,5,0), OrderShipped_dt = #datetime(2019,6,16,11,0,0), Price = 100.0],
            [OrderID = 2, CustomerID = 1, OrderPlaced_dt = #datetime(2019,6,16,12,12,12), OrderPaid_dt = #datetime(2019,6,16,13,13,13), OrderShipped_dt = null, Price = 200.0]  
        },
        type table[OrderID = Int64.Type, CustomerID = Int64.Type, OrderPlaced_dt = DateTime.Type, OrderPaid_dt = DateTime.Type, OrderShipped_dt = DateTime.Type, Price = Decimal.Type]
    ),

 

Next, let’s create a function, which has 2 parameters - a source table and a name of a datetime column. This function does the same what you would do in the UI. It creates 2 new columns and removes the original one.

    // split a datetime column into 2 columns: date and time
    fnSplitDateTimeColumn = (parTbl as table, parColumnName as text) as table =>
        let
            // add a new column with date
            transformDateExpression = Expression.Evaluate("each Date.From([" & parColumnName & "])", #shared),
            addDateColumn = Table.AddColumn(parTbl, parColumnName & "_date", transformDateExpression, type date),

            // add a new column with time
            transformTimeExpression = Expression.Evaluate("each try #time(Time.Hour([" & parColumnName & "]),Time.Minute([" & parColumnName & "]),Time.Second([" & parColumnName & "])) otherwise null", #shared),
            addTimeColumn = Table.AddColumn(addDateColumn, parColumnName & "_time", transformTimeExpression, type time),

            // remove datetime column
            removeDateTimeColumn = Table.RemoveColumns(addTimeColumn, parColumnName)
        in
            removeDateTimeColumn,

 

And finally, in the last step we create another function which is recursive. This function has also 2 parameters - a source table and a list of all datetime column names which we haven’t transformed yet. The function takes the first item from the list (a datetime column name), splits this datetime column into 2 columns, and calls itself recursively. The recursive call uses the last step as the new source table and a list of all datetime column names except the first one (which is already processed). Once the list is empty, the recursion terminates. The recursion function returns a modified source table – instead of one datetime columns there are now 2 columns.

    // recursive function which splits all datetime columns into date and time columns
    // parTbl is a source table to modify, parColumnNameList is a list of columns to split
    fnSplitAllDateTimeColumns = (parTbl as table, parColumnNameList as list) as table =>
        // if parColumNameList is empty, terminate the recursion
        if List.IsEmpty(parColumnNameList) then
            parTbl
        else
            let
                // get one column name to process
                currentColumnName = List.First(parColumnNameList),
                // remove first item from the parColumnNameList
                nextColumNameList = List.RemoveFirstN(parColumnNameList, 1),

                // split current column
                splitOneColumnTable = fnSplitDateTimeColumn(parTbl, currentColumnName),
                // call itself recursively with a new created table and a shortend column name list
                nextIterationTable = @fnSplitAllDateTimeColumns(splitOneColumnTable, nextColumNameList)
            in
                nextIterationTable,

And everything together:

let
    // create a table with some datetime columns
    Source = Table.FromRecords(
        {  
            [OrderID = 1, CustomerID = 1, OrderPlaced_dt = #datetime(2019,6,16,10,0,0), OrderPaid_dt = #datetime(2019,6,16,10,5,0), OrderShipped_dt = #datetime(2019,6,16,11,0,0), Price = 100.0],
            [OrderID = 2, CustomerID = 1, OrderPlaced_dt = #datetime(2019,6,16,12,12,12), OrderPaid_dt = #datetime(2019,6,16,13,13,13), OrderShipped_dt = null, Price = 200.0]  
        },
        type table[OrderID = Int64.Type, CustomerID = Int64.Type, OrderPlaced_dt = DateTime.Type, OrderPaid_dt = DateTime.Type, OrderShipped_dt = DateTime.Type, Price = Decimal.Type]
    ),

    // split a datetime column into 2 columns: date and time
    fnSplitDateTimeColumn = (parTbl as table, parColumnName as text) as table =>
        let
            // add a new column with date
            transformDateExpression = Expression.Evaluate("each Date.From([" & parColumnName & "])", #shared),
            addDateColumn = Table.AddColumn(parTbl, parColumnName & "_date", transformDateExpression, type date),

            // add a new column with time
            transformTimeExpression = Expression.Evaluate("each try #time(Time.Hour([" & parColumnName & "]),Time.Minute([" & parColumnName & "]),Time.Second([" & parColumnName & "])) otherwise null", #shared),
            addTimeColumn = Table.AddColumn(addDateColumn, parColumnName & "_time", transformTimeExpression, type time),

            // remove datetime column
            removeDateTimeColumn = Table.RemoveColumns(addTimeColumn, parColumnName)
        in
            removeDateTimeColumn,

    // recursive function which splits all datetime columns into date and time columns
    // parTbl is a source table to modify, parColumnNameList is a list of columns to split
    fnSplitAllDateTimeColumns = (parTbl as table, parColumnNameList as list) as table =>
        // if parColumNameList is empty, terminate the recursion
        if List.IsEmpty(parColumnNameList) then
            parTbl
        else
            let
                // get one column name to process
                currentColumnName = List.First(parColumnNameList),
                // remove first item from the parColumnNameList
                nextColumNameList = List.RemoveFirstN(parColumnNameList, 1),

                // split current column
                splitOneColumnTable = fnSplitDateTimeColumn(parTbl, currentColumnName),
                // call itself recursively with a new created table and a shortend column name list
                nextIterationTable = @fnSplitAllDateTimeColumns(splitOneColumnTable, nextColumNameList)
            in
                nextIterationTable,

    // get all columns having the datatype datetime
    DateTimeColumnNames = Table.ColumnsOfType(Source, {type datetime}),
    // split all datetime columns
    SplitAllDateTimeColumns = fnSplitAllDateTimeColumns(Source, DateTimeColumnNames)
in
    SplitAllDateTimeColumns

 

And the result:

Screenshot2.PNG

 

Is it a lot of code for such a simple task which you can manage in PowerQuery Editor with shiny UI? Yes and no. If you have many fact tables with a lot of datetime columns and you do an ETL, you’ll be very happy having the opportunity to automate that. But if you have just one table with 2 datetime columns, it is an overkill.

 

Do you know any other pragmatic use case for recursive functions in PowerQuery? Please, let me know down in the comments 😊

Helpful resources

Join Blog
Interested in blogging for the community? Let us know.