Solved: Generate table on availability

DouweMeer · ‎09-29-2020

I was curious whether you can do the following in DAX.

I got these products with a range.

Id	Start	End
A	1	5
B	1	2

And I wanted to change this table in such a way that I generate a record per step of the range.

Id	Level
A	1
A	2
A	3
A	4
A	5
B	1
B	2

Can this be done dynamically? Like when start is 2, it starts in the second table from 2 as well.

OwenAuger · ‎09-29-2020

@DouweMeer

You can certainly do that in DAX, or in Power Query for that matter.

In DAX, the code would look something like this:

ModifiedTable = 
SELECTCOLUMNS ( 
    GENERATE ( 
        YourTable,
        GENERATESERIES ( YourTable[Start], YourTable[End] )
    ),
    "Id", YourTable[Id],
    "Level", [Value]
)

Alternatively, in Power Query I would do something similar, by adding a column containing a list of numbers {[Start]..[End]}, and expanding that to rows.

Regards,

Owen

Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
Twitter
LinkedIn

View solution in original post

v-alq-msft · ‎10-01-2020

Hi, @DouweMeer

Based on your description, I created data to reproduce your scenario. The pbix file is attached in the end.

Table:

In order not to modify the raw data, you may go to 'Query Editor', duplicate 'Table', paste the following m codes in 'Advanced Editor'.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WclTSUTIEYlOlWJ1oJScoz0gpNhYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Id = _t, Start = _t, End = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Id", type text}, {"Start", Int64.Type}, {"End", Int64.Type}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Level", each let s=[Start],e=[End] in
List.Generate(
    ()=>s,
    each _<e+1,
    each _+1
)),
    #"Expanded Custom" = Table.ExpandListColumn(#"Added Custom", "Level"),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Custom",{"Start", "End"})
in
    #"Removed Columns"

Result:

Best Regards

Allan

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

DouweMeer · ‎10-08-2020

@v-alq-msft

Your solution is in M, I requested it in DAX. If you find me a solution in DAX... only then you can accept your comment as a solution.

OwenAuger · ‎09-29-2020

@DouweMeer

You can certainly do that in DAX, or in Power Query for that matter.

In DAX, the code would look something like this:

ModifiedTable = 
SELECTCOLUMNS ( 
    GENERATE ( 
        YourTable,
        GENERATESERIES ( YourTable[Start], YourTable[End] )
    ),
    "Id", YourTable[Id],
    "Level", [Value]
)

Alternatively, in Power Query I would do something similar, by adding a column containing a list of numbers {[Start]..[End]}, and expanding that to rows.

Regards,

Owen

Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
Twitter
LinkedIn

DouweMeer · ‎09-29-2020

@OwenAuger

Thanks for the reply. I've tried it but I'm struggling with the context of the generateseries as it expects a scalar value and in its implicit context there is no reference to a record specific value. So I'm hoping the below does take into record level specific values into account.

So... that's like 7 million records vs approximally 365 records on generateseries... I'll comeback tomorrow to see if the results are satisfying.

OwenAuger · ‎10-08-2020

@DouweMeer just on the DAX code you posted: I believe line 7 should be changed to

GENERATESERIES ( [Start], [End], 1 )

This is because GENERATE iterates over the rows of the 1st argument (t1 in your case), and evaluates the table expression in the 2nd argument in that row context, so [Start] and [End] refer to the values on specifix rows of t1. Using MINX and MAXX the way you have would give overall max & min values which is not what you want.

Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
Twitter
LinkedIn

DouweMeer · ‎10-08-2020

@OwenAuger

It seems that I have to correct myself on the 'blank' argument error. It looks like, unexpectly, that my dataset contains blanks on the position of start or end. I'm giving it another try. It appears to struggle longer with the expression this time...

OwenAuger · ‎10-08-2020

HI again @DouweMeer

That's good to know that there are some blank Start/End values - then it's a case of handling them appropriately 🙂

Also in the code you posted just before it looked like you had STRUCT_LI_ID in place of Product Start - you may have fixed that already.

In any case, you could try the below code to eliminate blank Start/End values and see if that at least produces a result.

If needed you could tweak this to replace blank Start/End with an appropriate value.

Contract line Date Table =
VAR t1 =
    CALCULATETABLE (
        SELECTCOLUMNS (
            'Model N - Contract Line',
            "Line Id", 'Model N - Contract Line'[STRUCT_LI_ID],
            "Start", 'Model N - Contract Line'[Product Start],
            "End", 'Model N - Contract Line'[Product End]
        ),
        // Eliminate blank Start/End for now
        NOT ISBLANK ( 'Model N - Contract Line'[Product Start] ),
        NOT ISBLANK ( 'Model N - Contract Line'[Product End] )
    )
RETURN
    SELECTCOLUMNS (
        GENERATE ( t1, GENERATESERIES ( [Start], [End], 1 ) ),
        "Line Id", [Line Id],
        "Level", [Value]
    )

Regards,

Owen

Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
Twitter
LinkedIn

DouweMeer · ‎10-09-2020

That's just amazing....

I had to create a separate table as it ran out of my 32 GB memoryafter executing the expression quite quickly :).

One thing...

This...:

    CALCULATETABLE (
        SELECTCOLUMNS (
            'Model N - Contract Line',
            "Line Id", 'Model N - Contract Line'[STRUCT_LI_ID],
            "Start", 'Model N - Contract Line'[Product Start],
            "End", 'Model N - Contract Line'[Product End]
        ),
        // Eliminate blank Start/End for now
        NOT ISBLANK ( 'Model N - Contract Line'[Product Start] ),
        NOT ISBLANK ( 'Model N - Contract Line'[Product End] )
    )

Can be written as such (or is there a performance advantage not adding the filter in the table context of the selectcolumns?):

SELECTCOLUMNS (
	Filter ( 'Model N - Contract Line',
		NOT ISBLANK ( 'Model N - Contract Line'[Product Start] ),
		NOT ISBLANK ( 'Model N - Contract Line'[Product End] )	
		)
	"Line Id", 'Model N - Contract Line'[STRUCT_LI_ID],
	"Start", 'Model N - Contract Line'[Product Start],
	"End", 'Model N - Contract Line'[Product End]
	)

By the way...

Why do people feel like to put the comma in the end of the line rather in front? Any particulary reason for this that I'm not aware of?

OwenAuger · ‎10-09-2020

Glad the DAX logic works, even if it may not perform well enough on your full dataset. Perhaps this table could be constructed either with Power Query or further upstream.

On the question of whether to use FILTER or CALCULATETABLE:

Generally it is best to apply FILTER to tables of as small cardinality as possible. FILTER iterates row-by-row over the table supplied, and evaluates the 2nd argument for each row.

So this expression could be quite expensive, as it iterates over all rows of 'Model N - Contract Line':

FILTER( 'Model N - Contract Line', ... )

In the CALCULATETABLE version, the two expressions

NOT ISBLANK ( 'Model N - Contract Line'[Product Start] ),
NOT ISBLANK ( 'Model N - Contract Line'[Product End] )

are actually converted to

FILTER (
    ALL ( 'Model N - Contract Line'[Product Start] ),
    NOT ISBLANK ( 'Model N - Contract Line'[Product Start] )
),
FILTER (
    ALL ( 'Model N - Contract Line'[Product End] ),
    NOT ISBLANK ( 'Model N - Contract Line'[Product End] )
)

which results in two iterations: over all values of Product Start and all values of Product End, which I would guess might be a smaller total number of iterations than the entire table. However it would be worth testing performance with your actual data 🙂

This article covers the topic well:

https://www.sqlbi.com/articles/filter-vs-calculatetable-optimization-using-cardinality-estimation/

Regarding leading or trailing commas, I know that many follow the DAX Formatter conventions with trailing commas. However I certainly know some folks who prefer leading commas as well. I guess it comes down to preference.

Regards

Owen

Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
Twitter
LinkedIn

DouweMeer · ‎10-08-2020

@OwenAuger For what I see is that when I get up to the generate part...

Contract line Date Table = 
VAR t1 = SELECTCOLUMNS( 'Model N - Contract Line' , "Line Id" , 'Model N - Contract Line'[STRUCT_LI_ID] , "Start" , 'Model N - Contract Line'[STRUCT_LI_ID] , "End" , 'Model N - Contract Line'[Product End] )
RETURN
SELECTCOLUMNS(
     GENERATE(
          t1 
         , GENERATESERIES( [

within this context, no field values can be 'found'. The value in expected by it is scalar rather than a column.

As you can see, the [Start] and [End] is not picked up as the row level context is not recognized in this part of the expression.

Continuing on the expression...

Contract line Date Table = 
VAR t1 = SELECTCOLUMNS( 'Model N - Contract Line' , "Line Id" , 'Model N - Contract Line'[STRUCT_LI_ID] , "Start" , 'Model N - Contract Line'[STRUCT_LI_ID] , "End" , 'Model N - Contract Line'[Product End] )
RETURN
SELECTCOLUMNS(
     GENERATE(
          t1 
         , GENERATESERIES( [Start] , [End] , 1 )
         )
     , "Line Id" , [Line Id]
     , "Level" , [Value]
     )

Will return the error 'The arguments in GenerateSeries cannot be blank'. Somehow you have to specify it has to take the value from the record context of the first table in generate. With the current expression it doesn't seem to be possible.

Generate table on availability

Helpful resources

Fabric certifications survey

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly