Solved: Very Slow Nested Iterator

rks · ‎10-19-2020

Hi Community.

I have the following scenario:

* I have a product table containing 15 million products

* I have a stock table (a billion or so rows) with a relationship to the product

* The stock table also contains other references such as partno, storage location, etc.

Now, a product could also be a big cupboard consisting of several pieces. Each piece has got a partno and each partno has a quantity.

Let's say I have a wooden-cupboard with 3 parts.

Product table: 1 row with attributes such as itemno, description, category and so on

Stock Table: one row per part, aggregated over partno it would look like:
Partno 1: 5 rows

Partno 2: 5 rows

Partno3: 6 rows

The business wants to know: how many products can I sell given my current stock situation. The answer in this case: I can sell the wooden-cupboard 5 times.

I have authored a measure which produces the correct results:

SUMX (
                'Filter Product',
                MINX (
                    VALUES ( 'Filter Parts'[PartNo] ),
                    CALCULATE ( COUNTROWS ( 'Stock' ) )
                )
            )

The result is a number which represents the number of "complete" items on stock.

However, it's slow, very slow; it runs 2 minutes to produce the results which most of the load in FE ( around 80%). How can I move the payload to SE?

Thank you and best regards

Konstantin

Anonymous · ‎10-21-2020

Your formula will be faster if you create an aggregation table that will squeeze the 'Stock' table so that you don't have to count rows for PartNo in CALCULATE ( COUNTROWS ('Stock') ) but use the number of rows directly. So, you'd have a table aggStock which would be created by appropriately agrouping rows and adding one more column that would tell you the number of rows in the grouping (aggStock[PartCount]). Then the measure would be:

sumx(
    'Products',
    calculate( minx( 'aggStock'[PartCount] ) )
)

In aggStock you'd have the same columns as in Stock but it would not be as granular, so that you don't have to count the rows. This is the first speed-up that comes to my mind...

View solution in original post

Anonymous · ‎10-21-2020

Your formula will be faster if you create an aggregation table that will squeeze the 'Stock' table so that you don't have to count rows for PartNo in CALCULATE ( COUNTROWS ('Stock') ) but use the number of rows directly. So, you'd have a table aggStock which would be created by appropriately agrouping rows and adding one more column that would tell you the number of rows in the grouping (aggStock[PartCount]). Then the measure would be:

sumx(
    'Products',
    calculate( minx( 'aggStock'[PartCount] ) )
)

In aggStock you'd have the same columns as in Stock but it would not be as granular, so that you don't have to count the rows. This is the first speed-up that comes to my mind...

rks · ‎10-26-2020

I accept the change in a data model as a solution. Obviously the nested iterator in itself is not to be optimized...

Anonymous · ‎10-26-2020

By the way, you don't have to use MINX. You can do:

sumx(
    'Products',
    calculate( min( 'aggStock'[PartCount] ) )
)

However, under the hood MIN is always MINX:

min( T[Col] ) = minx( T, T[Col] )

Anonymous · ‎10-26-2020

Can you tell us what kind of performance boost you've got from the suggestion? How have the runtimes changed?

rks · ‎10-20-2020

@Anonymous @AlB Thank you for looking into this and the feedback for more details. I have created a quick sample file with this simplified data model:

* Product contains the product's master data with about 13 million rows in the original model

* PartAgg is an aggregated table. In the real scenario each item on stock has a distinctive ID, the model would grow quickly to several hundred million rows in a dimension without aggregating. Of interest is the partNo and one or two other filterable columns. It contains around 200 rows.

* StorageLocation depicts the physical location of each item. The is just to illustrate that there are several other dimensions on a higher grain related to Stock (~5 million rows)

* Stock is a snapshot once per day with a snapshot date. The fact table contains up to a billion rows

Download here: http://s000.tinyupload.com/?file_id=57163484993199174102

AlB · ‎10-20-2020

@rks

You cannot upload the file here directly. You have to share the URL to the file hosted elsewhere: Dropbox, Onedrive... or just upload the file to a site like tinyupload.com (no sign-up required).

Please mark the question solved when done and consider giving a thumbs up if posts are helpful.

Contact me privately for support with any larger-scale BI needs, tutoring, etc.

Cheers

Anonymous · ‎10-19-2020

You should give more details. Especially post a picture of the model itself.

AlB · ‎10-19-2020

Hi @rks

You talk about two tables but your code refers to three: 'Filter Product', 'Filter Parts' and 'Stock'. Can you explain a bit more and show the relationships between the tables (what fields are involved)?

Please mark the question solved when done and consider giving kudos if posts are helpful.

Contact me privately for support with any larger-scale BI needs, tutoring, etc.

Cheers

Very Slow Nested Iterator

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024