Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
stephane513
Regular Visitor

Linking on aggregated columns or lookupvalue

Hello

I have been developping a lot of pretty large datasets with tables typically containing 20M+ rows

I often need to reconcile data between tables

 

For example I have an item master with columns like owner / source / part number and others

And I have a pricing table with owner / source / part and others

 

Two methods are currently working for me

1- Create a calculated column in DAX to aggregate the three columns. Then make a relationship between the two tables and use related function

2- Use the lookupvalue in expression

 

My opinion is that option 1 is better for performance. When having many lookupvalue the refresh time is taking longer and can fail due to memory constraints. I upgraded from 16Gb to 32 Gb to 64 Gb and I don't want to go on. On the service however it is rrefreshing ok in less than 40 minutes.

 

Drawback with option 1 is the size of the pbix

I did some tests. My original pbix is 1Gb. One of the table is containing 23 M rows and 40 columns.

As soon as I create a key owner&source&part then the size goes to 1.5 Gb. When I create a second key on the other table it would go to 1.9 Gb.

Loading to Service is not really the issue since I am using parameters or ALM Toolkit but still I don't want to end up with a pbix of 10 Gb

 

Advantage of the option 2 is that the size of pbox remains quite reasonnable

But when having a lot of lookupvalues the refresh would fail due to memory issues

 

On the service once loaded the refresh time is quite similar from what I have seen which is a little surprising

 

I would appreciate some guidance from any of you as I believe many people must have the same headache

 

Thanks in advance

 

Stéphane

3 REPLIES 3
lbendlin
Super User
Super User

Needs more details.  Are you on shared capacity or Premium? If Premium, which SKU? Gen1 or Gen2 ?

 

My opinion is that option 1 is better for performance

I would agree.  Beware though - linking via high cardinality columns can be extremely costly, especially in composite data models.

Thanks for your reply

Not sure to be able to answer your question. I was referring to my pbix on Power BI Desktop which I afterwards load to the service.

 

I am clearly not doing linking on high cardinality for composite model as it doesn't work

 

if you do this only on your desktop then you are limited by the available memory, disk space, and processing power on your PC.  Power BI uses the Vertipaq engine which is a columnar data store with in-column compression.  The final size of your PBIX will depend on the compressibility of the source data.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.