Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Kashuo
Helper I
Helper I

Joins are creating MILLIONS of rows to process...why?

Hi,

 

The way I'm joining and merging my data is creating millions of rows to process and it takes hours to compute anything. There must be a better way!

 

My data is from the real estate development industry.

 

  • I have 3 tables: Property, Floor, and Zone.
  • Properties have multiple Floors, and Floors have multiple Zones. (In other words: Each Zone row has a Floor ID field of the Floor it belongs to, and each Floor row has a Property ID field of the Property it belongs to).
  • The three tables are joined on their respective IDs.
  • Each Zone has a field called Opening Date, which is the date we finish construction on that Zone (a portion of a Floor).
  • It's easy to filter a table visualization of Zones by Zone Opening Date.
  • In order to filter a table visualization of Floors by Earliest Floor Opening Date, I created a separate table (merge as new) that merged Zones and Floors (on Floor ID) and then aggregated the Opening Date column of the Zones table to show only "Min" of Opening Date. Then I merged that table with the Floors table so that each row of the Floors table has the "Earliest Floor Opening Date” for that floor.
  • With this data structure in place, I'm able to add a range slicer to the page for Earliest Floor Opening Date and it will filter out Floors from a table visualization of the floors table have have an Earliest Floor Opening Date outside my range slicer. The same slicer works for the areas.
  • However, this filter will not filter out the Properties. Therefore, I have done the same process to find the Earliest Property Opening Date for each row of the Property Table.
  • I created a separate table (merge as new) that merged Floors and Properties this time. But when I try to then aggregate the Earliest Floor Opening Date column to create an "Earliest Property Opening Date" column, I run into trouble.
  • This technically works: the area date slicer filters the areas, the floor date slicer filters floors and areas, and the property date slicer filters all three.
  • The problem is that when I refresh my data source or apply any changes from the query editor, the update dialogue box takes FOREVER as it updates the more than TWELVE MILLION rows of the table of Property Opening Dates.
  • Is there a better way for me to do this without it taking so long?

 2017-06-14 11_31_39-Greenshot image editor.png

 

1 ACCEPTED SOLUTION
v-jiascu-msft
Employee
Employee

Hi @Kashuo,

 

Any changes in one query (table) will be applied to the related merged query. That’s why it takes so much time.  Adding a calculated column of the earliest date to the table respectively would be a good practice.

      1. In order to simply the model, it keeps the essential columns.

      2. The relationships are clear.

      3. Add a column in table “Floor” with this formula.

Earlist_Opening_Date_Each_Floor = calculate(min('Zone'[opening_date]))

      4. Add a column in table “Property” with this formula.

Earliest_Date_Each_Porperty = CALCULATE(min('Floor'[Earlist_Opening_Date_Each_Floor]))

 Filtering by date takes many hours to compute .jpg

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Best Regards!
Dale

 

Community Support Team _ Dale
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

6 REPLIES 6
v-jiascu-msft
Employee
Employee

Hi @Kashuo,

 

Any changes in one query (table) will be applied to the related merged query. That’s why it takes so much time.  Adding a calculated column of the earliest date to the table respectively would be a good practice.

      1. In order to simply the model, it keeps the essential columns.

      2. The relationships are clear.

      3. Add a column in table “Floor” with this formula.

Earlist_Opening_Date_Each_Floor = calculate(min('Zone'[opening_date]))

      4. Add a column in table “Property” with this formula.

Earliest_Date_Each_Porperty = CALCULATE(min('Floor'[Earlist_Opening_Date_Each_Floor]))

 Filtering by date takes many hours to compute .jpg

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Best Regards!
Dale

 

Community Support Team _ Dale
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hello again,

 

This worked for a while, but no longer.

 

Now I'm getting Circular Dependency Errors when I use Calulate. If I take out the "calculate" then the circular dependency error goes away, but then the min gives me an absolute minumum (the first date ever), rather than the earliest date per floor.

 

If I use a measure, then the tabular view gives me the earliest date per floor, but then I can't use Measures in the date slicers at the bottom of my sheet.

 

Any idea how to remove the Circular Dependency? Other than using a measure like this thread suggests?

Hi @Kashuo,

 

Could you please open a new thread? Because the new issue isn't related to this topic, which would be hard for others to search solutions in this community.

 

Could you please provide more information in the new thread?

1. A little sample;

2. The formula you already tried to use;

3. The result you want to get from this formula. (maybe we can create a new one.)

 

Best Regards!

Dale

Community Support Team _ Dale
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

This worked perfectly, thank you so much!

Kashuo
Helper I
Helper I

Hi,

 

My data is from the real estate development industry.

 

  • I have 3 tables: Property, Floor, and Zone.
  • Properties have multiple Floors, and Floors have multiple Zones. (In other words: Each Zone row has a Floor ID field of the Floor it belongs to, and each Floor row has a Property ID field of the Property it belongs to).
  • The three tables are joined on their respective IDs.
  • Each Zone has a field called Opening Date, which is the date we finish construction on that Zone (a portion of a Floor).
  • It's easy to filter a table visualization of Zones by Zone Opening Date.
  • In order to filter a table visualization of Floors by Earliest Floor Opening Date, I created a separate table (merge as new) that merged Zones and Floors (on Floor ID) and then aggregated the Opening Date column of the Zones table to show only "Min" of Opening Date. Then I merged that table with the Floors table so that each row of the Floors table has the "Earliest Floor Opening Date” for that floor.
  • With this data structure in place, I'm able to add a range slicer to the page for Earliest Floor Opening Date and it will filter out Floors from a table visualization of the floors table have have an Earliest Floor Opening Date outside my range slicer. The same slicer works for the areas.
  • However, this filter will not filter out the Properties. Therefore, I have done the same process to find the Earliest Property Opening Date for each row of the Property Table.
  • I created a separate table (merge as new) that merged Floors and Properties this time. But when I try to then aggregate the Earliest Floor Opening Date column to create an "Earliest Property Opening Date" column, I run into trouble.
  • This technically works: the area date slicer filters the areas, the floor date slicer filters floors and areas, and the property date slicer filters all three.
  • The problem is that when I refresh my data source or apply any changes from the query editor, the update dialogue box takes FOREVER as it updates the more than TWELVE MILLION rows of the table of Property Opening Dates. (None of the tables have anywhere near this many rows, bnut that's what the dialogue box says it's processing through).
  • Is there a better way for me to do this without it taking so long?

2017-06-14 11_31_39-Greenshot image editor.png

Hi @Kashuo

 

Any changes in one query (table) will be applied to the related merged query. That’s why it takes so much time.  Adding a calculated column of the earliest date to the table respectively would be a good practice.

      1. In order to simply the model, it keeps the essential columns.

      2. The relationships are clear.

      3. Add a column in table “Floor” with this formula.

Earlist_Opening_Date_Each_Floor = calculate(min('Zone'[opening_date]))

      4. Add a column in table “Property” with this formula.

Earliest_Date_Each_Porperty = CALCULATE(min('Floor'[Earlist_Opening_Date_Each_Floor]))

Filtering by date takes many hours to compute .jpg

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Best Regards!
Dale

 

Community Support Team _ Dale
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors