Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
CoonPageDub
Frequent Visitor

Help with Calibration of Data Model

Good Evening Fellow Power BI'ers!

 

I need help with calibrating my data model, but I have no idea what I'm doing so all suggestions and advice are desperately needed! The purpose of this model is to utilize a vast amount of data I've been aggregating to help predict what commercial real estate buildings I should buy as investments. This model is going to use four tables (Building, HistoricalData, Demographics, and DateTable). We're solving to the Average Historical Vacancy % (the lower, the better). for each building.

 

I need help calibrating all these metrics as weights so I can understand which ones have most impact that will ultimately produce the Lowest Average Historical Vacancy %, in the future. This number is what we're solving to - the best buildings stay full with tenants, and those that could have a couple of defiiciences, will lose tenants in tough times. 

 

Here is an example of the Buildings Table. I have about 30,000 buildings across 8 cities. There are about 8 more columns than those shown below. 

 

ID_BuildingAddressBuilding SFLatitudeLongitudeSubmarketClassStarYear BuiltYear RenovatedHist Avg Vacancy %
10022101 Nancy St. 100,25095.325212-36.22351MidtownA4198019955.2%
552101000 Smith St.250,00095.325231-37.36985DowntownC12020-11.2%
632515 Broad St.50,00095.35125-36.25363UptownB2201620102.3%

 

Right now I'm grouping and categorizing each of these categories so I can assign number values to them, and then scoring based upon those groups on a scale of 1-100. The scores are given relatively arbitrarily so that I can at least get a working model. 

 

For example:

 

CLASS:

  • A = 100
  • B = 50
  • C = 0

STAR:

  • 5 = 100
  • 4 = 75
  • 3 = 50
  • 2 = 25
  • 1 = 0

 

YEAR BUILT:

  • 1925-1959 = 10
  • 1960-1972 = 20
  • 1973-1984 = 30
  • 1985-1996 = 50
  • 1997-2004 = 70
  • 2005-2011 = 80
  • 2012-2015 = 90
  • 2015-2020 = 100

 

I have the standard DateTable connected between the HistoricalData table to perform necessary time intelligence calculations.

 

We're also incorporating a tremendous amount of of demographrics into this model. For every building (~30,000), we have a about 30 columns of demographic information connected to each building.

 

Below is a sample DemographicsTable:

 

ID_BuildingEnt-Cons Spdng - 2020 (1m)Total Emp(1m)2000 Househlds(1m)% HH Grwth 20-25(1m)'20 Families(1m)'25 Avg HH Inc(2m)'25 Popltion(5m)'20 Pop Bach Deg(5m)2020 Pop Wrk Trav Time 60+ Min(10m)
55411 $        1,145,244           9,500           2,5085.31%         2,444           69,679         106,362           6,806           7,197
90025 $        2,058,049           8,895           6,8472.07%         5,259           56,710         292,507         14,767         40,585
11252 $        5,838,175         20,156         15,3303.79%         5,325         129,532         490,592         96,072         47,449
25877 $        1,585,805           4,684           4,3628.30%         3,166         125,965         138,150         24,094         20,327

 

I'm performing a similar scoring method for the demographics tables, but using the PERCENTILEX.INC function. For buildings in the top 5%, get a 95 score, top 10%, get a 90 score, top 25%, a 75 score and so on.

 

I then take all the scores from the building and demographics tables, and add those together. The higher the final score, the better, and also should have the Lowest Average Historical Vacancy %.

 

I've created parameters for each of the metrics, so I can weight them from 0-1 so I can calibrate or fine tune each of the metrics. Theoretically, once these scores and weights are "calibrated" the end result is the list of buildings that have the highest final score that also has the Lowest Average Historical Vacancy %.

 

 

Lastly, below is a sample HistoricalDataTable:

 

IDID_BuildingMonthRecordedBuilding SFVacant SFVacancy %
1225463554117/1/2020152,00012,2258.04%
5874534554114/1/2020152,00015,22510.02%
5542335554111/1/2020152,00015,22510.02%
9987451900257/1/2020540,000142,32126.36%
858412112527/1/2020325,1005,0000.02%

Note: ID_Building is the 1-M relationship to the Building table

 

This table has 2-3 million rows of information. There is a historical occupancy for every building, on a quarterly basis since 2000. I can calculate the current vacancy % by dividing the Vacant SF by the Building SF - only for the most recent date (using the LASTDATE function). I also will be calculating the average historical vacancy since 2000 by iterating the previously explained calculation all the way back to 2000.

 

So please let me know if you have any ideas on what I should do next, because again, I have no idea what I'm doing and definitely need the help on this very important, and exciting project!

 

Thanks!

2 REPLIES 2
harshnathani
Community Champion
Community Champion

Hi @CoonPageDub ,

 

Common steps of a project.

First 2 are very important

 

1. Decide your goals.

2. Decide what you want to build and understand who will be your audience.

3. Then get your data , enrich,

4 Find Insights

5. Deliver

 

Iterate, Iterate and Iterate

 

Have a look at samples of your project.

 

https://www.youtube.com/watch?v=DEzbf1Q1b4k

https://www.youtube.com/watch?v=0ZV0kbObHf0

 

See what you want to build and then build your data model accordingly.

 


Regards,

Harsh Nathani


Appreciate with a Kudos!! (Click the Thumbs Up Button)

Did I answer your question? Mark my post as a solution!

I'm with you and thanks for the reply. 

1. the goal is simple: to build a data model that predicts the future.

2. Audience is me. I want to build a model that, upon entering a set of parameters, evaluates the buildings we want to buy. The output is a decision from the model that will have the highest probability of the lowest average vancancy over the next five years. 
3. I have all the data- mentioned below.

4. Some insights are being found, but I don't know how to program the model to iterate the weights ands measures. How do I do that?

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.