Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Good Evening Fellow Power BI'ers!
I need help with calibrating my data model, but I have no idea what I'm doing so all suggestions and advice are desperately needed! The purpose of this model is to utilize a vast amount of data I've been aggregating to help predict what commercial real estate buildings I should buy as investments. This model is going to use four tables (Building, HistoricalData, Demographics, and DateTable). We're solving to the Average Historical Vacancy % (the lower, the better). for each building.
I need help calibrating all these metrics as weights so I can understand which ones have most impact that will ultimately produce the Lowest Average Historical Vacancy %, in the future. This number is what we're solving to - the best buildings stay full with tenants, and those that could have a couple of defiiciences, will lose tenants in tough times.
Here is an example of the Buildings Table. I have about 30,000 buildings across 8 cities. There are about 8 more columns than those shown below.
ID_Building | Address | Building SF | Latitude | Longitude | Submarket | Class | Star | Year Built | Year Renovated | Hist Avg Vacancy % |
10022 | 101 Nancy St. | 100,250 | 95.325212 | -36.22351 | Midtown | A | 4 | 1980 | 1995 | 5.2% |
55210 | 1000 Smith St. | 250,000 | 95.325231 | -37.36985 | Downtown | C | 1 | 2020 | - | 11.2% |
63251 | 5 Broad St. | 50,000 | 95.35125 | -36.25363 | Uptown | B | 2 | 2016 | 2010 | 2.3% |
Right now I'm grouping and categorizing each of these categories so I can assign number values to them, and then scoring based upon those groups on a scale of 1-100. The scores are given relatively arbitrarily so that I can at least get a working model.
For example:
CLASS:
STAR:
YEAR BUILT:
I have the standard DateTable connected between the HistoricalData table to perform necessary time intelligence calculations.
We're also incorporating a tremendous amount of of demographrics into this model. For every building (~30,000), we have a about 30 columns of demographic information connected to each building.
Below is a sample DemographicsTable:
ID_Building | Ent-Cons Spdng - 2020 (1m) | Total Emp(1m) | 2000 Househlds(1m) | % HH Grwth 20-25(1m) | '20 Families(1m) | '25 Avg HH Inc(2m) | '25 Popltion(5m) | '20 Pop Bach Deg(5m) | 2020 Pop Wrk Trav Time 60+ Min(10m) |
55411 | $ 1,145,244 | 9,500 | 2,508 | 5.31% | 2,444 | 69,679 | 106,362 | 6,806 | 7,197 |
90025 | $ 2,058,049 | 8,895 | 6,847 | 2.07% | 5,259 | 56,710 | 292,507 | 14,767 | 40,585 |
11252 | $ 5,838,175 | 20,156 | 15,330 | 3.79% | 5,325 | 129,532 | 490,592 | 96,072 | 47,449 |
25877 | $ 1,585,805 | 4,684 | 4,362 | 8.30% | 3,166 | 125,965 | 138,150 | 24,094 | 20,327 |
I'm performing a similar scoring method for the demographics tables, but using the PERCENTILEX.INC function. For buildings in the top 5%, get a 95 score, top 10%, get a 90 score, top 25%, a 75 score and so on.
I then take all the scores from the building and demographics tables, and add those together. The higher the final score, the better, and also should have the Lowest Average Historical Vacancy %.
I've created parameters for each of the metrics, so I can weight them from 0-1 so I can calibrate or fine tune each of the metrics. Theoretically, once these scores and weights are "calibrated" the end result is the list of buildings that have the highest final score that also has the Lowest Average Historical Vacancy %.
Lastly, below is a sample HistoricalDataTable:
ID | ID_Building | MonthRecorded | Building SF | Vacant SF | Vacancy % |
1225463 | 55411 | 7/1/2020 | 152,000 | 12,225 | 8.04% |
5874534 | 55411 | 4/1/2020 | 152,000 | 15,225 | 10.02% |
5542335 | 55411 | 1/1/2020 | 152,000 | 15,225 | 10.02% |
9987451 | 90025 | 7/1/2020 | 540,000 | 142,321 | 26.36% |
858412 | 11252 | 7/1/2020 | 325,100 | 5,000 | 0.02% |
Note: ID_Building is the 1-M relationship to the Building table
This table has 2-3 million rows of information. There is a historical occupancy for every building, on a quarterly basis since 2000. I can calculate the current vacancy % by dividing the Vacant SF by the Building SF - only for the most recent date (using the LASTDATE function). I also will be calculating the average historical vacancy since 2000 by iterating the previously explained calculation all the way back to 2000.
So please let me know if you have any ideas on what I should do next, because again, I have no idea what I'm doing and definitely need the help on this very important, and exciting project!
Thanks!
Hi @CoonPageDub ,
Common steps of a project.
First 2 are very important
1. Decide your goals.
2. Decide what you want to build and understand who will be your audience.
3. Then get your data , enrich,
4 Find Insights
5. Deliver
Iterate, Iterate and Iterate
Have a look at samples of your project.
https://www.youtube.com/watch?v=DEzbf1Q1b4k
https://www.youtube.com/watch?v=0ZV0kbObHf0
See what you want to build and then build your data model accordingly.
Regards,
Harsh Nathani
Appreciate with a Kudos!! (Click the Thumbs Up Button)
Did I answer your question? Mark my post as a solution!
I'm with you and thanks for the reply.
1. the goal is simple: to build a data model that predicts the future.
2. Audience is me. I want to build a model that, upon entering a set of parameters, evaluates the buildings we want to buy. The output is a decision from the model that will have the highest probability of the lowest average vancancy over the next five years.
3. I have all the data- mentioned below.
4. Some insights are being found, but I don't know how to program the model to iterate the weights ands measures. How do I do that?
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
113 | |
100 | |
78 | |
76 | |
52 |
User | Count |
---|---|
144 | |
109 | |
108 | |
88 | |
61 |