Help with Calibration of Data Model

CoonPageDub · ‎07-16-2020

Good Evening Fellow Power BI'ers!

I need help with calibrating my data model, but I have no idea what I'm doing so all suggestions and advice are desperately needed! The purpose of this model is to utilize a vast amount of data I've been aggregating to help predict what commercial real estate buildings I should buy as investments. This model is going to use four tables (Building, HistoricalData, Demographics, and DateTable). We're solving to the Average Historical Vacancy % (the lower, the better). for each building.

I need help calibrating all these metrics as weights so I can understand which ones have most impact that will ultimately produce the Lowest Average Historical Vacancy %, in the future. This number is what we're solving to - the best buildings stay full with tenants, and those that could have a couple of defiiciences, will lose tenants in tough times.

Here is an example of the Buildings Table. I have about 30,000 buildings across 8 cities. There are about 8 more columns than those shown below.

ID_Building	Address	Building SF	Latitude	Longitude	Submarket	Class	Star	Year Built	Year Renovated	Hist Avg Vacancy %
10022	101 Nancy St.	100,250	95.325212	-36.22351	Midtown	A	4	1980	1995	5.2%
55210	1000 Smith St.	250,000	95.325231	-37.36985	Downtown	C	1	2020	-	11.2%
63251	5 Broad St.	50,000	95.35125	-36.25363	Uptown	B	2	2016	2010	2.3%

Right now I'm grouping and categorizing each of these categories so I can assign number values to them, and then scoring based upon those groups on a scale of 1-100. The scores are given relatively arbitrarily so that I can at least get a working model.

For example:

CLASS:

A = 100
B = 50
C = 0

STAR:

5 = 100
4 = 75
3 = 50
2 = 25
1 = 0

YEAR BUILT:

1925-1959 = 10
1960-1972 = 20
1973-1984 = 30
1985-1996 = 50
1997-2004 = 70
2005-2011 = 80
2012-2015 = 90
2015-2020 = 100

I have the standard DateTable connected between the HistoricalData table to perform necessary time intelligence calculations.

We're also incorporating a tremendous amount of of demographrics into this model. For every building (~30,000), we have a about 30 columns of demographic information connected to each building.

Below is a sample DemographicsTable:

ID_Building	Ent-Cons Spdng - 2020 (1m)	Total Emp(1m)	2000 Househlds(1m)	% HH Grwth 20-25(1m)	'20 Families(1m)	'25 Avg HH Inc(2m)	'25 Popltion(5m)	'20 Pop Bach Deg(5m)	2020 Pop Wrk Trav Time 60+ Min(10m)
55411	$ 1,145,244	9,500	2,508	5.31%	2,444	69,679	106,362	6,806	7,197
90025	$ 2,058,049	8,895	6,847	2.07%	5,259	56,710	292,507	14,767	40,585
11252	$ 5,838,175	20,156	15,330	3.79%	5,325	129,532	490,592	96,072	47,449
25877	$ 1,585,805	4,684	4,362	8.30%	3,166	125,965	138,150	24,094	20,327

I'm performing a similar scoring method for the demographics tables, but using the PERCENTILEX.INC function. For buildings in the top 5%, get a 95 score, top 10%, get a 90 score, top 25%, a 75 score and so on.

I then take all the scores from the building and demographics tables, and add those together. The higher the final score, the better, and also should have the Lowest Average Historical Vacancy %.

I've created parameters for each of the metrics, so I can weight them from 0-1 so I can calibrate or fine tune each of the metrics. Theoretically, once these scores and weights are "calibrated" the end result is the list of buildings that have the highest final score that also has the Lowest Average Historical Vacancy %.

Lastly, below is a sample HistoricalDataTable:

ID	ID_Building	MonthRecorded	Building SF	Vacant SF	Vacancy %
1225463	55411	7/1/2020	152,000	12,225	8.04%
5874534	55411	4/1/2020	152,000	15,225	10.02%
5542335	55411	1/1/2020	152,000	15,225	10.02%
9987451	90025	7/1/2020	540,000	142,321	26.36%
858412	11252	7/1/2020	325,100	5,000	0.02%

Note: ID_Building is the 1-M relationship to the Building table

This table has 2-3 million rows of information. There is a historical occupancy for every building, on a quarterly basis since 2000. I can calculate the current vacancy % by dividing the Vacant SF by the Building SF - only for the most recent date (using the LASTDATE function). I also will be calculating the average historical vacancy since 2000 by iterating the previously explained calculation all the way back to 2000.

So please let me know if you have any ideas on what I should do next, because again, I have no idea what I'm doing and definitely need the help on this very important, and exciting project!

Thanks!

harshnathani · ‎07-16-2020

Hi @CoonPageDub ,

Common steps of a project.

First 2 are very important

1. Decide your goals.

2. Decide what you want to build and understand who will be your audience.

3. Then get your data , enrich,

4 Find Insights

5. Deliver

Iterate, Iterate and Iterate

Have a look at samples of your project.

https://www.youtube.com/watch?v=DEzbf1Q1b4k

https://www.youtube.com/watch?v=0ZV0kbObHf0

See what you want to build and then build your data model accordingly.

Regards,

Harsh Nathani

Appreciate with a Kudos!! (Click the Thumbs Up Button)

Did I answer your question? Mark my post as a solution!

CoonPageDub · ‎07-16-2020

I'm with you and thanks for the reply.

1. the goal is simple: to build a data model that predicts the future.

2. Audience is me. I want to build a model that, upon entering a set of parameters, evaluates the buildings we want to buy. The output is a decision from the model that will have the highest probability of the lowest average vancancy over the next five years.
3. I have all the data- mentioned below.

4. Some insights are being found, but I don't know how to program the model to iterate the weights ands measures. How do I do that?

Help with Calibration of Data Model

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly