cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
siljamardla
New Member

Best data model for project sales and budget with two granularities (customer and customer category)

I am trying to build a solid data model to support the budget vs sales analysis. 

 

Sales is project based, it has to be possible to slice by project. Sales facts do not store customer information, this comes only through project. Budget exists with two different granularities: all the customer categories have a budget, but also some customers have their own budget. Sum of customer budgets <= customer category budget. 

It has to be possible to slice both sales and budget by customer and customer category. 

 

The data model currently looks like this. I have drawn up all the possible relations (this would not be possible in PBI as some of them create ambiguity).

 

siljamardla_1-1659739339725.png

So, how to turn this into a perfect star schema? If perfect can't be done then what could be the preferred way to go?

Anything can be changed in this data model as long as the existing data is used.

 

Problem 1. Sales, Project and Customer are currently in snowflake, not a star schema.

I could copy the CustomerKey to Sales table. Is this a good or bad idea?

siljamardla_2-1659739410470.png

+ Project and Customer would be independent dimensions.

- It would not be obvious (without documentation) that the current business logic dictates that sales can't have customers other than the project customer. 

- There would be a redundant column in Sales table. 

+ Sales, Project and Customer would form a star schema. 

+ One less join in PBI, better performance in PBI.


Problem 2. How to slice the sales and budget both by Customer and by Customer Category?
Option 1. Make Customer and Customer Category independent dimensions (similar to Problem 1)

siljamardla_3-1659739449025.png

+ Star schema

- Not obvious that Customer Category is an attribute of Customer

Option 2. Create an extra table that somehow combines Customers and CustomerCategories, give it a Key column and calculate suitable Keys into the budget table (rephrasing, create dummy rows in to Customer table for each Classifier row so that the dummy rows could be connected to the rows in Budget table that do not have specific customer).

siljamardla_4-1659739482840.png

- Messy, extra work
+ Star schema

- There will be blank customer values in vDimCustomerAndClassifier

- Customer and Classifier relation not obvious without documentation


Any other options? Any other considerations?

Code to recreate the original data model in dbdiagram.io

 

 

// Creating tables

Table vSales as SALE {
 ProjectKey int
 TransactionDate date
 ProjectRevenueEurAmt numeric
 }

Table vBudget as BUD {
 BudgetDate date
 CustomerKey int
 ClassifierCustomerKey int
 PlannedRevenueEurAmt numeric
 }

Table vDimProject as DP {
  Key int [pk]
  ProjectNumber varchar
  ProjectName varchar
  CustomerKey int
 }
 
Table vDimCustomer as DC {
  Key int [pk]
  Name varchar
  Category varchar
  Industry varchar
  ClassifierCustomerKey int
 }
 
Table vClassifierCustomer as CC {
  Key int [pk]
  Industry varchar
  Category varchar
 }
 
Table vDimDate as DD {
  Key int [pk]
  Date date
 }

// Creating references
// You can also define relaionship separately
// > many-to-one; < one-to-many; - one-to-one
Ref: SALE.ProjectKey > DP.Key  
Ref: DP.CustomerKey > DC.Key
Ref: SALE.TransactionDate > DD.Date
Ref: BUD.BudgetDate > DD.Date
Ref: BUD.CustomerKey > DC.Key
Ref: BUD.ClassifierCustomerKey > CC.Key
Ref: DC.ClassifierCustomerKey > CC.Key

 

 

Option 1

 

 

// Creating tables

Table vSales as SALE {
ProjectKey int
CustomerKey int
ClassifierCustomerKey int
TransactionDate date
ProjectRevenueEurAmt numeric
}

Table vBudget as BUD {
BudgetDate date
CustomerKey int
ClassifierCustomerKey int
PlannedRevenueEurAmt numeric
}

Table vDimProject as DP {
Key int [pk]
ProjectNumber varchar
ProjectName varchar
CustomerKey int
}

Table vDimCustomer as DC {
Key int [pk]
Name varchar
Category varchar
Industry varchar
ClassifierCustomerKey int
}

Table vClassifierCustomer as CC {
Key int [pk]
Industry varchar
Category varchar
}

Table vDimDate as DD {
Key int [pk]
Date date
}

// Creating references
// You can also define relaionship separately
// > many-to-one; < one-to-many; - one-to-one
Ref: SALE.ProjectKey > DP.Key
Ref: SALE.CustomerKey > DC.Key
Ref: SALE.TransactionDate > DD.Date
Ref: BUD.BudgetDate > DD.Date
Ref: BUD.CustomerKey > DC.Key
Ref: BUD.ClassifierCustomerKey > CC.Key
Ref: SALE.ClassifierCustomerKey > CC.Key

 

 

Option 2

 

 

Table vSales as SALE {
ProjectKey int
CustomerAndClassifierKey int
TransactionDate date
ProjectRevenueEurAmt numeric
}

Table vBudget as BUD {
BudgetDate date
CustomerAndClassifierKey int
PlannedRevenueEurAmt numeric
}

Table vDimProject as DP {
Key int [pk]
ProjectNumber varchar
ProjectName varchar
CustomerKey int
}

Table vDimCustomerAndClassifier as CAC {
Key int [pk]
Name varchar
Category varchar
Industry varchar
CustomerAndClassifierKey int
}

Table vDimDate as DD {
Key int [pk]
Date date
}

// Creating references
// You can also define relaionship separately
// > many-to-one; < one-to-many; - one-to-one
Ref: SALE.ProjectKey > DP.Key
Ref: SALE.CustomerAndClassifierKey > CAC.Key
Ref: SALE.TransactionDate > DD.Date
Ref: BUD.BudgetDate > DD.Date
Ref: BUD.CustomerAndClassifierKey > CAC.Key

 

 

3 REPLIES 3
lbendlin
Super User
Super User

I would go with your very first version and deal with the ambiguities via inactive relationships and USERELATIONSHIP

And what would be the reasoning?

It's the closest you can get to a snowflake schema, and it seems to match your business process best.

Helpful resources

Announcements
November 2022 Update

Check it Out!

Click here to read more about the November 2022 updates!

Microsoft 365 Conference â__ December 6-8, 2022

Microsoft 365 Conference - 06-08 December

Join us in Las Vegas to experience community, incredible learning opportunities, and connections that will help grow skills, know-how, and more.