Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
siljamardla
New Member

Best data model for project sales and budget with two granularities (customer and customer category)

I am trying to build a solid data model to support the budget vs sales analysis. 

 

Sales is project based, it has to be possible to slice by project. Sales facts do not store customer information, this comes only through project. Budget exists with two different granularities: all the customer categories have a budget, but also some customers have their own budget. Sum of customer budgets <= customer category budget. 

It has to be possible to slice both sales and budget by customer and customer category. 

 

The data model currently looks like this. I have drawn up all the possible relations (this would not be possible in PBI as some of them create ambiguity).

 

siljamardla_1-1659739339725.png

So, how to turn this into a perfect star schema? If perfect can't be done then what could be the preferred way to go?

Anything can be changed in this data model as long as the existing data is used.

 

Problem 1. Sales, Project and Customer are currently in snowflake, not a star schema.

I could copy the CustomerKey to Sales table. Is this a good or bad idea?

siljamardla_2-1659739410470.png

+ Project and Customer would be independent dimensions.

- It would not be obvious (without documentation) that the current business logic dictates that sales can't have customers other than the project customer. 

- There would be a redundant column in Sales table. 

+ Sales, Project and Customer would form a star schema. 

+ One less join in PBI, better performance in PBI.


Problem 2. How to slice the sales and budget both by Customer and by Customer Category?
Option 1. Make Customer and Customer Category independent dimensions (similar to Problem 1)

siljamardla_3-1659739449025.png

+ Star schema

- Not obvious that Customer Category is an attribute of Customer

Option 2. Create an extra table that somehow combines Customers and CustomerCategories, give it a Key column and calculate suitable Keys into the budget table (rephrasing, create dummy rows in to Customer table for each Classifier row so that the dummy rows could be connected to the rows in Budget table that do not have specific customer).

siljamardla_4-1659739482840.png

- Messy, extra work
+ Star schema

- There will be blank customer values in vDimCustomerAndClassifier

- Customer and Classifier relation not obvious without documentation


Any other options? Any other considerations?

Code to recreate the original data model in dbdiagram.io

 

 

// Creating tables

Table vSales as SALE {
 ProjectKey int
 TransactionDate date
 ProjectRevenueEurAmt numeric
 }

Table vBudget as BUD {
 BudgetDate date
 CustomerKey int
 ClassifierCustomerKey int
 PlannedRevenueEurAmt numeric
 }

Table vDimProject as DP {
  Key int [pk]
  ProjectNumber varchar
  ProjectName varchar
  CustomerKey int
 }
 
Table vDimCustomer as DC {
  Key int [pk]
  Name varchar
  Category varchar
  Industry varchar
  ClassifierCustomerKey int
 }
 
Table vClassifierCustomer as CC {
  Key int [pk]
  Industry varchar
  Category varchar
 }
 
Table vDimDate as DD {
  Key int [pk]
  Date date
 }

// Creating references
// You can also define relaionship separately
// > many-to-one; < one-to-many; - one-to-one
Ref: SALE.ProjectKey > DP.Key  
Ref: DP.CustomerKey > DC.Key
Ref: SALE.TransactionDate > DD.Date
Ref: BUD.BudgetDate > DD.Date
Ref: BUD.CustomerKey > DC.Key
Ref: BUD.ClassifierCustomerKey > CC.Key
Ref: DC.ClassifierCustomerKey > CC.Key

 

 

Option 1

 

 

// Creating tables

Table vSales as SALE {
ProjectKey int
CustomerKey int
ClassifierCustomerKey int
TransactionDate date
ProjectRevenueEurAmt numeric
}

Table vBudget as BUD {
BudgetDate date
CustomerKey int
ClassifierCustomerKey int
PlannedRevenueEurAmt numeric
}

Table vDimProject as DP {
Key int [pk]
ProjectNumber varchar
ProjectName varchar
CustomerKey int
}

Table vDimCustomer as DC {
Key int [pk]
Name varchar
Category varchar
Industry varchar
ClassifierCustomerKey int
}

Table vClassifierCustomer as CC {
Key int [pk]
Industry varchar
Category varchar
}

Table vDimDate as DD {
Key int [pk]
Date date
}

// Creating references
// You can also define relaionship separately
// > many-to-one; < one-to-many; - one-to-one
Ref: SALE.ProjectKey > DP.Key
Ref: SALE.CustomerKey > DC.Key
Ref: SALE.TransactionDate > DD.Date
Ref: BUD.BudgetDate > DD.Date
Ref: BUD.CustomerKey > DC.Key
Ref: BUD.ClassifierCustomerKey > CC.Key
Ref: SALE.ClassifierCustomerKey > CC.Key

 

 

Option 2

 

 

Table vSales as SALE {
ProjectKey int
CustomerAndClassifierKey int
TransactionDate date
ProjectRevenueEurAmt numeric
}

Table vBudget as BUD {
BudgetDate date
CustomerAndClassifierKey int
PlannedRevenueEurAmt numeric
}

Table vDimProject as DP {
Key int [pk]
ProjectNumber varchar
ProjectName varchar
CustomerKey int
}

Table vDimCustomerAndClassifier as CAC {
Key int [pk]
Name varchar
Category varchar
Industry varchar
CustomerAndClassifierKey int
}

Table vDimDate as DD {
Key int [pk]
Date date
}

// Creating references
// You can also define relaionship separately
// > many-to-one; < one-to-many; - one-to-one
Ref: SALE.ProjectKey > DP.Key
Ref: SALE.CustomerAndClassifierKey > CAC.Key
Ref: SALE.TransactionDate > DD.Date
Ref: BUD.BudgetDate > DD.Date
Ref: BUD.CustomerAndClassifierKey > CAC.Key

 

 

5 REPLIES 5
MikeSolvesBI
Helper II
Helper II

Hello,

I assume in PowerBI it is best to keep the Sales & Budget Data in separate tables as opposed to union the data together?

Thanks,

 

Michael

Most of the time you want to keep your fact data separate, especially if it serves a very different purpose.  If you had Sales and Backlog it would be a slightly different story as you could argue that they supplement each other.  But in the case of Sales and Budget there is no conceptual overlap.

lbendlin
Super User
Super User

I would go with your very first version and deal with the ambiguities via inactive relationships and USERELATIONSHIP

And what would be the reasoning?

It's the closest you can get to a snowflake schema, and it seems to match your business process best.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.