Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
TimmK
Helper IV
Helper IV

Need Help with Data Model

I have difficulties making my data model a star schema. The report is mainly about showing the consumption of material needed for manufacturing our products.

 

Tables

  • Date = Clearly a DIM table
  • Item = List of materials; Clearly a DIM table
  • Vendor = Clearly a DIM table
  • Outs = Shows consumption for each specific material; Clearly a FACT table
  • Unit = Shows the purchasing unit and quantity; For example, four pieces per box; Support table that I merged with the item table
  • Stock = A duplicate of the outs table; I grouped by with SUM of "Remaining Quantity" as it shows me the current stock level; Not sure if it is a fact table anymore as there are only unique item key rows
  • Last Vendor = Sorted descending by date, removed duplicate item key, removed the date column afterwards; Shows the vendor from who we ordered the last time; Not sure if it is a fact table anymore as there are only unique item key rows
  • Ordered = Shows the SUM of outstanding quantity of open orders for each specific item; Not sure if it is a fact table anymore as there are only unique item key rows
  • Purchase Number = It is a duplicate of the ordered table; Support table that I merged with the ordered table so I see the last order number

Outs only has rows from this and last year as this is sufficient for the report. Stock is not filtered by date and is thus based on all rows.

 

Could you please tell me how I can make a star schema out of this? As what kind of tables should I treat stock, last vendor and ordered? How should I make the connections? Should I maybe just merge these tables with the item table?

 

Data Model.PNG

1 ACCEPTED SOLUTION

Hi @TimmK ,

the best practices refer to Power BI as a model based tool and not as a report based tool. The goal here is to design a data model that can be applied to various questions and reports.

 

With a good data model you have common dimensions by which you can view your different metrics. At the same time, your model should be stable in case of changes, e.g. if you need to add another dimension or metric.

 

For example, the date dimension is one of the most important dimensions from my point of view and should be linked to almost every fact table. So, if modeled correctly, you can determine the inventory at any point in time (running total), your consumptions or receipts at certain points in time, and the delivery quantity at the delivery date.

 

A good BI data model should give analysis capabilities and not just solve a reporting question.

Did I answer your question?
Please mark my post as solution, this will also help others.
Please give Kudos for support.

Marcus Wegener works as Full Stack Power BI Engineer at BI or DIE.
His mission is clear: "Get the most out of data, with Power BI."
twitter - LinkedIn - YouTube - website - podcast


View solution in original post

7 REPLIES 7
v-henryk-mstf
Community Support
Community Support

Hi @TimmK ,

 

To build a reasonable star model, you first need to understand its design principles: Dimension tables support filtering and grouping, Fact tables support summarization. There are cardinal attributes and screening directions, both of which also play a key role in the construction of the model. Based on the above and refer to the official documents, I hope to help you.

v-henryk-mstf_0-1619071136881.png

 

Here is the reference link:

Understand star schema and the importance for Power BI - Power BI | Microsoft Docs
If the problem is still not resolved, please provide detailed error information and let me know immediately. Looking forward to your reply.

 

Best Regards,
Henry

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi Henryk,

 

Thank you, I already know the reference, but this does not really help in my practical example.

 

For example, is a table that is originally a fact table still a fact table after I have aggregated it with the query editor with a group by function? Because then it only has rows with a unique PK and in this case it would be a 1:1 relationship to the item table. Thus, I wonder if I should just merge it with the item table?

Instead of me providing answers, let me provide questions. 

what are your transactions and data tables?  This is where you start. I think you have 

consumption

stock

orders. 

Don't assume you need to use the same names as your source system. You can call it "outs" if you like, but to me it is "consumption"

 

these are all fact tables. 

Next step is to determine if you need 3 fact tables, or if you can combine these into 2 or 1 table. I think 3 is fine.

 

then ask, who, what, when, where. 

who: vendor

what: item

when: date

where: not sure. 

Keep the minimum i of about who, what, when, where in the fact table: just the keys. Put all the detail in the dim tables. 

these are general guidelines. I hope it helps. 

 

 



* Matt is an 8 times Microsoft MVP (Power BI) and author of the Power BI Book Supercharge Power BI.

Thank you, Matt.

 

Yes, Consumption may be a better table name.

 

So would you say fact tables with distinct rows only are still considered fact tables? Stock, Last Vendor and Ordered have been reduced by GROUP BY or remove duplicates in the query editor. Thus, each row of them has only exactly one unique item key.

 

Together with the Consumption table I would then have four fact tables (Consumption, Stock, Last Vendor, Ordered).

 

I agree, Item, Vendor and Date must be the DIM tables. The where question is not relevant for this report, I think.

 

Do you think the following would be the star schema according to best practice then? Or would you do something differently?

 

DM 2.PNG

Hi @TimmK ,

the best practices refer to Power BI as a model based tool and not as a report based tool. The goal here is to design a data model that can be applied to various questions and reports.

 

With a good data model you have common dimensions by which you can view your different metrics. At the same time, your model should be stable in case of changes, e.g. if you need to add another dimension or metric.

 

For example, the date dimension is one of the most important dimensions from my point of view and should be linked to almost every fact table. So, if modeled correctly, you can determine the inventory at any point in time (running total), your consumptions or receipts at certain points in time, and the delivery quantity at the delivery date.

 

A good BI data model should give analysis capabilities and not just solve a reporting question.

Did I answer your question?
Please mark my post as solution, this will also help others.
Please give Kudos for support.

Marcus Wegener works as Full Stack Power BI Engineer at BI or DIE.
His mission is clear: "Get the most out of data, with Power BI."
twitter - LinkedIn - YouTube - website - podcast


Thank you, @mwegener 

 

This makes sense to me in general having this kind of model perspective being able to answer various questions, also those that may not be seen yet.

 

However, could possible performance improvements also be important/useful in some cases? For example, when I aggregate in Power Query I can reduce the rows from let's say 1.5 million to just a few ten thousand rows. This may result in reducing overall loading time four times compared to no aggregation. So maybe one could trade-off universal flexibility/changeability with better performance in some cases?

 

Kimball briefly describes here "Aggregate Fact Tables" and their use to accelerate query performance:

Aggregate Fact Tables or Cubes | Kimball Group

 

 

Hi @TimmK,

 

I would consider the performance improvement secondary. Since Power BI's analytics database is optimized for aggregating data. The main focus should be on usability and maintainability at the beginning. In my view, an increased effort for performance is only justified in case of concrete performance problems.

 

Did I answer your question?
Please mark my post as solution, this will also help others.
Please give Kudos for support.

Marcus Wegener works as Full Stack Power BI Engineer at BI or DIE.
His mission is clear: "Get the most out of data, with Power BI."
twitter - LinkedIn - YouTube - website - podcast


Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors