Solved: Re: Recommended table setup

Hanspw · ‎02-14-2021

Hi.

I wonder if someone have some tips for best practice for seting up my datamodel.

I have one main fact table with connected dim tables (fact_table1). Hovever some of the transactions in the main fact_table1 is on an aggregate level. And then i have a underlying datatable (fact_table2) with the underlying transactions split up on a more detailed level. But fact_table1 also includes transactions which dont come from fact_table2.

So lets say fact_table2 have 5 underlying transactions. These 5 transactions aggregate together to 1 transaction in fact_table1. And then fact_table1 have 10 more transactions from another source. fact_table1 and fact_table2 would have a relationsship based on a transaction ID.

So:

i would need to establish my reports based on the main transaction table (fact_table1), to include all transactions
from the report i want to be able to drill to the data from the underlying transaction table (fact_table2), to see the 5 underlying transactions

I want to keep the tables appart to not get a too big data table and use relationships to drill the data, not merge the data to one big table.

I guess this could be done with an ordinary relationsship based on transaction ID?

So trans_table1 and trans_table2 are connected. Then i connect the dim_tables to trans_table 1. Would i then be able to filter on the dim tables, and the filters would apply to trans_table2 as well? does the relationsships then need to be many-to-many, or can i still use one-to-many? And wouls many-to-many in that case make some trouble?

belvoir99 · ‎02-14-2021

Sounds like you've got the right idea here. Separate out dimension tables from fact tables. Link them with keys (foreign and primary key). Share the dimension tables with the fact tables. Eg. dim x is linked to fact a and fact b using the same key, using 1:M relationships. Then you can group and slice using the dimension column.

You should avoid M:M relationships between fact tables. Performance will be poor. If you design the model well, you shouldn't need that relationship.

For simplicity, where the granularity of the data is different, you should separate the datasets into separate fact tables. If the summary data is simply a summary of one of the fact tables, you should (if size/performance allows) remove the summary data and just use DAX and/or the filtering on the visuals to group the data.

Spend time getting the model right. This will save you many hours of frustration and grief later on. Talk to the business/end users about what they want. Can you capture that with your model?

View solution in original post

belvoir99 · ‎02-15-2021

Hi @Hanspw

Sorry, I probably wasn't clear enough. You link the fact table (many side) to the dimension table (one side) using a key. In the fact table, the key is called the foreign key. In the dimension table, the key is called the primary key.

If you then link the dimension table to another fact table, then the dimension table is a shared dimension table.

You can then use the dimension column as a slicer (or row or column on a visual) and can then drag and drop columns from both of the fact tables onto the same visual. It's powerful.

The standard model design is called a star model - a fact table with many dimension tables attached. This is the best form for tabular systems such as Power BI.

SQLBI.com have a great introductory DAX course (which is free!) - it's worth working through. There is also an article on budget vs actuals with different granularities (e.g. budget is more summarized than actuals) on their website, but the DAX is more complex to understand.

Keeping the two tables separate is simpler but may have limitations. A definitive answer can't be given without in-depth review. The other simpler way is to use Power Query to summarize the actuals and then join together with the budget data into a single table adding a column called say Type with the values of either "Budget" or "Actuals". That makes visual production much easier.

100,000 rows is a small to medium table. You shouldn't need to consider performance until your tables are 1,000,000 or more rows (many factors determine performance). The biggest issue is no of rows x cardinality of the columns. Basically if your column has lots of different values (e.g. Amount $) that's bad ; if it has few (e.g. Gender: Male, Female), that's really good. Removing high cardinality columns such as ID is usually a good thing to do.

I don't think I can answer all your detailed questions I'm afraid but you are definitely heading in the right direction!

I would start with the 2 fact tables joined to the shared dimension table. Note that, in the filter context, the filters automatically flow from dimension to fact, so you just drag and drop from the tables onto the visuals.

A really good beginners book on DAX and modelling is 'Power Pivot & Power BI' by Rob Collie and Avi Singh. Really old now but clear and simple, and still relevant for Power BI, even though it was written in 2016 when Power BI had just come out! I still refer to it at times.

If you can give me a 'like' for my posts, that would be great - thanks!

Hanspw · ‎02-15-2021

Thank you belvoir.

As for budget vs actual i guess my current method is a OK way to do it. I have done almost what you suggest "summarize the actuals and then join together with the budget data into a single table adding a column called say Type with the values of either "Budget" or "Actuals".",

Only that i have not used summarized data for actuals, i hav just put them together. I dont have a problem with the budget data being more aggregated than actuals (if it doesnt affect model performance).

For the two different fact tables, as i understand it could be an solution to append them as well to a combined data table. That would make the star schema easier. I will try both solutions later on and look how they work.

Thanks again for the good info, il look into the dax courses!

belvoir99 · ‎02-14-2021

Sounds like you've got the right idea here. Separate out dimension tables from fact tables. Link them with keys (foreign and primary key). Share the dimension tables with the fact tables. Eg. dim x is linked to fact a and fact b using the same key, using 1:M relationships. Then you can group and slice using the dimension column.

You should avoid M:M relationships between fact tables. Performance will be poor. If you design the model well, you shouldn't need that relationship.

For simplicity, where the granularity of the data is different, you should separate the datasets into separate fact tables. If the summary data is simply a summary of one of the fact tables, you should (if size/performance allows) remove the summary data and just use DAX and/or the filtering on the visuals to group the data.

Spend time getting the model right. This will save you many hours of frustration and grief later on. Talk to the business/end users about what they want. Can you capture that with your model?

Hanspw · ‎02-15-2021

Thank you belvoir.

So, if i understand it correct:

fact_table1 and fact_table 2 is linked with 1:M relationship.
dim_tables are linked both to fact_table 1 and fact_table2, this way i dont need to use M:M releationsship,

Correct?

I also have some followup questions.

@belvoir99 wrote:
For simplicity, where the granularity of the data is different, you should separate the datasets into separate fact tables.

I have a budget table and a table with real data. As of now i have combined(append) this to one fact table (fact_table1). I then have one columnd for actual amount and one columnd for budget amount. Since the budget is on a less detailed level some of the columns would be empty in the budget table. Does that mean that i should have an own fact table for the budget data? As i understand the empty columns does not take up data storage because of the column based database storage, and it would not be a problem to have empty columns, or lower granularity of the budget data, in the same fact table? I guess budget could be in a own fact table but it seems more practically to have it in the same fact table?

Is there any best practice on how to include budget data?

@belvoir99 wrote:
If the summary data is simply a summary of one of the fact tables, you should (if size/performance allows) remove the summary data and just use DAX and/or the filtering on the visuals to group the data.

This seems smart. I would really like the reports to show the data with all detail. And not to have one report with data from fact_table1 and then one report with data from fact_table2. If there is a 1:M relationsship i guess i could delete/hide the aggregate data in fact_table1 and then append the more detailed data from fact_table2 to fact_table1, to make one datatable with all underlying transactions. Then i would get all transactions in the same "amount column".

Do i understand your comment so that we can do the same but skip the append part, and keep the data in two fact tables, and use relationsships and dax to make the combined "amount" measure? Something like this?:

keep fact_table 1 and fact_table2
hide/delete the aggregate transactions (coming from fact_table2) in fact_table1
make a dax measure to sum amount from both fact_table1 and fact_table2?
or even better: make a dax measure which sums amounts from fact_table1 only if not coming from fact_table2, and sum amount coming from fact_table2

Am i on the right track here?

And one more question: when using more fact_tables like this. If i want to make a filter based on a primary key. Then the key would exist in both fact_tables. So i cant put the filter here like i ordinary do, then i would need to put the filter twice (one for each fact_table). Istead i should put the filter on the key in the dim_table, which is linked to both fact_tables, correct?

BTW: my dataset is maybe 100.000 rows a year in fact_table1, and and i guess the same in fact_table2. I dont know what is consideres a big dataset but i would guess this would be in the lower size so that size/performance is not a big issue here?

Recommended table setup

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly