Solved: Opinion on how to handle a data base with possible...

Joao_Vitor · ‎04-30-2024

Hello everyone. I would like an opinion from experienced people on how to handle my particular case, I will try to keep it simple (wrote this before finishing) and by no means want to outsource my work.

TL;DR: Have 3 different datasets and want an opinion on how to deal with showing them together.

My Dashboard, among several plot visuals, has table visuals that show info on the Facts on my dataset based on the slicers and cross filtering made by the user. The table visuals are intended as a "onscreen drilling".

The problem is that I have 3 different sources for my Facts as my ERP handles Incomes, Outcomes and Bank Movements differently and each type of Fact has different fields (columns) associated to them.

I made a very simple flow diagram to show how things work.

Data flow.jpg

The 3 final elements are the 3 json files provided thru API and treated on Power Query.

For context: we are a construction company and seeing past and future has equal importante as management style usually compares past and future figures on the go so my Dashboards must be capable of seeing past and future dynamically, most often by slicing and cross filtering.

My main problem is the link between External Outcomes and Bank Movements since my "past" is a combination of Paid Outcomes + Paid Incomes + External Movements but my future is solely Open Incomes and Outcomes. I'm struggling with a simple table or matrix visual that show data from these 3 sources as the intended result below (Origin column just for context).

I'm a Civil Engineer, little tech literacy and taking some Python 101 classes. I need an opinion on:

Combining the 3 tables. My time and effort will be well spent? Reasons below
1. Income and Outcome share some "Unique" IDs since ERP treat them as different stuff thought I can append a prefix to solve that
2. External Movements don't have proper Unique IDs. Need to be created.
3. Income and Outcome JSON have nested dimension tables that split facts values in different projects and I can't foresee the impact of the merging
4. The 3 datasets have similar but not quite equal columns so I think especial treatment will be necessary
5. Bigger dataset have a quite small size (27k lines, 10~15 columns) but is already taking a good 5 minutes for PowerBI to refresh data
Is there a way unknown to me to achieve de desired result without the merging?
1. With a Table Visual Client and Supplier become two distinct columns since both come from different tables. The same for every column except Date
2. With a Matrix I wasn't able to put up a measure that filters the info from row headers since each of them comes from a different table, Income and Outcome share common IDs and external movements don't even have an unique ID as said before

v-yilong-msft · ‎04-30-2024

Hi @Joao_Vitor ,

Firstly your time and effort in combining the tables could be well spent if it leads to a more streamlined and efficient dashboard. To handle the issue of shared “Unique” IDs between Income and Outcome, appending a prefix is a good solution. For External Movements without proper Unique IDs, you could consider creating a composite key based on multiple columns that together can act as a unique identifier. If Income and Outcome JSONs have nested dimension tables, you might need to flatten these tables before merging to avoid complexity.

Secondly special treatment will be necessary for columns that are similar but not equal. You can create calculated columns in Power Query to standardize the data. For the larger dataset taking a long time to refresh, consider optimizing your Power Query steps or using incremental refresh if the data size grows.

If merging is not feasible, you could use Power BI’s relationship model to create relationships between the tables where possible. For the Table Visual, you can use DAX measures to calculate and display values from different tables in the same column.

Finally ensure that your data model is as simple as possible, with fewer columns and only necessary rows. Use calculated columns and measures wisely, as they can impact performance. Consider summarizing data at a higher level if the granularity is not required for all analyses.

Best Regards

Yilong Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

v-yilong-msft · ‎04-30-2024

Hi @Joao_Vitor ,

Firstly your time and effort in combining the tables could be well spent if it leads to a more streamlined and efficient dashboard. To handle the issue of shared “Unique” IDs between Income and Outcome, appending a prefix is a good solution. For External Movements without proper Unique IDs, you could consider creating a composite key based on multiple columns that together can act as a unique identifier. If Income and Outcome JSONs have nested dimension tables, you might need to flatten these tables before merging to avoid complexity.

Secondly special treatment will be necessary for columns that are similar but not equal. You can create calculated columns in Power Query to standardize the data. For the larger dataset taking a long time to refresh, consider optimizing your Power Query steps or using incremental refresh if the data size grows.

If merging is not feasible, you could use Power BI’s relationship model to create relationships between the tables where possible. For the Table Visual, you can use DAX measures to calculate and display values from different tables in the same column.

Finally ensure that your data model is as simple as possible, with fewer columns and only necessary rows. Use calculated columns and measures wisely, as they can impact performance. Consider summarizing data at a higher level if the granularity is not required for all analyses.

Best Regards

Yilong Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Joao_Vitor · ‎05-07-2024

Thank you for your inputs @v-yilong-msft, will take that in consideration when deciding my next steps.

v-yilong-msft

Hi @Joao_Vitor ,

Have you solved your problem? If so, can you share your solution here and mark the correct answer as a standard answer to help other members find it faster? Thank you very much for your kind cooperation!

Best Regards

Yilong Zhou

Joao_Vitor

Sure. The providedo sugestions are valid and I decided to attempt data treatment using Pandas.

Opinion on how to handle a data base with possible duplicate values from different sources

Helpful resources

Fabric certifications survey

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly