Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
Joao_Vitor
Regular Visitor

Opinion on how to handle a data base with possible duplicate values from different sources

Hello everyone. I would like an opinion from experienced people on how to handle my particular case, I will try to keep it simple (wrote this before finishing) and by no means want to outsource my work.

 

TL;DR: Have 3 different datasets and want an opinion on how to deal with showing them together.

 

My Dashboard, among several plot visuals, has table visuals that show info on the Facts on my dataset based on the slicers and cross filtering made by the user. The table visuals are intended as a "onscreen drilling".

 

The problem is that I have 3 different sources for my Facts as my ERP handles Incomes, Outcomes and Bank Movements differently and each type of Fact has different fields (columns) associated to them.

 

I made a very simple flow diagram to show how things work.

Data flow.jpg

 

The 3 final elements are the 3 json files provided thru API and treated on Power Query.

For context: we are a construction company and seeing past and future has equal importante as management style usually compares past and future figures on the go so my Dashboards must be capable of seeing past and future dynamically, most often by slicing and cross filtering.

 

My main problem is the link between External Outcomes and Bank Movements since my "past" is a combination of Paid Outcomes + Paid Incomes + External Movements but my future is solely Open Incomes and Outcomes. I'm struggling with a simple table or matrix visual that show data from these 3 sources as the intended result below (Origin column just for context).

image.png

 

I'm a Civil Engineer, little tech literacy and taking some Python 101 classes. I need an opinion on:

  1. Combining the 3 tables. My time and effort will be well spent? Reasons below
    1. Income and Outcome share some "Unique" IDs since ERP treat them as different stuff thought I can append a prefix to solve that
    2. External Movements don't have proper Unique IDs. Need to be created.
    3. Income and Outcome JSON have nested dimension tables that split facts values in different projects and I can't foresee the impact of the merging
    4. The 3 datasets have similar but not quite equal columns so I think especial treatment will be necessary
    5. Bigger dataset have a quite small size (27k lines, 10~15 columns) but is already taking a good 5 minutes for PowerBI to refresh data
  2. Is there a way unknown to me to achieve de desired result without the merging?
    1. With a Table Visual Client and Supplier become two distinct columns since both come from different tables. The same for every column except Date
    2. With a Matrix I wasn't able to put up a measure that filters the info from row headers since each of them comes from a different table, Income and Outcome share common IDs and external movements don't even have an unique ID as said before
1 ACCEPTED SOLUTION
v-yilong-msft
Community Support
Community Support

Hi @Joao_Vitor ,

Firstly your time and effort in combining the tables could be well spent if it leads to a more streamlined and efficient dashboard. To handle the issue of shared “Unique” IDs between Income and Outcome, appending a prefix is a good solution. For External Movements without proper Unique IDs, you could consider creating a composite key based on multiple columns that together can act as a unique identifier. If Income and Outcome JSONs have nested dimension tables, you might need to flatten these tables before merging to avoid complexity.

 

Secondly special treatment will be necessary for columns that are similar but not equal. You can create calculated columns in Power Query to standardize the data. For the larger dataset taking a long time to refresh, consider optimizing your Power Query steps or using incremental refresh if the data size grows.

 

If merging is not feasible, you could use Power BI’s relationship model to create relationships between the tables where possible. For the Table Visual, you can use DAX measures to calculate and display values from different tables in the same column.

 

Finally ensure that your data model is as simple as possible, with fewer columns and only necessary rows. Use calculated columns and measures wisely, as they can impact performance. Consider summarizing data at a higher level if the granularity is not required for all analyses.

 

 

 

Best Regards

Yilong Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

4 REPLIES 4
v-yilong-msft
Community Support
Community Support

Hi @Joao_Vitor ,

Firstly your time and effort in combining the tables could be well spent if it leads to a more streamlined and efficient dashboard. To handle the issue of shared “Unique” IDs between Income and Outcome, appending a prefix is a good solution. For External Movements without proper Unique IDs, you could consider creating a composite key based on multiple columns that together can act as a unique identifier. If Income and Outcome JSONs have nested dimension tables, you might need to flatten these tables before merging to avoid complexity.

 

Secondly special treatment will be necessary for columns that are similar but not equal. You can create calculated columns in Power Query to standardize the data. For the larger dataset taking a long time to refresh, consider optimizing your Power Query steps or using incremental refresh if the data size grows.

 

If merging is not feasible, you could use Power BI’s relationship model to create relationships between the tables where possible. For the Table Visual, you can use DAX measures to calculate and display values from different tables in the same column.

 

Finally ensure that your data model is as simple as possible, with fewer columns and only necessary rows. Use calculated columns and measures wisely, as they can impact performance. Consider summarizing data at a higher level if the granularity is not required for all analyses.

 

 

 

Best Regards

Yilong Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Thank you for your inputs @v-yilong-msft, will take that in consideration when deciding my next steps.

Hi  @Joao_Vitor ,

Have you solved your problem? If so, can you share your solution here and mark the correct answer as a standard answer to help other members find it faster? Thank you very much for your kind cooperation!

 

 

Best Regards

Yilong Zhou

 

Sure. The providedo sugestions are valid and I decided to attempt data treatment using Pandas.

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.