Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
gemcityzach
Helper III
Helper III

Managing multiple sources and their relationships

Hey folks,

 

What is the best way to manage relationships between tables when I need to filter one table based on another? Should I build a relationship between the two? Or just merge table 2 into table 1? Since all of this data is operational data about various parts of our program, is it okay to have it all in the same report or should I break it up over multiple files? I'm buillding it in Desktop and will publish to PBI Premium workspace, which will handle the monthly incremental refreshes once published.

 

The majority of my data sources are from an application that allows no direct connection and has a crappy reporting layer. So I'm essentially having to rebuilt large parts of the schema within PowerBI to do decent reporting. We're also supplementing this Application with newly built applications in SharePoint that will use data from the source application but also some generated within SharePoint.

 

CSV Source

tbl1_id (distinct)tbl1_field1tbl_1field2tbl1_field3
123abcxyzpqd
123cdelllawe

 

SharePoint Source

tbl2tbl1_foreign_keytbl2_field2tbl2_field3
R3145123ppppll
R3111555aaazzz

 

I have about 20 unique data sources that are month-end snapshot extracts as CSV that get loaded into 20 SharePoint folders (e.g., dim-table1, dim-table2, fact-tableZZ, fact-tableQQ) each month. All of them have < 20k rows in each file. I use PowerQuery to append those individual files/tables to themselves each month with a mm-yyyy column to differentiate history. Additionally, there are multiple SharePoint lists in this model.

 

I'd like to be able to have date filters that allow me to change my report view based on the month of data selected 01-2024 vs 03-2024 for example. That selection would change the view of the all of the visualizations in the file.

 

I'd also like to bring data from one of the SharePoint sources who has complementary data in a CSV source and build a 'delta table'. I.e., things may show up in the CSV that are not yet in SharePoint, but should be. There is a common key between the two when the relationship is satisfied. E.g., report_id is is a field that comes in the CSV and when a complementary entry is put into SharePoint someone will manually add the 'report_id' field into the list. This is because the source of the CSV is a different tool than the SharePoint intake list.

 

I think that's about it. Best way to handle relationships and filter with relationships. Monthly snapshot filtering and a delta table between two complementary data sources. I hope that's not too much for one question!

6 REPLIES 6
christinepayton
Super User
Super User

Usually what people do instead of connecting directly to the source database is use an ETL tool to import the data into a data lake or data warehouse (e.g. Databricks, Fivetran, etc). You will absolutely want to look into that vs exporting CSVs to SharePoint if you want to avoid losing your mind. 🙂

 

Power BI likes multi-dimensional models like a star schema vs single large flattened tables. 

DL/DWH not possible for too many infuriating reasons. I get that PBI likes a star schema and I understand why. From your response it seems you're in favor of the relationship building functionality vs merging tables.

Does it matter that I have all of these data sources in one file? I'm guessing no, because eventually I'll push it to PBI Premium and use it as it as a data service to pull files off SharePoint Online folders.

Well, if it's already all in one file I guess it kind of depends... if it's a small project I'd just leave it in one, but if you've got 200 million rows or something and need to write measures that like to use dimensions (e.g. iterator functions), then I'd split the dimensions out in a dataflow or something. You might run into refresh speed issues if you try to fully dimensionalize a large csv that's based in SharePoint - it pulls the whole file for each table query, and the speed for that connector can be slower sometimes (particularly if you're doing a lot of transforms). 

Small data set for sure. Across the 20 data sources we're talking less than 200k rows in total. Folder get new data loads each month. So the # of possible rows increases by the length of itself each month. That being said, I'll be implementing an incremental refresh on each of these data sources. My understanding was that would keep the Service from having to do a full pull each month.

Yeah, incremental is totally the way to go if the older files are static. 🙂

Yep static!

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.