Managing multiple sources and their relationships

gemcityzach · ‎03-19-2024

Hey folks,

What is the best way to manage relationships between tables when I need to filter one table based on another? Should I build a relationship between the two? Or just merge table 2 into table 1? Since all of this data is operational data about various parts of our program, is it okay to have it all in the same report or should I break it up over multiple files? I'm buillding it in Desktop and will publish to PBI Premium workspace, which will handle the monthly incremental refreshes once published.

The majority of my data sources are from an application that allows no direct connection and has a crappy reporting layer. So I'm essentially having to rebuilt large parts of the schema within PowerBI to do decent reporting. We're also supplementing this Application with newly built applications in SharePoint that will use data from the source application but also some generated within SharePoint.

CSV Source

tbl1_id (distinct)	tbl1_field1	tbl_1field2	tbl1_field3
123	abc	xyz	pqd
123	cde	lll	awe

SharePoint Source

tbl2	tbl1_foreign_key	tbl2_field2	tbl2_field3
R3145	123	ppp	pll
R3111	555	aaa	zzz

I have about 20 unique data sources that are month-end snapshot extracts as CSV that get loaded into 20 SharePoint folders (e.g., dim-table1, dim-table2, fact-tableZZ, fact-tableQQ) each month. All of them have < 20k rows in each file. I use PowerQuery to append those individual files/tables to themselves each month with a mm-yyyy column to differentiate history. Additionally, there are multiple SharePoint lists in this model.

I'd like to be able to have date filters that allow me to change my report view based on the month of data selected 01-2024 vs 03-2024 for example. That selection would change the view of the all of the visualizations in the file.

I'd also like to bring data from one of the SharePoint sources who has complementary data in a CSV source and build a 'delta table'. I.e., things may show up in the CSV that are not yet in SharePoint, but should be. There is a common key between the two when the relationship is satisfied. E.g., report_id is is a field that comes in the CSV and when a complementary entry is put into SharePoint someone will manually add the 'report_id' field into the list. This is because the source of the CSV is a different tool than the SharePoint intake list.

I think that's about it. Best way to handle relationships and filter with relationships. Monthly snapshot filtering and a delta table between two complementary data sources. I hope that's not too much for one question!

christinepayton · ‎03-19-2024

Usually what people do instead of connecting directly to the source database is use an ETL tool to import the data into a data lake or data warehouse (e.g. Databricks, Fivetran, etc). You will absolutely want to look into that vs exporting CSVs to SharePoint if you want to avoid losing your mind. 🙂

Power BI likes multi-dimensional models like a star schema vs single large flattened tables.

gemcityzach · ‎03-19-2024

DL/DWH not possible for too many infuriating reasons. I get that PBI likes a star schema and I understand why. From your response it seems you're in favor of the relationship building functionality vs merging tables.

Does it matter that I have all of these data sources in one file? I'm guessing no, because eventually I'll push it to PBI Premium and use it as it as a data service to pull files off SharePoint Online folders.

christinepayton · ‎03-19-2024

Well, if it's already all in one file I guess it kind of depends... if it's a small project I'd just leave it in one, but if you've got 200 million rows or something and need to write measures that like to use dimensions (e.g. iterator functions), then I'd split the dimensions out in a dataflow or something. You might run into refresh speed issues if you try to fully dimensionalize a large csv that's based in SharePoint - it pulls the whole file for each table query, and the speed for that connector can be slower sometimes (particularly if you're doing a lot of transforms).

gemcityzach · ‎03-19-2024

Small data set for sure. Across the 20 data sources we're talking less than 200k rows in total. Folder get new data loads each month. So the # of possible rows increases by the length of itself each month. That being said, I'll be implementing an incremental refresh on each of these data sources. My understanding was that would keep the Service from having to do a full pull each month.

christinepayton · ‎03-19-2024

Yeah, incremental is totally the way to go if the older files are static. 🙂

gemcityzach · ‎03-20-2024

Yep static!

Managing multiple sources and their relationships

Helpful resources

New forum boards available in Real-Time Intelligence.

Power BI Monthly Update - May 2024

How to Get Your Question Answered Quickly

Jumpstart your career with the Fabric Career Hub

Managing multiple sources and their relationships

Helpful resources

New forum boards available in Real-Time Intelligence.

Power BI Monthly Update - May 2024

How to Get Your Question Answered Quickly