Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
michael95677
Helper III
Helper III

How to share datasets with very large tables or complex joins needed

We are trying to figure out how to provide datasets with multiple tables and complex joins to our users so they can create their own reports and share them. If we don't filter the tables in SQL prior to creating the dataset these tables could have 50-100K rows in many of them. To create the proper relationships to include all in one report would mean doing joins that contain their own select statements using Row_Number function and other difficult join scenarios that do not seem possible in the Power BI Model . 

 

How to avoid having users download the datasets that would all refresh on their desktop app if the tables are very large and allow for complex joins? 

1 ACCEPTED SOLUTION

Load everything to Dataflows in a workspace. They can handle millions of rows. Set it to refresh periodically. Your users never need access to the true source data. Shape the tables properly as DIM or FACT tables, and show them or document the unique keys (product number for example) that they would need to use as a filter relationship in Power BI desktop.

Dataflows come with Power BI, so no additional cost. The Power BI Pro covers the cost.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

View solution in original post

15 REPLIES 15
edhans
Super User
Super User

Power BI is not SQL Server. It is not a relational database. You need to provide tables to  your users in a Star Schema format with Dimension and Fact tables. There are no joins in Power BI at all. They are filter relationships and it is more than a different term. 

You can absolutely create what is known as a "golden dataset" where users can create their own reports off of it, or you you can create and stage these tables and users can import those and create their own models, but the worst thing you can do is give them access to dozens of tables in your relational database and expect them to properly create a Star Schema model without proper training. It isn't hard, but it may not be intuitive, and the typical DBA will balk at it because it is so against the third normal form a typical DBA uses.


Microsoft Guidance on Importance of Star Schema



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Thank you so much for that information. While I realize Power BI is not SQL, I do have confusion as to how to offer all of the tables that are needed for the request they've made without giving them a query that or connection to the database that does pull in the entire table. 

We've tried exporting and updating Excel spreadsheets, or queries that access the data in SQL and is then saved as a dataset but this does not seem to be the best option. 

If there are 9 tables needed and some may have hundreds of thousands of rows, then how best to provide those table? Yes, I can help them with the relations (I say joins only because that is terminology I'm familiar with). Thanks

Load everything to Dataflows in a workspace. They can handle millions of rows. Set it to refresh periodically. Your users never need access to the true source data. Shape the tables properly as DIM or FACT tables, and show them or document the unique keys (product number for example) that they would need to use as a filter relationship in Power BI desktop.

Dataflows come with Power BI, so no additional cost. The Power BI Pro covers the cost.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Very helpful. What happens when the join or relation even in sql  requires select statements in the join? My guess is that would be a case where trying to combine too many tables for one report is better off being 2-3 different reports that can then be analyzed or compared separately?

I am not sure what specifically you are referring to. If you load a bunch of tables in Dataflows and let your users do joins in Power Query before loading, then you can show them how to do a conditional join that lets you select rows in the join. Conditional Merge in Power Query — ehansalytics

But again, you should shape the tables for them so they just use the FACT and DIM tables they need. Ideally don't just replicate SQL Database tables in Dataflows. But you can, and you will need to educate your users in how to manually use M code for joins as you describe, and how to properly model data.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

@michael95677 - it seems you are trying to build a datawarehouse or datamart for your users. That is good. But in a good datamart for Power BI, there are no complex joins. That happens at the datawarehouse level or in the source data, not the endpoint the users will use to build their datasets. 

If you want to stage tables for them to build their own stuff and they are properly trained in how to use M code, then just copy the SQL tables to the dataflows and let them build proper star schemas in Power BI. 

But I think it is a mistake to let your users make complex joins unless they are experienced in that. And if they are, then I think dataflows are the wrong course. You should let them build views in Schemas on your SQL Server where they can do whatever they need, then connect to Power BI from there. If you aren't comfortable with them writing SQL for views within a schema like that, then you shouldn't be comfortable with them making "complex joins" anywhere.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

I think I didn't explain properly. I am hoping that I can do the complex joins in the dataflows so that they dont' have to. I am mainly trying to add tables that can be used/refreshed once each night without having to give each user access to the tables (in datasets) that are all then going to be refreshed from our main database. No one can do these in SQL but me so that is why I do not want to give our users access to the source data. However, we do have some power users that can do some relating and Dax measures/calculated columns etc. etc. very well, just that they do not understand the database structure and how to join the tables properly to get the data they want. 

If I can do this in the dataflows then that is great, however, the main purpose is to be able to add many large tables and then be able to create multiple views/datasets from these and reuse the tables for various dept's where this is crossover. 

Yes, you absolutely can do that in M code in the dataflow.  But, if you are creating the dataflow, you can do the joins in your own SQL views and just load the views to the dataflow.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Just out of curiosity, can SSIS be used instead of M code to extract the data and create the joins etc. and then create the dataflows or push the data to Azure Data Lake Storage? I'm more familiar with ssis and ssrs etc. so just thought that might still be an option. Seems like the Dataflows model is now the method Microsoft is moving toward as SSIS and SSRS do not seem to be utilized.  

Yes, but now you are getting well beyond the general scope of this forum. 😁 

You could use SSIS to put it into a datalake, or use Azure Data Factory to do all of the transformations for you. Once you crack open tools like that, the possiblities increase, as does the complexity. 😉



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

One potential problem with dataflows is that there seems to be a 10GB storage limit per user. If I planned to provide access to some of our larger sql tables they would easily go over that amount. This really seems to be a useful option but only if you have the premium license. Is this correct? 

Premium Per User (an extra $10/mo, or $20/mo per user) would allow 100TB. But MS does not recommend you use Dataflows for your full Datawarehouse solution. Pick the right tool. Dataflows are great for staging data for Power BI, and if you are going to be storing over 10GB, I'd look at other solutions, letting Dataflows cover the last mile of transformations specific to Power BI users.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Totally confused now. Since this was the recommendation for being able to easily store millions of rows of data which will easily go over 10GB then what other tools or options are there. We have a physical server for our data storage but do not want to build multiple datasets for everyone and I thought everyone was pointing to using data flows to solve that problem. We are pulling the data from a report server anyway so it is always a day behind and thus not live data. In addition, we are talking about a small scale business with only 10-20 users (accessing or building reports). It should not be this difficult to just pull data from a sql server and make it available in data sets for our users to create their own reports but every attempt fails in one way or another. It's very surprising to me that this functionality is not built in. Users with more advanced report building skills will want to modify datasets to an extent so they can build their own reports but that does not mean they have the skills to create the joins/relations in database with a complex architeccture. 

Entire books have been written on data warehouses (Kimball for example) so that goes well beyond the scope of both this forum and the Microsoft product. Power BI itself doesn't support a full Data Warehouse tech. It would rely on other MS technologies - Azure Data Lake for example. Dataflows are a part of this process, but they are not designed to be "the Data Warehouse."


To do it right, IMHO, you'd need Data Marts. I'd be more inclined to follow Imnan's methodology vs Kimball's for Power BI. And that would either involve golden datasets or AAS instances.

I think you are over complicating it for 10-20 users though. If you just want flat tables in the service, with multiple gig's of data, buy everyone a PPU license, store the data in a dataflow, and call it a day.

Best practices for creating a data warehouse using dataflows - Power Query | Microsoft Docs



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Some of the joins are very complex to pull in and combine all the data they want. I think creating the data model for them as a data set with some of the joins already built would be helpful. I do think that our only option for having to access large amounts of data is data flows so thank you for that. Unfortunately, the linked tables requires the Premium edition which we do not have but I believe there are many other good reasons for using it. Thanks

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors