Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Power BI Community,
How do you deal with large datasets? Do you always utilize a live connection to a SSAS cube or import the data?
I am at my wits end. I am attempting to figure out the best way to deal with a large dataset in Power BI. I have a dataset that is about 21M+ rows of data and around 40 columns. Several of these columns are "value" columns, like Sales, COGS, etc. The rest are dimension columns that will be required to create one of several slices of the data. Unfortunately, I can't seem to come to a successful outcome. When I import this data, it ballons the file size to over 1GB. While shockingly PBI Service does allow me to upload the .pbix file onto the server (I thought there was a 1GB limit), once it is there, the performance of the report is too slow to be acceptable performance for an end user. I'm not using very complicated dax, just "standard" time intelligence functions or simple sums/subtractions/divisions.
My data is sitting in a SQL Server database. I do have access to a SSAS Tabular instance. From the community's experience, are you more likely to be successful utilizing a live connection to a SSAS cube or import the data? If it's import the data, how can I optimize file size and/or report performance?
Help, you're my only hope...
When possible the live connection is the better way to go. The security is inherited from the server and the model is a lot faster. If you import, the big trick is add things in steps. First pick your most restrictive or smallest filters (member filter). Generally, this would be the year column, department column, or financial statement column. Then use add items to add in your next most restrictive filter. Once all your filters are set then you add in your data. Please note, that you can add in multiple filters at once, but I generally do one at a time for performance purposes.
@TheOckieMofo one suggestion.. query your table (select only the necesary columns) and create a dataflow. put it on schedule referesh.
Now for the power bi report (end user report) query that dataflow. You can atleast avoid the query evaluation time on the report refresh.
@smpa01 I have a similar issue with a large data set that I want to try to create dimenstion tables from, but it's in CSV format. Is there a way to pull select columns and distinct values of those columns from a CSV file?
@AnonymousI experimented with a CSV file.
Following is the syntax
Csv.Document(File.Contents("C:\Users\x\Desktop\csv.csv"),[Delimiter=",", Columns=2, Encoding=65001, QuoteStyle=QuoteStyle.None])
Is there a way to pull select columns - yes, you can do that by mentioning th number of columns you want by overwriting in Columns=2. But I guess the column selection works from left to right and you can't do a SQL like SELECT to choose your columns.
and distinct values of those columns from a CSV file? - since a SQL statement will not work, I doubt whether it is possible to run additional scripting.
Please read to understand the parameters allowed inside CSV.Document
https://docs.microsoft.com/en-us/powerquery-m/csv-document
You can try connecting it through ODBC connector and there you can fully utilize SQL statements
Thank you for such a quick response!
I did see the CSV parameters and for columns it does say for columns you can use "a list of column names" but when I list the columns as they are in my first row (example below) it just names them but doesn't select those columns. I tried finding if there was a way to select certain columns in CSV, but with no luck.
Example data:
"InsuredID","AccountName","PlanName","FirstName","MiddleInitial","LastName","IndivTripCostInsured","DepartDate","ReturnDate","CovgBeginDate ","CovgEndDate"
Example code (pulls first two columns and names the respective column InsuredID and PlanName:
= Csv.Document(File.Contents("Z:\Data\Travel\Legacy Data\Premium\AXA_PREM.csv"),[Delimiter=",", Columns={"InsuredID", "PlanName"}, Encoding=1252, QuoteStyle=QuoteStyle.None])
Example code (pulls first two columns and names the respective column Column1 and Column3:
= Csv.Document(File.Contents("Z:\Data\Travel\Legacy Data\Premium\AXA_PREM.csv"),[Delimiter=",", Columns={"Column1", "Column3"}, Encoding=1252, QuoteStyle=QuoteStyle.None])
@Anonymous tested and yes I can confirm the usage of list inside column parameter
Csv.Document(File.Contents("C:\Users\X\Desktop\csv.csv"),[Delimiter=",", Columns={"Column1","Column3"}, Encoding=65001, QuoteStyle=QuoteStyle.None])
When I tested that code, it renamed columns 1 and 2 as "Column1" and "Column3", respectively, rather than pulling columns 1 and 3. As you mentioned in your first response, it seems to pull left to right rather than being able to select defined columns.
@TheOckieMofo,
Firstly, Power BI Premium supports uploads of Power BI Desktop (.pbix) files that are up to 10 GB in size. For more details, please review this article.
Secondly, if you choose import mode, you can follow the guide in this article to optimize your data model.
Thirdly, if you don't have Power BI premium license, it is better for you to use DirectQuery or Connect Live mode to connect to the data source as 1GB limitation doesn't apply to directquery datasets. Also in this case, your report performance depends largely on the performance of the underlying data source.
Regards,
Lydia
DirectQuery is limited to Data sets < 1 million rows. So I would argue it is not good for large data. However, I don't see any limitation placed on columns. See: https://docs.microsoft.com/en-us/power-bi/desktop-use-directquery
The next suggestion is: Live Connection, I have found less infomation about this online. It currently, appear to have no limits. As everything is powered by SSAS.
https://powerbi.tips/2018/02/power-bi-connections-live-connection/
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
110 | |
98 | |
79 | |
64 | |
57 |
User | Count |
---|---|
143 | |
109 | |
89 | |
84 | |
66 |