Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
TheOckieMofo
Resolver II
Resolver II

Dealing with Large Datasets

Power BI Community,

 

How do you deal with large datasets? Do you always utilize a live connection to a SSAS cube or import the data?

 

I am at my wits end. I am attempting to figure out the best way to deal with a large dataset in Power BI. I have a dataset that is about 21M+ rows of data and around 40 columns. Several of these columns are "value" columns, like Sales, COGS, etc. The rest are dimension columns that will be required to create one of several slices of the data. Unfortunately, I can't seem to come to a successful outcome. When I import this data, it ballons the file size to over 1GB. While shockingly PBI Service does allow me to upload the .pbix file onto the server (I thought there was a 1GB limit), once it is there, the performance of the report is too slow to be acceptable performance for an end user. I'm not using very complicated dax, just "standard" time intelligence functions or simple sums/subtractions/divisions.

 

My data is sitting in a SQL Server database. I do have access to a SSAS Tabular instance. From the community's experience, are you more likely to be successful utilizing a live connection to a SSAS cube or import the data? If it's import the data, how can I optimize file size and/or report performance?

 

Help, you're my only hope...

10 REPLIES 10
keeganm
Frequent Visitor

When possible the live connection is the better way to go. The security is inherited from the server and the model is a lot faster. If you import, the big trick is add things in steps. First pick your most restrictive or smallest filters (member filter). Generally, this would be the year column, department column, or financial statement column. Then use add items to add in your next most restrictive filter. Once all your filters are set then you add in your data. Please note, that you can add in multiple filters at once, but I generally do one at a time for performance purposes.

smpa01
Super User
Super User

@TheOckieMofo one suggestion.. query your table (select only the necesary columns) and create a dataflow. put it on schedule referesh.

 

Now for the power bi report (end user report) query that dataflow. You can atleast avoid the query evaluation time on the report refresh.

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs
Anonymous
Not applicable

@smpa01 I have a similar issue with a large data set that I want to try to create dimenstion tables from, but it's in CSV format. Is there a way to pull select columns and distinct values of those columns from a CSV file?

@AnonymousI experimented with a CSV file.

Following is the syntax

Csv.Document(File.Contents("C:\Users\x\Desktop\csv.csv"),[Delimiter=",", Columns=2, Encoding=65001, QuoteStyle=QuoteStyle.None])

Is there a way to pull select columns - yes, you can do that by mentioning th number of columns you want by overwriting in Columns=2. But I guess the column selection works from left to right and you can't do a SQL like SELECT to choose your columns.

 

and distinct values of those columns from a CSV file? - since a SQL statement will not work, I doubt whether it is possible to run additional scripting.

 

Please read to understand the parameters allowed inside CSV.Document

https://docs.microsoft.com/en-us/powerquery-m/csv-document

 

You can try connecting it through ODBC connector and there you can fully utilize SQL statements

 

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs
Anonymous
Not applicable

Thank you for such a quick response!

 

I did see the CSV parameters and for columns it does say for columns you can use "a list of column names" but when I list the columns as they are in my first row (example below) it just names them but doesn't select those columns. I tried finding if there was a way to select certain columns in CSV, but with no luck.

Example data:

"InsuredID","AccountName","PlanName","FirstName","MiddleInitial","LastName","IndivTripCostInsured","DepartDate","ReturnDate","CovgBeginDate ","CovgEndDate"

 

Example code (pulls first two columns and names the respective column InsuredID and PlanName:

= Csv.Document(File.Contents("Z:\Data\Travel\Legacy Data\Premium\AXA_PREM.csv"),[Delimiter=",", Columns={"InsuredID", "PlanName"}, Encoding=1252, QuoteStyle=QuoteStyle.None])

 

Example code (pulls first two columns and names the respective column Column1 and Column3:

= Csv.Document(File.Contents("Z:\Data\Travel\Legacy Data\Premium\AXA_PREM.csv"),[Delimiter=",", Columns={"Column1", "Column3"}, Encoding=1252, QuoteStyle=QuoteStyle.None])

 

@Anonymous  tested and yes I can confirm the usage of list inside column parameter

Csv.Document(File.Contents("C:\Users\X\Desktop\csv.csv"),[Delimiter=",", Columns={"Column1","Column3"}, Encoding=65001, QuoteStyle=QuoteStyle.None])

 

 

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs
Anonymous
Not applicable

When I tested that code, it renamed columns 1 and 2 as "Column1" and "Column3", respectively, rather than pulling columns 1 and 3. As you mentioned in your first response, it seems to pull left to right rather than being able to select defined columns.

MarkCBB
Helper V
Helper V

Is it not possible to remove the dims from the fact table and create dim tables with ids to the fact.
I have a 80m row 9 column fact and it is about 0.6gb
v-yuezhe-msft
Employee
Employee

@TheOckieMofo,

Firstly, Power BI Premium supports uploads of Power BI Desktop (.pbix) files that are up to 10 GB in size. For more details, please review this article.

Secondly, if you choose import mode, you can follow the guide in this article to optimize your data model.

Thirdly, if you don't have Power BI premium license, it is better for you to use DirectQuery or Connect Live mode to connect to the data source as 1GB limitation doesn't apply to directquery datasets. Also in this case, your report performance  depends largely on the performance of the underlying data source.

Regards,
Lydia

Community Support Team _ Lydia Zhang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Anonymous
Not applicable

DirectQuery is limited to Data sets < 1 million rows.  So I would argue it is not good for large data. However, I don't see any limitation placed on columns.  See: https://docs.microsoft.com/en-us/power-bi/desktop-use-directquery

 

The next suggestion is: Live Connection, I have found less infomation about this online. It currently, appear to have no limits. As everything is powered by SSAS. 

https://powerbi.tips/2018/02/power-bi-connections-live-connection/

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.