cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
TheOckieMofo
Resolver II
Resolver II

Dealing with Large Datasets

Power BI Community,

 

How do you deal with large datasets? Do you always utilize a live connection to a SSAS cube or import the data?

 

I am at my wits end. I am attempting to figure out the best way to deal with a large dataset in Power BI. I have a dataset that is about 21M+ rows of data and around 40 columns. Several of these columns are "value" columns, like Sales, COGS, etc. The rest are dimension columns that will be required to create one of several slices of the data. Unfortunately, I can't seem to come to a successful outcome. When I import this data, it ballons the file size to over 1GB. While shockingly PBI Service does allow me to upload the .pbix file onto the server (I thought there was a 1GB limit), once it is there, the performance of the report is too slow to be acceptable performance for an end user. I'm not using very complicated dax, just "standard" time intelligence functions or simple sums/subtractions/divisions.

 

My data is sitting in a SQL Server database. I do have access to a SSAS Tabular instance. From the community's experience, are you more likely to be successful utilizing a live connection to a SSAS cube or import the data? If it's import the data, how can I optimize file size and/or report performance?

 

Help, you're my only hope...

9 REPLIES 9
smpa01
Resident Rockstar
Resident Rockstar

@TheOckieMofo one suggestion.. query your table (select only the necesary columns) and create a dataflow. put it on schedule referesh.

 

Now for the power bi report (end user report) query that dataflow. You can atleast avoid the query evaluation time on the report refresh.


New Animated Dashboard: Sales Calendar


@smpa01 I have a similar issue with a large data set that I want to try to create dimenstion tables from, but it's in CSV format. Is there a way to pull select columns and distinct values of those columns from a CSV file?

@lorenamschaferI experimented with a CSV file.

Following is the syntax

Csv.Document(File.Contents("C:\Users\x\Desktop\csv.csv"),[Delimiter=",", Columns=2, Encoding=65001, QuoteStyle=QuoteStyle.None])

Is there a way to pull select columns - yes, you can do that by mentioning th number of columns you want by overwriting in Columns=2. But I guess the column selection works from left to right and you can't do a SQL like SELECT to choose your columns.

 

and distinct values of those columns from a CSV file? - since a SQL statement will not work, I doubt whether it is possible to run additional scripting.

 

Please read to understand the parameters allowed inside CSV.Document

https://docs.microsoft.com/en-us/powerquery-m/csv-document

 

You can try connecting it through ODBC connector and there you can fully utilize SQL statements

 


New Animated Dashboard: Sales Calendar


Thank you for such a quick response!

 

I did see the CSV parameters and for columns it does say for columns you can use "a list of column names" but when I list the columns as they are in my first row (example below) it just names them but doesn't select those columns. I tried finding if there was a way to select certain columns in CSV, but with no luck.

Example data:

"InsuredID","AccountName","PlanName","FirstName","MiddleInitial","LastName","IndivTripCostInsured","DepartDate","ReturnDate","CovgBeginDate ","CovgEndDate"

 

Example code (pulls first two columns and names the respective column InsuredID and PlanName:

= Csv.Document(File.Contents("Z:\Data\Travel\Legacy Data\Premium\AXA_PREM.csv"),[Delimiter=",", Columns={"InsuredID", "PlanName"}, Encoding=1252, QuoteStyle=QuoteStyle.None])

 

Example code (pulls first two columns and names the respective column Column1 and Column3:

= Csv.Document(File.Contents("Z:\Data\Travel\Legacy Data\Premium\AXA_PREM.csv"),[Delimiter=",", Columns={"Column1", "Column3"}, Encoding=1252, QuoteStyle=QuoteStyle.None])

 

@lorenamschafer  tested and yes I can confirm the usage of list inside column parameter

Csv.Document(File.Contents("C:\Users\X\Desktop\csv.csv"),[Delimiter=",", Columns={"Column1","Column3"}, Encoding=65001, QuoteStyle=QuoteStyle.None])

 

 


New Animated Dashboard: Sales Calendar


When I tested that code, it renamed columns 1 and 2 as "Column1" and "Column3", respectively, rather than pulling columns 1 and 3. As you mentioned in your first response, it seems to pull left to right rather than being able to select defined columns.

MarkCBB
Helper V
Helper V

Is it not possible to remove the dims from the fact table and create dim tables with ids to the fact.
I have a 80m row 9 column fact and it is about 0.6gb
v-yuezhe-msft
Microsoft
Microsoft

@TheOckieMofo,

Firstly, Power BI Premium supports uploads of Power BI Desktop (.pbix) files that are up to 10 GB in size. For more details, please review this article.

Secondly, if you choose import mode, you can follow the guide in this article to optimize your data model.

Thirdly, if you don't have Power BI premium license, it is better for you to use DirectQuery or Connect Live mode to connect to the data source as 1GB limitation doesn't apply to directquery datasets. Also in this case, your report performance  depends largely on the performance of the underlying data source.

Regards,
Lydia

Community Support Team _ Lydia Zhang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

DirectQuery is limited to Data sets < 1 million rows.  So I would argue it is not good for large data. However, I don't see any limitation placed on columns.  See: https://docs.microsoft.com/en-us/power-bi/desktop-use-directquery

 

The next suggestion is: Live Connection, I have found less infomation about this online. It currently, appear to have no limits. As everything is powered by SSAS. 

https://powerbi.tips/2018/02/power-bi-connections-live-connection/

 

Helpful resources

Announcements
PBI User Groups

Welcome to the User Group Public Preview

Check out new user group experience and if you are a leader please create your group!

MBAS on Demand

Microsoft Business Applications Summit sessions

On-demand access to all the great content presented by the product teams and community members! #MSBizAppsSummit #CommunityRocks

MBAS Attendee Badge

Claim Your Badge & Digital Swag!

Check out how to claim yours today!

secondImage

Are You Ready?

Test your skills now with the Cloud Skills Challenge.

Top Solution Authors