Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Large Dataset in refreshing PowerBI Service

Hi all,

I'm newbie in this field.

I had created a Customer's reports. I used MySQL Server is my datawarehouse, and connect with power BI. Then I published it online.

It was 15 milion data rows for 6 months, and it took 2-3 hours to refresh data in app.powerbi.com (not refresh in powerbi desktop)

 

So, here is my 02 problems:

1. My dataset will be get more and more large in the near future, and maybe it will take more time than 3hours to refresh data.

Is there something that I can do to imporve this situation?

 

2. The refresh usually get fail, for some below common reasons:

- "Before the data import for Total_Identifies finished, its data source timed out. Double-check whether that data source can process import queries, and if it can, try again."
- "Unable to connect to the data source undefined."

Microsoft SQL: Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

 

What can I do to fix this?

 

Thanks you guy in advance (Y)

 

 

1 ACCEPTED SOLUTION
v-lid-msft
Community Support
Community Support

Hi @Anonymous ,

 

We can use the Increasement Refresh to reduce the refresh time if the dataset is under premium capacity. Or we can following those tips to reduce the size of dataset or optimize the model of dataset based on this document, some tips may not reduce the time of refresh.

 

  • Remove unused tables or columns, where possible. 
  • Avoid distinct counts on fields with high cardinality – that is, millions of distinct values.  
  • Take steps to avoid fields with unnecessary precision and high cardinality. For example, you could split highly unique datetime values into separate columns – for example, month, year, date, and so on. Or, where possible, use rounding on high-precision fields to lower cardinality – (for example, 13.29889 -> 13.3).
  • Use integers instead of strings, where possible.
  • Be wary of DAX functions, which need to test every row in a table – for example, RANKX – in the worst case, these functions can exponentially increase run-time and memory requirements given linear increases in table size.
  • When connecting to data sources via DirectQuery, consider indexing columns that are commonly filtered or sliced again. Indexing greatly improves report responsiveness.  

 

For second question, please increase the timeout value in connector function, such as following:

 

MySQL.Database(server, database, [CommandTimeout = #duration(0,2,0,0)])


we may also need to increase the timeout value in data source.


Best regards,

 

Community Support Team _ Dong Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

2 REPLIES 2
v-lid-msft
Community Support
Community Support

Hi @Anonymous ,

 

We can use the Increasement Refresh to reduce the refresh time if the dataset is under premium capacity. Or we can following those tips to reduce the size of dataset or optimize the model of dataset based on this document, some tips may not reduce the time of refresh.

 

  • Remove unused tables or columns, where possible. 
  • Avoid distinct counts on fields with high cardinality – that is, millions of distinct values.  
  • Take steps to avoid fields with unnecessary precision and high cardinality. For example, you could split highly unique datetime values into separate columns – for example, month, year, date, and so on. Or, where possible, use rounding on high-precision fields to lower cardinality – (for example, 13.29889 -> 13.3).
  • Use integers instead of strings, where possible.
  • Be wary of DAX functions, which need to test every row in a table – for example, RANKX – in the worst case, these functions can exponentially increase run-time and memory requirements given linear increases in table size.
  • When connecting to data sources via DirectQuery, consider indexing columns that are commonly filtered or sliced again. Indexing greatly improves report responsiveness.  

 

For second question, please increase the timeout value in connector function, such as following:

 

MySQL.Database(server, database, [CommandTimeout = #duration(0,2,0,0)])


we may also need to increase the timeout value in data source.


Best regards,

 

Community Support Team _ Dong Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

@Anonymous The quick things you can do are ensure that only the data you are using is being loaded, remove the rest. Use the PowerQuery Analyzer to tune the ingestion if possible.

Another alternative is to use Analysis Services as a stand alone model. I love working in Power BI more, but sometimes if you have to scale up you need to jump over and use that with live connection to offload the long processing times.


Looking for more Power BI tips, tricks & tools? Check out PowerBI.tips the site I co-own with Mike Carlo. Also, if you are near SE WI? Join our PUG Milwaukee Brew City PUG

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors