cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
Frequent Visitor

Refresh large datasets on Power BI service

Hi,

 

I have data imported from bigquery using the SIMBA ODBC driver. My organisation has power BI premium capcity. There are billions of rows in the views as the data is in long format. 

 

Data structure preview: There are around 50,000 SKUs tracked across 40 regions on a daily basis. This is around 2million rows with dates as column headers. Since this structure is not supported for visualisation, data is unpivoted for visualisation purpose. This makes the number of rows go upto 300millions for 5months data. I'm now concerned about the increasing 2million rows daily and the dashboard is expected to be in place for at least an year. 

 

Here comes the major issue. I wish to refresh the data on a daily basis and below are the few roadblocks:

  • Refresh on service fails because of timeout issues after 2 hours. Couldn't increase command timeout as it is not being supported by bigquery import
  • Direct query is very very slow
  • Data is present from 1st of Jan 2020 and incremental refresh is not preferred as the data has to be updated from the day of start
  • Have tried dataflow as well. It takes around 6 hrs for data refresh with no manipulations except appending tables in Power BI

 

Could someone please help me in scheduling a successful daily refresh for this huge dataset? Any suggestions would be greatly appreciated

 

Regards,

Swetha

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
Community Support
Community Support

Hi @swethabonthu ,

 

You may following those tips to reduce the size of dataset or optimize the model of dataset based on this document, some tips may not reduce the time of refresh.

 

  • Remove unused tables or columns, where possible. 
  • Avoid distinct counts on fields with high cardinality – that is, millions of distinct values.  
  • Take steps to avoid fields with unnecessary precision and high cardinality. For example, you could split highly unique datetime values into separate columns – for example, month, year, date, and so on. Or, where possible, use rounding on high-precision fields to lower cardinality – (for example, 13.29889 -> 13.3).
  • Use integers instead of strings, where possible.
  • Be wary of DAX functions, which need to test every row in a table – for example, RANKX – in the worst case, these functions can exponentially increase run-time and memory requirements given linear increases in table size.
  • When connecting to data sources via DirectQuery, consider indexing columns that are commonly filtered or sliced again. Indexing greatly improves report responsiveness.  

Or you can increase the timeout value in connector function.

 

Best Regards,

Jay

 

Community Support Team _ Jay Wang

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Community Support Team _ Jay
If this post helps, then please consider Accept it as the solution to help the other members find it.

View solution in original post

6 REPLIES 6
Highlighted
Super User II
Super User II

Please explain the reasoning for not using incremental refresh in a bit more detail. Do your data rows come with a "last modified date" tag?

Highlighted

Data imported from bigquery uses complex machine learning techniques and the historic data is not fixed. Also, I don't have "last modified date" tag

Highlighted
Community Support
Community Support

Hi @swethabonthu ,

 

You may following those tips to reduce the size of dataset or optimize the model of dataset based on this document, some tips may not reduce the time of refresh.

 

  • Remove unused tables or columns, where possible. 
  • Avoid distinct counts on fields with high cardinality – that is, millions of distinct values.  
  • Take steps to avoid fields with unnecessary precision and high cardinality. For example, you could split highly unique datetime values into separate columns – for example, month, year, date, and so on. Or, where possible, use rounding on high-precision fields to lower cardinality – (for example, 13.29889 -> 13.3).
  • Use integers instead of strings, where possible.
  • Be wary of DAX functions, which need to test every row in a table – for example, RANKX – in the worst case, these functions can exponentially increase run-time and memory requirements given linear increases in table size.
  • When connecting to data sources via DirectQuery, consider indexing columns that are commonly filtered or sliced again. Indexing greatly improves report responsiveness.  

Or you can increase the timeout value in connector function.

 

Best Regards,

Jay

 

Community Support Team _ Jay Wang

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Community Support Team _ Jay
If this post helps, then please consider Accept it as the solution to help the other members find it.

View solution in original post

Highlighted
Helper I
Helper I

Hi @swethabonthu 

 

You can use PBI Spy (www.pbispy.com) to quickly identify columns that are not being used.

Reach out to me if you need any assistance doing this.

Highlighted

OMG! I wish I would have known about this before!!

I've been given a report with 5 tables, each one with about 350 columns, I was going crazy trying to figure out what data was really used.

 

THANK YOU!

Highlighted

Glad to be of assistance. This is the first real public release of PBI Spy. If anything does not work or is unclear please don't hesitate to contact me. I'm trying to make it as good as it can be, but it's a journey.

Helpful resources

Announcements
Community Conference

Power Platform Community Conference

Check out the on demand sessions that are available now!

Community Conference

Microsoft Power Platform Communities

Check out the Winners!

secondImage

Create an end-to-end data and analytics solution

Learn how Power BI works with the latest Azure data and analytics innovations at the digital event with Microsoft CEO Satya Nadella.

Top Solution Authors
Top Kudoed Authors