Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
First off, this isn't strictly a Desktop question. But, I don't know where else to ask this question. So, you'll have to forgive me if this isn't the right forum.
I have PBI developers spread across the country (USA). Most of the data that we load into PBI comes from a single location in the country (i.e., I don't have data centers distributed across the country). Most of our PBI reports have lots of complicated data load transformations, but the data volumes aren't gigantic (~2M rows for our largest table). We have been finding that, in developer work in Desktop (not report refreshes in the Service):
1) when the transformations are written in M (instead of directly in the data source language), the data loads are really slow (sometimes taking about 45 minutes)
2) developers that are physically near our data center experience much faster data loads than those that are further away (exacerbated by the fact that we're all working from home now because of the virus, i.e., we're using VPN)
To resolve issue #1, we have implemented a design standard to write the transformations in the data source language, not in M / Power Query. To resolve issue #2, we have begun standing up machines near our data center for developers who are far away from the data center to RDP into.
I understand that our solution for issue #1 isn't recommended in general (e.g., because it prevents all subsequent query-folding, should we ever want to do M transformations after the source step). And, I understand that data load performance in Desktop may be very different than in the Service. (For example, a SQL query in Desktop runs twice, whereas it only runs once in the Service.) However, I'm very uncomfortable with our design standard for issue #1. Is slowness of data load in Desktop for complex M transformations typical? If not, what is the best solution for this?
Also, is it typical for companies to stand up machines near data centers for PBI developers to RDP into in order to overcome Desktop load performance challenges? If not, how is this problem solved?
@Anonymous Well, putting in Views in your source data system is the recommended fastest way to load data. It is when you put the SQL code into a query statement that can cause problems. @edhans has specific opinions on this if memory serves. If the M query transformations can fold into the database that is the 2nd best option beyond putting the views in the database itself.
I will consider moving the heavy transformation logic into SQL Server as views.
How about for my second question? To overcome data speed challenges, is it typical to put "developer" machines near data centers and have PBI developers RDP into them for PBI development? If so, are there any gotcha's or best practices? For example, should I assign 1 machine per developer (since PBI saves credentials into the Windows profile)?
@Anonymous Yeah, distance creates latency, no question if you could sit your developers on top of the servers that would be optimal! :D. I don't see why developers couldn't share jump boxes, that's pretty common but I'm sure they would be much happier not having to share, I mean, they are developers after all!!
User | Count |
---|---|
128 | |
109 | |
100 | |
65 | |
62 |
User | Count |
---|---|
137 | |
115 | |
102 | |
71 | |
61 |