Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
mmilegal
Frequent Visitor

Desktop refresh vs gateway refresh on service

Hi 

 

I feel sure my question must have already been asked and answered but I’ve been unable to find a search term that finds an answer that matches my experience. 

 

I’ve tried “desktop refresh vs gateway refresh” and “scheduled refresh vs manual refresh” and both find many discussions about refresh issues but the underlying issue is never quite what I’m looking for.  The ‘Service’ forum seems like it might be a suitable place to post.  Here’s a description of my scenario that hopefully has sufficient detail. 

  • Dataset draws data exclusively from SQL 2012 database living on an AWS cloud server
  • No native queries are used in the Get Data phase
  • There is extensive processing of the data in the Transform Data phase using Power Query, including:
    • Several merge query operations
    • One pivoted table, duplicated and filtered with each version filtered in a mutually exclusive way to allow different post-filter data type formatting (Example 1)
    • Use of 3 custom functions, created from imported SQL UDFs for the data type formatting
    • One imported table that is duplicated to several variants each filtered and processed using DAX – the peculiarity of that one particular source table allows for different data types in the same column across the whole table.   (Example 2)
  • Total size of pbix after successful refresh of the data in the Desktop version = 171Mb. 
  • Time to refresh entire dataset in single refresh operation in Desktop = 16 minutes.  16 minutes to refresh the entire database ready to use the full reporting power of PowerBI is fine.  That allows for daily refresh and even intra-day refreshes. 
  • 2 hours after publishing to Service (upon which a refresh begins) and the refresh shows no signs of completion.  I’ve monitored Task Manager on the server and it has been pinned at 99% CPU throughout.  Heavy consumer processes are ‘Microsoft Mashup Evaluation Container’.  The refresh eventually timed out. 
  • Dataset is configured to refresh in the Service via an On-Premises Gateway installed on the same AWS server as SQL2012.

Example 1

Source Format has 3 columns:

CaseID (integer), FieldID (integer), Value (text)

There is a merge to another table on FieldID that expands out to give Field Type

Pivot turns data into:

1 row per case, one column for each field containing the field value. 

Pivoted data is then duplicated, filtered by type (text, date and number) and the imported custom functions used to format the values. 

Example 2

Source data looks something like this:

GroupTableExample

 

 

 

 

 

GroupID

ID

InstanceID

Column01

Column02

Column03

1

1

1001

"20200101"

"[Hello]"

"123"

1

2

1009

"20211212"

"[World]"

"456"

2

3

1210

"[Lorem]"

"20210101"

"789"

2

4

1223

"[Ipsum]"

"20210131

"100"

 

Examples of the DAX to give formatted data for ‘table type 1’ and ‘table type 2’ based on GroupID are:

Filter1 = SUMMARIZE( FILTER(GroupTableExample,GroupTableExample[GroupID]=1),GroupTableExample[ID],

"Caseid",max(GroupTableExample[InstanceID]),

"DateColumn",if(len(max(CustomFieldGroupTable[Column01]))= 10 && LEFT(max(CustomFieldGroupTable[Column01]),1) = """" && RIGHT(max(CustomFieldGroupTable[Column01]),1) = """",DATE(LEFT(SUBSTITUTE(max(CustomFieldGroupTable[Column01]),"""",""),4),MID(SUBSTITUTE(max(CustomFieldGroupTable[Column01]),"""",""),5,2),RIGHT(SUBSTITUTE(max(CustomFieldGroupTable[Column01]),"""",""),2)),BLANK()),

"TextColumn",SUBSTITUTE(SUBSTITUTE(max(GroupTableExample[Column02]),"[""",""),"""]",""),

"NumberColumn",SUBSTITUTE(max(GroupTableExample[Column03]),"""","")

)

Filter2 = SUMMARIZE( FILTER(GroupTableExample,GroupTableExample[GroupID]=2),GroupTableExample[ID],

"Caseid",max(GroupTableExample[InstanceID]),

"TextColumn",SUBSTITUTE(SUBSTITUTE(max(GroupTableExample[Column01]),"[""",""),"""]",""),

"DateColumn",if(len(max(CustomFieldGroupTable[Column02]))= 10 && LEFT(max(CustomFieldGroupTable[Column02]),1) = """" && RIGHT(max(CustomFieldGroupTable[Column02]),1) = """",DATE(LEFT(SUBSTITUTE(max(CustomFieldGroupTable[Column02]),"""",""),4),MID(SUBSTITUTE(max(CustomFieldGroupTable[Column02]),"""",""),5,2),RIGHT(SUBSTITUTE(max(CustomFieldGroupTable[Column02]),"""",""),2)),BLANK()),

"NumberColumn",SUBSTITUTE(max(GroupTableExample[Column03]),"""","")

)

These work in their real-life form - if I've made any blunders while manually amending them into generic and anonymous examples please disregard.  

I’ve used all of these techniques in some form in other datasets that are published to the Service and for which scheduled refresh is configured to use the On-Premises Gateway and all refresh within tolerances of the desktop refresh time.  They all used native queries at the GetData stage and did reduce the size of the dataset in the pbix to approximately 40Mb.  However, my requirement now is to deliver the entire database for use in a single dataset and it seems native queries are to be avoided in general terms anyway. 

 

I recreated this dataset from the ground up to exclude native queries because I had concerns about the processing load on the SQL server.  However, if the heavy processing on the server is ‘Microsoft Mashup Evaluation Container’, does that mean that the bottleneck with this approach is the processing power of PowerBI and/or the Gateway?

 

What might I need to change about my approach to be able to deliver a dataset that can be successfully refreshed on a schedule?

 

Apologies if this duplicates any other question - I've looked hard at related questions in the hope of working out the possible issues for myself even if no other single post perfectly reflects my issue but there's so much going on and my knowledge is sufficiently lacking to be able to connect all the dots.  I would greatly appreciate any help from the community and would be happy to contribute if this chimes with any other user having similar difficulties.  

 

Kind regards

mmilegal

8 REPLIES 8
mmilegal
Frequent Visitor

Hi @aj1973 

 

Many thanks for your suggestion - I appreciate your help.  My use of DAX for the transformation of some of the data was expedient for two reasons.  I had used that successfully in the past and it was quicker to be able to text-edit a formula to drop into 'create table' commands than to go through the process of adding transformation steps in Power Query. 

 

However, expedient isn't always best! 

 

As the removal of DAX seemed to help in the instance on that other thread I deleted all my tables created in DAX.  As a test and before trying to recreate the transformations in Power Query, I saved and published the dataset just with all DAX removed.  Once again, the desktop refresh went fine but the refresh in the service hogged resources and timed out.  

 

I cannot easily regress the Gateway version and I cannot change the SQL Server version.  I have checked our Gateway and it is not the current version.  I have considered updating the Gateway to the current version but it is demanding a .NET Framework version update.  That's well above me, so I've referred that on.  

 

I'm keeping an open mind.  I'm far from expert with Power BI, but I'm inclined to think if the refresh in Desktop can be successful and relatively quick that I can't have made too much of a mess of my methods of importing and transforming data.  Our Gateway version is 3000.66.4 (November 2020 Release 1) so we're not far out of date on that.  Whether updating it to 3000.77.7 (which seems to be the current version) will be significant, I don't know, but it seems worth trying.  I'll update the thread with actions/outcomes as they unfold.  

 

Cheers

 

mmilegal

aj1973
Community Champion
Community Champion

Hi @mmilegal 

it seems like your model is facing a version compatibilty issue. Please try to use Dataflow as connection mode and then re build a Sample report out of 2 tables(for example), then publish it. Logically when you try to connect to the SQL server through Dataflow, the Gateway is needed and a refresh is asked for. If it works then you know what to do.

Please let me know.

 

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

Hi @aj1973 

I've not used dataflows until now.  I'm trying to study up on that now.  I know I'll be limited to some degree because I understand that merging tables in the dataflow requires a premium workspace, which we do not have.  
Progress might be a little slow from here but I will work on your suggestion.  Please forgive me if it takes a few days for me to report back on what I've tried and how it turned out.  
Cheers

mmilegal

aj1973
Community Champion
Community Champion

Hi @mmilegal 

No, you don't need premium capacity to use Dataflows. Premium capacity is for AI and ML capabilities.

To make it simple for you to understand, The Dataflow uses the Gateway to connect to your SQL server 2012, if it works then good. After you cennect to your source you will be using Power Query Online Platforme ( just like in your desktop) to ETL your data. After you prepare your data and refresh it  you can then call it in your desktop and build the report. after that you just publish it and you all be good to go.

Let me know for more details

 

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

Hi @aj1973 

I'm embarrassed to keep missing the point - I suspect I am a bit like a kid who belongs in the beginner class who is trying to ask sensible questions in the advanced class!  That's probably fairly accurate, I'm afraid!  I've read a fair bit of content about dataflow vs dataset and watched a number of the ' guy in a cube' videos but I've ended up feeling like my only chance to grasp it was to try some practical implementation that relates to my real-world requirements.  Unfortunately, I think my 'in at the deep end' approach has, this time, been more sink than swim.  

Experiment 1
I tried a test dataflow with two of the tables from my old dataset.  To try to test some of what I knew I'd need, I included the tables used in Example1 of my original post but decided to try only one of the three pivots.  When I tried to edit the dataflow (in what I assume is the Power Query Online Platform - it looks near-identical to Power Query in Desktop) to merge the 2 tables, I got the message 'Computed tables require Premium to refresh. To enable refresh, upgrade this workspace to Premium capacity, or remove this table.'  

Experiment 2
At that point, I considered that I was only supposed to use the Dataflow to pull the source data so I saved the Dataflow without any transformation steps then opened a fresh pbix in the desktop and used the dataflows connector in the Get Data dialog in the desktop.  In Desktop, I recreated the steps to merge, filter and pivot the data.  When I applied the changes, the refresh began and was obviously going to take a while.  I timed the refresh as processing around 950 rows per minute.  The data being processed measures rows in the millions so I abandoned that. 

Experiment 3
My last throw of the dice was to replicate the merge/pivot steps in the online editor of the dataflow then create a new pbix in the Desktop and refresh the data in the Desktop.  I kind of suspected that was a waste of time and I wasn't terribly surprised to see that the structure imported matched the output of the pivot that I had saved in the Dataflow but there was no data, presumably because the Dataflow was not refreshed for the reason above.  

Do those three experiments make sense?

Cheers

mmilegal

aj1973
Community Champion
Community Champion

It all makes sense, I like the way you described your 3 experiments.

From the first experiment I could tell that you don't have permission to use or store the dataflow in Azure Data Lake Storage https://docs.microsoft.com/en-us/power-query/dataflows/configuring-storage-and-compute-options-for-a.... That's why the dataflow didn't refresh at first. Bad news good news from your experiment is that your data source (SQL 2012 server) is accessible through your Gateway, that was the aim of all this experiment wasn't it?

 

So what is left for you to do is ask your Azure admin the permission to use the Data lake in order to save your dataflow. Once you have it then right after you build your dataflow in Power Query online and save the changes the service will ask you to refresh the dataflow, if it goes through then you will be good to go.

 

Let me know

   

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

Hi @aj1973 

I think I've managed to achieve the result but in a slightly different way - I might have dodged the issue in a manner that almost feels like cheating!

The bit about access to ADLS has kind of thrown me as I checked with the one person in our organisation likely to know about that and he doesn't believe we have a dedicated subscription for that.  

However, it had occurred to me that everything else was working according to your suggestions and I could make the dataflow work so long as I didn't try to use merged or computed tables in the Power Query Online Platform.  All the other fancy stuff like the pivots worked just fine.  My 'cheat' solution - and I appreciate I'm fortunate that it was a solution available to me and not every user will be so lucky - is based on the fact that it was possible for me to use the SQL database from which I am pulling all my data.  It was a very quick and simple process to create  views in SQL to perform the limited actions that I might otherwise have done with a merge in PQOP.  So, in case it helps anyone else with a similar challenge, the structure of my solution is something like this:

1. create views in SQL Server with the smallest amount of joins possible to allow filtering for further processing.  That really was just limited to examples like pulling in a 'type' column from table B to apply to rows in table A to allow me to filter table A by type.  

2. create a dataflow in a workspace to pull the data from SQL Server tables and views as per step 1. The most significant processing in PQOP is the pivots described as Example1 in my original post.  

3. create a dataset in Desktop and use the dataflow connector in the Get Data dialog to pull data for the dataset from the dataflow created in step 2.  

4. the most significant processing in the dataset is the DAX SUMMARIZE/FILTER processing outlined as Example2 in my original post.  

I think I now have the entirety of the data required built into a single dataset that should be able to support the reporting requirements.  The dataflow has refreshed in the cloud several times and, although it did fall over once, it seems to be refreshing in about 22 minutes.  I've published the dataset and, scheduling a refresh for one hour after the scheduled refresh time of the dataflow, it refreshed last night in around 16 minutes.  Size of the pbix is 199Mb.  Not sure if all of those statistics necessarily are too meaningful without lots of other info not included here, but they are at least indicative of one user's experience.  

I suspect much of my solution still bears the hallmarks of the enthusiastic amateur and is likely to fall short of best practice in some areas but I'm sure that will be the reality for many using Power BI in small organisations.  

A very big thank you @aj1973 - I know for certain I would not have worked out all of this on my own and I'm sure I've had a learning experience on the journey too.  I very much appreciate both the knowledge you possess and your generosity with your time in sharing it.  

Cheers

mmilegal

 

 

aj1973
Community Champion
Community Champion

Hi @mmilegal 

I think your issue is close to this one since your SQL server is 2012 version.

https://community.powerbi.com/t5/Service/Data-Gateway-update-Causes-SQL-Server-error-An-expression/m...

 

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors