Solved: Need help with a performance issue - optimizing QP...

Sparks · ‎05-01-2021

Hi,

I have a fairly simple semantic model hosted on Azure Analysis Service (S2 tier). The volume in each of the table is quite high (millions). There is also a Power BI report hosted in the Power BI Service workspace, connecting to this semantic model in AAS. When the report is opened, there is always a sudden surge in the QPU (AAS maxxed out to 200). This report is embedded and when there are many users hitting the same report, the QPU is maxed out for a long duration as well as the Query Pool Busy Threads.

After analyzing the report through profiler traces, performance analyzer in Power BI Desktop, DAX studio etc, i figured it is a DAX query generated from one perticular visual in the report that causes the surge in the QPU.

I have created a trimmed sample version of the Model (removed other dimensions and facts) and the Power BI report using import mode for the purpose of this forum to better explain the problem. The sample version of the report is shared here.

Here is the model diagram that contains two dimensions (JobPostingAds, SkillListing) and a fact table (JobPostingSkillDemand). The JobPostingAds table nearly contains 10 Million records at any point in the production with Job Status (Open or Closed). The SkillListing nearly contains 10K distinct skills. The JobPostingSkillDemand fact table that contains the list of skills demanded for a perticular Job Advertisment and it contains nearly 10-15 times the volume of JobPostingAds.

I have created a measure to show the demand of skill currently in the market, and this measure is used against the SkillName in the Clustered Bar Chart visual.

Postings Seeking Required Skill =
VAR TotalCount =
CALCULATE ( DISTINCTCOUNT ( JobPostingSkillDemand[JobId] ) )
RETURN
IF ( ISBLANK ( TotalCount ), 0, TotalCount )

Analyzing the generated DAX query in DAX Studio, we can see there is high CPU usage.

// DAX Query
DEFINE
  VAR __DS0FilterTable = 
    TREATAS({"Open",
      "Close"}, 'JobPostingAds'[JobStatus])

  VAR __DS0Core = 
    SUMMARIZECOLUMNS(
      'JobPostingSkillDemand'[SkillName],
      __DS0FilterTable,
      "Postings_Seeking_Required_Skill", 'JobPostingSkillDemand'[Postings Seeking Required Skill]
    )

  VAR __DS0PrimaryWindowed = 
    TOPN(
      1001,
      __DS0Core,
      [Postings_Seeking_Required_Skill],
      0,
      'JobPostingSkillDemand'[SkillName],
      1
    )

EVALUATE
  __DS0PrimaryWindowed
ORDER BY
  [Postings_Seeking_Required_Skill] DESC, 'JobPostingSkillDemand'[SkillName]

Is there any way we can optimize this? Please help.

Thanks!

HotChilli · ‎05-02-2021

Here are my thoughts (from the supplied sample pbix - so may not carry through to the real thing but hey-ho, there we go):

Do we need the filter to check 'open' or 'closed' status ? Seems redundant.

Do we need SkillName in the JobPostingSkillName table? Seems redundant.

Does the measure need to use DISTINCT? We can already see that the CALCULATE is redundant but why are we checking for duplicate 'JobId' s for the same skill? Even if there are (which would probably be a data quality issue) is it significant in terms of the count? If not, then the measure is a simpler COUNT (and you can replace the check for blank with '+0' too).

In business terms, is it important to see 12,855 records in the visual. No-one is going to scroll through them. What is it important to see here? Top 50? Maybe putting some thought into that would result in rewrite of the measure.

---

Also, in terms of the model, not this particular visual, the jobPostingAds is being used as a dimension table(even though it's really another Fact table) and it's got 10 million records so that's going to cause it's own performance problems.

----

Let me know what you think.

View solution in original post

HotChilli · ‎05-02-2021

Here are my thoughts (from the supplied sample pbix - so may not carry through to the real thing but hey-ho, there we go):

Do we need the filter to check 'open' or 'closed' status ? Seems redundant.

Do we need SkillName in the JobPostingSkillName table? Seems redundant.

Does the measure need to use DISTINCT? We can already see that the CALCULATE is redundant but why are we checking for duplicate 'JobId' s for the same skill? Even if there are (which would probably be a data quality issue) is it significant in terms of the count? If not, then the measure is a simpler COUNT (and you can replace the check for blank with '+0' too).

In business terms, is it important to see 12,855 records in the visual. No-one is going to scroll through them. What is it important to see here? Top 50? Maybe putting some thought into that would result in rewrite of the measure.

---

Also, in terms of the model, not this particular visual, the jobPostingAds is being used as a dimension table(even though it's really another Fact table) and it's got 10 million records so that's going to cause it's own performance problems.

----

Let me know what you think.

Sparks · ‎05-03-2021

@HotChilli , Thank you so much for opening my eyes 😄

You are right, I don't need a Distinct Count as there won't be a duplicate JobId for a give skill. Changing this to just COUNT improved the performance drastically, I mean seriously drastically. The CPU cycles reduced from 35K to 1K and Duration from 10+ seconds to few ms.

Tons and Tons of thanks!!

To answer your other questions:

Regarding the status, the filter will always be on 'Open' only in this visual.

SkillName is not redundant in the actual model. I was trying out few changes to see if that helps.

Filtering TOP 50 or 100 in the visual causes an extra SE query with its own CPU cycles. Without TOP 50 or 100 renders the visual slower than without it 😞

And finally, JobPostingAds is really another Fact Table and business wants it that way for few reasons 😞 I couldn't do much there.

selimovd · ‎05-01-2021

Hey @Sparks ,

in general DISTINCTCOUNT is an expensive operation.

I don't really understand why you use a CALCULATE in this case as from my point of view it doesn't make sense. But anyway I don't think this is the bottleneck.

In some cases the combination SUMX and DISTINCT is faster, maybe give it a try:

SUMX ( DISTINCT ( JobPostingSkillDemand[JobId] ), 1 )

If you need any help please let me know.

If I answered your question I would be happy if you could mark my post as a solution ✔️ and give it a thumbs up 👍

Best regards

Denis

Blog: WhatTheFact.bi

Follow me: twitter.com/DenSelimovic

Sparks · ‎05-01-2021

Hi @selimovd ,

SUMX is causing more rows to be read and high CPU compared to the DISTINCTCOUNT.

Here is the DAX query generated:

// DAX Query --SUMX ( DISTINCT ( JobPostingSkillDemand[JobId] ), 1 )
DEFINE
  VAR __DS0FilterTable = 
    TREATAS({"Open",
      "Close"}, 'JobPostingAds'[JobStatus])

  VAR __DS0Core = 
    SUMMARIZECOLUMNS(
      'JobPostingSkillDemand'[SkillName],
      __DS0FilterTable,
      "Measure", 'JobPostingSkillDemand'[Measure]
    )

  VAR __DS0PrimaryWindowed = 
    TOPN(1001, __DS0Core, [Measure], 0, 'JobPostingSkillDemand'[SkillName], 1)

EVALUATE
  __DS0PrimaryWindowed
ORDER BY
  [Measure] DESC, 'JobPostingSkillDemand'[SkillName]

Any other solution?

Regards

@Sparks