Re: Trying to find out most efficient way to COUNT...

sc_gus · ‎03-07-2024

I want to calculate number of records on 'Data Dump' table where [Stage] column contains the value "Enquiry". I would also like to understand the difference in execution of these formulas.

I have used the below DAX formulas to evaluate - which formula is more efficient?

=SUMX('Data Dump',MIN(SEARCH("Enquiry",'Data Dump'[Stage],,0),1))

=CALCULATE(COUNTROWS('Data Dump'),SEARCH("Enquiry",'Data Dump'[Stage],,0)>0)

=CALCULATE(COUNT('Data Dump'[Stage]),SEARCH("Enquiry",'Data Dump'[Stage],,0)>0)

=CALCULATE(COUNTROWS(FILTER('Data Dump',SEARCH("Enquiry",'Data Dump'[Stage],,0)>0)))

Also, would it be more efficient if I simply add formula columns to my data source table with 1 and 0 for different criteria (eg. value is "Enquiry", "Pipeline", "Converted", etc) and then sum it up to get total count?

lbendlin · ‎03-07-2024

I would also like to understand the difference in execution of these formulas.

Familiarize yourself with DAX Studio - it has extensive capabilities to examine query plans and server timings. It will be able to show you the differences.

sc_gus · ‎03-07-2024

I am just starting out with DAX, will give DAXStudio a go. But how do I compare performance against creating and summing up formula fields?

lbendlin · ‎03-07-2024

You run each of the measure queries in DAX Studio and compare their query plans. Then you pick the version that produces the most optimal one - for that particular situation.

sc_gus · ‎03-08-2024

I meant that in the 'Data_Dump' table itself I might create columns with 1 and 0 which satisfy my conditions, and simply so a SUM on the column in the Pivot Table. How do I compare this approach with the DAX formula in terms of performance?

As for the measures, I got the below result on DAX Studio. The performance looks similar for each of the formulas. I cannot figure out the meaning of CPU and Parallelism metrics - can you shed some light on it?

v-jianpeng-msft · ‎03-20-2024

Hi, @sc_gus

See that @lbendlin has replied to you if your problem has been solved? lbendlin said it greatly, you can choose from the overall running time. Here's a little addition :

Metric	Description
Total	This is the total query duration in milliseconds – taken from the Query End profiler event. This is the total time the server took to process the query. (so it will exclude any time the client took to process the result set)
SE CPU	This is the amount of CPU time that was spend on Storage Engine queries (note that this figure may not be 100% reliable, so don’t place a high amount of importance on it) the blue ratio under SE CPU is the factor of SE CPU over SE and is a very rough indicator of the average parallel operations that the SE was running. This figure is calculated by adding up the CPU duration from the Storage Engine events
FE	This is the amount of time spent in the Formula Engine, calculated by Subtracting the SE duration from the Total. The blue figure underneath is the percentage of FE / Total
SE	This is the amount of time spend in the Storage Engine, calculated by adding up the duration of all the Storage Engine queries. The blue figure underneath is the percentage of SE / Total. As a rough rule of thumb you want to try to get your queries to spend more time in the Storage Engine as it is multi-threaded so can do more operations in parallel. While the Formula Engine is single threaded and cannot make use of multiple CPU cores. Note that you cannot have a query that is 100% handled in the storage engine as the FE sits over the top of the SE. The FE is what issues the requests to the SE and it also serializes the result set before it is sent back to the client
SE Queries	this is the number of Storage Engine queries that were performed during the processing of the query
SE Cache	this is the number of Storage Engine cache hits

The table description is from the official DAX Studio documentation:

Query Plan Tracing | DAX Studio

You can choose the DAX expression that suits you based on the metrics you tested.

How to Get Your Question Answered Quickly

If it does not help, please provide more details with your desired output and pbix file without privacy information (or some sample data) .

Best Regards

Jianpeng Li

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

sc_gus · ‎03-20-2024

I have gone through this documentation, but could not find anything on the CPU and Parallelism columns. Would be helpful if you can guide me on these as well.

v-jianpeng-msft · ‎03-21-2024

Hi, @sc_gus

In DAX Studio, the CPU and Parallelism metrics play a crucial role in understanding the performance of your DAX queries and evaluating the efficiency of storage engine operations. You can refer to the following insights

1.CPU (Central Processing Unit):

The CPU metric represents the amount of CPU time spent on executing storage engine queries. It provides insights into the computational workload required by the storage engine during query execution.

Keep in mind that the accuracy of this figure may not be perfect, so it's essential to interpret it with caution. Additionally, the blue ratio under SE CPU (Storage Engine CPU) serves as a rough indicator of the average parallel operations that the storage engine was running. This ratio helps you understand how efficiently the storage engine utilizes available CPU resources.

2.Parallelism Metrics:

Parallelism refers to the ability of a system to perform multiple tasks simultaneously. In the context of DAX queries, it relates to how well the storage engine leverages multiple CPU cores for parallel execution.

DAX Studio provides several parallelism-related metrics:

Parallelism Indicator (Par.):

1.The Par. column in the Server Timings pane shows the ratio between CPU and Duration for each operation in the storage engine.

2.This ratio is not computed when either Duration or CPU is 0.

Batch Operations:

1.A batch operation refers to multiple storage engine operations executed within a single batch request made by the formula engine.

2.In complex models, understanding batch operations is crucial for evaluating materialization costs.

3.The Batch subclass is now visible by default in DAX Studio 2.17.

4.The Duration and CPU for the batch operation are displayed as the difference between the values reported by profiler events and the sum of internal Scan events for that batch.

5.The materialization produced by the batch to the formula engine usually corresponds to the last Scan operation belonging to the batch.

6.Other Scan events within the same batch are used only by the storage engine and are not copied to the formula engine.

7.By considering the Batch CPU and Duration costs, the calculation of the Formula Engine cost becomes more accurate in complex models.

These insights come from the following documents:
Server Timing Trace | DAX Studio

Analyzing the parallelism of storage engine operations in DAX Studio - SQLBI

Model Metrics | DAX Studio

How to Get Your Question Answered Quickly

If it does not help, please provide more details with your desired output and pbix file without privacy information (or some sample data) .

Best Regards

Jianpeng Li

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

lbendlin · ‎03-08-2024

You spend most of your time in the storage engine - good.

There are CallBackDataIDs - bad (the storage engine still needs to ask the formula engine for help)

Compare the overall timing across the queries for five runs each, then make your pick.

Trying to find out most efficient way to COUNTIF in DAX to calculate a Measure in Pivot Table

Helpful resources

New forum boards available in Real-Time Intelligence.

Fabric certifications survey

Jumpstart your career with the Fabric Career Hub

Trying to find out most efficient way to COUNTIF in DAX to calculate a Measure in Pivot Table

Helpful resources

New forum boards available in Real-Time Intelligence.

Fabric certifications survey