Solved: Re: Show the category with MAX AVERAGE between cat...

mateusluzzi · ‎11-11-2020

Hello, folks, how are you?

I'm a student, new to PBI, but i'm learning how to work with it because i'll need soon.

I am having problems on my very first dashboard. Let me explain (4 key info):

Table 1: I have a table that show all leads from a company (I'll refer as my company) and where they came from (category), and the date (first time) when they contacted us.

Table 2: Then I have another table that shows those leads that became costumers, and the date they became costumers (second time, selling date).

Categories: There are 9 categories that I want to measure, such as: direct traffic, organic search, paid search, social, etc...

How long to convert: basically a datediff between first contact and last contact

I created a row that counts how much days it passed from the first contact to the last contact (the day they bought our product), so that I can track how long its taking to convert the prospect by each channel (direct traffic, organic search, paid search...)

It may be confusing, but let's go...:

Now I want to show the slowest/fastest channel (direct traffic, organic search, paid search...)

The logical sequence that Ive came up but I don't know how to apply:

✔️calculate for each new costumer how long it took to convert it (that row that I created before)
❌somehow take the average separated by category (direct traffic, organic search, paid search...)
❌somehow take the max from those values that returned from the average
1. ❌apply the max value into a card
2. ❌somehow apply the category name that returned max into another card
❌The same for the min value

Is this possible? If so, can somebody help me?

In advice, thank you very much!

daxer-almighty · ‎11-13-2020

Hi @mateusluzzi

I've written 4 measures that give you what you want. I mean I have checked them and they behave reasonably. The tables are connected via 1-1 and the table 'Closed Deals' must be hidden as this is a fact table and it should not be selected from. The other will be visible as it's a dimension. Only on this assumption you're guaranteed correct results.

Highest Avg Gap = 
MAXX(
    DISTINCT( 'Marketing Qualified Leads'[origin] ),
    CALCULATE(
        AVERAGEX(
            'Closed Deals',
            'Closed Deals'[won_date]
                - RELATED( 'Marketing Qualified Leads'[first_contact_date] )
        )
    )
)

Highest Avg Gap gives you the biggest of the average gaps across all origins visible in the current context.

Highest Origin = 
MAXX(
    TOPN(1,
        // Need only those origins that have
        // at least one entry in Closed Deals.
        SUMMARIZE(
            'Closed Deals',
            'Marketing Qualified Leads'[origin]
        ),
        [Highest Avg Gap],
        DESC
    ),
    'Marketing Qualified Leads'[origin]
)

Highest Origin returns the origin with the highest average gap. If there are many origins with the same average gap, it returns the one that is alphabetically last. If you need to return a string with all origins that have the same average gap, you'll have to adjust the code by using CONCATENATEX somewhere and changing the logic a bit. You could as well create a simple measure that would return the number of origins with the same highest average gap. Same is true for the next measure that deals with the Lowest Origin.

Lowest Avg Gap = 
MINX(
    DISTINCT( 'Marketing Qualified Leads'[origin] ),
    CALCULATE(
        AVERAGEX(
            'Closed Deals',
            'Closed Deals'[won_date]
                - RELATED( 'Marketing Qualified Leads'[first_contact_date] )
        )
    )
)

Lowest Origin = 
MAXX(
    TOPN(1,
        SUMMARIZE(
            'Closed Deals',
            'Marketing Qualified Leads'[origin]
        ),
        [Highest Avg Gap],
        ASC
    ),
    'Marketing Qualified Leads'[origin]
)

Bear in mind that when only one origin is visible in the current context, [Highest Avg Gap] and [Lowest Avg Gap] will always be the same, hence I could use both measures in the definition of [Lowest Origin] (I used [Highest Avg Gap]). If [Higest Avg Gap] (and [Lowest Avg Gap] out of necessity) return BLANK, it means that in the current context there are no entries in the fact table 'Closed Deals'.

View solution in original post

daxer-almighty · ‎11-14-2020

Go to www.sqlbi.com and learn from there (they have the best courses on DAX and data modeling in the world and all I know about DAX I know from them). If you want a DAX bible, read "The Definitive Guide to DAX." If you want videos, go to YT and learn from the Guy in The Cube, or Curbal or EnterpiseDNA. If you want to learn about modeling, go to the Microsoft documentation on Power BI.

daxer-almighty · ‎11-13-2020

Hi @mateusluzzi

I've written 4 measures that give you what you want. I mean I have checked them and they behave reasonably. The tables are connected via 1-1 and the table 'Closed Deals' must be hidden as this is a fact table and it should not be selected from. The other will be visible as it's a dimension. Only on this assumption you're guaranteed correct results.

Highest Avg Gap = 
MAXX(
    DISTINCT( 'Marketing Qualified Leads'[origin] ),
    CALCULATE(
        AVERAGEX(
            'Closed Deals',
            'Closed Deals'[won_date]
                - RELATED( 'Marketing Qualified Leads'[first_contact_date] )
        )
    )
)

Highest Avg Gap gives you the biggest of the average gaps across all origins visible in the current context.

Highest Origin = 
MAXX(
    TOPN(1,
        // Need only those origins that have
        // at least one entry in Closed Deals.
        SUMMARIZE(
            'Closed Deals',
            'Marketing Qualified Leads'[origin]
        ),
        [Highest Avg Gap],
        DESC
    ),
    'Marketing Qualified Leads'[origin]
)

Highest Origin returns the origin with the highest average gap. If there are many origins with the same average gap, it returns the one that is alphabetically last. If you need to return a string with all origins that have the same average gap, you'll have to adjust the code by using CONCATENATEX somewhere and changing the logic a bit. You could as well create a simple measure that would return the number of origins with the same highest average gap. Same is true for the next measure that deals with the Lowest Origin.

Lowest Avg Gap = 
MINX(
    DISTINCT( 'Marketing Qualified Leads'[origin] ),
    CALCULATE(
        AVERAGEX(
            'Closed Deals',
            'Closed Deals'[won_date]
                - RELATED( 'Marketing Qualified Leads'[first_contact_date] )
        )
    )
)

Lowest Origin = 
MAXX(
    TOPN(1,
        SUMMARIZE(
            'Closed Deals',
            'Marketing Qualified Leads'[origin]
        ),
        [Highest Avg Gap],
        ASC
    ),
    'Marketing Qualified Leads'[origin]
)

Bear in mind that when only one origin is visible in the current context, [Highest Avg Gap] and [Lowest Avg Gap] will always be the same, hence I could use both measures in the definition of [Lowest Origin] (I used [Highest Avg Gap]). If [Higest Avg Gap] (and [Lowest Avg Gap] out of necessity) return BLANK, it means that in the current context there are no entries in the fact table 'Closed Deals'.

mateusluzzi · ‎11-13-2020

Hello, friend @daxer-almighty

Thank you very much for your assistance. It worked!

Why? I couldn't understand, I'll just use the code you provide me to this application, but certainly something Ive learned. I need to go deeper in OLAP study. For sure I underestimated the necessary knowledge to work with Power BI, as it is an OLAP tool. If somebody could provide some good study source about data modelling and OLAP I would appreciate.

Thank you all @vivran22, @wdx223_Daniel and @daxer-almighty

mateusluzzi · ‎11-12-2020

Hello, @vivran22!

I tried to import but for some reason I cannot upload PBIX or CSV files. If you desire, these datasets are available on Kaggle, but it may be a lot of data, IDK if it is worth. I'll try to explain better.

https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_products_dataset.csv

https://www.kaggle.com/olistbr/marketing-funnel-olist/home

I have 2 tables:

Marketing Qualified Leads
- mql_id: basically an ID that links both table
- origin: where did the lead come from (organic search, paid search, direct traffic...)
- first contact date: the first time the lead contacted the company
Closed Deals
- mql_id: the link between tables
- won date: the date that the deal was closed

MARKETING QUALIFIED LEADS

mql_id	first_contact_date	origin	....
aof123j1oa0j323j	13/05/2018	paid_search	....
faspf1k12123fsa	21/05/2018	paid_search	....
dfas0fajsfkaak13	29/05/2018	organic_search	....
daj102jdaafdjakf	04/06/2018	email	....
....	....	....	....

CLOSED DEALS

mql_id	won_date	....
faspf1k12123fsa	29/05/2018	....
daj102jdaafdjakf	04/07/2018	....
....	....	....

There are more rows, but irrelevant to this question.

These tables relationship are 1:1, for each mql_id there is a unique mql_id in the other table, but there may be entries at MARKETING QUALIFIED LEADS that has no corresponding at CLOSED DEALS

There are 8.000 distinct mql_id values on MARKETING QUALIFIED LEADS and 842 corresponding mql_id values on CLOSED DEALS

Basically I need to calculate the difference between those dates (first contact and won date, this means how long it took to close the deal), then group this for each origin and take its average, for each origin. Then, I need to take the MAX/MIN average value, to know how many days it took (in average) for each category, and the fastest and slowest category.

I managed to make this works, but I dont think it was the best practice. I've created a few other tables linking and gathering the info I needed. I managed somehow (dont remember) to create a new table with 2 rows, origin and average time for each category, so that I could simple do a "MAX/MIN" from that table and not from a formula. Is it a good way to do this? When I was learning about DB/SQL one of the "principles" was not to create more rows and tables, always using smart select with join... Is this true to Business Intelligence also?

~~Sure I can! Here:~~

~~I excluded some tables that do not interfer because it can be too large. There are only 2 tables left on this file:~~

~~- "Olist Closed Deals": shows us when the deal was closed~~

~~- "Olist Marketing Qualified Leads": shows us when the first contact was made and the origin of the contact (paid search, direct traffic...)~~

~~I added 3 extra rows in "Olist Closed Deals":~~

~~First contact (from the other table)~~
~~How long to close the deal (as 'tempo_venda'), this is a simple DATADIFF~~
~~The origin from the other table (as: 'origem_lead')~~

~~They are linked by "mql_id" column.~~

I managed to make this work, but I dont think it was the best practice. I've created a few other tables linking and gathering the info I needed, so that I could simple do a "MAX/MIN" from a table and not from a formula. Is it a good way to do this? When I was learning about DB/SQL one of the "principles" was not to create another rows, and always use a smart select with join... Is this true to Business Intelligence also?

Thank you very much!

wdx223_Daniel · ‎11-12-2020

@mateusluzzi if only two table in the module, this code might work. otherwise you need double check the filter context to get correct result.

MaxAvgCat:=CONCATENATEX(TOPN(1,ALL(MQ[origin]),MAXX(ALL(MQ[mql_id]),MAX(0,CALCULATE(MAX(CD[won_date])-MAX(MQ[first_contact_date]))))),MQ[origin],",")

mateusluzzi · ‎11-13-2020

Hello @wdx223_Daniel

Thank you for your effort helping me!

But it didnt work...

MaxAvgCat:=CONCATENATEX(TOPN(1,ALL(olist_marketing_qualified_leads_dataset[origin]),MAXX(ALL(olist_marketing_qualified_leads_dataset[mql_id]),MAX(0,CALCULATE(MAX(olist_closed_deals_dataset[won_date])-MAX(olist_marketing_qualified_leads_dataset[first_contact_date]))))),olist_marketing_qualified_leads_dataset[origin],",")

"A sintaxe de 'ALL' está incorreta. (DAX(CONCATENATE....)

It comes up with an error that says the 'ALL' syntax isnt correct

I'm very new to Power BI / DAX, so that I couldn't understand why it is incorrect. You said about filter context, what do you mean?

Thank you!