Solved: Calcule similarity between two item

Anonymous · ‎03-03-2020

Hi everybody !

Here is my issue :

I have this set of data :

SAMU FEATURE Y/N

SAMU 1 A Yes

SAMU 1 B Yes

SAMU 1 C No

SAMU 2 A Yes

SAMU 2 B No

SAMU 2 C No

I would like to calculate the "coefficient of similarity" between SAMU 1 and SAMU 2.

I think it's a measure, but i don't know how to say "Between Samu 1 and samu 2, For each feature, calculate if the features have the same value and calculate the percentage of similarity"

For the exemple, the correct percentage shoud be : 66%

Of course, my real data set have smg like 100 SAMU, and 30+ features. But the Y/N column will always be yes or no.

On my report, i ask to chose two SAMU, and i would like to screen the percentage.

Any idea ?

Thank you so muck for your help and have good day !!!!

v-juanli-msft · ‎03-17-2020

Hi @Anonymous

Sorry for the late reply.

Add a measure

Measure =
VAR left =
    CONCATENATEX (
        VALUES ( 'left slicer' ),
        'left slicer'[SAMU],
        ","
    )
VAR right =
    CONCATENATEX (
        VALUES ( 'right slicer' ),
        'right slicer'[SAMU],
        ","
    )
RETURN
    IF (
        FIND (
            MAX ( [SAMU] ),
            left,
            1,
            0
        )
            || FIND (
                MAX ( [SAMU] ),
                right,
                1,
                0
            ),
        1,
        0
    )

then modify a measure as below

percent =
VAR P =
    CALCULATE (
        DISTINCTCOUNT ( 'Table'[FEATURE] ),
        FILTER (
            'Table',
            [similarity1] = 1
                && [Measure] = 1
        )
    )
        / CALCULATE (
            DISTINCTCOUNT ( 'Table'[FEATURE] ),
            FILTER (
                ALLSELECTED ( 'Table' ),
                [Measure] = 1
            )
        )
RETURN
    IF (
        ISBLANK ( P ),
        0,
        P
    )

Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

Anonymous · ‎03-25-2020

Hello Maggie

This is exactrly what i needed !!!

Thank you so much for your amazing work !

love this community !

Accepted as solution ofc !

v-juanli-msft · ‎03-03-2020

Hi @Anonymous

Could you clear me that?

the correct percent 66% is for all features or for each feature?

We look into the column "Y/N" for the similarity based on different "SAUM" category for each feature, right?

To test it on my side, i recongize "yes" as 1, "no" as 0, then calcualte the similarity, is my job going on the right way?

 
Best Regards
Maggie

Anonymous · ‎03-04-2020

Hey v_juan_li !

Thx for your help !

So the report will be smg like that :

The user chose one SAMU (SAMU is an emergency structure in french) on the left dropdown and an other on the right one.

The goal is to identify the percentage of similarity between the 2 SAMU.

In the set of data i gave here, of course we only have the choice to compare SAMU 1 with SAMU 2.

How the calcul of similarty is :

SAMU 1 FEATURE A = YES and SAMU 2 FEATURE A = YES

then FEATURE A is similar (+1)

SAMU 1 FEATURE B = YES and SAMU 2 FEATURE B = NO

then FEATURE B is not similar (+0)

SAMU 1 FEATURE C = NOand SAMU 2 FEATURE C = NO

then FEATURE B is similar (+1)

Similarity :

Similarity (2) divided by the total of features (3) = 2/3 = 66%

So we look into the column "Y/N" for the similarity based on different "SAMU" category for all feature.

Thank you

v-juanli-msft · ‎03-04-2020

Hi @Anonymous

Create two tables

left slicer = VALUES('Table'[SAMU])

right slicer = VALUES('Table'[SAMU])

add columns to slicers.

Create two measures in table

similarity1 =
VAR left =
    CONCATENATEX (
        VALUES ( 'left slicer' ),
        'left slicer'[SAMU],
        ","
    )
VAR right =
    CONCATENATEX (
        VALUES ( 'right slicer' ),
        'right slicer'[SAMU],
        ","
    )
VAR left_yn =
    CALCULATE (
        MAX ( 'Table'[Y/N] ),
        FILTER (
            ALLSELECTED ( 'Table' ),
            'Table'[FEATURE]
                = MAX ( 'Table'[FEATURE] )
                && FIND (
                    [SAMU],
                    left,
                    1,
                    0
                ) > 0
        )
    )
VAR right_yn =
    CALCULATE (
        MAX ( 'Table'[Y/N] ),
        FILTER (
            ALLSELECTED ( 'Table' ),
            'Table'[FEATURE]
                = MAX ( 'Table'[FEATURE] )
                && FIND (
                    [SAMU],
                    right,
                    1,
                    0
                ) > 0
        )
    )
RETURN
    IF (
        left_yn = right_yn,
        1,
        0
    )

percent =
CALCULATE (
    DISTINCTCOUNT ( 'Table'[FEATURE] ),
    FILTER (
        'Table',
        [similarity1] <> 0
    )
)
    / CALCULATE (
        DISTINCTCOUNT ( 'Table'[FEATURE] ),
        ALLSELECTED ( 'Table' )
    )

Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous · ‎03-05-2020

Hi Maggi !

Thanks for you amazing job !!!!!!

Itis realy near to be functonial !!

Indeed, it works with a dataset with only 2 SAMU, but when i do it with my real dataset (which contains 100 SAMU), the percentage screened is wrong.

I think it's because he counts in the total of similarity with SAMU which are not chosen in the slicers.

Let's have a look :

I guess he counts in the total of similarity the number for all SAMU, even if they are not chosen in the slicers.

That's why the percentage screened is always > 90%.

I dont have the skills to correct the measure you wrote, this is too much complicated for me ...

I don't know how to attach a document (the .pix) with my post to give you the full dataset ...

EDIT : I think the answer shoud be to modify only the percentage measure to count in the first total only the SAMU chosen in the slicer .. but don't know how to do that !

v-juanli-msft · ‎03-05-2020

Hi @Anonymous

Add some data to your original data example(you could paste here), and share me the expected result for the example.

Best Regards

Maggie

Anonymous · ‎03-06-2020

Hey Maggie,

i hope you're good today.

So here is a better view of my dataset :

And here is the number i want, dpeending on what SAMU i chose in the left and right slicers.

thanks !

v-juanli-msft · ‎03-17-2020

Hi @Anonymous

Sorry for the late reply.

Add a measure

Measure =
VAR left =
    CONCATENATEX (
        VALUES ( 'left slicer' ),
        'left slicer'[SAMU],
        ","
    )
VAR right =
    CONCATENATEX (
        VALUES ( 'right slicer' ),
        'right slicer'[SAMU],
        ","
    )
RETURN
    IF (
        FIND (
            MAX ( [SAMU] ),
            left,
            1,
            0
        )
            || FIND (
                MAX ( [SAMU] ),
                right,
                1,
                0
            ),
        1,
        0
    )

then modify a measure as below

percent =
VAR P =
    CALCULATE (
        DISTINCTCOUNT ( 'Table'[FEATURE] ),
        FILTER (
            'Table',
            [similarity1] = 1
                && [Measure] = 1
        )
    )
        / CALCULATE (
            DISTINCTCOUNT ( 'Table'[FEATURE] ),
            FILTER (
                ALLSELECTED ( 'Table' ),
                [Measure] = 1
            )
        )
RETURN
    IF (
        ISBLANK ( P ),
        0,
        P
    )

Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous · ‎03-11-2020

Hi everybody ?

Does someone can help me ? 😞

Thank you

amitchandak · ‎03-03-2020

Refer if this can help

https://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/td-p/196274

!! Power BI 101 Interview questions !! !! Master Microsoft Fabric- 36 Videos !!
Microsoft Power BI Learning Resources, 2023 !!
Learn Power BI - Full Course with Dec-2022, with Window, Index, Offset, 100+ Topics !!
Did I answer your question? Mark my post as a solution! Appreciate your Kudos !! Proud to be a Super User! !!

Anonymous · ‎03-03-2020

Thanks for your answer

This look complicated .. and i dont really understand how to apply it !

To start, i try to find the number of rows similar when filters are applied :

XXX =

CALCULATE(

COUNTROWS(TEST);

FILTER(TEST_2;

TEST[SAMU]=TEST_2[SAMU]

&& TEST[ITEM] = TEST_2[ITEM]

&& TEST[CRITERE] = TEST_2[CRITERE]

)

but it does not really work ..

Any idea, ?

Calcule similarity between two item

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly