Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Calcule similarity between two item

Hi everybody !

 

Here is my issue :

I have this set of data :

 

SAMU             FEATURE        Y/N

SAMU 1          A                     Yes

SAMU 1          B                     Yes

SAMU 1          C                     No

SAMU 2          A                     Yes

SAMU 2          B                     No

SAMU 2          C                     No

 

I would like to calculate the "coefficient of similarity" between SAMU 1 and SAMU 2.

I think it's a measure, but i don't know how to say "Between Samu 1 and samu 2, For each feature, calculate if the features have the same value and calculate the percentage of similarity"

For the exemple, the correct percentage shoud be : 66%

 

Of course, my real data set have smg like 100 SAMU, and 30+ features. But the Y/N column will always be yes or no.

 

On my report, i ask to chose two SAMU, and i would like to screen the percentage.

 

Any idea ?

 

Thank you so muck for your help and have good day !!!!

1 ACCEPTED SOLUTION

Hi @Anonymous 

Sorry for the late reply.

Add a measure

 

Measure =
VAR left =
    CONCATENATEX (
        VALUES ( 'left slicer' ),
        'left slicer'[SAMU],
        ","
    )
VAR right =
    CONCATENATEX (
        VALUES ( 'right slicer' ),
        'right slicer'[SAMU],
        ","
    )
RETURN
    IF (
        FIND (
            MAX ( [SAMU] ),
            left,
            1,
            0
        )
            || FIND (
                MAX ( [SAMU] ),
                right,
                1,
                0
            ),
        1,
        0
    )

 

then modify a measure as below

 

percent =
VAR P =
    CALCULATE (
        DISTINCTCOUNT ( 'Table'[FEATURE] ),
        FILTER (
            'Table',
            [similarity1] = 1
                && [Measure] = 1
        )
    )
        / CALCULATE (
            DISTINCTCOUNT ( 'Table'[FEATURE] ),
            FILTER (
                ALLSELECTED ( 'Table' ),
                [Measure] = 1
            )
        )
RETURN
    IF (
        ISBLANK ( P ),
        0,
        P
    )

 

15.JPG

Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

11 REPLIES 11
Anonymous
Not applicable

Hello Maggie 

 

This is exactrly what i needed !!!

Thank you so much for your amazing work !

 

love this community !

 

Accepted as solution ofc !

v-juanli-msft
Community Support
Community Support

Hi @Anonymous 

Could you clear me that?

the correct percent 66% is for all features or for each feature?

We look into the column "Y/N" for the similarity based on different "SAUM" category for each feature, right?

To test it on my side, i recongize "yes" as 1, "no" as 0, then calcualte the similarity, is my job going on the right way?

 
Best Regards
Maggie

 

Anonymous
Not applicable

Hey v_juan_li !

 

Thx for your help !

 

So the report will be smg like that :

SAMU.png

The user chose one SAMU (SAMU is an emergency structure in french) on the left dropdown and an other on the right one.

The goal is to identify the percentage of similarity between the 2 SAMU.

In the set of data i gave here, of course we only have the choice to compare SAMU 1 with SAMU 2.

 

How the calcul of similarty is :

SAMU 1 FEATURE A = YES and SAMU 2 FEATURE A = YES

then FEATURE A is similar (+1)

 

SAMU 1 FEATURE B = YES and SAMU 2 FEATURE B = NO

then FEATURE B is not similar (+0)

 

SAMU 1 FEATURE C = NOand SAMU 2 FEATURE C = NO

then FEATURE B is similar (+1)

 

Similarity : 

Similarity (2) divided by the total of features (3) = 2/3 = 66%

 

So we look into the column "Y/N" for the similarity based on different "SAMU" category for all feature.

 

Thank you

 

 

Hi @Anonymous 

Create two tables

left slicer = VALUES('Table'[SAMU])

right slicer = VALUES('Table'[SAMU])

add columns to slicers.

 

Create two measures in table

Capture15.JPG

 

similarity1 =
VAR left =
    CONCATENATEX (
        VALUES ( 'left slicer' ),
        'left slicer'[SAMU],
        ","
    )
VAR right =
    CONCATENATEX (
        VALUES ( 'right slicer' ),
        'right slicer'[SAMU],
        ","
    )
VAR left_yn =
    CALCULATE (
        MAX ( 'Table'[Y/N] ),
        FILTER (
            ALLSELECTED ( 'Table' ),
            'Table'[FEATURE]
                = MAX ( 'Table'[FEATURE] )
                && FIND (
                    [SAMU],
                    left,
                    1,
                    0
                ) > 0
        )
    )
VAR right_yn =
    CALCULATE (
        MAX ( 'Table'[Y/N] ),
        FILTER (
            ALLSELECTED ( 'Table' ),
            'Table'[FEATURE]
                = MAX ( 'Table'[FEATURE] )
                && FIND (
                    [SAMU],
                    right,
                    1,
                    0
                ) > 0
        )
    )
RETURN
    IF (
        left_yn = right_yn,
        1,
        0
    )
percent =
CALCULATE (
    DISTINCTCOUNT ( 'Table'[FEATURE] ),
    FILTER (
        'Table',
        [similarity1] <> 0
    )
)
    / CALCULATE (
        DISTINCTCOUNT ( 'Table'[FEATURE] ),
        ALLSELECTED ( 'Table' )
    )

 

Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous
Not applicable

Hi Maggi !

 

Thanks for you amazing job !!!!!!

Itis realy near to be functonial !!

 

Indeed, it works with a dataset with only 2 SAMU, but when i do it with my real dataset (which contains 100 SAMU), the percentage screened is wrong.

I think it's because he counts in the total of similarity with SAMU which are not chosen in the slicers.

Let's have a look :

PBI data.png

I guess he counts in the total of similarity the number for all SAMU, even if they are not chosen in the slicers.

That's why the percentage screened is always > 90%.

 

I dont have the skills to correct the measure you wrote, this is too much complicated for me ...

I don't know how to attach a document (the .pix) with my post to give you the full dataset ...

 

EDIT : I think the answer shoud be to modify only the percentage measure to count in the first total only the SAMU chosen in the slicer .. but don't know how to do that !

Hi @Anonymous 

Add some data to your original data example(you could paste here), and share me the expected result for the example.

 

Best Regards

Maggie

Anonymous
Not applicable

Hey Maggie,

 

i hope you're good today.

 

So here is a better view of my dataset :

PBI forum.png

 

And here is the number i want, dpeending on what SAMU i chose in the left and right slicers.

 

PBI forum 2.png

 

thanks !

Hi @Anonymous 

Sorry for the late reply.

Add a measure

 

Measure =
VAR left =
    CONCATENATEX (
        VALUES ( 'left slicer' ),
        'left slicer'[SAMU],
        ","
    )
VAR right =
    CONCATENATEX (
        VALUES ( 'right slicer' ),
        'right slicer'[SAMU],
        ","
    )
RETURN
    IF (
        FIND (
            MAX ( [SAMU] ),
            left,
            1,
            0
        )
            || FIND (
                MAX ( [SAMU] ),
                right,
                1,
                0
            ),
        1,
        0
    )

 

then modify a measure as below

 

percent =
VAR P =
    CALCULATE (
        DISTINCTCOUNT ( 'Table'[FEATURE] ),
        FILTER (
            'Table',
            [similarity1] = 1
                && [Measure] = 1
        )
    )
        / CALCULATE (
            DISTINCTCOUNT ( 'Table'[FEATURE] ),
            FILTER (
                ALLSELECTED ( 'Table' ),
                [Measure] = 1
            )
        )
RETURN
    IF (
        ISBLANK ( P ),
        0,
        P
    )

 

15.JPG

Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous
Not applicable

Hi everybody ?

 

Does someone can help me ? 😞

 

Thank you

amitchandak
Super User
Super User
Anonymous
Not applicable

Thanks for your answer

 

This look complicated .. and i dont really understand how to apply it !

 

To start, i try to find the number of rows similar when filters are applied :

XXX =
CALCULATE(
COUNTROWS(TEST);
FILTER(TEST_2;
TEST[SAMU]=TEST_2[SAMU]
&& TEST[ITEM] = TEST_2[ITEM]
&& TEST[CRITERE] = TEST_2[CRITERE]
)
)
 
but it does not really work ..
 
Any idea, ?
 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.