Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
bossamus
Frequent Visitor

Conditionally removing duplicates and then finding average

Hi everyone!

 

I have a dataset that looks like this;

 

TeamCertification LevelNumber of members certified
SpidersNo Certification0
Spiders13
Spiders27
LionsNo Certification0
TigersNo Certification0
Tigers12

 

So

  • Spiders has 3 people who are Level 1 Certified and 7 people who are Level 2 Certified and no one uncertified.
  • Lions have no one who is certified
  • Tigers has 2 people who are Level 1 Certified and no one uncertified.

I want to remove the 'No Certified' rows for Spiders and Tigers because having those would mess up the accuracy of future metrics I want to calculate. E.g. I want to caluclate the number of teams that are uncertified vs certified and I also want to calculate the average number of Level 1 and Level 2 certifications amongst all the teams.

 

Thanks!

 

1 ACCEPTED SOLUTION
MFelix
Super User
Super User

Hi @bossamus 

 

Depending on the calcultion you want to do you may use a syntax similar to this one:

Number of teams by certification = 
VAR Temp_table =
    SUMMARIZE (
        Certification,
        Certification[Team],
        Certification[Certification Level],
        "N_Member", SUM ( Certification[Number of members certified] ),
        "Rows_Total",
            COUNTROWS (
                FILTER (
                    ALL ( Certification ),
                    Certification[Team] = SELECTEDVALUE ( certification[team] )
                )
            )
    )
RETURN
    COUNTROWS (
        FILTER (
            Temp_table,
             ( [Rows_Total] = 1 )
                || ( [N_Member] <> 0
                && [Rows_Total] > 1 )
        )
    )

 

The trick here is the filter part of the syntax,

 

MFelix_0-1611578273305.png

If you had the following syntax you get the teams names:

Teams names = 
VAR Temp_table =
    SUMMARIZE (
        Certification,
        Certification[Team],
        Certification[Certification Level],
        "N_Member", SUM ( Certification[Number of members certified] ),
        "Rows_Total",
            COUNTROWS (
                FILTER (
                    ALL ( Certification ),
                    Certification[Team] = SELECTEDVALUE ( certification[team] )
                )
            )
    )
RETURN
    CONCATENATEX(
        FILTER (
            Temp_table,
             ( [Rows_Total] = 1 )
                || ( [N_Member] <> 0
                && [Rows_Total] > 1 )
        )
    , Certification[Team], ",")

 

MFelix_1-1611578508119.png

I assume that you don't want to filter out the information from your main table.

 

If you want to do the calculation based on cleaning the data on the query editor you can:

  • Group Columns by team
  • Merge the groups with the certification in order to get total rows
  • Add a custom column if the following syntax:
if [Grouped Rows.Count] = 1  then 1 else if [Number of members certified]  >  0 then 1 else 0
  • filter out the 0 from the table

Full code below and in attach PBIX file:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCi7ITEktKlbSUfLLV3BOLSrJTMtMTizJzM8DChkoxeogKzEEYmM0MSMgNgeL+QA14TUoJDOdgFVwFYZgk2NjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Team = _t, #"Certification Level" = _t, #"Number of members certified" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Team", type text}, {"Certification Level", type text}, {"Number of members certified", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Team"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type", {"Team"}, #"Grouped Rows", {"Team"}, "Grouped Rows", JoinKind.LeftOuter),
    #"Expanded Grouped Rows" = Table.ExpandTableColumn(#"Merged Queries", "Grouped Rows", {"Count"}, {"Grouped Rows.Count"}),
    #"Added Custom" = Table.AddColumn(#"Expanded Grouped Rows", "Custom", each if [Grouped Rows.Count] = 1  then 1 else if [Number of members certified]  >  0 then 1 else 0),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Custom] = 1))
in
    #"Filtered Rows"

This way you don't need to use any dax.

 

 


Regards

Miguel Félix


Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Check out my blog: Power BI em Português



View solution in original post

3 REPLIES 3
v-cazheng-msft
Community Support
Community Support

Hi, @bossamus 

 

Is your problem solved? If MFelix's post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

Best Regards,

Caiyun Zheng

MFelix
Super User
Super User

Hi @bossamus 

 

Depending on the calcultion you want to do you may use a syntax similar to this one:

Number of teams by certification = 
VAR Temp_table =
    SUMMARIZE (
        Certification,
        Certification[Team],
        Certification[Certification Level],
        "N_Member", SUM ( Certification[Number of members certified] ),
        "Rows_Total",
            COUNTROWS (
                FILTER (
                    ALL ( Certification ),
                    Certification[Team] = SELECTEDVALUE ( certification[team] )
                )
            )
    )
RETURN
    COUNTROWS (
        FILTER (
            Temp_table,
             ( [Rows_Total] = 1 )
                || ( [N_Member] <> 0
                && [Rows_Total] > 1 )
        )
    )

 

The trick here is the filter part of the syntax,

 

MFelix_0-1611578273305.png

If you had the following syntax you get the teams names:

Teams names = 
VAR Temp_table =
    SUMMARIZE (
        Certification,
        Certification[Team],
        Certification[Certification Level],
        "N_Member", SUM ( Certification[Number of members certified] ),
        "Rows_Total",
            COUNTROWS (
                FILTER (
                    ALL ( Certification ),
                    Certification[Team] = SELECTEDVALUE ( certification[team] )
                )
            )
    )
RETURN
    CONCATENATEX(
        FILTER (
            Temp_table,
             ( [Rows_Total] = 1 )
                || ( [N_Member] <> 0
                && [Rows_Total] > 1 )
        )
    , Certification[Team], ",")

 

MFelix_1-1611578508119.png

I assume that you don't want to filter out the information from your main table.

 

If you want to do the calculation based on cleaning the data on the query editor you can:

  • Group Columns by team
  • Merge the groups with the certification in order to get total rows
  • Add a custom column if the following syntax:
if [Grouped Rows.Count] = 1  then 1 else if [Number of members certified]  >  0 then 1 else 0
  • filter out the 0 from the table

Full code below and in attach PBIX file:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCi7ITEktKlbSUfLLV3BOLSrJTMtMTizJzM8DChkoxeogKzEEYmM0MSMgNgeL+QA14TUoJDOdgFVwFYZgk2NjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Team = _t, #"Certification Level" = _t, #"Number of members certified" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Team", type text}, {"Certification Level", type text}, {"Number of members certified", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Team"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type", {"Team"}, #"Grouped Rows", {"Team"}, "Grouped Rows", JoinKind.LeftOuter),
    #"Expanded Grouped Rows" = Table.ExpandTableColumn(#"Merged Queries", "Grouped Rows", {"Count"}, {"Grouped Rows.Count"}),
    #"Added Custom" = Table.AddColumn(#"Expanded Grouped Rows", "Custom", each if [Grouped Rows.Count] = 1  then 1 else if [Number of members certified]  >  0 then 1 else 0),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Custom] = 1))
in
    #"Filtered Rows"

This way you don't need to use any dax.

 

 


Regards

Miguel Félix


Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Check out my blog: Power BI em Português



Hi @MFelix 

 

Thanks for providing that! It was a great idea to group by, count number of rows and then filter out rows if NumberofRows > 1 and CertificationLevel = 0. It was a very simple and elegant solution!

 

Have a great day!

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.