cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Frequent Visitor

Conditionally removing duplicates and then finding average

Hi everyone!

 

I have a dataset that looks like this;

 

TeamCertification LevelNumber of members certified
SpidersNo Certification0
Spiders13
Spiders27
LionsNo Certification0
TigersNo Certification0
Tigers12

 

So

  • Spiders has 3 people who are Level 1 Certified and 7 people who are Level 2 Certified and no one uncertified.
  • Lions have no one who is certified
  • Tigers has 2 people who are Level 1 Certified and no one uncertified.

I want to remove the 'No Certified' rows for Spiders and Tigers because having those would mess up the accuracy of future metrics I want to calculate. E.g. I want to caluclate the number of teams that are uncertified vs certified and I also want to calculate the average number of Level 1 and Level 2 certifications amongst all the teams.

 

Thanks!

 

1 ACCEPTED SOLUTION
Super User III
Super User III

Hi @bossamus 

 

Depending on the calcultion you want to do you may use a syntax similar to this one:

Number of teams by certification = 
VAR Temp_table =
    SUMMARIZE (
        Certification,
        Certification[Team],
        Certification[Certification Level],
        "N_Member", SUM ( Certification[Number of members certified] ),
        "Rows_Total",
            COUNTROWS (
                FILTER (
                    ALL ( Certification ),
                    Certification[Team] = SELECTEDVALUE ( certification[team] )
                )
            )
    )
RETURN
    COUNTROWS (
        FILTER (
            Temp_table,
             ( [Rows_Total] = 1 )
                || ( [N_Member] <> 0
                && [Rows_Total] > 1 )
        )
    )

 

The trick here is the filter part of the syntax,

 

MFelix_0-1611578273305.png

If you had the following syntax you get the teams names:

Teams names = 
VAR Temp_table =
    SUMMARIZE (
        Certification,
        Certification[Team],
        Certification[Certification Level],
        "N_Member", SUM ( Certification[Number of members certified] ),
        "Rows_Total",
            COUNTROWS (
                FILTER (
                    ALL ( Certification ),
                    Certification[Team] = SELECTEDVALUE ( certification[team] )
                )
            )
    )
RETURN
    CONCATENATEX(
        FILTER (
            Temp_table,
             ( [Rows_Total] = 1 )
                || ( [N_Member] <> 0
                && [Rows_Total] > 1 )
        )
    , Certification[Team], ",")

 

MFelix_1-1611578508119.png

I assume that you don't want to filter out the information from your main table.

 

If you want to do the calculation based on cleaning the data on the query editor you can:

  • Group Columns by team
  • Merge the groups with the certification in order to get total rows
  • Add a custom column if the following syntax:
if [Grouped Rows.Count] = 1  then 1 else if [Number of members certified]  >  0 then 1 else 0
  • filter out the 0 from the table

Full code below and in attach PBIX file:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCi7ITEktKlbSUfLLV3BOLSrJTMtMTizJzM8DChkoxeogKzEEYmM0MSMgNgeL+QA14TUoJDOdgFVwFYZgk2NjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Team = _t, #"Certification Level" = _t, #"Number of members certified" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Team", type text}, {"Certification Level", type text}, {"Number of members certified", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Team"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type", {"Team"}, #"Grouped Rows", {"Team"}, "Grouped Rows", JoinKind.LeftOuter),
    #"Expanded Grouped Rows" = Table.ExpandTableColumn(#"Merged Queries", "Grouped Rows", {"Count"}, {"Grouped Rows.Count"}),
    #"Added Custom" = Table.AddColumn(#"Expanded Grouped Rows", "Custom", each if [Grouped Rows.Count] = 1  then 1 else if [Number of members certified]  >  0 then 1 else 0),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Custom] = 1))
in
    #"Filtered Rows"

This way you don't need to use any dax.

 

 


Regards

Miguel Félix


Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Check out my blog:

Power BI em Português





View solution in original post

3 REPLIES 3
Microsoft
Microsoft

Hi, @bossamus 

 

Is your problem solved? If MFelix's post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

Best Regards,

Caiyun Zheng

Super User III
Super User III

Hi @bossamus 

 

Depending on the calcultion you want to do you may use a syntax similar to this one:

Number of teams by certification = 
VAR Temp_table =
    SUMMARIZE (
        Certification,
        Certification[Team],
        Certification[Certification Level],
        "N_Member", SUM ( Certification[Number of members certified] ),
        "Rows_Total",
            COUNTROWS (
                FILTER (
                    ALL ( Certification ),
                    Certification[Team] = SELECTEDVALUE ( certification[team] )
                )
            )
    )
RETURN
    COUNTROWS (
        FILTER (
            Temp_table,
             ( [Rows_Total] = 1 )
                || ( [N_Member] <> 0
                && [Rows_Total] > 1 )
        )
    )

 

The trick here is the filter part of the syntax,

 

MFelix_0-1611578273305.png

If you had the following syntax you get the teams names:

Teams names = 
VAR Temp_table =
    SUMMARIZE (
        Certification,
        Certification[Team],
        Certification[Certification Level],
        "N_Member", SUM ( Certification[Number of members certified] ),
        "Rows_Total",
            COUNTROWS (
                FILTER (
                    ALL ( Certification ),
                    Certification[Team] = SELECTEDVALUE ( certification[team] )
                )
            )
    )
RETURN
    CONCATENATEX(
        FILTER (
            Temp_table,
             ( [Rows_Total] = 1 )
                || ( [N_Member] <> 0
                && [Rows_Total] > 1 )
        )
    , Certification[Team], ",")

 

MFelix_1-1611578508119.png

I assume that you don't want to filter out the information from your main table.

 

If you want to do the calculation based on cleaning the data on the query editor you can:

  • Group Columns by team
  • Merge the groups with the certification in order to get total rows
  • Add a custom column if the following syntax:
if [Grouped Rows.Count] = 1  then 1 else if [Number of members certified]  >  0 then 1 else 0
  • filter out the 0 from the table

Full code below and in attach PBIX file:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCi7ITEktKlbSUfLLV3BOLSrJTMtMTizJzM8DChkoxeogKzEEYmM0MSMgNgeL+QA14TUoJDOdgFVwFYZgk2NjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Team = _t, #"Certification Level" = _t, #"Number of members certified" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Team", type text}, {"Certification Level", type text}, {"Number of members certified", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Team"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type", {"Team"}, #"Grouped Rows", {"Team"}, "Grouped Rows", JoinKind.LeftOuter),
    #"Expanded Grouped Rows" = Table.ExpandTableColumn(#"Merged Queries", "Grouped Rows", {"Count"}, {"Grouped Rows.Count"}),
    #"Added Custom" = Table.AddColumn(#"Expanded Grouped Rows", "Custom", each if [Grouped Rows.Count] = 1  then 1 else if [Number of members certified]  >  0 then 1 else 0),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Custom] = 1))
in
    #"Filtered Rows"

This way you don't need to use any dax.

 

 


Regards

Miguel Félix


Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Check out my blog:

Power BI em Português





View solution in original post

Hi @MFelix 

 

Thanks for providing that! It was a great idea to group by, count number of rows and then filter out rows if NumberofRows > 1 and CertificationLevel = 0. It was a very simple and elegant solution!

 

Have a great day!

Helpful resources

Announcements
secondImage

Congratulations!

We are excited to announce the Power BI Super Users!

Wave Release 2

Check out the updates in Power BI.

Overview of Power BI 2020 release wave 2!

Microsoft Ignite

Microsoft Ignite

Join digitally, March 2–4, 2021 to explore new tech that's ready to implement. Experience the keynote in mixed reality through AltspaceVR!

secondImage

The largest Power BI virtual conference

100+ sessions, 100+ speakers, Product managers, MVPs, and experts. All about Power BI. Attend online or watch the recordings.

Top Solution Authors
Top Kudoed Authors