cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
Frequent Visitor

Remove Rows in Dataset based on most recent ID and email address

Hi everyone hope someone can help me.....I'm pretty new to PowerBI but loving how flexible it is. However, as anything new I've come up against something I think should be pretty simple.

 

I have a dataset that I'm brining into Power Bi (it's an SQL table)

 

The table has an incremental ID and an email address amongst many others. For example....

 

ID, Email

1, 123@xyz.com

2, 345@abc.com

3, 123@xyz.com

4, abc@123.com

5, dce@456.com

6, abc@123.com

 

I need to remove all the rows from the dataset that aren't the latest ones for each email.

 

The resulting table would then ONLY include ;

 

ID, Email

2, 345@abc.com

3, 123@xyz.com

5, dce@456.com

6, abc@123.com

 

I've tried many ways to do this and using the GROUP function and MAX as a condition which is returning the correct ID but I need to actually remove them completely.

 

I'm returning all rows below so that I can cherry pick which columns I want once I've (hopefully) removed the rows I don't want.

 

latestguaranteeid.PNG

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I need to remove the rows as I need the email address to be unique but ensure it's the most recent record so that I can join it to another table. 

 

Thanks in advance, Mark.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
Microsoft
Microsoft

Re: Remove Rows in Dataset based on most recent ID and email address

Hi @markpendlebury

 

I think this calculated table might be getting close.  Just replace my Table2 with the name of your table

 

Table 3 = SELECTCOLUMNS(
            FILTER(
                CROSSJOIN(
                    SELECTCOLUMNS( 
                        SUMMARIZECOLUMNS(
                            'Table2'[Email],
                            "MAX ID",
                            MAX('Table2'[ID])
                            ),
                            "MAX Email",[Email],
                            "MAX ID",[MAX ID]
                          ) ,
            'Table2'
            ),
            'Table2'[Email]=[MAX Email] && 
            Table2[ID] = [MAX ID]
            ),
            "ID",[ID],
            "Email" , [MAX Email]
            )

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

View solution in original post

5 REPLIES 5
Highlighted
Microsoft
Microsoft

Re: Remove Rows in Dataset based on most recent ID and email address

Hi @markpendlebury

 

I think this calculated table might be getting close.  Just replace my Table2 with the name of your table

 

Table 3 = SELECTCOLUMNS(
            FILTER(
                CROSSJOIN(
                    SELECTCOLUMNS( 
                        SUMMARIZECOLUMNS(
                            'Table2'[Email],
                            "MAX ID",
                            MAX('Table2'[ID])
                            ),
                            "MAX Email",[Email],
                            "MAX ID",[MAX ID]
                          ) ,
            'Table2'
            ),
            'Table2'[Email]=[MAX Email] && 
            Table2[ID] = [MAX ID]
            ),
            "ID",[ID],
            "Email" , [MAX Email]
            )

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

View solution in original post

Highlighted
Microsoft
Microsoft

Re: Remove Rows in Dataset based on most recent ID and email address

Or you could go the calculated column way and have a column that shows a 1 or 0 depending if the row is the latest.

 

Just add the following calcualted column and then you can filter on it in the Report, Page or Visual filters.

 

Is Latest = IF( CALCULATE(
                COUNTROWS('Table2'),
                FILTER(
                    ALL(Table2),
                    'Table2'[ID] > EARLIER('Table2'[ID]) 
                    && 'Table2'[Email] = EARLIER('Table2'[Email])
                    )
                    )+0=0,1,0) 

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Highlighted
Community Support
Community Support

Re: Remove Rows in Dataset based on most recent ID and email address

@markpendlebury,

 

You could modify the code in Advanced Editor.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTI0MnaoqKzSS87PVYrViVYyAooZm5g6JCYlw8WMsagzAYoB1TgAxeFipkCxlORUBxNTM7iYGbq6WAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [ID = _t, Email = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Email", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Email"}, {{"Data", each _, type table}}),
    Transformed = Table.TransformColumns(#"Grouped Rows", {{"Data", each Table.AddIndexColumn(Table.Sort(_, {{"ID", Order.Descending}}), "Rank", 1, 1)}}),
    #"Expanded IDs" = Table.ExpandTableColumn(Transformed, "Data", {"ID", "Rank"}, {"ID", "Rank"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded IDs", each ([Rank] = 1)),
    #"Sorted Rows" = Table.Sort(#"Filtered Rows",{{"ID", Order.Ascending}})
in
    #"Sorted Rows"
Community Support Team _ Sam Zha
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Highlighted
Frequent Visitor

Re: Remove Rows in Dataset based on most recent ID and email address

@Phil_Seamark Thanks for this answer. I thought I might need to create this as a Table and this appears to work perfectly.

I did need to add my additional fields but that was pretty simple. Thanks

 

My original source table was 200,000 rows so I've had to restrict the incoming dataset (which is ok) as the refresh was timing out the Power BI Gateway but I think I've got that working now.

 

Thanks

Highlighted
Frequent Visitor

Re: Remove Rows in Dataset based on most recent ID and email address

@v-chuncz-msft Thanks for the answer. In the end I didn't attempt this option as I wasn't sure how or which elements to incorporate and I got the other solution working.

 

I do potentially like this option as it would solve the problem without creating a new table.

Helpful resources

Announcements
Community Conference

Power Platform Community Conference

Check out the on demand sessions that are available now!

Community Conference

Microsoft Power Platform Communities

Check out the Winners!

secondImage

Power Platform 2020 release wave 2 plan

Features releasing from October 2020 through March 2021

Tech Marathon

Maratón de Soluciones de Negocio Microsoft

Una semana de contenido con +100 sesiones educativas, consultorios, +10 workshops Premium, Hackaton, EXPO, Networking Hall y mucho más!

Top Solution Authors
Top Kudoed Authors
Users online (857)