Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
markpendlebury
Frequent Visitor

Remove Rows in Dataset based on most recent ID and email address

Hi everyone hope someone can help me.....I'm pretty new to PowerBI but loving how flexible it is. However, as anything new I've come up against something I think should be pretty simple.

 

I have a dataset that I'm brining into Power Bi (it's an SQL table)

 

The table has an incremental ID and an email address amongst many others. For example....

 

ID, Email

1, 123@xyz.com

2, 345@abc.com

3, 123@xyz.com

4, abc@123.com

5, dce@456.com

6, abc@123.com

 

I need to remove all the rows from the dataset that aren't the latest ones for each email.

 

The resulting table would then ONLY include ;

 

ID, Email

2, 345@abc.com

3, 123@xyz.com

5, dce@456.com

6, abc@123.com

 

I've tried many ways to do this and using the GROUP function and MAX as a condition which is returning the correct ID but I need to actually remove them completely.

 

I'm returning all rows below so that I can cherry pick which columns I want once I've (hopefully) removed the rows I don't want.

 

latestguaranteeid.PNG

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I need to remove the rows as I need the email address to be unique but ensure it's the most recent record so that I can join it to another table. 

 

Thanks in advance, Mark.

 

 

1 ACCEPTED SOLUTION
Phil_Seamark
Employee
Employee

Hi @markpendlebury

 

I think this calculated table might be getting close.  Just replace my Table2 with the name of your table

 

Table 3 = SELECTCOLUMNS(
            FILTER(
                CROSSJOIN(
                    SELECTCOLUMNS( 
                        SUMMARIZECOLUMNS(
                            'Table2'[Email],
                            "MAX ID",
                            MAX('Table2'[ID])
                            ),
                            "MAX Email",[Email],
                            "MAX ID",[MAX ID]
                          ) ,
            'Table2'
            ),
            'Table2'[Email]=[MAX Email] && 
            Table2[ID] = [MAX ID]
            ),
            "ID",[ID],
            "Email" , [MAX Email]
            )

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

View solution in original post

5 REPLIES 5
v-chuncz-msft
Community Support
Community Support

@markpendlebury,

 

You could modify the code in Advanced Editor.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTI0MnaoqKzSS87PVYrViVYyAooZm5g6JCYlw8WMsagzAYoB1TgAxeFipkCxlORUBxNTM7iYGbq6WAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [ID = _t, Email = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Email", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Email"}, {{"Data", each _, type table}}),
    Transformed = Table.TransformColumns(#"Grouped Rows", {{"Data", each Table.AddIndexColumn(Table.Sort(_, {{"ID", Order.Descending}}), "Rank", 1, 1)}}),
    #"Expanded IDs" = Table.ExpandTableColumn(Transformed, "Data", {"ID", "Rank"}, {"ID", "Rank"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded IDs", each ([Rank] = 1)),
    #"Sorted Rows" = Table.Sort(#"Filtered Rows",{{"ID", Order.Ascending}})
in
    #"Sorted Rows"
Community Support Team _ Sam Zha
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

@v-chuncz-msft Thanks for the answer. In the end I didn't attempt this option as I wasn't sure how or which elements to incorporate and I got the other solution working.

 

I do potentially like this option as it would solve the problem without creating a new table.

Phil_Seamark
Employee
Employee

Hi @markpendlebury

 

I think this calculated table might be getting close.  Just replace my Table2 with the name of your table

 

Table 3 = SELECTCOLUMNS(
            FILTER(
                CROSSJOIN(
                    SELECTCOLUMNS( 
                        SUMMARIZECOLUMNS(
                            'Table2'[Email],
                            "MAX ID",
                            MAX('Table2'[ID])
                            ),
                            "MAX Email",[Email],
                            "MAX ID",[MAX ID]
                          ) ,
            'Table2'
            ),
            'Table2'[Email]=[MAX Email] && 
            Table2[ID] = [MAX ID]
            ),
            "ID",[ID],
            "Email" , [MAX Email]
            )

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

@Phil_Seamark Thanks for this answer. I thought I might need to create this as a Table and this appears to work perfectly.

I did need to add my additional fields but that was pretty simple. Thanks

 

My original source table was 200,000 rows so I've had to restrict the incoming dataset (which is ok) as the refresh was timing out the Power BI Gateway but I think I've got that working now.

 

Thanks

Or you could go the calculated column way and have a column that shows a 1 or 0 depending if the row is the latest.

 

Just add the following calcualted column and then you can filter on it in the Report, Page or Visual filters.

 

Is Latest = IF( CALCULATE(
                COUNTROWS('Table2'),
                FILTER(
                    ALL(Table2),
                    'Table2'[ID] > EARLIER('Table2'[ID]) 
                    && 'Table2'[Email] = EARLIER('Table2'[Email])
                    )
                    )+0=0,1,0) 

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.