cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
MichaelHutchens
Helper II
Helper II

Help creating a table of unique strings from list of duplicates

Hi folks, I'm hoping someone can help. I have a large dataset of search keyword strings that customers have used to search a knowledgebase. Due to the way that the platform records the searches, it contains duplicates that I'd like to remove.

Each search is defined by a unique search session ("Session Id"), a "Created" datestamp, the search words themselves, and the User who performed the keyword search. I'd like to, in unique each Session Id, remove any search term strings that are duplicates (if two or more strings appear from the same user in the same Session Id, the earliest search string will be retained, based on the datestamp), or search term strings that are already contained in longer strings (e.g. if string 'a' is "hello world" and string 'b' is "hello", string 'b' will be removed).

I've had a convo with a helpful person in this thread who provided the following solution:

let
Source = SourceData,
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Created", type text}, {"Session Id", type text}, {"Search term", type text}, {"User", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
#"Added Index1" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
#"Merged Queries" = Table.NestedJoin(#"Added Index1", {"Session Id", "Index"}, #"Added Index", {"Session Id", "Index"}, "Table1", JoinKind.LeftOuter),
#"Expanded Table1" = Table.ExpandTableColumn(#"Merged Queries", "Table1", {"Search term"}, {"Table1.Search term"}),
#"Sorted Rows" = Table.Sort(#"Expanded Table1",{{"Index", Order.Ascending}}),
#"Added Custom" = Table.AddColumn(#"Sorted Rows", "Custom", each Text.Contains(Text.Lower([Table1.Search term]),Text.Lower([Search term]))),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each [Custom] <> true),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Created", "Session Id", "Search term", "User"})
in
#"Removed Other Columns"

I've attempted to apply the above to some test data, the PIBX & data sources (Excel spreadsheets) of which are in the Onedrive share here.

I'm having some real trouble understanding what I need to do however, and can't make it work. If someone could take a look at provide some assistance, it would be a huge help for me. Thanks so much 🙂

1 ACCEPTED SOLUTION

Hi @MichaelHutchens ,

I created a sample pbix(see attachment), please check whether that is what you want.

yingyinr_0-1618372671306.png

Best Regards

Community Support Team _ Rena
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

3 REPLIES 3
yingyinr
Community Support
Community Support

Hi @MichaelHutchens ,

What's your expected result? Assume that you have the below table, whether only the rows which marked with red square is what you want?

yingyinr_0-1618308213735.png

Best Regards

Community Support Team _ Rena
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi @yingyinr, apologies for not being clearer with the results I want, and yes you are correct. You've marked with red the results that I would like to display.

For completeness I should perhaps clarify that I'm looking for unique results for each grouping of 'Session IdFILTER'. So for example if there were two different strings in the 'Session IdFILTER rows (one in the green box, one not in the green box) then the green circled result would also be displayed:

 

MichaelHutchens_0-1618342246901.png


Hope that makes sense 🙂

Hi @MichaelHutchens ,

I created a sample pbix(see attachment), please check whether that is what you want.

yingyinr_0-1618372671306.png

Best Regards

Community Support Team _ Rena
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

Helpful resources

Announcements
PBI User Groups

Welcome to the User Group Public Preview

Check out new user group experience and if you are a leader please create your group!

MBAS on Demand

Microsoft Business Applications Summit sessions

On-demand access to all the great content presented by the product teams and community members! #MSBizAppsSummit #CommunityRocks

MBAS Attendee Badge

Claim Your Badge & Digital Swag!

Check out how to claim yours today!

Top Solution Authors
Top Kudoed Authors