Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
MichaelHutchens
Helper IV
Helper IV

Help creating a table of unique strings from list of duplicates

Hi folks, I'm hoping someone can help. I have a large dataset of search keyword strings that customers have used to search a knowledgebase. Due to the way that the platform records the searches, it contains duplicates that I'd like to remove.

Each search is defined by a unique search session ("Session Id"), a "Created" datestamp, the search words themselves, and the User who performed the keyword search. I'd like to, in unique each Session Id, remove any search term strings that are duplicates (if two or more strings appear from the same user in the same Session Id, the earliest search string will be retained, based on the datestamp), or search term strings that are already contained in longer strings (e.g. if string 'a' is "hello world" and string 'b' is "hello", string 'b' will be removed).

I've had a convo with a helpful person in this thread who provided the following solution:

let
Source = SourceData,
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Created", type text}, {"Session Id", type text}, {"Search term", type text}, {"User", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
#"Added Index1" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
#"Merged Queries" = Table.NestedJoin(#"Added Index1", {"Session Id", "Index"}, #"Added Index", {"Session Id", "Index"}, "Table1", JoinKind.LeftOuter),
#"Expanded Table1" = Table.ExpandTableColumn(#"Merged Queries", "Table1", {"Search term"}, {"Table1.Search term"}),
#"Sorted Rows" = Table.Sort(#"Expanded Table1",{{"Index", Order.Ascending}}),
#"Added Custom" = Table.AddColumn(#"Sorted Rows", "Custom", each Text.Contains(Text.Lower([Table1.Search term]),Text.Lower([Search term]))),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each [Custom] <> true),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Created", "Session Id", "Search term", "User"})
in
#"Removed Other Columns"

I've attempted to apply the above to some test data, the PIBX & data sources (Excel spreadsheets) of which are in the Onedrive share here.

I'm having some real trouble understanding what I need to do however, and can't make it work. If someone could take a look at provide some assistance, it would be a huge help for me. Thanks so much 🙂

1 ACCEPTED SOLUTION

Hi @MichaelHutchens ,

I created a sample pbix(see attachment), please check whether that is what you want.

yingyinr_0-1618372671306.png

Best Regards

Community Support Team _ Rena
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

3 REPLIES 3
v-yiruan-msft
Community Support
Community Support

Hi @MichaelHutchens ,

What's your expected result? Assume that you have the below table, whether only the rows which marked with red square is what you want?

yingyinr_0-1618308213735.png

Best Regards

Community Support Team _ Rena
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi @v-yiruan-msft, apologies for not being clearer with the results I want, and yes you are correct. You've marked with red the results that I would like to display.

For completeness I should perhaps clarify that I'm looking for unique results for each grouping of 'Session IdFILTER'. So for example if there were two different strings in the 'Session IdFILTER rows (one in the green box, one not in the green box) then the green circled result would also be displayed:

 

MichaelHutchens_0-1618342246901.png


Hope that makes sense 🙂

Hi @MichaelHutchens ,

I created a sample pbix(see attachment), please check whether that is what you want.

yingyinr_0-1618372671306.png

Best Regards

Community Support Team _ Rena
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.