Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Removing duplicate and sorting in a keywords column

All,

My customer has a following requirement. This will form of a report I am generating. 

 

I have a keywords column that contains words like following:

- Grape Grape Apple

- Orange Apple Grape Orange

- Melon Orange Apple

- Melon Melon Grape

 

1. I want to remove the duplicates in each. For example - "Grape Grape Apple" should be "Grape Apple"

2. I wanted to sort the column. For example - "Grape Apple" to "Apple Grape"

 

Eventually this keyword combination will be used to slice the data.

Is PowerBI the right solution? Can I integrate with PowerApps or Power Automate to get this done. Unfortunately, for Python, I have to install a personal gateway, which might not work when we go into production.

 

Thanks for any thoughts/ideas. Appreciate it.

Sundar

1 ACCEPTED SOLUTION

Hi @Anonymous ,

 

After sorting, you can merge rows as follows.

1. Select "Group By", then name the new column "Rows", group the keywords column by Index column and sum.

 

vkkfmsft_0-1629867453572.png

image.png


2. You can see that the aggregated column returns error value.

 

image.png


3. At this point, select "Advanced Editor", make the changes shown in the figure below. Which function List.RemoveItems retains the other columns except the keywords column. And the rows of the keywords column are merged group by these columns. For merging rows, you can refer to the video https://www.youtube.com/watch?v=ODLTky9z5LE .

 

image.png

List.RemoveItems(Table.ColumnNames ( #"Sort Rows"), {"keywords column"})

 

4. Finally, you can choose whether to delete the Index column or not.

 

vkkfmsft_1-1629867989585.png

 

If the problem is still not resolved, please provide detailed error information or the expected result you expect. Let me know immediately, looking forward to your reply.

Best Regards,
Winniz

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

7 REPLIES 7
Anonymous
Not applicable

Thank You @v-kkf-msft and @AllisonKennedy. Your solution solves my problem partially. The challenge I have is that Keywords is one of 40 columns in that table and Keywords has null values as well for some of the rows.

 

I won't be able to create a separate dim, since the dim won't map to fact table, as the sort order would have changed. any thoughts on removing the null value and update the fact table with the sorted data.

 

Thanks

Veyron

Hi @Anonymous ,

 

Try the following code:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WclTSUXIvSixIVYCQjgUFOalKsTrRSk5AGf+ixLx0qCBUAUQIrMIZqMI3NSc/TwFZHVjKBS4FIcF6wTKuOC10JWghSAWY4QZjuGO3JxYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, #"keywords column" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"keywords column", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Added Index", {{"keywords column", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "keywords column"),
    #"Removed Duplicates" = Table.Distinct(#"Split Column by Delimiter"),
    #"Sort Rows" = Table.Buffer(Table.Sort(#"Removed Duplicates",{{"Index",Order.Ascending},{"keywords column",Order.Ascending}})),
    #"Grouped Rows" = Table.Group(#"Sort Rows",  List.RemoveItems(Table.ColumnNames ( #"Sort Rows"), {"keywords column"}), {{"Rows", each Text.Combine([keywords column]," "), type nullable text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"Index"})
in
    #"Removed Columns"

image.png


If the problem is still not resolved, please provide detailed error information or the expected result you expect. Let me know immediately, looking forward to your reply.

Best Regards,
Winniz

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

Anonymous
Not applicable

Hi @v-kkf-msft Thanks for sharing the script. I tried to recreate, but unable to for the Grouped Rows. Can you let me know how you arrived at that?

 

    #"Grouped Rows" = Table.Group(#"Sort Rows",  List.RemoveItems(Table.ColumnNames ( #"Sort Rows"), {"keywords column"}), {{"Rows", each Text.Combine([keywords column]," "), type nullable text}}),

  

Hi @Anonymous ,

 

After sorting, you can merge rows as follows.

1. Select "Group By", then name the new column "Rows", group the keywords column by Index column and sum.

 

vkkfmsft_0-1629867453572.png

image.png


2. You can see that the aggregated column returns error value.

 

image.png


3. At this point, select "Advanced Editor", make the changes shown in the figure below. Which function List.RemoveItems retains the other columns except the keywords column. And the rows of the keywords column are merged group by these columns. For merging rows, you can refer to the video https://www.youtube.com/watch?v=ODLTky9z5LE .

 

image.png

List.RemoveItems(Table.ColumnNames ( #"Sort Rows"), {"keywords column"})

 

4. Finally, you can choose whether to delete the Index column or not.

 

vkkfmsft_1-1629867989585.png

 

If the problem is still not resolved, please provide detailed error information or the expected result you expect. Let me know immediately, looking forward to your reply.

Best Regards,
Winniz

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous
Not applicable

Thank you for your help. For some reason, when I did text.combine, it did not merge it alphabetically. So I added List.Sort and it worked fine. Appreciate your help.

 

= Table.Group(#"Removed Columns", List.RemoveItems(Table.ColumnNames (#"Removed Columns"), {"Value"}), {{"SortedKeywords", each Text.Combine(List.Sort([Value]), " "), type nullable text}})

v-kkf-msft
Community Support
Community Support

Hi @Anonymous ,

 

Try the following code in Power Query Editor:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45Wci9KLEhVgJCOBQU5qUqxOtFK/kWJeelQAagkRAgs65uak5+ngKwGSRhCgvUoxcYCAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"keywords column" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"keywords column", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Added Index", {{"keywords column", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "keywords column"),
    #"Removed Duplicates" = Table.Distinct(#"Split Column by Delimiter"),
    #"Sort Rows" = Table.Buffer(Table.Sort(#"Removed Duplicates",{{"Index",Order.Ascending},{"keywords column",Order.Ascending}})),
    #"Grouped Rows" = Table.Group(#"Sort Rows", {"Index"}, {{"Rows", each Text.Combine([keywords column]," "), type nullable text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"Index"})
in
    #"Removed Columns"

 image.png

 

If the problem is still not resolved, please provide detailed error information or the expected result you expect. Let me know immediately, looking forward to your reply.

Best Regards,
Winniz

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

AllisonKennedy
Super User
Super User

@Anonymous 

 

Can you paste in table format a sample of the raw data? Does each row have multiple keywords? Do you assume that the keywords are separated by a space? If so you could use Power Query:

 

1) Import data

2) Click Transform to open Power Query

3) Add Column tab > Index column to assign unique ID to each row

4) Select the keyword column > Transform Column tab > Split Column > By Delimiter EXPAND ADVANCED Options here and choose split into rows

5) Do a custom sort on the data to sort by Index, then by Keyword

 

Then you can group them back and concatenate them if needed, but I would leave them in separate rows and add a DimKeyword table as the slicer. Does that make any sense?


Please @mention me in your reply if you want a response.

Copying DAX from this post? Click here for a hack to quickly replace it with your own table names

Has this post solved your problem? Please Accept as Solution so that others can find it quickly and to let the community know your problem has been solved.
If you found this post helpful, please give Kudos C

I work as a Microsoft trainer and consultant, specialising in Power BI and Power Query.
www.excelwithallison.com

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.