cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Bigglerum
Frequent Visitor

Comparing Delimited Unique Text Values In a Column and Posting a Count of matches.

Hi

In a table of 6000 rows of chemical compunds, I have a Column with Chemical element symbols separated by ";" in each cell, eg [ H;O;Pb;Ti]

 

There are between 1 - 16 symbols in each cell from a list of 78 valid elements in another table. 

- these are not ordered, so the next may be [ Pb;O;H;Ti]


I want to run a comparison of the cell contents for each row (with others in the table) and in a new column post a count of matching values, ie if the (combination) value is unique = 1, or if there are a pair of matches the count is marked for each = 2 in the appropriate rows, etc.

The goal is to understand the relationship between combinations of elements in the table and their "uniqueness", and thereon the relationships between compounds with the same elements,,,,,,,

I am sure the simple way to start is to order the cell contents, but maybe not necessary, and then to run the comparison of arrays.

I would appreciate suggestions with the process and/or the functions/expressions.

Cheers

Bigglerum

1 ACCEPTED SOLUTION

Hi @Bigglerum ,

I get data from CSV and show you how to update your code.

Here is my original code like yours.

let
    Source = Csv.Document(File.Contents("...\unique1.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Column1", type text}, {"Value", Int64.Type}}),
    #"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Column1", "ElementsRaw"}})
in
    #"Renamed Columns"

My Sample:

1.png

Add codes after step "Renamed Columns".

let
    Source = Csv.Document(File.Contents("...\unique1.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Column1", type text}, {"Value", Int64.Type}}),
    #"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Column1", "ElementsRaw"}}),
    #"Added Index" = Table.AddIndexColumn( #"Renamed Columns", "Index", 1, 1, Int64.Type),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Added Index", {{"ElementsRaw", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "ElementsRaw"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"ElementsRaw", type text}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type1",{{"ElementsRaw", Order.Ascending}}),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"ElementsRaw"}, {{"Rows", each _, type table [ElementsRaw=nullable text, Value=nullable text, Index=number]}}),
    #"Added Index1" = Table.AddIndexColumn(#"Grouped Rows", "Index", 1, 1, Int64.Type),
    #"Expanded Rows" = Table.ExpandTableColumn(#"Added Index1", "Rows", {"Value", "Index"}, {"Rows.Value", "Rows.Index"}),
    #"Group Value"= Table.Group(#"Expanded Rows", {"Rows.Index","Rows.Value"}, {{"New", each Text.Combine([ElementsRaw], "; "), type text}}),
    #"Sorted Rows1" = Table.Sort(#"Group Value",{{"Rows.Index", Order.Ascending}}),
    #"Grouped Rows1" = Table.Group(#"Sorted Rows1", {"New"}, {{"Count", each _, type table [Rows.Index=nullable number, Rows.Value=nullable text, New=text]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows1", "Custom", each Table.RowCount([Count])),
    #"Expanded Count" = Table.ExpandTableColumn(#"Added Custom", "Count", {"Rows.Index", "Rows.Value"}, {"Count.Rows.Index", "Count.Rows.Value"}),
    #"Sorted Rows2" = Table.Sort(#"Expanded Count",{{"Count.Rows.Index", Order.Ascending}}),
    #"Removed Columns" = Table.RemoveColumns(#"Sorted Rows2",{"Count.Rows.Index"}),
    #"Renamed Columns2" = Table.RenameColumns(#"Removed Columns",{{"Count.Rows.Value", "Value"}})
in
    #"Renamed Columns2"

Result is as below.

2.png

 Best Regards,
Rico Zhou

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

11 REPLIES 11
RicoZhou
Community Support
Community Support

Hi @Bigglerum 

According to your statement, I know you want to count the same values inculding values which are combined in same cell with different sort, like [ Pb;O;H;Ti] and [ H;O;Pb;Ti].

Try my way by M code. Here is my sample.

1.png

1. Firstly add an Inedx in this table. 

2. Split by ";" in Rows, sort your new table by [Column1].

3. Group All Rows by [Column1] , add a new Index and Expand Table.

1.png

4. Group Values by Code.

 #"Group Value"= Table.Group(#"Expanded Rows", {"Rows.Index","Rows.Value"}, {{"New", each Text.Combine([Column1], "; "), type text}}),

3.png

5. Then it's easy to count Rows by same [New].

1.png

My M Code:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCkiy9rf2sA7JVNJRMlaK1YlWCgYxjcBMD6AcUAFY0hQs4m8NljZXio0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Value = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Added Index", {{"Column1", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column1"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1", type text}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type1",{{"Column1", Order.Ascending}}),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"Column1"}, {{"Rows", each _, type table [Column1=nullable text, Value=nullable text, Index=number]}}),
    #"Added Index1" = Table.AddIndexColumn(#"Grouped Rows", "Index", 1, 1, Int64.Type),
    #"Expanded Rows" = Table.ExpandTableColumn(#"Added Index1", "Rows", {"Value", "Index"}, {"Rows.Value", "Rows.Index"}),
    #"Group Value"= Table.Group(#"Expanded Rows", {"Rows.Index","Rows.Value"}, {{"New", each Text.Combine([Column1], "; "), type text}}),
    #"Sorted Rows1" = Table.Sort(#"Group Value",{{"Rows.Index", Order.Ascending}}),
    #"Grouped Rows1" = Table.Group(#"Sorted Rows1", {"New"}, {{"Count", each _, type table [Rows.Index=nullable number, Rows.Value=nullable text, New=text]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows1", "Custom", each Table.RowCount([Count])),
    #"Expanded Count" = Table.ExpandTableColumn(#"Added Custom", "Count", {"Rows.Index", "Rows.Value"}, {"Count.Rows.Index", "Count.Rows.Value"}),
    #"Sorted Rows2" = Table.Sort(#"Expanded Count",{{"Count.Rows.Index", Order.Ascending}}),
    #"Removed Columns" = Table.RemoveColumns(#"Sorted Rows2",{"Count.Rows.Index"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Count.Rows.Value", "Value"}})
in
    #"Renamed Columns"

Best Regards,
Rico Zhou

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi Rico,

 

Thanks I am really struggling with this - I understand the functions and what is happening with your code, my weakness is being a beginner and trying to transpose your query into my data.
This begins with defining the source! 

Your code says 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCkiy9rf2sA7JVNJRMlaK1YlWCgYxjcBMD6AcUAFY0hQs4m8NljZXio0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Value = _t]),


 I need to source from a csv with a column I need to name ElementsRaw - here is a query that imports the data and generates the column ElementsRaw

 

let
Source = Csv.Document(File.Contents("D:\Users\Mark\Documents\unique1.csv"),[Delimiter=",", Columns=2, Encoding=65001, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Formula", type text}, {"Mineral Count", Int64.Type}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Formula", "ElementsRaw"}, {"Mineral Count", "Minerals with Same Complexity"}})
in
#"Renamed Columns"

I am sure once I get this fundamental working I will be able to continue


Regards

Bigglerum



 

Hi @Bigglerum ,

I get data from CSV and show you how to update your code.

Here is my original code like yours.

let
    Source = Csv.Document(File.Contents("...\unique1.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Column1", type text}, {"Value", Int64.Type}}),
    #"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Column1", "ElementsRaw"}})
in
    #"Renamed Columns"

My Sample:

1.png

Add codes after step "Renamed Columns".

let
    Source = Csv.Document(File.Contents("...\unique1.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Column1", type text}, {"Value", Int64.Type}}),
    #"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Column1", "ElementsRaw"}}),
    #"Added Index" = Table.AddIndexColumn( #"Renamed Columns", "Index", 1, 1, Int64.Type),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Added Index", {{"ElementsRaw", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "ElementsRaw"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"ElementsRaw", type text}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type1",{{"ElementsRaw", Order.Ascending}}),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"ElementsRaw"}, {{"Rows", each _, type table [ElementsRaw=nullable text, Value=nullable text, Index=number]}}),
    #"Added Index1" = Table.AddIndexColumn(#"Grouped Rows", "Index", 1, 1, Int64.Type),
    #"Expanded Rows" = Table.ExpandTableColumn(#"Added Index1", "Rows", {"Value", "Index"}, {"Rows.Value", "Rows.Index"}),
    #"Group Value"= Table.Group(#"Expanded Rows", {"Rows.Index","Rows.Value"}, {{"New", each Text.Combine([ElementsRaw], "; "), type text}}),
    #"Sorted Rows1" = Table.Sort(#"Group Value",{{"Rows.Index", Order.Ascending}}),
    #"Grouped Rows1" = Table.Group(#"Sorted Rows1", {"New"}, {{"Count", each _, type table [Rows.Index=nullable number, Rows.Value=nullable text, New=text]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows1", "Custom", each Table.RowCount([Count])),
    #"Expanded Count" = Table.ExpandTableColumn(#"Added Custom", "Count", {"Rows.Index", "Rows.Value"}, {"Count.Rows.Index", "Count.Rows.Value"}),
    #"Sorted Rows2" = Table.Sort(#"Expanded Count",{{"Count.Rows.Index", Order.Ascending}}),
    #"Removed Columns" = Table.RemoveColumns(#"Sorted Rows2",{"Count.Rows.Index"}),
    #"Renamed Columns2" = Table.RenameColumns(#"Removed Columns",{{"Count.Rows.Value", "Value"}})
in
    #"Renamed Columns2"

Result is as below.

2.png

 Best Regards,
Rico Zhou

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Bigglerum
Frequent Visitor

Hi Alexis,

 
Thanks again, I maybe need to seek a developer for this, it feels a little advanced and I need to build something robust!
 
I love Powerbi and wish I had the skills and time to give to become an expert.
 
Thanks again 
 
Bigglerum 

It seems no more than a simple game of data transformation and filtering. Paste enough dummy data set, I'll show you the way.


Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is beyond their comprehension!

DAX is simple, but NOT EASY!

Perhaps. This would be a good learning experience but somewhat daunting if you're newish to DAX and modeling concepts.

 

This looks like an interesting challenge, so if you've got a sample file you can link to (doesn't need to be real data), I'd be interested in tinkering with it now and again to keep you moving forward.

Hi Alex,
That is very kind of you - the project is actually a not-for-profit enterprise my company is funding in Science Education.

Will happily share - are we allowed to do this outside of the forum? I am new here so not sure what is legit.

Regards

Bigglerum




The most common way to share files is to post a link to a copy saved in the cloud (e.g. Dropbox/Google Drive/SharePoint/OneDrive).

AlexisOlson
Super User
Super User

I've demonstrated how you can compare delimited lists like this regardless of order here:
https://community.powerbi.com/t5/Desktop/Match-between-two-columns-in-different-order/m-p/2204459

 

However, you can probably make your life a lot easier if you split that column by delimiter into rows via the query editor:

AlexisOlson_0-1638725983130.png

AlexisOlson_1-1638726027819.png

 

With this, you can create a relationship ValidElements[Element] 1--> * Compounds[Element] and analyzing patterns and relationships should be easier.

B5CC86F3-886A-43FE-8F09-37AC660A0D4D.jpeg

Hi Alexis,

I am very grateful for your help and suggestions.

Looking at the link to the earlier solution this points to two column values side by side, in my data the values are in one column across 6000 odd rows.

I am not sure I can use the splitting approach? 

Regards

Bigglerum

 




 

The link I posted isn't exactly what you need. It just gives the flavor for comparing delimited strings.

 

I'm suggesting you expand your single column across 6k rows to a single column that splits each list of elements into as many rows as there are elements. The new table will have 6k * (avg elements per compound) rows.

Helpful resources

Announcements
Microsoft Build 768x460.png

Microsoft Build is May 24-26. Have you registered yet?

Come together to explore latest innovations in code and application development—and gain insights from experts from around the world.

May 23 2022 epsiode 5 without aka link.jpg

The Power BI Community Show

Welcome to the Power BI Community Show! Jeroen ter Heerdt talks about the importance of Data Modeling.

Power BI Dev Camp Session 22 with aka link 768x460.jpg

Check it out!

Mark your calendars and join us on Thursday, May 26 at 11a PDT for a great session with Ted Pattison!

charticulator_carousel_with_text (1).png

Charticulator Design Challenge

Put your data visualization and design skills to the test! This exciting challenge is happening now through May 31st!