Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
gioris
Regular Visitor

Text Replace specific character only if it occurs as the last character of a word in a string

Hi

 

I have a very nasty bug that prevents me from creating a unique values table.

 

I have a list of search terms and I need to make all occurrences unique.

I use the build in remove duplicates function and all seems to be working fine in powerquery.

 

when I port this into powerpivot and try to connect the tables I get a message that the table has duplicate values (even though I have removed them in power query)

After some digging around I found a very nasty bug probably present only in the Greek language and I need to find a way around it.

 

In greek the letter S should be written with "σ" if it is in the middle of the sentence and with "ς" if it is present as the last character of the word.

 

The problem is that some users might input either σ ορ ς,

the problem is:

power query in  the remove duplicate function does recognizes this distinction,
even though in the visual interface the query seems to be the same (crazy)

so to make it more clear this search term:

αγγελιες εργασιας

 

 

could be written

αγγελιεσ εργασιας

αγγελιες εργασιασ

αγγελιεσ εργασιασ

αγγελιες εργασιας (correct version)

 

so after porting to power pivot I have this query that has count of rows =4 and distinct count of query=1

since users have actualy given the input in all 4 possible ways

 

the result is that I cannot have distinct occurrences in the table.

 

 

1.JPG

 

It was really difficult to pinpoint the problem since in the visual interface in power query I get one row and in power pivot visual 4 rows all identical

 

 

One solution that I experimented with is trying to search and replace the occurrence of "σ" with "ς"

the problem here is that I would need to do the replacement only when the occurrence of the letter is in the last letter/position of each word in the string.

 

so in this string:

"αγγελιες εργασιασ θεσσαλονικη"

I need to replace the bold σ only if it is the last letter of each word not if it is present in the middle of the word

 

any help would be really appreciated

 

 

 

 

 

 

 

 

 

 

1 ACCEPTED SOLUTION

In your example, with the 4 different spellings, it is still possible to get 1 distinct value in Power Query by adding the second parameter to Table.Distinct as: Ordinal.FromCulture("GR-gr", true)

in which true means: ignore case.

 

However, the result can be any of the 4 occurrences, not necessarily the correct one.

 

This is illustrated in the code below, along with the answer to your second question, which required just a small adjustment of the original code.

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WOrfx3GYg3Hpu97md57aeb1YAEo1AgY3nm4ECG883KcXqYKhqQlfVjE0VhllYVWGY1aRAnJUK53aALAFyNwLV7D+3Fyi869x2pdhYAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Trimmed Text" = Table.TransformColumns(Source,{},Text.Trim),
    #"Removed Duplicates" = Table.Distinct(#"Trimmed Text", Comparer.FromCulture("GR-gr", true)),
    #"Changed Type" = Table.TransformColumnTypes(#"Removed Duplicates",{{"Column1", type text}}),
    Splitted = Table.TransformColumns(#"Changed Type",{{"Column1", each Text.Split(_," ")}}),
    Replaced = Table.TransformColumns(Splitted,{{"Column1", each List.Transform(_,
        each if _ = "" then "" else Text.Replace(Text.Middle(_,0,Text.Length(_)-1),"ς","σ")&
                                    Text.Replace(Text.At(_,Text.Length(_)-1),"σ","ς"))}}),
    Joined = Table.TransformColumns(Replaced,{{"Column1", each Text.Combine(_," ")}})
in
    Joined
Specializing in Power Query Formula Language (M)

View solution in original post

4 REPLIES 4
MarcelBeug
Community Champion
Community Champion

You can split the strings on " ", replace any last "σ" by "ς" and combine again,

(the if..then..else is because: after the split there may be empty items).

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WOrfx3GYg3Hpu97md57aeb1YAEo1AgY3nm4ECG883KcXqYKhqQlfVjE0VhllYVWGY1aRAnJUK53aALAFyNwLV7D+3Fyi869x2pdhYAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    Splitted = Table.TransformColumns(#"Changed Type",{{"Column1", each Text.Split(_," ")}}),
    Replaced = Table.TransformColumns(Splitted,{{"Column1", each List.Transform(_,each if _ = "" then "" else Text.Middle(_,0,Text.Length(_)-1)&Text.Replace(Text.At(_,Text.Length(_)-1),"σ","ς"))}}),
    Joined = Table.TransformColumns(Replaced,{{"Column1", each Text.Combine(_," ")}})
in
    Joined
Specializing in Power Query Formula Language (M)

This is amazing!
Thank you.

 

I will bother you with one more thing if you can help, since another similar bug appeared after cleaning up the data.

 

It seems that even if the "ς" is in the middle of the sentence power query runs into the same bug/problem.
Now I need to substitute "ς" with "σ" if "ς" appears in any position except the last

thanks again

Hi @gioris,

 

Have you tried the solution provided by @MarcelBeug above? Does it work in your scenario? If it works, could you accept it as solution to close this thread?

 

If you still have any question on this issue, feel free to post here. Smiley Happy

 

Regards

In your example, with the 4 different spellings, it is still possible to get 1 distinct value in Power Query by adding the second parameter to Table.Distinct as: Ordinal.FromCulture("GR-gr", true)

in which true means: ignore case.

 

However, the result can be any of the 4 occurrences, not necessarily the correct one.

 

This is illustrated in the code below, along with the answer to your second question, which required just a small adjustment of the original code.

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WOrfx3GYg3Hpu97md57aeb1YAEo1AgY3nm4ECG883KcXqYKhqQlfVjE0VhllYVWGY1aRAnJUK53aALAFyNwLV7D+3Fyi869x2pdhYAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Trimmed Text" = Table.TransformColumns(Source,{},Text.Trim),
    #"Removed Duplicates" = Table.Distinct(#"Trimmed Text", Comparer.FromCulture("GR-gr", true)),
    #"Changed Type" = Table.TransformColumnTypes(#"Removed Duplicates",{{"Column1", type text}}),
    Splitted = Table.TransformColumns(#"Changed Type",{{"Column1", each Text.Split(_," ")}}),
    Replaced = Table.TransformColumns(Splitted,{{"Column1", each List.Transform(_,
        each if _ = "" then "" else Text.Replace(Text.Middle(_,0,Text.Length(_)-1),"ς","σ")&
                                    Text.Replace(Text.At(_,Text.Length(_)-1),"σ","ς"))}}),
    Joined = Table.TransformColumns(Replaced,{{"Column1", each Text.Combine(_," ")}})
in
    Joined
Specializing in Power Query Formula Language (M)

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.