Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi
I have a very nasty bug that prevents me from creating a unique values table.
I have a list of search terms and I need to make all occurrences unique.
I use the build in remove duplicates function and all seems to be working fine in powerquery.
when I port this into powerpivot and try to connect the tables I get a message that the table has duplicate values (even though I have removed them in power query)
After some digging around I found a very nasty bug probably present only in the Greek language and I need to find a way around it.
In greek the letter S should be written with "σ" if it is in the middle of the sentence and with "ς" if it is present as the last character of the word.
The problem is that some users might input either σ ορ ς,
the problem is:
power query in the remove duplicate function does recognizes this distinction,
even though in the visual interface the query seems to be the same (crazy)
so to make it more clear this search term:
αγγελιες εργασιας |
could be written
αγγελιεσ εργασιας
αγγελιες εργασιασ
αγγελιεσ εργασιασ
αγγελιες εργασιας (correct version)
so after porting to power pivot I have this query that has count of rows =4 and distinct count of query=1
since users have actualy given the input in all 4 possible ways
the result is that I cannot have distinct occurrences in the table.
It was really difficult to pinpoint the problem since in the visual interface in power query I get one row and in power pivot visual 4 rows all identical
One solution that I experimented with is trying to search and replace the occurrence of "σ" with "ς"
the problem here is that I would need to do the replacement only when the occurrence of the letter is in the last letter/position of each word in the string.
so in this string:
"αγγελιες εργασιασ θεσσαλονικη"
I need to replace the bold σ only if it is the last letter of each word not if it is present in the middle of the word
any help would be really appreciated
Solved! Go to Solution.
In your example, with the 4 different spellings, it is still possible to get 1 distinct value in Power Query by adding the second parameter to Table.Distinct as: Ordinal.FromCulture("GR-gr", true)
in which true means: ignore case.
However, the result can be any of the 4 occurrences, not necessarily the correct one.
This is illustrated in the code below, along with the answer to your second question, which required just a small adjustment of the original code.
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WOrfx3GYg3Hpu97md57aeb1YAEo1AgY3nm4ECG883KcXqYKhqQlfVjE0VhllYVWGY1aRAnJUK53aALAFyNwLV7D+3Fyi869x2pdhYAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t]), #"Trimmed Text" = Table.TransformColumns(Source,{},Text.Trim), #"Removed Duplicates" = Table.Distinct(#"Trimmed Text", Comparer.FromCulture("GR-gr", true)), #"Changed Type" = Table.TransformColumnTypes(#"Removed Duplicates",{{"Column1", type text}}), Splitted = Table.TransformColumns(#"Changed Type",{{"Column1", each Text.Split(_," ")}}), Replaced = Table.TransformColumns(Splitted,{{"Column1", each List.Transform(_, each if _ = "" then "" else Text.Replace(Text.Middle(_,0,Text.Length(_)-1),"ς","σ")& Text.Replace(Text.At(_,Text.Length(_)-1),"σ","ς"))}}), Joined = Table.TransformColumns(Replaced,{{"Column1", each Text.Combine(_," ")}}) in Joined
You can split the strings on " ", replace any last "σ" by "ς" and combine again,
(the if..then..else is because: after the split there may be empty items).
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WOrfx3GYg3Hpu97md57aeb1YAEo1AgY3nm4ECG883KcXqYKhqQlfVjE0VhllYVWGY1aRAnJUK53aALAFyNwLV7D+3Fyi869x2pdhYAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t]), #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}), Splitted = Table.TransformColumns(#"Changed Type",{{"Column1", each Text.Split(_," ")}}), Replaced = Table.TransformColumns(Splitted,{{"Column1", each List.Transform(_,each if _ = "" then "" else Text.Middle(_,0,Text.Length(_)-1)&Text.Replace(Text.At(_,Text.Length(_)-1),"σ","ς"))}}), Joined = Table.TransformColumns(Replaced,{{"Column1", each Text.Combine(_," ")}}) in Joined
This is amazing!
Thank you.
I will bother you with one more thing if you can help, since another similar bug appeared after cleaning up the data.
It seems that even if the "ς" is in the middle of the sentence power query runs into the same bug/problem.
Now I need to substitute "ς" with "σ" if "ς" appears in any position except the last
thanks again
Hi @gioris,
Have you tried the solution provided by @MarcelBeug above? Does it work in your scenario? If it works, could you accept it as solution to close this thread?
If you still have any question on this issue, feel free to post here.
Regards
In your example, with the 4 different spellings, it is still possible to get 1 distinct value in Power Query by adding the second parameter to Table.Distinct as: Ordinal.FromCulture("GR-gr", true)
in which true means: ignore case.
However, the result can be any of the 4 occurrences, not necessarily the correct one.
This is illustrated in the code below, along with the answer to your second question, which required just a small adjustment of the original code.
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WOrfx3GYg3Hpu97md57aeb1YAEo1AgY3nm4ECG883KcXqYKhqQlfVjE0VhllYVWGY1aRAnJUK53aALAFyNwLV7D+3Fyi869x2pdhYAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t]), #"Trimmed Text" = Table.TransformColumns(Source,{},Text.Trim), #"Removed Duplicates" = Table.Distinct(#"Trimmed Text", Comparer.FromCulture("GR-gr", true)), #"Changed Type" = Table.TransformColumnTypes(#"Removed Duplicates",{{"Column1", type text}}), Splitted = Table.TransformColumns(#"Changed Type",{{"Column1", each Text.Split(_," ")}}), Replaced = Table.TransformColumns(Splitted,{{"Column1", each List.Transform(_, each if _ = "" then "" else Text.Replace(Text.Middle(_,0,Text.Length(_)-1),"ς","σ")& Text.Replace(Text.At(_,Text.Length(_)-1),"σ","ς"))}}), Joined = Table.TransformColumns(Replaced,{{"Column1", each Text.Combine(_," ")}}) in Joined
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
106 | |
93 | |
75 | |
62 | |
50 |
User | Count |
---|---|
147 | |
107 | |
105 | |
87 | |
61 |