Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
bandaid
New Member

split by non ascii characters

Hi I have a row that contains ascii and non-ascii characters. 

eg: Tokyo 東京, 大阪Osaka

 

The entry is not always separated by spaces.
Is there anyway to split the columns with words that start(contains?) with ascii / non-ascii characters?

 

I know this is possible using programing langugaes but I would like to accomplish this with just powerbi.

1 ACCEPTED SOLUTION
HotChilli
Super User
Super User

Can you try these 2 lines please (in Power Query).  Add them in the Advanced Editor. I think I've got the syntax correct but if you could test them that would be great.  You might need to rename the parts that refer to previous steps or column names.

#"Split Column by Character Transition" = Table.SplitColumn(#"Renamed Columns", "Column1", Splitter.SplitTextByCharacterTransition({"A".."z"}, (c) => not List.Contains({"A".."z"}, c)), {"Column1.1", "Column1.2"}),
    #"Split Column by Character Transition1" = Table.SplitColumn(#"Split Column by Character Transition", "Column1.1", Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"A".."z"}, c), {"A".."z"}), {"Column1.1.1", "Column1.1.2"})
in
    #"Split Column by Character Transition1"

Essentially it's performing two splits on the same column. One from english alphabet to non-english alphabet and the 2nd from non-english alphabet to english alphabet.

Let me know how you get on.

View solution in original post

4 REPLIES 4
HotChilli
Super User
Super User

Can you try these 2 lines please (in Power Query).  Add them in the Advanced Editor. I think I've got the syntax correct but if you could test them that would be great.  You might need to rename the parts that refer to previous steps or column names.

#"Split Column by Character Transition" = Table.SplitColumn(#"Renamed Columns", "Column1", Splitter.SplitTextByCharacterTransition({"A".."z"}, (c) => not List.Contains({"A".."z"}, c)), {"Column1.1", "Column1.2"}),
    #"Split Column by Character Transition1" = Table.SplitColumn(#"Split Column by Character Transition", "Column1.1", Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"A".."z"}, c), {"A".."z"}), {"Column1.1.1", "Column1.1.2"})
in
    #"Split Column by Character Transition1"

Essentially it's performing two splits on the same column. One from english alphabet to non-english alphabet and the 2nd from non-english alphabet to english alphabet.

Let me know how you get on.

Thank you! You made my day! 🙂 

The actual data had special charactes such as apostrophes which were getting recognized as non-ascii but I have a better understanding of how to use powerquery to split the data. Thank you again! 

amitchandak
Super User
Super User

@bandaid , do you need check just 1 digit as ?

the in power query a new column

= if(Text.Start([col],1)  ="?" , [Col], null)

Thanks for your reply! So does '?' mean to check if the string starts with non-ascii? 

The result I want is 

 

| English | japanese |
| ------- | -------- |
| tokyo | 東京 |
| osaka | 大阪 |

Helpful resources

Announcements
PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.