Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
bandaid
New Member

split by non ascii characters

Hi I have a row that contains ascii and non-ascii characters. 

eg: Tokyo 東京, 大阪Osaka

 

The entry is not always separated by spaces.
Is there anyway to split the columns with words that start(contains?) with ascii / non-ascii characters?

 

I know this is possible using programing langugaes but I would like to accomplish this with just powerbi.

1 ACCEPTED SOLUTION
HotChilli
Super User
Super User

Can you try these 2 lines please (in Power Query).  Add them in the Advanced Editor. I think I've got the syntax correct but if you could test them that would be great.  You might need to rename the parts that refer to previous steps or column names.

#"Split Column by Character Transition" = Table.SplitColumn(#"Renamed Columns", "Column1", Splitter.SplitTextByCharacterTransition({"A".."z"}, (c) => not List.Contains({"A".."z"}, c)), {"Column1.1", "Column1.2"}),
    #"Split Column by Character Transition1" = Table.SplitColumn(#"Split Column by Character Transition", "Column1.1", Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"A".."z"}, c), {"A".."z"}), {"Column1.1.1", "Column1.1.2"})
in
    #"Split Column by Character Transition1"

Essentially it's performing two splits on the same column. One from english alphabet to non-english alphabet and the 2nd from non-english alphabet to english alphabet.

Let me know how you get on.

View solution in original post

4 REPLIES 4
HotChilli
Super User
Super User

Can you try these 2 lines please (in Power Query).  Add them in the Advanced Editor. I think I've got the syntax correct but if you could test them that would be great.  You might need to rename the parts that refer to previous steps or column names.

#"Split Column by Character Transition" = Table.SplitColumn(#"Renamed Columns", "Column1", Splitter.SplitTextByCharacterTransition({"A".."z"}, (c) => not List.Contains({"A".."z"}, c)), {"Column1.1", "Column1.2"}),
    #"Split Column by Character Transition1" = Table.SplitColumn(#"Split Column by Character Transition", "Column1.1", Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"A".."z"}, c), {"A".."z"}), {"Column1.1.1", "Column1.1.2"})
in
    #"Split Column by Character Transition1"

Essentially it's performing two splits on the same column. One from english alphabet to non-english alphabet and the 2nd from non-english alphabet to english alphabet.

Let me know how you get on.

Thank you! You made my day! 🙂 

The actual data had special charactes such as apostrophes which were getting recognized as non-ascii but I have a better understanding of how to use powerquery to split the data. Thank you again! 

amitchandak
Super User
Super User

@bandaid , do you need check just 1 digit as ?

the in power query a new column

= if(Text.Start([col],1)  ="?" , [Col], null)

Thanks for your reply! So does '?' mean to check if the string starts with non-ascii? 

The result I want is 

 

| English | japanese |
| ------- | -------- |
| tokyo | 東京 |
| osaka | 大阪 |

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel1

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.

Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.