Solved: DAX will get false Unicode from character

Eastwood · ‎04-29-2022

Hello Community,

this is my first time asking a question here! 🙂

I want to find every lower and upper character in a string. To achieve this, I use the following DAX statement:

DEFINE
VAR Repl_Char = FORMAT ("Adma Sandler","STRING")
VAR Mylen = LEN ( Repl_Char )
VAR mytable =
ADDCOLUMNS (
GENERATESERIES ( 1, mylen ), "Original", MID ( Repl_Char, [Value], 1 ))
VAR Result_Uni =
ADDCOLUMNS (
SELECTCOLUMNS (
ADDCOLUMNS ( mytable, "uni_char", UNICODE ( [Original] ) ),
"Number", [Value],
"Original",[Original],
"Unicode", [uni_char]
),
"uppercase_lowercase",
IF (
[Unicode] >= 97
&& [Unicode] <= 122,
"lower",
IF ( [Unicode] >= 65 && [Unicode] <= 90, "upper","non alpha" )
)
)

EVALUATE
Result_Uni

If I execute this I will receive the follow:

Number	Original	Unicode	uppercase_lowercase
1	A	65	upper
2	d	100	lower
3	m	109	lower
4	A	65	upper
5		32	non alpha
6	S	83	upper
7	A	65	upper
8	n	110	lower
9	d	100	lower
10	l	108	lower
11	e	101	lower
12	r	114	lower

Question:

1. Could someone help to understand why this situation is happend?

2. It would be really helpful if someone has an idea to solve this.

BTW: I only have the option to use DAX for this challenge. 😉

Many thanks

AlexisOlson · ‎04-30-2022

I'm pretty sure this has to do with case-insensitive column compression as I explain here:

https://community.powerbi.com/t5/Desktop/Single-character-in-Text-field-changing-from-upper-to-lower...

While I don't think there's a way to make the [Original] column as you expect, you should be able to fix the other columns by immediately converting to Unicode like this:

DEFINE
VAR Repl_Char = FORMAT ( "Adma Sandler", "STRING" )
VAR Mylen = LEN ( Repl_Char )
VAR mytable =
    ADDCOLUMNS (
        GENERATESERIES ( 1, mylen ),
        "Unicode", UNICODE ( MID ( Repl_Char, [Value], 1 ) )
    )
VAR Result_Uni =
    SELECTCOLUMNS (
        mytable,
        "Number", [Value],
        "Original", UNICHAR ( [Unicode] ),
        "Unicode", [Unicode],
        "uppercase_lowercase",
            IF (
                [Unicode] >= 97 && [Unicode] <= 122,
                "lower",
                IF ( [Unicode] >= 65 && [Unicode] <= 90, "upper", "non alpha" )
            )
    )
EVALUATE
    Result_Uni

View solution in original post

AlexisOlson · ‎04-30-2022

I'm pretty sure this has to do with case-insensitive column compression as I explain here:

https://community.powerbi.com/t5/Desktop/Single-character-in-Text-field-changing-from-upper-to-lower...

While I don't think there's a way to make the [Original] column as you expect, you should be able to fix the other columns by immediately converting to Unicode like this:

DEFINE
VAR Repl_Char = FORMAT ( "Adma Sandler", "STRING" )
VAR Mylen = LEN ( Repl_Char )
VAR mytable =
    ADDCOLUMNS (
        GENERATESERIES ( 1, mylen ),
        "Unicode", UNICODE ( MID ( Repl_Char, [Value], 1 ) )
    )
VAR Result_Uni =
    SELECTCOLUMNS (
        mytable,
        "Number", [Value],
        "Original", UNICHAR ( [Unicode] ),
        "Unicode", [Unicode],
        "uppercase_lowercase",
            IF (
                [Unicode] >= 97 && [Unicode] <= 122,
                "lower",
                IF ( [Unicode] >= 65 && [Unicode] <= 90, "upper", "non alpha" )
            )
    )
EVALUATE
    Result_Uni

Eastwood · ‎04-30-2022

Hello @AlexisOlson, many thanks for your explanation! 🙂 I now understand why my approach to identify the characters did not work! 😉 Your solution will save my day!

amitchandak · ‎04-29-2022

@Eastwood , Why is there a problem with the other A = 65?

!! Power BI 101 Interview questions !! !! Master Microsoft Fabric- 36 Videos !!
Microsoft Power BI Learning Resources, 2023 !!
Learn Power BI - Full Course with Dec-2022, with Window, Index, Offset, 100+ Topics !!
Did I answer your question? Mark my post as a solution! Appreciate your Kudos !! Proud to be a Super User! !!

Eastwood · ‎04-29-2022

@amitchandak many thx for your reply. The search string "Adma" has the last lower character "a". But the Unicode are referenced to an upper character.

DAX will get false Unicode from character

Helpful resources

Fabric certifications survey

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024