Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
kushal_chawda
Frequent Visitor

M function to encode column values

Hi All,

 

Is there any M function which I can use to encode column values. For eg. I have one column with values A,B, C. I want these values to be represented as masked values like ..z1Bc01Wx. If there is no direct function is there is any alternate way. My data resides in on- premise big data platform and we use hive connector to get the data in power bi.

1 ACCEPTED SOLUTION
kushal_chawda
Frequent Visitor

@AlienSx @BA_Pete  I am using below functions to mask data. Not sure whether it is correct way to do it or not. It does not give actual hash value but kind of masks the data.

Binary.ToText(Text.ToBinary([FieldToMasK]))

View solution in original post

8 REPLIES 8
kushal_chawda
Frequent Visitor

@AlienSx @BA_Pete  I am using below functions to mask data. Not sure whether it is correct way to do it or not. It does not give actual hash value but kind of masks the data.

Binary.ToText(Text.ToBinary([FieldToMasK]))

 

If this works well/quickly over very large datasets then it seems a perfectly fine option.

The only thing I'd add would be a length limit on the output, in case the binary output becomes too long, something like this:

Text.Start(Binary.ToText(Text.ToBinary([FieldToMask])), 15)

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




AlienSx
Super User
Super User

Hi, @kushal_chawda yet another option - this post

@AlienSx  Thanks for your input. Can I ask further question? How performant this code will be? I want to apply this on large dataset (200 M rows) and column on which I want to perform contains 8M distinct values

@kushal_chawda have no idea. I have not tested this on big numbers. Why don't you try and get back to us with some performance assessment? 

BA_Pete
Super User
Super User

Hi @kushal_chawda ,

 

Try this out:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCkgtSVWK1YlW8i4tzkjMATO9MnNzK8Gs8MySDF2XxOIMMM83MyI1RcE5MSAzJNGnGCzkX5KRWqRTXJCanJmYo5eckVgEFI8FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [toAnon = _t]),
    addAnon = Table.AddColumn(Source, "Anon", each if [toAnon] = null then null
		else Text.Combine(
            List.Transform(
                List.FirstN( Text.ToList([toAnon]), 15),
                each if List.Contains({" ", "-"}, _)
					then _
                else if _ = Text.Upper(_)
					then Character.FromNumber(Number.Round(Number.RandomBetween(65, 90)))
                else Character.FromNumber(Number.Round(Number.RandomBetween(97, 122)))
            )
        )
    )
in
    addAnon

 

Example output:

BA_Pete_0-1688723685287.png

 

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




@BA_Pete  Thanks for your input. Can I ask further question? How performant this code will be? I want to apply this on large dataset (200 M rows) and column on which I want to perform contains 8M distinct values

 

No idea I'm afraid.

It certainly won't fold back to the source so, for that quantity of rows, you probably don't want to be doing anything that won't fold.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

Top Solution Authors
Top Kudoed Authors