Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hi All,
Is there any M function which I can use to encode column values. For eg. I have one column with values A,B, C. I want these values to be represented as masked values like ..z1Bc01Wx. If there is no direct function is there is any alternate way. My data resides in on- premise big data platform and we use hive connector to get the data in power bi.
Solved! Go to Solution.
If this works well/quickly over very large datasets then it seems a perfectly fine option.
The only thing I'd add would be a length limit on the output, in case the binary output becomes too long, something like this:
Text.Start(Binary.ToText(Text.ToBinary([FieldToMask])), 15)
Pete
Proud to be a Datanaut!
@AlienSx Thanks for your input. Can I ask further question? How performant this code will be? I want to apply this on large dataset (200 M rows) and column on which I want to perform contains 8M distinct values
@kushal_chawda have no idea. I have not tested this on big numbers. Why don't you try and get back to us with some performance assessment?
Hi @kushal_chawda ,
Try this out:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCkgtSVWK1YlW8i4tzkjMATO9MnNzK8Gs8MySDF2XxOIMMM83MyI1RcE5MSAzJNGnGCzkX5KRWqRTXJCanJmYo5eckVgEFI8FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [toAnon = _t]),
addAnon = Table.AddColumn(Source, "Anon", each if [toAnon] = null then null
else Text.Combine(
List.Transform(
List.FirstN( Text.ToList([toAnon]), 15),
each if List.Contains({" ", "-"}, _)
then _
else if _ = Text.Upper(_)
then Character.FromNumber(Number.Round(Number.RandomBetween(65, 90)))
else Character.FromNumber(Number.Round(Number.RandomBetween(97, 122)))
)
)
)
in
addAnon
Example output:
Pete
Proud to be a Datanaut!
@BA_Pete Thanks for your input. Can I ask further question? How performant this code will be? I want to apply this on large dataset (200 M rows) and column on which I want to perform contains 8M distinct values
No idea I'm afraid.
It certainly won't fold back to the source so, for that quantity of rows, you probably don't want to be doing anything that won't fold.
Pete
Proud to be a Datanaut!