Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hi,
There are a few similar posts like this but I haven't been able to adapt them to solve my problem. I'm looking to create a function that will do this replacement so that it can be dynamic.
Sample Data:
= Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fc5NC8MgDAbg/+K5F933MbbbsdeBxcOwZcrEgtP/X53bCNschBDIQ94MA4GobvbiRtIQ4G3qqShbpc4YyzOlRDa/3Xqzxa4Od/tDgbWT3fH0jE67LPHF82Stcdcwu2L62Qddxof7RkKIsn+lVhxGb9Fqb+5B6ehVTgEAxPBf/yB2XXTTaHIe5/yTECkX", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [City = _t, Customer = _t, Location = _t, Supplier = _t, Data1 = _t, Data2 = _t])
I want to conditionally replace values in the City column using the Rules table.
Rules:
= Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXIsTc7OScxLATGdnIEkEBkaGYO42Tl++UUlGUqxOtFKRjiUmpiaQZQG55dClRrjUGpuYQlRGp5aXAJWaQLkh6fm5GTmpZfk50FUQawEM4FSCAeYoquNioqCKINIQO2PBQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Id = _t, City = _t, Customer = _t, Location = _t, Supplier = _t, ActualCity = _t])
I would like to use all the columns in the Rules table (except the Id and ActualCity) to compare with the Data table. The Rules table will grow including new columns. Blanks can be ignored, e.g. Rule 4 only needs to use 'City' and 'Location' to check the Data table. For all matching rows, replace the values in the City column with the 'ActualCity' value.
Desired Result:
= Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fc+9CsMgEADgd3HOov0fNWnHLh0CBodipErEQKLvXzU1vXQoHMfhfdydXYfoYO/j5DWqEGV1zDEw2cVMCEk1xkhU2T3GsHX7wxG6FbZq9tCdzpfFbScGOdin6+NTc719NsdeknBgq9YLY4Ayq0LKcZzzpVs2bpQ17uVHt4rv+VnUejKzlzpMMn+UUsDgTf8gdE1wqjdpH2PslyAh3g==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [City = _t, Customer = _t, Location = _t, Supplier = _t, Data1 = _t, Data2 = _t])
I'm happy to unpivot the rules if it makes it easier to solve the problem:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXIsTc7OScxLATGdnIEkEBkaGYO42Tl++UUlGUqxOtFKRjiUmpiaQZQG55dClRrjUGpuYQlRGp5aXAJWaQLkh6fm5GTmpZfk50FUQawEM4FSCAeYoquNioqCKINIQO2PBQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Id = _t, City = _t, Customer = _t, Location = _t, Supplier = _t, ActualCity = _t]),
#"Replaced Value" = Table.ReplaceValue(Source,"",null,Replacer.ReplaceValue,{"Customer", "Location", "Supplier"}),
#"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Replaced Value", {"Id", "ActualCity"}, "Attribute", "Value")
in
#"Unpivoted Columns"
I'd really appreciate any help from the M code gurus out there.
Regards,
Kim
Have I solved your problem? Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;). |
If you found this post helpful, please give Kudos. It gives me a sense of instant gratification and, if you give me Kudos enough times, magical unicorns will appear on your screen. If you find my signature vaguely amusing, please give Kudos. | Proud to be a Super User! |
Solved! Go to Solution.
test if this is right for you:
Table.FromRecords(Table.TransformRows(dataTab, (x)=> Record.Combine({x,[City=
(try Table.SelectRows(ruleTab, each ([Customer]=x[Customer] or x[Customer]="") and ([location]=x[location] or x[location]="" ))[ActualCity]{0} otherwise x[City])]})))
this solution tries to allow control over the list of relevant columns:
let
relCols={"Customer","location"},
matchCity=(rTab,dTabRow,rCols) => [ dTabRowValues= Record.FieldValues(Record.SelectFields(dTabRow,rCols)),
newCity= try Table.SelectRows(rTab, each List.ContainsAll( Record.FieldValues(_),dTabRowValues))[ActualCity]{0}
otherwise dTabRow[City]] [newCity],
tab=Table.FromRecords(Table.TransformRows(dataTab, each _&[City=matchCity(ruleTab,_,relCols)]))
in
tab
Thanks @Anonymous , your first solution works so thank you for that. I would like to dive a little deeper into your second solution as I think it will be more future proof. I'm getting some results that don't match what I'm expecting and I think it is because I don't fully understand how it is working.
See screenshots below. In this example I would have wanted the City to stay as is (row 7 of the results) i.e. Wellington, like it has for row 4 in the results below (Auckland). Are you able to help my understand your code a little more?
Source (highlight = the row that shouldn't change):
Rules (highlight = the rule being applied to the result):
Results:
Have I solved your problem? Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;). |
If you found this post helpful, please give Kudos. It gives me a sense of instant gratification and, if you give me Kudos enough times, magical unicorns will appear on your screen. If you find my signature vaguely amusing, please give Kudos. | Proud to be a Super User! |
The problem could be the interpretation (mine or yours) of the "relevant columns" 😀.
Which columns do you intend to use as a key to select the rows on which to make the changes?
Another important aspect to be clarified is related to the logical consistency of the rules.
Take, for example, rules 4 and 5. If there is a record in the data table that has city = Wellington, customer = ZZZ and location = North, what do you do?
The relevant columns would be any in the rules table except Id and ActualCity if not blank.
City will always be populated. The other columns would be optional.
Blank should be ignored so only columns that have values should be used to evaluate the data table.
I'm happy to unpivot the rules if it makes it easier.
I think the short answer to your second question is that the rules won't exist like that (or data won't).
What would happen in your code if they did? Process twice (in order) therefore getting the second result after processing is complete or only process once and then not match again?
Have I solved your problem? Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;). |
If you found this post helpful, please give Kudos. It gives me a sense of instant gratification and, if you give me Kudos enough times, magical unicorns will appear on your screen. If you find my signature vaguely amusing, please give Kudos. | Proud to be a Super User! |
It seems to me that works. perhaps you should change column type to text and cell value "null" to empty, i.e. ""
here how (with little change), although I used fewer rows and fewer columns than you used:
let
relCols={"Customer","Location","Supplier"},
matchCity=(rTab,dTabRow,rCols) => [ dTabRowValues= Record.FieldValues(Record.SelectFields(dTabRow,rCols)),
newCity= try Table.SelectRows(rTab, each List.ContainsAll( Record.FieldValues(_),dTabRowValues))[ActualCity]{0} otherwise dTabRow[City]]
[newCity]
in Table.FromRecords(Table.TransformRows(dataTab, each _&[City=matchCity(ruleTab,_,relCols)]))
about your question
What would happen in your code if they did? Process twice (in order) therefore getting the second result after processing is complete or only process once and then not match again?
the following piece of code
Table.SelectRows(rTab, each List.ContainsAll( Record.FieldValues(_),dTabRowValues))[ActualCity]{0}
select the first of the rules that the current row of the data table meets.
But this behavior can eventually be changed
this version should take into account the type of columns and the list of irrelevant chars / text (like: null, empy, blank, and so on).
So maybe 😁you shouldn't worry about it 😀.
let
relCols={"Customer","Location","Supplier"},
irrFieldValues={""," ",null, "QuelloCheVogliO"},
MyList.ContainsAll=[MLCA=(listRules,listData,nullChars)=> List.ContainsAll(listRules,listData,
(x,y)=>List.Contains({Text.From(x)}&irrFieldValues,Text.From(y)))][MLCA],
matchCity=(rTab,dTabRow,rCols) => [ dTabRowValues= Record.FieldValues(Record.SelectFields(dTabRow,rCols)),
newCity= try Table.SelectRows(rTab, each MyList.ContainsAll( Record.FieldValues(_),dTabRowValues,irrFieldValues))[ActualCity]{0} otherwise dTabRow[City]]
[newCity]
in Table.FromRecords(Table.TransformRows(dataTab, each _&[City=matchCity(ruleTab,_,relCols)]))
T
It seems to me that works,although in my test I used tables with fewer rows and columns than those shown by you.
Perhaps you should make typeof your column text type and modify, if any, "null" cell to empty cell, i.e. "".
Here how (little change to the previuos version):
let
relCols={"Customer","Location","Supplier"},
matchCity=(rTab,dTabRow,rCols) => [ dTabRowValues= Record.FieldValues(Record.SelectFields(dTabRow,rCols)),
newCity= try Table.SelectRows(rTab, each List.ContainsAll( Record.FieldValues(_),dTabRowValues))[ActualCity]{0} otherwise dTabRow[City]]
[newCity]
in Table.FromRecords(Table.TransformRows(dataTab, each _&[City=matchCity(ruleTab,_,relCols)]))
About your question:
What would happen in your code if they did? Process twice (in order) therefore getting the second result after processing is complete or only process once and then not match again?
this piece of code:
Table.SelectRows(rTab, each List.ContainsAll( Record.FieldValues(_),dTabRowValues))[ActualCity]{0}
takes the first matched rule. but you can eventually change the behavior
Hi, @KNP
my code as below:
// Output
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fc5NC8MgDAbg/+K5F933MbbbsdeBxcOwZcrEgtP/X53bCNschBDIQ94MA4GobvbiRtIQ4G3qqShbpc4YyzOlRDa/3Xqzxa4Od/tDgbWT3fH0jE67LPHF82Stcdcwu2L62Qddxof7RkKIsn+lVhxGb9Fqb+5B6ehVTgEAxPBf/yB2XXTTaHIe5/yTECkX", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [City = _t, Customer = _t, Location = _t, Supplier = _t, Data1 = _t, Data2 = _t]),
chType = Table.TransformColumnTypes(Source,{{"City", type text}, {"Customer", type text}, {"Location", type text}, {"Supplier", Int64.Type}, {"Data1", Int64.Type}, {"Data2", Int64.Type}}),
rules = Table.Buffer(Rules),
fx = (rec, matchlist)=>try rules[ActualCity]{Table.PositionOf(rules, rec, 0, matchlist )} otherwise rec[City],
trans = Table.TransformRows(chType,each let m =List.FirstN(Record.FieldNames(_),4), r=Record.SelectFields(_, m) in _&[City= fx(r, m)]),
result = Table.FromRecords(trans)
in
result
test if this is right for you:
Table.FromRecords(Table.TransformRows(dataTab, (x)=> Record.Combine({x,[City=
(try Table.SelectRows(ruleTab, each ([Customer]=x[Customer] or x[Customer]="") and ([location]=x[location] or x[location]="" ))[ActualCity]{0} otherwise x[City])]})))
@Anonymous - Thanks for all your help on this. After much testing I've decided to go with a variation of your first answer. The others were very promising but I was getting mixed results. Performance was certainly a consideration on a relatively large dataset.
Have I solved your problem? Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;). |
If you found this post helpful, please give Kudos. It gives me a sense of instant gratification and, if you give me Kudos enough times, magical unicorns will appear on your screen. If you find my signature vaguely amusing, please give Kudos. | Proud to be a Super User! |