Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
One of the most required functionalities in terms of data transformation for Power BI is the ability to do Fuzzy Lookup on two datasets so that input text values with minor errors can still be mapped to a dimension in PowerBI.
I know best practices would redirect this to the usage of a ETL tool and take care of this in a DW etc.... but most uses cases are within the universe of Self Service BI using Excel files.
Let me know if there is a workaround either using M, R or other approach.
Thank you
let
Table1 = #table(type table [value = text],
{ {"spruce"}, {"cedar"}, {"birch"}, {"pine"}, {"poplar"}, {"aspen"},
{"baobab"}, {"larch"}, {"willow"} }),
Table2 = #table(type table [word = text],
{ {"sprce"}, {"cidar"}, {"pane"}, {"aspern"}, {"babab"} }),
AddCol = Table.AddColumn(Table2, "Custom", each Table1),
Expand = Table.ExpandTableColumn(AddCol, "Custom", {"value"}, {"value"}),
FuzzyCalc = Table.AddColumn(Expand, "Prct", each
2 * List.Count( List.Intersect(
{ Text.ToList([word]), Text.ToList([value]) } ) )
/ (List.Count(Text.ToList([word])) + List.Count(Text.ToList([value])))),
SortRows = Table.Sort(FuzzyCalc,{{"word", Order.Ascending}, {"Prct", Order.Descending}}),
AddIndex = Table.AddIndexColumn(SortRows, "Index", 1, 1),
RemoveDupls = Table.Distinct(AddIndex, {"word"})
in
RemoveDupls
Hi @Anonymous,
Currently, power bi not support fuzzy lookup. Is only supported to use full query(direct support) and part query(need some additional operations to work through it). It not support to lookup the similar result(e.g. use "abc" to find out the "aac","a1c") and search with wildcard.
BTW, you can submit your requirement to ideas to help us improve the power bi.
Regards,
Xiaoxin Sheng
HI @Anonymous,
If you meant use r script to deal with original data source, r script support these operations.
I think t-sql fuzzy matching is more simple than r script, you can refer to below links:
Fuzzy Logic function in R as in Matlab
Regards,
Xiaoxin Sheng
Why not just use binning?
Binning would require me to know all the values to select... but this is dynamic and tomorrow the items may be different.
Or am I understanding your suggestion wrong?