Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
prankin
Frequent Visitor

Find and Replace text string in one table using value from another table

Hi

I am trying to create a query using M language that allows me to correct text in a certain column in a certain table using data from two columns in another table.

I have a table of bank transactions with a column for Description. Due to OCR errors, there are numerous errors in the text in that column. I have a second table called Replacements that has two columns: one for incorrect text called Find and one for corrected text called Replace. I want to create a recursive function that searches BankRecords[Description] column for the incorrect text in Replacements[Find] and if there is a match replace it with the text in Replacements [Replace]. There are two stumbling blocks I can't seem to get past: the first is I want the search for the incorrect text to begin at position 0 in the Descriptions column, and the second is that I am only replacing parts of the text, not the whole cell or the full text string.

I have based my code on code I found in John McDougal's Blog "How to Excel", specifically his 2019 article "Bulk Find And Replace In Power Query". Here's the code I'm using:

let BulkReplace = (DataTable as table, FindReplaceTable as table, DataTableColumn as list) =>
	let
	//Buffer the FindReplaceTable and remove the “Bank” column
	BufferFRT = Table.Buffer(Table.RemoveColumns(FindReplaceTable, “Bank”),
	//Convert the buffered FindReplaceTable to a buffered list using the List.Buffer and Table.ToRows
	//functions so we can reference the list with an index number
	FindReplaceList = List.Buffer(Table.ToRows(BufferFRT)),
 	//Count number of rows in the FindReplaceTable to determine
	//how many iterations are needed
	Counter = Table.RowCount(BufferFRT),
	//Define a function to iterate over our list 
	//with the Table.ReplaceValue function
	BulkReplaceValues = (DataTableTemp, n) => 
	let 
		//Set length of replace string
		ReplaceLength = Text.Length(FindReplaceList{n}{0})
		DataColumnLength = Text.Length(DataTableColumn{n}{0})
		//Replace values using nth item in FindReplaceList
		ReplaceTable = Table.ReplaceValue(
			DataTableTemp,
			//set string from Find column as ‘oldValue’
			FindReplaceList{n}{0},
			//where Find column string matches start of Description column, set
                        //newValue equal to  combination of Replace column string plus
                        //remainder of Description column string
			If FindReplaceList{n}{0} = Text.Start(DataTableColumn{n}{0}, ReplaceLength)
			Then
			ReplaceString = Text.Combine(FindReplaceList{n}{1}, Text.Middle(DataTableColumn{n}{0}, ReplaceLength+1, DataColumnLength))
			Else  "",
			Replacer.ReplaceText,
			DataTableColumn
			)
	   in
		//if we are not at the end of the FindReplaceList
		//then iterate through Table.ReplaceValue again
		if n = Counter - 1 
			then ReplaceTable
			else @BulkReplaceValues(ReplaceTable, n + 1),
	//Evaluate the sub-function at the first row
	Output = BulkReplaceValues(DataTable, 0)   
	in
	Output
in
	BulkReplace

If I imbed the BulkReplace function at the end of a query so that it reads: =BulkReplaceFunction(#"Reordered Columns", Replacements, {"Description"}), I get the error message: "Expression.Error: We cannot convert the value "Description" to type List."

If I invoke the BulkReplace function, so that it reads: =BulkReplaceFunction(BankRecords, Replacements, BankRecords[Description]), I get the error message: "Expression.Error: We cannot convert the value "#290 WINNERS" to type List." Note that "#290 WINNERS" is the value from first cell in my Description Column in the table BankRecords.

Thanks in advance for your help.

1 ACCEPTED SOLUTION
CNENFRNL
Community Champion
Community Champion

Hi, @prankin , why bother to use such an intimidating chunk of codes given that there already exists a native function List.Accumulate? You might want to try such a solution,

let

    BulkReplaceFunction = (DataTable as table, FindReplaceTable as table, DataTableColumn as list) =>
    let
        BufferedFRT = List.Buffer(Table.ToRecords(FindReplaceTable)),
        Substitution = List.Accumulate(//outer List.Accumulate for iteration
            DataTableColumn,
            Table.Buffer(DataTable),
            (s, c) =>
                Table.TransformColumns(
                    s, {{c, each List.Accumulate(//core function, inner List.Accumulate to find & replace
                            BufferedFRT, _, (x, y) => Text.Replace(x, y[Find], y[Replace])
                            )}}
                )
        )
    in
        Substitution,

    Replacements = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fZGxDoIwEIZfpeJGcGhwcWxrSRtDSSvSGMLgQKKLC2H3WXw0n8S7Ahowcbn+33eXa5rWdRQznkdJhLVJAIUDghJAcgAoATJxBsIaUJtSOs7MAeQ3YyuZr3w9nkQ6LZhBmY9x7Cwnj7BIMLcHm7MPYJOS/21TVR409d4Hkfbd7d52HTg+RfSCKGLX+EgFRzB0U9vtzOwWE0quyEkGJS2EIK/tpW8XTsniZ07ZeFg24fwuTVOBQnMxiMKQUruwZT9G9JLAe/E/SJnB/zRv", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Find = _t, Replace = _t]),

    BankRecords = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nZfLbuM2FIZf5TTpog0kmodXcSnL9FhTS05kx5k0mEUKGBhggBS9AV3OoxR9lD7KPMmQEikzmcyCBQz70LL9+T+Xn9TDwwVeFBdvH59OsPr15EJkXJRSKl2Vxhj3RvPhN/e8fHz6CP6jfIFiwSj18SUzFO7avrfD3i2VIrRyr+5RxjgE74uHC5YHEgFEXfw9IBJhdGm39mj9l5EoGVAxLrUmmo8onofCBZNR0w/HentrYdfDtr35cUJxPKPGuKwMoWJEiTyUWSCLqq7qZQdNvd/A3WIFd/UWuno4uAuMOrUBGePS5ZqIiSlzM8lplHfVDNDZbgfrYdcBZ1i6b5fhdUJyQednYYhUI1Nl68RZp13Cst1u4fq+O0Dd2Xcw2De323rwnEJol8xYzGTpY2Zc2Ue+zubLmb9u7uEw1P1+bYeJI0xhKkoofRmrQkpKJI7MKrONcIFmhrb9wQ7Luv/JodfQkTVpyH//THiGhUBGZPUiVrowihM6STb/v7WKnNZyVK3cEE1UpLlYNmf686d/wQ5tU/dg310Pdr9PKs3GjhrBY+Swkp2xmT6UNpjDOrkre73bt4eYY1G4i6Gq51DjM2imJ73QuncVbuphVcCy23mFZ84o87xU5hk2059SLMIrYCcqBSdLVRU0AWe6FV8wHp0DW04RDuuhXA+vGwdOfzByMeFmOpZvZJzB/fF4B3xYwfXh3lc5zNBZZBJqWqiEm+laDhvzzG/3be8beLe6hrUdN53XCls942W6lN/gosx5YuvVse4bO4qJxCny9pDAMu3JTQyP6lJ7cEs5g2QEiQSUaUSucmZW1TS72/4Aezsc28ZCs6mHN2kuMfJw5rFMB/LdoqOy5enpD+gef/94+tNfqojWkRRiB+OGEcMmWPb5h0VtDWzg5tJhqRpnMB1+ziUZ8+bNlVUEpx2FZRpOSsPy4Ua8TpNIjAwwv5VNWzbLtJkUZr4lzHVIAPEzKNNWXMH4PN7Nxn4H7sQV7HuiOPcwQRNqJJqF7nc7lg6nyUxLkcm5tflwevzrBJeIlRconwlERSINNTGBlm0kKtG3c/IuNfU/W4WDDjJFVDUZCBWCqJDJTANBenasZnNzBaYC7swCeUX1eLqQQZYolKoI95XSrNDIveARmWkjaZdsvtmQiogqoJTx8YjKNBIe5npktcgbOLb7GgZ2I95+zWSuWiYwjSZCT7cCmWYiEi9p3F3AoR0sdM1F3NhGJ/HntYmkKkFMIGU6CYYjqo8tuB0dGvrz379UVn49dIiGVKF2Ujlrcfc5778A", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Index = _t, #"Account Name" = _t, #"Account Number" = _t, #"Account Type" = _t, #"Branch Reference" = _t, Date = _t, Description = _t, Debit = _t, Credit = _t, Value = _t, Balance = _t]),

    Invoking = BulkReplaceFunction(BankRecords, Replacements, {"Description"})
in
    Invoking

 

As you can see, the very essential part of the definition of the function

BulkReplaceFunction = (DataTable as table, FindReplaceTable as table, DataTableColumn as list) =>
let
    BufferedFRT = List.Buffer(Table.ToRecords(FindReplaceTable)),
    Substitution = List.Accumulate(//outer List.Accumulate for iteration
        DataTableColumn,
        Table.Buffer(DataTable),
        (s, c) =>
            Table.TransformColumns(
                s, {{c, each List.Accumulate(//core function, inner List.Accumulate to find & replace
                        BufferedFRT, _, (x, y) => Text.Replace(x, y[Find], y[Replace])
                        )}}
            )
    )
in
    Substitution,

 


Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension!

DAX is simple, but NOT EASY!

View solution in original post

11 REPLIES 11
watkinnc
Super User
Super User

You could just do a join:

NewStep = Table.Join(PriorStep, {"Description"}, ReplacementTable, {"NameOfColumnOneOfReplacementTable"}, JoinKind.LeftOuter)

Then add a column:

NewColumn = Table.AddColumn(NewStep, "Final Column", each if [NameOfColumnOneOfReplacementTable] = null then [Description] else [NameOfColumnTwoReplacementTable], type text)

 

That'll do it!---Nate

 


I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!

Sorry @watkinnc . Your code will not work for me. It requires that the complete string in both columns referenced in the join to match. I'm only matching the first few characters of the complete string. I ran your solution through a larger sample of my data set, and it only corrected 94 out of 73,000 rows. Same with my other problem. I'm searching for bits of text in the "Description" column, and if there is a match, then I want to do the next steps.

Hi @watkinnc: except that it adds three new columns (Find, Replace, NewColumn). Getting rid of the first two new columns is not difficult. However, I don't was the corrected text in a new column, I want the incorrect text replaced in the old column. Yours is also slightly slower when processing 100K+ rows of data. However, your solution may solve another problem I've run into, which is categorizing each transaction based on a separate table of data.

prankin
Frequent Visitor

Hi @AlB. I have added some rows of the BankRecords table, as you requested, plus some rows of the Replacements table. Thanks.

@prankin 

=Table.ReplaceValue(BankRecords,"","",(x,y,z)=>let a=Table.SelectRows(Replacements,each Text.Contains(x,[Find])){0}? in if a=null then x else Text.Replace(x,a[Find],a[Replace]),{"Description"})

CNENFRNL
Community Champion
Community Champion

Hi, @prankin , why bother to use such an intimidating chunk of codes given that there already exists a native function List.Accumulate? You might want to try such a solution,

let

    BulkReplaceFunction = (DataTable as table, FindReplaceTable as table, DataTableColumn as list) =>
    let
        BufferedFRT = List.Buffer(Table.ToRecords(FindReplaceTable)),
        Substitution = List.Accumulate(//outer List.Accumulate for iteration
            DataTableColumn,
            Table.Buffer(DataTable),
            (s, c) =>
                Table.TransformColumns(
                    s, {{c, each List.Accumulate(//core function, inner List.Accumulate to find & replace
                            BufferedFRT, _, (x, y) => Text.Replace(x, y[Find], y[Replace])
                            )}}
                )
        )
    in
        Substitution,

    Replacements = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fZGxDoIwEIZfpeJGcGhwcWxrSRtDSSvSGMLgQKKLC2H3WXw0n8S7Ahowcbn+33eXa5rWdRQznkdJhLVJAIUDghJAcgAoATJxBsIaUJtSOs7MAeQ3YyuZr3w9nkQ6LZhBmY9x7Cwnj7BIMLcHm7MPYJOS/21TVR409d4Hkfbd7d52HTg+RfSCKGLX+EgFRzB0U9vtzOwWE0quyEkGJS2EIK/tpW8XTsniZ07ZeFg24fwuTVOBQnMxiMKQUruwZT9G9JLAe/E/SJnB/zRv", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Find = _t, Replace = _t]),

    BankRecords = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nZfLbuM2FIZf5TTpog0kmodXcSnL9FhTS05kx5k0mEUKGBhggBS9AV3OoxR9lD7KPMmQEikzmcyCBQz70LL9+T+Xn9TDwwVeFBdvH59OsPr15EJkXJRSKl2Vxhj3RvPhN/e8fHz6CP6jfIFiwSj18SUzFO7avrfD3i2VIrRyr+5RxjgE74uHC5YHEgFEXfw9IBJhdGm39mj9l5EoGVAxLrUmmo8onofCBZNR0w/HentrYdfDtr35cUJxPKPGuKwMoWJEiTyUWSCLqq7qZQdNvd/A3WIFd/UWuno4uAuMOrUBGePS5ZqIiSlzM8lplHfVDNDZbgfrYdcBZ1i6b5fhdUJyQednYYhUI1Nl68RZp13Cst1u4fq+O0Dd2Xcw2De323rwnEJol8xYzGTpY2Zc2Ue+zubLmb9u7uEw1P1+bYeJI0xhKkoofRmrQkpKJI7MKrONcIFmhrb9wQ7Luv/JodfQkTVpyH//THiGhUBGZPUiVrowihM6STb/v7WKnNZyVK3cEE1UpLlYNmf686d/wQ5tU/dg310Pdr9PKs3GjhrBY+Swkp2xmT6UNpjDOrkre73bt4eYY1G4i6Gq51DjM2imJ73QuncVbuphVcCy23mFZ84o87xU5hk2059SLMIrYCcqBSdLVRU0AWe6FV8wHp0DW04RDuuhXA+vGwdOfzByMeFmOpZvZJzB/fF4B3xYwfXh3lc5zNBZZBJqWqiEm+laDhvzzG/3be8beLe6hrUdN53XCls942W6lN/gosx5YuvVse4bO4qJxCny9pDAMu3JTQyP6lJ7cEs5g2QEiQSUaUSucmZW1TS72/4Aezsc28ZCs6mHN2kuMfJw5rFMB/LdoqOy5enpD+gef/94+tNfqojWkRRiB+OGEcMmWPb5h0VtDWzg5tJhqRpnMB1+ziUZ8+bNlVUEpx2FZRpOSsPy4Ua8TpNIjAwwv5VNWzbLtJkUZr4lzHVIAPEzKNNWXMH4PN7Nxn4H7sQV7HuiOPcwQRNqJJqF7nc7lg6nyUxLkcm5tflwevzrBJeIlRconwlERSINNTGBlm0kKtG3c/IuNfU/W4WDDjJFVDUZCBWCqJDJTANBenasZnNzBaYC7swCeUX1eLqQQZYolKoI95XSrNDIveARmWkjaZdsvtmQiogqoJTx8YjKNBIe5npktcgbOLb7GgZ2I95+zWSuWiYwjSZCT7cCmWYiEi9p3F3AoR0sdM1F3NhGJ/HntYmkKkFMIGU6CYYjqo8tuB0dGvrz379UVn49dIiGVKF2Ujlrcfc5778A", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Index = _t, #"Account Name" = _t, #"Account Number" = _t, #"Account Type" = _t, #"Branch Reference" = _t, Date = _t, Description = _t, Debit = _t, Credit = _t, Value = _t, Balance = _t]),

    Invoking = BulkReplaceFunction(BankRecords, Replacements, {"Description"})
in
    Invoking

 

As you can see, the very essential part of the definition of the function

BulkReplaceFunction = (DataTable as table, FindReplaceTable as table, DataTableColumn as list) =>
let
    BufferedFRT = List.Buffer(Table.ToRecords(FindReplaceTable)),
    Substitution = List.Accumulate(//outer List.Accumulate for iteration
        DataTableColumn,
        Table.Buffer(DataTable),
        (s, c) =>
            Table.TransformColumns(
                s, {{c, each List.Accumulate(//core function, inner List.Accumulate to find & replace
                        BufferedFRT, _, (x, y) => Text.Replace(x, y[Find], y[Replace])
                        )}}
            )
    )
in
    Substitution,

 


Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension!

DAX is simple, but NOT EASY!

Hi @CNENFRNL

Thanks for the post. It took me a while to fully understand what you did. I also had to slightly modify it because it was replacing the string in the Description column at any position in the string. I needed the replacement process to only replace beginning at the start of the string.

 

Here's my mod:

let
    DataTable = BankRecords,
    BufferedFRT = List.Buffer(Table.ToRecords(Replacements)),
    Substitution = List.Accumulate(
        {"Description"},
        Table.Buffer(BankRecords),
        (s, c) =>
            Table.TransformColumns(
                s, {{c, each List.Accumulate(
                        BufferedFRT, _, (x, y) => 
                                        if Text.StartsWith(x, y[Find])
					then
					Text.Replace(x, y[Find], y[Replace])
					else
					x)}}
            )
    )
in
   Substitution

This is so easy though!

 

You could just do a join:

NewStep = Table.Join(PriorStep, {"Description"}, ReplacementTable, {"NameOfColumnOneOfReplacementTable"}, JoinKind.LeftOuter)

Then add a column:

NewColumn = Table.AddColumn(NewStep, "Final Column", each if [NameOfColumnOneOfReplacementTable] = null then [Description] else [NameOfColumnTwoReplacementTable], type text)

 

---Nate


I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!
prankin
Frequent Visitor

Here's a sample of the Find and Replace columns from the Replacements Table:

FindReplace
*ABMABM
*CRCR
*EBEB
*FCYFCY
*INTERBANKINTERBANK
,ABMABM
• ERICANAMERICAN
•BMABM
•STERCARDMASTERCARD
1 •STERCARDMASTERCARD
1NVVW1WWW
3usinessBusiness
C H Q#CHQ#
C1-[Q4CHQ#
C9Q#CHQ#
CHE! UECHEQUE
CheaueCHEQUE
CHEOUECHEQUE
CHQ*CHQ
CHQ4CHQ#
CI13CCIBC
CON TIRECDN TIRE
E TERE TFR
prankin
Frequent Visitor

Here's a anonymized table of data from BankRecords with some of the OCR errors I encounter in the Description column.

IndexAccount NameAccount NumberAccount TypeBranch ReferenceDateDescriptionDebitCreditValueBalance
1Jane Doe1234-55678-999ChqBank 13/14/2001#290 WINNERS66.08 (66.08)(66.08)
2Jane Doe1234-55678-999ChqBank 14/14/2000$ 11.497-ELEVE11.65 (11.65)(77.73)
3Jane Doe1234-55678-999ChqBank 11/25/2001(VALUE ON LIQ)11.31 (11.31)(89.04)
4Jane Doe1234-55678-999ChqBank 19/12/2000*ABM CASH W/D WAL MART201.40 (201.40)(290.44)
5Jane Doe1234-55678-999ChqBank 14/30/2001*CR MEMO FROM 321-123-321-123 340.00340.0049.56
6Jane Doe1234-55678-999ChqBank 19/11/2000*EB BILL PYMT AMEX REGULAR3,479.05 (3,479.05)(3,429.49)
7Jane Doe1234-55678-999ChqBank 19/15/2000*FCY TRANSFER 49,980.0049,980.0046,550.51
8Jane Doe1234-55678-999ChqBank 111/19/2000*INTERBANK TRF M.F.C.  21,412.5821,412.5867,963.09
9Jane Doe1234-55678-999ChqBank 19/12/2000,ABM CASH W/D WAL MART201.40 (201.40)67,761.69
10Jane Doe1234-55678-999ChqBank 19/25/2000• ERICAN EXPRESS REGULAR240.00 (240.00)67,521.69
11Jane Doe1234-55678-999ChqBank 19/11/2000•BM DEPOSIT  4,000.004,000.0071,521.69
12Jane Doe1234-55678-999ChqBank 19/25/2000•STERCARD, BMO2,000.00 (2,000.00)69,521.69
13Jane Doe1234-55678-999ChqBank 19/25/20001 •STERCARD, BMO1,500.00 (1,500.00)68,021.69
14Jane Doe1234-55678-999ChqBank 13/23/20011I301 TFR-FR 321-123-321-123 100.00100.0068,121.69
15Jane Doe1234-55678-999ChqBank 112/21/20011NVVW 3RD PTY DEP 2,500.002,500.0070,621.69
16Jane Doe1234-55678-999ChqBank 12/2/20003USINESS ODP FEE10.00 (2,000.00)68,621.69
17Jane Doe1234-55678-999ChqBank 14/20/2001ABM CASH ADVANCE700.00 (700.00)67,921.69
18Jane Doe1234-55678-999ChqBank 19/13/2000ABM CASH W/D500.00 (500.00)67,421.69
19Jane Doe1234-55678-999ChqBank 11/29/2001ACCOUNT SERVICE CHARG10.00 (10.00)67,411.69
20Jane Doe1234-55678-999ChqBank 112/27/2000Bens Market18.77 (18.77)67,392.92
21Jane Doe1234-55678-999ChqBank 13/2/2001C H Q#000061,000.00 335.1967,728.11
22Jane Doe1234-55678-999ChqBank 13/2/2001C1-[Q4000061,000.00 51.9567,780.06
23Jane Doe1234-55678-999ChqBank 13/2/2001C9Q#000061,000.00 (50.00)67,730.06
24Jane Doe1234-55678-999ChqBank 112/31/2001CHE! UE DEPOSIT 8,191.95171.7267,901.78
25Jane Doe1234-55678-999ChqBank 15/14/2001Cheaue #11865,000.00 16.2067,917.98
26Jane Doe1234-55678-999ChqBank 12/6/2001CHEOUE #70285.00 126.6868,044.66
27Jane Doe1234-55678-999ChqBank 110/2/2000CHQ* 98 3700138078250.00 4,668.3472,713.00
28Jane Doe1234-55678-999ChqBank 13/2/2001CHQ4000061,000.00 56.4872,769.48
29Jane Doe1234-55678-999ChqBank 13/27/2001CI13C VISA R2Q4J61,000.00 27.9972,797.47
30Jane Doe1234-55678-999ChqBank 14/27/2000CON TIRE MC100.00 (112.50)72,684.97
31Jane Doe1234-55678-999ChqBank 11/15/2001E TER C0Zxb8E51,000.00 (119.84)72,565.13
AlB
Super User
Super User

Hi  @prankin 

Can you show at least a few rows of the BankRecords table? In text-taular format so that the contents can be easily copied

Please mark the question solved when done and consider giving a thumbs up if posts are helpful.

Contact me privately for support with any larger-scale BI needs, tutoring, etc.

Cheers 

SU18_powerbi_badge

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors