Moving valuem from columns (update to previous que...

brunofds · ‎05-14-2021

Hello ,

Previously i posted this question : Solved: Re: Moving values from columns - Microsoft Power BI Community which was kindly replied from @Jimmy801 .

It was fine until i've faced new challenges from my PDF tables sources, now i'm stuck again.

This time i've put a fraction of my original pdf archives here: https://1drv.ms/u/s!BJPuttHgFDRRgY1myLNkrZm5G__Rlg?e=MIiQ9K

I did some small cleaning with this code:

let
Fonte = Folder.Files(path to onedrive folder),
#"FiltraPDF, APRM e MULTAS OUTORG" = Table.SelectRows(Fonte, each Text.EndsWith([Name], "pdf") and not Text.Contains([Name], "APRM") and not Text.Contains([Name], "MULT")),
#"Personalização Adicionada" = Table.AddColumn(#"FiltraPDF, APRM e MULTAS OUTORG", "Personalizar", each Pdf.Tables([Content])),
#"Personalizar Expandido" = Table.ExpandTableColumn(#"Personalização Adicionada", "Personalizar", {"Id", "Name", "Kind", "Data"}, {"Id", "Name.1", "Kind", "Data"}),
#"Linhas Filtradas3" = Table.SelectRows(#"Personalizar Expandido", each ([Kind] = "Table")),
#"Personalizar.Data Expandido" = Table.ExpandTableColumn(#"Linhas Filtradas3", "Data", {"Column1", "Column2", "Column3", "Column4", "Column5"}, {"Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4", "Data.Column5"}),
#"Removed Other Columns" = Table.SelectColumns(#"Personalizar.Data Expandido",{"Name", "Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4", "Data.Column5"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Other Columns", each ([Data.Column3] <> "SALDO ANTERIOR" and [Data.Column3] <> "Taxa de Administração - Agente Financeiro Cobrança ref. Mês Anterior" and [Data.Column3] <> "TOTAL") and ([Data.Column2] <> "SALDO ANTERIOR" and [Data.Column2] <> "Taxa de Administração - Agente Financeiro Cobrança ref. Mês Anterior" and [Data.Column2] <> "TOTAL" and [Data.Column2] <> "^{( - )} Taxa de Administração - Agente Financeiro Cobrança ref. Mês Anterior") and ([Data.Column1] <> "TOTAL"))
in
#"Filtered Rows"

As i said in the original topic, although those pdf are table structured, they dont follow the exactly same structure.

i.e. Data.Colum3, Data.Colum4 e Data.Colum5 have financial values, although only one column of this type is desirable.

Also some columns were split in, but should have been merged:

But i cant just merge them because there are values in Data.Column3, so merging them would result in a undesirable results.

OUTPUT DESIRED (just a small fraction as example!😞

CBH	Ano	Mes	Atribute	Data Taxa Agente Financeiro	Data Saldo	R$
ALPA	2021	2		null	28/02/2021	136613632
ALPA	2021	3		null	28/02/2021	136613632
AP	2021	2		null	28/02/2021	552702984
AP	2021	3		null	28/02/2021	552702984
ALPA	2021	1		01/02/2021	null	214775
AP	2021	1		01/02/2021	null	875524
ALPA	2020	3	Lançamentos à Crédito	null	null	null
ALPA	2020	3	( + )Rendas de Aplicações Financeiras - Cobrança	null	null	45935
ALPA	2020	3	( + )Aportes Secretaria	null	null	0
ALPA	2020	3	( + )Rendimento Repassado pelo Tomador	null	null	0
ALPA	2020	3	( + )Devolução de parcelas - Contratos Não Reembolsáveis	null	null	0
ALPA	2020	3	( + )Pagamento de parcelas - Contratos com Retorno	null	null	0
ALPA	2020	3	( + )Crédito de Cobrança	null	null	2428
ALPA	2020	3	Lançamentos à Débito	null	null	null
ALPA	2020	3	Liberação de parcelas para contratos não reembolsáveis	null	null	0
ALPA	2020	3	Liberação de parcelas para contratos reembolsáveis	null	null	0
ALPA	2020	3	Repasses para Cobertura de Custos Operacionais	null	null	0
ALPA	2020	3	Repasses sobre Valores Arrecadados	null	null	0
ALPA	2020	3	Restituição de valores cobrados pelo uso da água	null	null	0
ALPA	2020	3	Resgate para transferência ao DAEE ref.Pagto. de tarifas de cobrança	null	null	0

The first 3 columnns are base on the PDF archive name :

RELATORIO COBRANCA ALPA_2020_03.pdf

ALPA = CBH

2020 = ANO (YEAR)

03 = MES (MONTH)

Data Taxa Agente Financeiro

Based on rows with "Taxa de Administração do Agente Financeiro cobrada em". Note that are rows without the space between "em" and the date itself, like in the last examples below.

Data Saldo

Based on rows with "SALDO EM 'dd/mm/yyyy'". Those values are also presented in more than one column

Atribute

Are the main atributes, which i'll do the calculation with the "R$" column

R$

The values itself, expressed in brazilian reais (BRL)

_____________

I thought i had completed this task within this code:

let
Fonte = Folder.Files("one drive link"),
#"FiltraPDF, APRM e MULTAS OUTORG" = Table.SelectRows(Fonte, each Text.EndsWith([Name], "pdf") and not Text.Contains([Name], "APRM") and not Text.Contains([Name], "MULT")),
#"Personalização Adicionada" = Table.AddColumn(#"FiltraPDF, APRM e MULTAS OUTORG", "Personalizar", each Pdf.Tables([Content])),
#"Personalizar Expandido" = Table.ExpandTableColumn(#"Personalização Adicionada", "Personalizar", {"Id", "Name", "Kind", "Data"}, {"Personalizar.Id", "Personalizar.Name", "Personalizar.Kind", "Personalizar.Data"}),
#"Linhas Filtradas3" = Table.SelectRows(#"Personalizar Expandido", each ([Personalizar.Kind] = "Table")),
#"Personalizar.Data Expandido" = Table.ExpandTableColumn(#"Linhas Filtradas3", "Personalizar.Data", {"Column1", "Column2", "Column3", "Column4", "Column5"}, {"Personalizar.Data.Column1", "Personalizar.Data.Column2", "Personalizar.Data.Column3", "Personalizar.Data.Column4", "Personalizar.Data.Column5"}),
#"PADRAO ""COBRANCA"" NO NOME" = Table.ReplaceValue(#"Personalizar.Data Expandido","COBR_","COBRANCA ",Replacer.ReplaceText,{"Name"}),
#"PADRAO ""COBRANCA"" NO NOME2" = Table.ReplaceValue(#"PADRAO ""COBRANCA"" NO NOME","COBRANÇA","COBRANCA",Replacer.ReplaceText,{"Name"}),
#"Removed Other Columns" = Table.SelectColumns(#"PADRAO ""COBRANCA"" NO NOME2",{"Name", "Personalizar.Data.Column1", "Personalizar.Data.Column2", "Personalizar.Data.Column3", "Personalizar.Data.Column4", "Personalizar.Data.Column5"}),
#"Filtra linhas inúteis" = Table.SelectRows(#"Removed Other Columns", each ([Personalizar.Data.Column2] <> "SALDO ANTERIOR" and [Personalizar.Data.Column2] <> "TOTAL") and ([Personalizar.Data.Column1] <> "AG. 1897-X - SETOR PÚBLICO SÃO PAULO / MSE" and [Personalizar.Data.Column1] <> "FEHIDRO COBRANÇA" and [Personalizar.Data.Column1] <> "RESUMO DA MOVIMENTAÇÃO" and [Personalizar.Data.Column1] <> "TOTAL" and [Personalizar.Data.Column1] <> "[image]")),
Custom3 = #"Filtra linhas inúteis",
Custom2 = Table.ColumnNames(Custom3),
#"Troca Null por Blank" = Table.ReplaceValue(Custom3,null,"",Replacer.ReplaceValue,Custom2),
Custom1 = Table.AddColumn(#"Troca Null por Blank", "CreateNewTable", each [1 = if Text.StartsWith([Personalizar.Data.Column1],"Taxa") then [Personalizar.Data.Column1] else [Personalizar.Data.Column2],
2 = if Text.StartsWith([Personalizar.Data.Column1],"Taxa") then [Personalizar.Data.Column2] else [Personalizar.Data.Column3],
3= if Text.StartsWith([Personalizar.Data.Column1],"Taxa") then [Personalizar.Data.Column3] else [Personalizar.Data.Column4],
4 = if Text.StartsWith([Personalizar.Data.Column1],"Taxa") then [Personalizar.Data.Column4] else [Personalizar.Data.Column5]
]),
#"Expanded CreateNewTable" = Table.ExpandRecordColumn(Custom1, "CreateNewTable", {"1", "2", "3", "4"}, {"CreateNewTable.1", "CreateNewTable.2", "CreateNewTable.3", "CreateNewTable.4"}),
#"Removed Other Columns2" = Table.SelectColumns(#"Expanded CreateNewTable",{"Name", "CreateNewTable.1", "CreateNewTable.2", "CreateNewTable.3", "CreateNewTable.4"}),
#"Merged Columns1" = Table.CombineColumns(#"Removed Other Columns2",{"CreateNewTable.3", "CreateNewTable.4"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged")

//= Table.AddColumn(Source, "CreateNewTable", each [FirstColumn = if Text.StartsWith([A],"Taxa de Administração do Agente Financeiro cobrada em") then [A] else [B],
//SecondColumns = if Text.StartsWith([A],"Taxa de Administração do Agente Financeiro cobrada em") then [B] else [C]])
,
#"É numero?" = Table.AddColumn(#"Merged Columns1", "isnumber?", each try Number.From([Merged]) otherwise "", type number),
#"eh texto?" = Table.AddColumn(#"É numero?", "isText", each try Text.Contains([CreateNewTable.2], "a") otherwise ""),
#"eh texto 2" = Table.AddColumn(#"eh texto?", "isText2", each if ([isText]) = true then [CreateNewTable.1] & "" & "" & [CreateNewTable.2] else [CreateNewTable.1], type text),
#"Removed Other Columns1" = Table.SelectColumns(#"eh texto 2",{"Name", "isnumber?", "isText2"}),
#"Reordered Columns" = Table.ReorderColumns(#"Removed Other Columns1",{"isText2", "isnumber?"}),
#"Renamed Columns" = Table.RenameColumns(#"Reordered Columns",{ {"isText2", "Atributo"}, {"isnumber?", "R$"}}),
#"Tira o .PDF" = Table.TransformColumns(#"Renamed Columns", {{"Name", each Text.BeforeDelimiter(_, ".pdf"), type text}}),
#"Duplicated Column" = Table.DuplicateColumn(#"Tira o .PDF", "Name", "Name - Copy"),
#"Extracted Last Characters" = Table.TransformColumns(#"Duplicated Column", {{"Name - Copy", each Text.End(_, 7), type text}}),
#"Filtered Rows" = Table.SelectRows(#"Extracted Last Characters", each true),
#"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows", "Name - Copy", Splitter.SplitTextByDelimiter("_", QuoteStyle.Csv), {"Ano", "Mes"}),
#"Extracted Text Between Delimiters" = Table.TransformColumns(#"Split Column by Delimiter", {{"Name", each Text.BetweenDelimiters(_, " ", "_", 1, 0), type text}}),
#"Data Saldo" = Table.SplitColumn(#"Extracted Text Between Delimiters", "Atributo", Splitter.SplitTextByDelimiter("SALDO EM ", QuoteStyle.Csv), {"Atributo", "Data Saldo"}),
#"Data Taxa AF" = Table.SplitColumn(#"Data Saldo", "Atributo", Splitter.SplitTextByDelimiter("Taxa de Administração do Agente Financeiro cobrada em ", QuoteStyle.None), {"Atributo", "Data Taxa AF"}),
#"Renamed Columns1" = Table.RenameColumns(#"Data Taxa AF",{{"Name", "CBH"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns1",{{"CBH", type text}, {"Atributo", type text}, {"Data Taxa AF", type date}, {"Data Saldo", type date}, {"R$", type number}, {"Ano", Int64.Type}, {"Mes", Int64.Type}}),
#"Filtra Tx Adm Mes Anterior" = Table.SelectRows(#"Changed Type", each ([Atributo] <> "( - )Taxa de Administração - Agente Financeiro Cobrança ref. Mês Anterior" and [Atributo] <> "Taxa de Administração - Agente Financeiro Cobrança ref. Mês Anterior" and [Atributo] <> "^{( - )} Taxa de Administração - Agente Financeiro Cobrança ref. Mês Anterior"))
in
#"Filtra Tx Adm Mes Anterior"

but i still got values (in BRL) in my Atributo Column:

I glad appreciate any help provided.

Cheers

brunofds · ‎05-20-2021

@ImkeF You helped me once, i summon you, power query queen 👑 !

ImkeF · ‎05-21-2021

Sorry, but I'm too busy right now and cannot pick this up.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

brunofds · ‎05-24-2021

Thanks anyhow. I've learned a lot from posts/blog.

watkinnc · ‎05-19-2021

Here you are. This will work for all of the pdfs in your folder:

let
Source = Pdf.Tables(File.Contents("xxxxxxxxxxxxxxxxxxxxx.pdf"), [Implementation="1.2"]),
#"Filtered Rows" = Table.SelectRows(Source, each [Kind] = "Table"),
#"Added Custom1" = Table.AddColumn(#"Filtered Rows", "Columns", each Table.ColumnCount([Data])),
#"Sorted Rows" = Table.Sort(#"Added Custom1",{{"Columns", Order.Descending}}),
#"Removed Other Columns1" = Table.SelectColumns(#"Sorted Rows",{"Id", "Data"}),
#"Expanded Data" = Table.ExpandTableColumn(#"Removed Other Columns1", "Data", {"Column1", "Column2", "Column3", "Column4"}, {"Column1", "Column2", "Column3", "Column4"}),
#"Sorted Rows1" = Table.Sort(#"Expanded Data",{{"Id", Order.Ascending}}),
#"Removed Columns" = Table.RemoveColumns(#"Sorted Rows1",{"Id"})
in
#"Removed Columns"

--Nate

I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!

brunofds · ‎05-19-2021

Hi Nate,

Still not working.

I need something to deal with a folder path, not a file, hence 'File.Contents' it's not the best alternative since there will be many pdf files.

Anyhow, i've tested with your query, and the problems are still appearing.

Check for instance with this file: RELATÓRIO COBR_BPG_2020_01.pdf

Red line circle: can't have values in the same colum as my atributes

Blue circle: SALDO EM with the date

Green line: "Taxa de administração..." is an atributte. Should be in the correct column, as i said earlier..

and this happened checking only one file... probably there would be many more when considering the other files, since the pdfs are unstructured.

Glad appreciatte your efforts, though

brunofds · ‎05-19-2021

Anyone ? This is driving me nuts!

Thanks...

watkinnc · ‎05-14-2021

I would actually improve upon the first solution to make it more dynamic:

NewColumn1= Table.AddColumn(PriorStep, "Fixed", each if [A] = null then [B] else [A]),

NewColumn2 = Table.AddColumn(NewColumn1, "Fixed2", each if [C] = null then [B] else [C])

Then remove your original columns.

As far as your table splitting issue, I would add a space after each em by using replace values "em", "em ". That way you can split by the last space instead of by that whole text string.

--Nate

I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!

brunofds · ‎05-15-2021

Hi there.

I'm afraid this solution did not work as expected. There are still "values" fields (R$) where it doesnt suppose to be....

Moving valuem from columns (update to previous question)

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024