Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
brunofds
Helper I
Helper I

Moving valuem from columns (update to previous question)

Hello ,

 

Previously i posted this question : Solved: Re: Moving values from columns - Microsoft Power BI Community  which was kindly replied from @Jimmy801 .

 

It was fine until i've faced new challenges from my PDF tables sources, now i'm stuck again.

 

This time i've put a fraction of my original pdf archives here: https://1drv.ms/u/s!BJPuttHgFDRRgY1myLNkrZm5G__Rlg?e=MIiQ9K

 

I did some small cleaning with this code:

 

let
Fonte = Folder.Files(path to onedrive folder),
#"FiltraPDF, APRM e MULTAS OUTORG" = Table.SelectRows(Fonte, each Text.EndsWith([Name], "pdf") and not Text.Contains([Name], "APRM") and not Text.Contains([Name], "MULT")),
#"Personalização Adicionada" = Table.AddColumn(#"FiltraPDF, APRM e MULTAS OUTORG", "Personalizar", each Pdf.Tables([Content])),
#"Personalizar Expandido" = Table.ExpandTableColumn(#"Personalização Adicionada", "Personalizar", {"Id", "Name", "Kind", "Data"}, {"Id", "Name.1", "Kind", "Data"}),
#"Linhas Filtradas3" = Table.SelectRows(#"Personalizar Expandido", each ([Kind] = "Table")),
#"Personalizar.Data Expandido" = Table.ExpandTableColumn(#"Linhas Filtradas3", "Data", {"Column1", "Column2", "Column3", "Column4", "Column5"}, {"Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4", "Data.Column5"}),
#"Removed Other Columns" = Table.SelectColumns(#"Personalizar.Data Expandido",{"Name", "Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4", "Data.Column5"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Other Columns", each ([Data.Column3] <> "SALDO ANTERIOR" and [Data.Column3] <> "Taxa de Administração - Agente Financeiro Cobrança ref. Mês Anterior" and [Data.Column3] <> "TOTAL") and ([Data.Column2] <> "SALDO ANTERIOR" and [Data.Column2] <> "Taxa de Administração - Agente Financeiro Cobrança ref. Mês Anterior" and [Data.Column2] <> "TOTAL" and [Data.Column2] <> "^{( - )} Taxa de Administração - Agente Financeiro Cobrança ref. Mês Anterior") and ([Data.Column1] <> "TOTAL"))
in
#"Filtered Rows"

 

As i said in the original topic, although those pdf are table structured, they dont follow the exactly same structure.

 

i.e. Data.Colum3, Data.Colum4 e Data.Colum5 have financial values, although only one column of this type is desirable.

 

Also some columns were split in, but should have been merged:

 

brunofds_1-1621002792011.png

 

But i cant just merge them because there are values in Data.Column3, so merging them would result in a undesirable results.

 

 

 

OUTPUT DESIRED (just a small fraction as example!😞

 

CBHAnoMesAtributeData Taxa Agente FinanceiroData SaldoR$
ALPA20212 null28/02/2021136613632
ALPA20213 null28/02/2021136613632
AP20212 null28/02/2021552702984
AP20213 null28/02/2021552702984
ALPA20211 01/02/2021null214775
AP20211 01/02/2021null875524
ALPA20203Lançamentos à Créditonullnullnull
ALPA20203( + )Rendas de Aplicações Financeiras - Cobrançanullnull45935
ALPA20203( + )Aportes Secretarianullnull0
ALPA20203( + )Rendimento Repassado pelo Tomadornullnull0
ALPA20203( + )Devolução de parcelas - Contratos Não Reembolsáveisnullnull0
ALPA20203( + )Pagamento de parcelas - Contratos com Retornonullnull0
ALPA20203( + )Crédito de Cobrançanullnull2428
ALPA20203Lançamentos à Débitonullnullnull
ALPA20203Liberação de parcelas para contratos não reembolsáveisnullnull0
ALPA20203Liberação de parcelas para contratos reembolsáveisnullnull0
ALPA20203Repasses para Cobertura de Custos Operacionaisnullnull0
ALPA20203Repasses sobre Valores Arrecadadosnullnull0
ALPA20203Restituição de valores cobrados pelo uso da águanullnull0
ALPA20203Resgate para transferência ao DAEE ref.Pagto. de tarifas de cobrançanullnull0

 

 

The first 3 columnns are base on the PDF archive name :

RELATORIO COBRANCA ALPA_2020_03.pdf

ALPA = CBH

2020 = ANO  (YEAR)

03 = MES (MONTH) 

 

Data Taxa Agente Financeiro

Based on rows with "Taxa de Administração do Agente Financeiro cobrada em".  Note that are rows without the space between "em" and the date itself, like in the last examples below.

 

 

brunofds_0-1621001920137.png

 

 

 

Data Saldo

Based on rows with "SALDO EM 'dd/mm/yyyy'". Those values are also presented in more than one column

 

Atribute

Are the main atributes, which i'll do the calculation with the "R$" column

 

R$

The values itself, expressed in brazilian reais (BRL)

 

_____________

 

I thought i had completed this task within this code:

let
Fonte = Folder.Files("one drive link"),
#"FiltraPDF, APRM e MULTAS OUTORG" = Table.SelectRows(Fonte, each Text.EndsWith([Name], "pdf") and not Text.Contains([Name], "APRM") and not Text.Contains([Name], "MULT")),
#"Personalização Adicionada" = Table.AddColumn(#"FiltraPDF, APRM e MULTAS OUTORG", "Personalizar", each Pdf.Tables([Content])),
#"Personalizar Expandido" = Table.ExpandTableColumn(#"Personalização Adicionada", "Personalizar", {"Id", "Name", "Kind", "Data"}, {"Personalizar.Id", "Personalizar.Name", "Personalizar.Kind", "Personalizar.Data"}),
#"Linhas Filtradas3" = Table.SelectRows(#"Personalizar Expandido", each ([Personalizar.Kind] = "Table")),
#"Personalizar.Data Expandido" = Table.ExpandTableColumn(#"Linhas Filtradas3", "Personalizar.Data", {"Column1", "Column2", "Column3", "Column4", "Column5"}, {"Personalizar.Data.Column1", "Personalizar.Data.Column2", "Personalizar.Data.Column3", "Personalizar.Data.Column4", "Personalizar.Data.Column5"}),
#"PADRAO ""COBRANCA"" NO NOME" = Table.ReplaceValue(#"Personalizar.Data Expandido","COBR_","COBRANCA ",Replacer.ReplaceText,{"Name"}),
#"PADRAO ""COBRANCA"" NO NOME2" = Table.ReplaceValue(#"PADRAO ""COBRANCA"" NO NOME","COBRANÇA","COBRANCA",Replacer.ReplaceText,{"Name"}),
#"Removed Other Columns" = Table.SelectColumns(#"PADRAO ""COBRANCA"" NO NOME2",{"Name", "Personalizar.Data.Column1", "Personalizar.Data.Column2", "Personalizar.Data.Column3", "Personalizar.Data.Column4", "Personalizar.Data.Column5"}),
#"Filtra linhas inúteis" = Table.SelectRows(#"Removed Other Columns", each ([Personalizar.Data.Column2] <> "SALDO ANTERIOR" and [Personalizar.Data.Column2] <> "TOTAL") and ([Personalizar.Data.Column1] <> "AG. 1897-X - SETOR PÚBLICO SÃO PAULO / MSE" and [Personalizar.Data.Column1] <> "FEHIDRO COBRANÇA" and [Personalizar.Data.Column1] <> "RESUMO DA MOVIMENTAÇÃO" and [Personalizar.Data.Column1] <> "TOTAL" and [Personalizar.Data.Column1] <> "[image]")),
Custom3 = #"Filtra linhas inúteis",
Custom2 = Table.ColumnNames(Custom3),
#"Troca Null por Blank" = Table.ReplaceValue(Custom3,null,"",Replacer.ReplaceValue,Custom2),
Custom1 = Table.AddColumn(#"Troca Null por Blank", "CreateNewTable", each [1 = if Text.StartsWith([Personalizar.Data.Column1],"Taxa") then [Personalizar.Data.Column1] else [Personalizar.Data.Column2],
2 = if Text.StartsWith([Personalizar.Data.Column1],"Taxa") then [Personalizar.Data.Column2] else [Personalizar.Data.Column3],
3= if Text.StartsWith([Personalizar.Data.Column1],"Taxa") then [Personalizar.Data.Column3] else [Personalizar.Data.Column4],
4 = if Text.StartsWith([Personalizar.Data.Column1],"Taxa") then [Personalizar.Data.Column4] else [Personalizar.Data.Column5]
]),
#"Expanded CreateNewTable" = Table.ExpandRecordColumn(Custom1, "CreateNewTable", {"1", "2", "3", "4"}, {"CreateNewTable.1", "CreateNewTable.2", "CreateNewTable.3", "CreateNewTable.4"}),
#"Removed Other Columns2" = Table.SelectColumns(#"Expanded CreateNewTable",{"Name", "CreateNewTable.1", "CreateNewTable.2", "CreateNewTable.3", "CreateNewTable.4"}),
#"Merged Columns1" = Table.CombineColumns(#"Removed Other Columns2",{"CreateNewTable.3", "CreateNewTable.4"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged")

//= Table.AddColumn(Source, "CreateNewTable", each [FirstColumn = if Text.StartsWith([A],"Taxa de Administração do Agente Financeiro cobrada em") then [A] else [B],
//SecondColumns = if Text.StartsWith([A],"Taxa de Administração do Agente Financeiro cobrada em") then [B] else [C]])
,
#"É numero?" = Table.AddColumn(#"Merged Columns1", "isnumber?", each try Number.From([Merged]) otherwise "", type number),
#"eh texto?" = Table.AddColumn(#"É numero?", "isText", each try Text.Contains([CreateNewTable.2], "a") otherwise ""),
#"eh texto 2" = Table.AddColumn(#"eh texto?", "isText2", each if ([isText]) = true then [CreateNewTable.1] & "" & "" & [CreateNewTable.2] else [CreateNewTable.1], type text),
#"Removed Other Columns1" = Table.SelectColumns(#"eh texto 2",{"Name", "isnumber?", "isText2"}),
#"Reordered Columns" = Table.ReorderColumns(#"Removed Other Columns1",{"isText2", "isnumber?"}),
#"Renamed Columns" = Table.RenameColumns(#"Reordered Columns",{ {"isText2", "Atributo"}, {"isnumber?", "R$"}}),
#"Tira o .PDF" = Table.TransformColumns(#"Renamed Columns", {{"Name", each Text.BeforeDelimiter(_, ".pdf"), type text}}),
#"Duplicated Column" = Table.DuplicateColumn(#"Tira o .PDF", "Name", "Name - Copy"),
#"Extracted Last Characters" = Table.TransformColumns(#"Duplicated Column", {{"Name - Copy", each Text.End(_, 7), type text}}),
#"Filtered Rows" = Table.SelectRows(#"Extracted Last Characters", each true),
#"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows", "Name - Copy", Splitter.SplitTextByDelimiter("_", QuoteStyle.Csv), {"Ano", "Mes"}),
#"Extracted Text Between Delimiters" = Table.TransformColumns(#"Split Column by Delimiter", {{"Name", each Text.BetweenDelimiters(_, " ", "_", 1, 0), type text}}),
#"Data Saldo" = Table.SplitColumn(#"Extracted Text Between Delimiters", "Atributo", Splitter.SplitTextByDelimiter("SALDO EM ", QuoteStyle.Csv), {"Atributo", "Data Saldo"}),
#"Data Taxa AF" = Table.SplitColumn(#"Data Saldo", "Atributo", Splitter.SplitTextByDelimiter("Taxa de Administração do Agente Financeiro cobrada em ", QuoteStyle.None), {"Atributo", "Data Taxa AF"}),
#"Renamed Columns1" = Table.RenameColumns(#"Data Taxa AF",{{"Name", "CBH"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns1",{{"CBH", type text}, {"Atributo", type text}, {"Data Taxa AF", type date}, {"Data Saldo", type date}, {"R$", type number}, {"Ano", Int64.Type}, {"Mes", Int64.Type}}),
#"Filtra Tx Adm Mes Anterior" = Table.SelectRows(#"Changed Type", each ([Atributo] <> "( - )Taxa de Administração - Agente Financeiro Cobrança ref. Mês Anterior" and [Atributo] <> "Taxa de Administração - Agente Financeiro Cobrança ref. Mês Anterior" and [Atributo] <> "^{( - )} Taxa de Administração - Agente Financeiro Cobrança ref. Mês Anterior"))
in
#"Filtra Tx Adm Mes Anterior"

 

 

but i still got values (in BRL) in my Atributo Column:

brunofds_2-1621003126856.png

 

 

 

I glad appreciate any help provided.

 

 

Cheers

 

8 REPLIES 8
brunofds
Helper I
Helper I

@ImkeF  You helped me once, i summon you, power query queen 👑 !

 

 

Sorry, but I'm too busy right now and cannot pick this up.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Thanks anyhow. I've learned a lot from posts/blog.

watkinnc
Super User
Super User

Here you are.  This will work for all of the pdfs in your folder:

let
Source = Pdf.Tables(File.Contents("xxxxxxxxxxxxxxxxxxxxx.pdf"), [Implementation="1.2"]),
#"Filtered Rows" = Table.SelectRows(Source, each [Kind] = "Table"),
#"Added Custom1" = Table.AddColumn(#"Filtered Rows", "Columns", each Table.ColumnCount([Data])),
#"Sorted Rows" = Table.Sort(#"Added Custom1",{{"Columns", Order.Descending}}),
#"Removed Other Columns1" = Table.SelectColumns(#"Sorted Rows",{"Id", "Data"}),
#"Expanded Data" = Table.ExpandTableColumn(#"Removed Other Columns1", "Data", {"Column1", "Column2", "Column3", "Column4"}, {"Column1", "Column2", "Column3", "Column4"}),
#"Sorted Rows1" = Table.Sort(#"Expanded Data",{{"Id", Order.Ascending}}),
#"Removed Columns" = Table.RemoveColumns(#"Sorted Rows1",{"Id"})
in
#"Removed Columns"

 

watkinnc_0-1621426549200.png

 

--Nate


I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!

Hi Nate,

 

Still not working. 

I need something to deal with a folder path, not a file, hence 'File.Contents' it's not the best alternative since there will be many pdf files.

 

Anyhow, i've tested with your query, and the problems are still appearing.

 

Check for instance with this file: RELATÓRIO COBR_BPG_2020_01.pdf

 

Captura de tela 2021-05-19 101008.png

 

Red line circle: can't have values in the same colum as my atributes

Blue circle: SALDO EM with the date

Green line: "Taxa de administração..." is an atributte. Should be in the correct column, as i said earlier..

 

 

and this happened checking only one file... probably there would be many more when considering the other files, since the pdfs are unstructured.

 

 

Glad appreciatte your efforts, though

brunofds
Helper I
Helper I

Anyone ?  This is driving me nuts!

 

Thanks...

watkinnc
Super User
Super User

I would actually improve upon the first solution to make it more dynamic:

NewColumn1= Table.AddColumn(PriorStep, "Fixed", each if [A] = null then [B] else [A]),

NewColumn2 = Table.AddColumn(NewColumn1, "Fixed2", each if [C] = null then [B] else [C])

Then remove your original columns.

As far as your table splitting issue, I would add a space after each em by using replace values "em", "em ". That way you can split by the last space instead of by that whole text string. 

--Nate

 

 


I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!

Hi there.

 

I'm afraid this solution did not work as expected. There are still "values" fields (R$) where it doesnt suppose to be....

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors