Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hi,
I have a bunch, quite a lot, of csv files that have the annoying property that the date of the registers is only in one value of the first row of each file (and in the filename). As an example, 20200101.csv might look like this:
Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Col8 |
1 | 2 | 3 | 4 | 5 | 01/01/2020 | 7 | 8 |
a | b | c | d | e | f | g | h |
a | b | c | d | e | f | g | h |
a | b | c | d | e | f | g | h |
a | b | c | d | e | f | g | h |
I've been able to load them in bulk using the "load from folder" thing in PowerQuery inside PowerBI, and now what I've got in the Power Query tool is something similar to this:
Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Col8 |
1 | 2 | 3 | 4 | 5 | 01/01/2020 | 7 | 8 |
a | b | c | d | e | f | g | h |
a | b | c | d | e | f | g | h |
a | b | c | d | e | f | g | h |
a | b | c | d | e | f | g | h |
1 | 2 | 3 | 4 | 5 | 02/01/2020 | 7 | 8 |
a | b | c | d | e | f | g | h |
a | b | c | d | e | f | g | h |
a | b | c | d | e | f | g | h |
a | b | c | d | e | f | g | h |
1 | 2 | 3 | 4 | 5 | 03/01/2020 | 7 | 8 |
a | b | c | d | e | f | g | h |
a | b | c | d | e | f | g | h |
a | b | c | d | e | f | g | h |
a | b | c | d | e | f | g | h |
1 | 2 | 3 | 4 | 5 | 04/01/2020 | 7 | 8 |
a | b | c | d | e | f | g | h |
a | b | c | d | e | f | g | h |
a | b | c | d | e | f | g | h |
a | b | c | d | e | f | g | h |
But what I really need to get at the end of the transformation is more like this:
Col1 | Col2 | Col3 | Col4 | Col5 | Col6 | Col7 | Col8 | Col9 |
a | b | c | d | e | f | g | h | 01/01/2020 |
a | b | c | d | e | f | g | h | 01/01/2020 |
a | b | c | d | e | f | g | h | 01/01/2020 |
a | b | c | d | e | f | g | h | 01/01/2020 |
a | b | c | d | e | f | g | h | 02/01/2020 |
a | b | c | d | e | f | g | h | 02/01/2020 |
a | b | c | d | e | f | g | h | 02/01/2020 |
a | b | c | d | e | f | g | h | 02/01/2020 |
a | b | c | d | e | f | g | h | 03/01/2020 |
a | b | c | d | e | f | g | h | 03/01/2020 |
a | b | c | d | e | f | g | h | 03/01/2020 |
a | b | c | d | e | f | g | h | 03/01/2020 |
a | b | c | d | e | f | g | h | 04/01/2020 |
a | b | c | d | e | f | g | h | 04/01/2020 |
a | b | c | d | e | f | g | h | 04/01/2020 |
a | b | c | d | e | f | g | h | 04/01/2020 |
Perhaps it's a really easy thing to do, and I am so sorry if it really is, but I've been looking at all the transformations and options I have been able to find, and I can't figure out how to achieve this.
Any insight?
Thanks in advance.
Josep
Solved! Go to Solution.
Hi @jsanchezm ,
Create a blank query in Query Editor and paste this mcode:
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTICYmMgNgFiUyA2MNQHIiMDIwMgxxyILZRidaKVEoGsJCBOBuIUIE4F4jQgTgfiDDqqwOpko6HnZOOh52STwevkWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Col1 = _t, Col2 = _t, Col3 = _t, Col4 = _t, Col5 = _t, Col6 = _t, Col7 = _t, Col8 = _t]), #"Changed Type" = Table.TransformColumnTypes(Source,{{"Col1", type text}, {"Col2", type text}, {"Col3", type text}, {"Col4", type text}, {"Col5", type text}, {"Col6", type text}, {"Col7", type text}, {"Col8", type text}}), #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each try Date.FromText([Col6]) otherwise null), #"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each if [Custom] = null then [Col6] else null), #"Added Custom2" = Table.AddColumn(#"Added Custom1", "Custom.2", each if Text.Remove([Col7], {"0".."9"}) = "" then null else Text.Remove([Col7], {"0".."9"})), #"Added Custom3" = Table.AddColumn(#"Added Custom2", "Custom.3", each if Text.Remove([Col8], {"0".."9"}) = "" then null else Text.Remove([Col8], {"0".."9"})), #"Removed Columns" = Table.RemoveColumns(#"Added Custom3",{"Col6", "Col7", "Col8"}), #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Custom", "Col9"}, {"Custom.1", "Col6"}, {"Custom.2", "Col7"}, {"Custom.3", "Col8"}}), #"Reordered Columns" = Table.ReorderColumns(#"Renamed Columns",{"Col1", "Col2", "Col3", "Col4", "Col5", "Col6", "Col7", "Col8", "Col9"}), #"Filled Up" = Table.FillUp(#"Reordered Columns",{"Col6", "Col7", "Col8"}), #"Filled Down" = Table.FillDown(#"Filled Up",{"Col9"}) in #"Filled Down"
If it works, use it on your sample file.
Hi @jsanchezm ,
Create a blank query in Query Editor and paste this mcode:
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTICYmMgNgFiUyA2MNQHIiMDIwMgxxyILZRidaKVEoGsJCBOBuIUIE4F4jQgTgfiDDqqwOpko6HnZOOh52STwevkWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Col1 = _t, Col2 = _t, Col3 = _t, Col4 = _t, Col5 = _t, Col6 = _t, Col7 = _t, Col8 = _t]), #"Changed Type" = Table.TransformColumnTypes(Source,{{"Col1", type text}, {"Col2", type text}, {"Col3", type text}, {"Col4", type text}, {"Col5", type text}, {"Col6", type text}, {"Col7", type text}, {"Col8", type text}}), #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each try Date.FromText([Col6]) otherwise null), #"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each if [Custom] = null then [Col6] else null), #"Added Custom2" = Table.AddColumn(#"Added Custom1", "Custom.2", each if Text.Remove([Col7], {"0".."9"}) = "" then null else Text.Remove([Col7], {"0".."9"})), #"Added Custom3" = Table.AddColumn(#"Added Custom2", "Custom.3", each if Text.Remove([Col8], {"0".."9"}) = "" then null else Text.Remove([Col8], {"0".."9"})), #"Removed Columns" = Table.RemoveColumns(#"Added Custom3",{"Col6", "Col7", "Col8"}), #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Custom", "Col9"}, {"Custom.1", "Col6"}, {"Custom.2", "Col7"}, {"Custom.3", "Col8"}}), #"Reordered Columns" = Table.ReorderColumns(#"Renamed Columns",{"Col1", "Col2", "Col3", "Col4", "Col5", "Col6", "Col7", "Col8", "Col9"}), #"Filled Up" = Table.FillUp(#"Reordered Columns",{"Col6", "Col7", "Col8"}), #"Filled Down" = Table.FillDown(#"Filled Up",{"Col9"}) in #"Filled Down"
If it works, use it on your sample file.
after a little cleaning
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTICYmMgNgFiUyA2MNQHIiMDIwMgxxyILZRidaKVEoGsJCBOBuIUIE4F4jQgTgfiDDqqwOpko6HnZOOh52STwevkWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Col1 = _t, Col2 = _t, Col3 = _t, Col4 = _t, Col5 = _t, Col6 = _t, Col7 = _t, Col8 = _t]),
dc = Table.DuplicateColumn(Source, "Col8", "Col9"),
grp= Table.Group(dc, "Col1", {"_", (r)=> Table.Skip(Table.TransformColumns(r, {"Col9", each r[Col6]{0}}),1)},GroupKind.Local, (x,y)=>Number.From(x=y)),
#"Expanded _1" = Table.ExpandTableColumn(grp, "_", {"Col2", "Col3", "Col4", "Col5", "Col6", "Col7", "Col8", "Col9"}, {"Col2", "Col3", "Col4", "Col5", "Col6", "Col7", "Col8", "Col9"})
in
#"Expanded _1"
HI!
First of all, I can't really express how grateful I am to you both for your help, I would never have found this solution by myself!
After a bit of tinkering with it (or a little more than a bit...but that's my fault, not yours) I've made it work as needed. The thing is that my sample file was "a little bit" too simplified, and had to get to understand what your steps did to adapt it to work. Actually, I prefer it this way, because I've learned a lot in the works.
Look, this is my final result at the moment:
And this is the query that creates all this mess:
let
Origen = Folder.Files("C:\Users\XXXXXXXXXXXXXX\Desktop\gener"),
#"Files filtrades" = Table.SelectRows(Origen, each Text.EndsWith([Name], ".JOU")),
#"Columnes suprimides" = Table.RemoveColumns(#"Files filtrades",{"Attributes", "Folder Path", "Date created", "Date modified", "Date accessed", "Extension"}),
#"Divideix la columna per delimitador" = Table.SplitColumn(#"Columnes suprimides", "Name", Splitter.SplitTextByDelimiter(".", QuoteStyle.Csv), {"Name.1", "Name.2"}),
#"Tipus canviat" = Table.TransformColumnTypes(#"Divideix la columna per delimitador",{{"Name.1", Int64.Type}, {"Name.2", type text}}),
#"Fitxers ocults filtrats1" = Table.SelectRows(#"Tipus canviat", each [Attributes]?[Hidden]? <> true),
#"Fitxers ocults filtrats2" = Table.SelectRows(#"Fitxers ocults filtrats1", each [Attributes]?[Hidden]? <> true),
#"Invoqueu una funció personalitzada1" = Table.AddColumn(#"Fitxers ocults filtrats2", "Transforma el fitxer (3)", each #"Transforma el fitxer (3)"([Content])),
#"Fitxers ocults filtrats3" = Table.SelectRows(#"Invoqueu una funció personalitzada1", each [Attributes]?[Hidden]? <> true),
#"Invoqueu una funció personalitzada2" = Table.AddColumn(#"Fitxers ocults filtrats3", "Transforma el fitxer (4)", each #"Transforma el fitxer (4)"([Content])),
#"Altres columnes suprimides1" = Table.SelectColumns(#"Invoqueu una funció personalitzada2", {"Transforma el fitxer (4)"}),
#"Columna de taula expandida1" = Table.ExpandTableColumn(#"Altres columnes suprimides1", "Transforma el fitxer (4)", Table.ColumnNames(#"Transforma el fitxer (4)"(#"Fitxer d'exemple (4)"))),
#"Tipus canviat1" = Table.TransformColumnTypes(#"Columna de taula expandida1",{{"Column1", type time}, {"Column2", type text}, {"Column3", Int64.Type}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}}),
#"DupliquemCol4" = Table.DuplicateColumn(#"Tipus canviat1", "Column5", "Column7"),
#"Agrupem01" = Table.Group(#"DupliquemCol4", "Column2", {"_", (r)=> Table.Skip(Table.TransformColumns(r, {"Column7", each r[Column5]{0}}),1)},GroupKind.Local, (x,y)=>Number.From(x=y)),
#"Expandim01" = Table.ExpandTableColumn(#"Agrupem01", "_", {"Column1", "Column3", "Column4", "Column5", "Column6", "Column7"}, {"Column1", "Column3", "Column4", "Column5", "Column6", "Column7"}),
#"Valor substituït" = Table.ReplaceValue(Expandim01,".",",",Replacer.ReplaceText,{"Column5"}),
#"Tipus canviat2" = Table.TransformColumnTypes(#"Valor substituït",{{"Column5", Currency.Type}}),
#"Columnes suprimides1" = Table.RemoveColumns(#"Tipus canviat2",{"Column4", "Column2"}),
#"Divideix la columna per delimitador1" = Table.SplitColumn(#"Columnes suprimides1", "Column7", Splitter.SplitTextByDelimiter(".", QuoteStyle.Csv), {"Column7.1", "Column7.2"}),
#"Tipus canviat3" = Table.TransformColumnTypes(#"Divideix la columna per delimitador1",{{"Column7.1", Int64.Type}, {"Column7.2", type text}}),
#"Columnes suprimides2" = Table.RemoveColumns(#"Tipus canviat3",{"Column7.2", "Column6"}),
#"Columnes reordenades" = Table.ReorderColumns(#"Columnes suprimides2",{"Column7.1", "Column1", "Column3", "Column5"})
in
#"Columnes reordenades"
But, (yes, there's still a "but"), my real files have 10 columns, and I can't see how can I get the program to load all of them, instead of the 6 that it consistently insists in loading. Reading the script, I don't see any limitation in columns that would cause this behaviour.
This is one real datafile content's, to show you what I mean:
11:51:50, H, 132, CDS700:09.12.00_Address:CDP4-00235, 20200101.JOU, #9783
11:51:50, d, 41934, EUR, 188.30, 0.00, 244832, a1:, 3001, #7018
14:29:23, d, 41935, EUR, 345.80, 0.00, 449582, a1:, 3015, #7013
22:26:54, d, 41936, EUR, 299.20, 0.00, 389002, a1:, 3014, #4202
22:28:41, d, 41937, EUR, 140.60, 0.00, 182822, a1:, 3013, #5728
22:41:11, d, 41938, EUR, 49.90, 0.00, 64912, a1:, 3014, #12119
23:15:09, d, 41939, EUR, 101.20, 0.00, 131602, a1:, 3014, #4456
23:32:14, d, 41940, EUR, 418.80, 0.00, 544482, a1:, 3015, #6509
Would it be better to open another thread, or can we consider this is a continuation of this one?
Thanks a lot!
the first row seems have 6 fields:
11:51:50 | H | 132 | CDS700:09.12.00_Address:CDP4-00235 | 20200101.JOU | #9783 |
don't know details about function that load in csv files, but I guess that row determine the amount of columns the file contains.
just an idea:
you could replace "f" with null, then fill-Down the col6 and last filter one of column to exclude rows containing the corrispondent number "1" in col1, "2" in col2, ...
if you need somethink more generale, colud be useful the use of Table.group in this form:
Table.Group(Source, "Col8", {"nnn", (r)=> List.Transform(r[Col6],each r[Col6]{0})},GroupKind.Local, (x,y)=>Number.From(x=y))