Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
Asantos2020
Advocate II
Advocate II

Remove duplicates, keeping the latest version of txt file for reporting

Hi guys!

Here's a good one for you, since I can't seem to find the solution for it:

 

Power BI connects to a folder, where the data is made available in txt format. In order to have high quality data, the provider generates the data as follows:

 

SourceName   Chain      Date      Product   Qty

file01               Retail A   08/12     Item 1     5

file01               Retail A   08/12     Item 2     2

file01               Retail B   08/12     Item 1     3

file01               Retail C   08/12     Item 2     5

file02              Retail A    08/13     Item 1    10

file02              Retail A    08/13     Item 2    10

file02              Retail B    08/13     Item 3    5

file02              Retail C    08/13     Item 2    10

file02              Retail A    08/12     Item 1    10 (replacing row 1)

file02              Retail A    08/12     Item 2    6 (replacing row 2)

 

Currently, Power BI is reading these files and summing items sold by chain over a time series, while it should be discarding duplicates and considering only the latest dataset received.

 

I can't figure out where I should identify the latest version to be considered and keep its data for reporting.

 

Any help is greatly appreciated.

 

Regards,

Antonio Santos

 

 

 

1 ACCEPTED SOLUTION
Zubair_Muhammad
Community Champion
Community Champion

Hi,

 

Try this.

Please see attached file for steps as well.

 

Basically I grouped by Date,Chain and Product and then used Table.Max to get SourceName with Latestversion

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSsvMSTUwVNJRCkotSczMUXAEMg2NdA0sdA0tgUzPktRcBZC0qVKsDnGqjYAMI6yqnXCYbYxVtTMOs5FcYoTmEmNMsw0NiFZuhFu5Exblxjjd4kya4biCnATlIGkzpdhYAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [SourceName = _t, Chain = _t, Date = _t, Product = _t, Qty = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"SourceName", type text}, {"Chain", type text}, {"Date", type text}, {"Product", type text}, {"Qty", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Chain", "Date", "Product"}, {{"ALL", each _, type table [SourceName=text, Chain=text, Date=text, Product=text, Qty=number]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.Max([ALL],"SourceName")),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"ALL"}),
    #"Expanded Custom" = Table.ExpandRecordColumn(#"Removed Columns", "Custom", {"SourceName", "Qty"}, {"SourceName", "Qty"})
in
    #"Expanded Custom"

Regards
Zubair

Please try my custom visuals

View solution in original post

2 REPLIES 2
Zubair_Muhammad
Community Champion
Community Champion

Hi,

 

Try this.

Please see attached file for steps as well.

 

Basically I grouped by Date,Chain and Product and then used Table.Max to get SourceName with Latestversion

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSsvMSTUwVNJRCkotSczMUXAEMg2NdA0sdA0tgUzPktRcBZC0qVKsDnGqjYAMI6yqnXCYbYxVtTMOs5FcYoTmEmNMsw0NiFZuhFu5Exblxjjd4kya4biCnATlIGkzpdhYAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [SourceName = _t, Chain = _t, Date = _t, Product = _t, Qty = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"SourceName", type text}, {"Chain", type text}, {"Date", type text}, {"Product", type text}, {"Qty", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Chain", "Date", "Product"}, {{"ALL", each _, type table [SourceName=text, Chain=text, Date=text, Product=text, Qty=number]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.Max([ALL],"SourceName")),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"ALL"}),
    #"Expanded Custom" = Table.ExpandRecordColumn(#"Removed Columns", "Custom", {"SourceName", "Qty"}, {"SourceName", "Qty"})
in
    #"Expanded Custom"

Regards
Zubair

Please try my custom visuals

Hello @Zubair_Muhammad !

 

Thanks for your response.

I've been trying to do the grouping, but without success so far, as the PC runs out of memory, trying to processo over 2mm rows.

This is an i7, though.

 

Will provide feedback as to any progresses here.

 

Cheers,

Antonio Santos

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel1

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.