Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
Asantos2020
Advocate II
Advocate II

Remove duplicates, keeping the latest version of txt file for reporting

Hi guys!

Here's a good one for you, since I can't seem to find the solution for it:

 

Power BI connects to a folder, where the data is made available in txt format. In order to have high quality data, the provider generates the data as follows:

 

SourceName   Chain      Date      Product   Qty

file01               Retail A   08/12     Item 1     5

file01               Retail A   08/12     Item 2     2

file01               Retail B   08/12     Item 1     3

file01               Retail C   08/12     Item 2     5

file02              Retail A    08/13     Item 1    10

file02              Retail A    08/13     Item 2    10

file02              Retail B    08/13     Item 3    5

file02              Retail C    08/13     Item 2    10

file02              Retail A    08/12     Item 1    10 (replacing row 1)

file02              Retail A    08/12     Item 2    6 (replacing row 2)

 

Currently, Power BI is reading these files and summing items sold by chain over a time series, while it should be discarding duplicates and considering only the latest dataset received.

 

I can't figure out where I should identify the latest version to be considered and keep its data for reporting.

 

Any help is greatly appreciated.

 

Regards,

Antonio Santos

 

 

 

1 ACCEPTED SOLUTION
Zubair_Muhammad
Community Champion
Community Champion

Hi,

 

Try this.

Please see attached file for steps as well.

 

Basically I grouped by Date,Chain and Product and then used Table.Max to get SourceName with Latestversion

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSsvMSTUwVNJRCkotSczMUXAEMg2NdA0sdA0tgUzPktRcBZC0qVKsDnGqjYAMI6yqnXCYbYxVtTMOs5FcYoTmEmNMsw0NiFZuhFu5Exblxjjd4kya4biCnATlIGkzpdhYAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [SourceName = _t, Chain = _t, Date = _t, Product = _t, Qty = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"SourceName", type text}, {"Chain", type text}, {"Date", type text}, {"Product", type text}, {"Qty", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Chain", "Date", "Product"}, {{"ALL", each _, type table [SourceName=text, Chain=text, Date=text, Product=text, Qty=number]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.Max([ALL],"SourceName")),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"ALL"}),
    #"Expanded Custom" = Table.ExpandRecordColumn(#"Removed Columns", "Custom", {"SourceName", "Qty"}, {"SourceName", "Qty"})
in
    #"Expanded Custom"

Regards
Zubair

Please try my custom visuals

View solution in original post

2 REPLIES 2
Zubair_Muhammad
Community Champion
Community Champion

Hi,

 

Try this.

Please see attached file for steps as well.

 

Basically I grouped by Date,Chain and Product and then used Table.Max to get SourceName with Latestversion

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSsvMSTUwVNJRCkotSczMUXAEMg2NdA0sdA0tgUzPktRcBZC0qVKsDnGqjYAMI6yqnXCYbYxVtTMOs5FcYoTmEmNMsw0NiFZuhFu5Exblxjjd4kya4biCnATlIGkzpdhYAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [SourceName = _t, Chain = _t, Date = _t, Product = _t, Qty = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"SourceName", type text}, {"Chain", type text}, {"Date", type text}, {"Product", type text}, {"Qty", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Chain", "Date", "Product"}, {{"ALL", each _, type table [SourceName=text, Chain=text, Date=text, Product=text, Qty=number]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.Max([ALL],"SourceName")),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"ALL"}),
    #"Expanded Custom" = Table.ExpandRecordColumn(#"Removed Columns", "Custom", {"SourceName", "Qty"}, {"SourceName", "Qty"})
in
    #"Expanded Custom"

Regards
Zubair

Please try my custom visuals

Hello @Zubair_Muhammad !

 

Thanks for your response.

I've been trying to do the grouping, but without success so far, as the PC runs out of memory, trying to processo over 2mm rows.

This is an i7, though.

 

Will provide feedback as to any progresses here.

 

Cheers,

Antonio Santos

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.