Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Mann
Resolver III
Resolver III

Consuming only updated records in Power query

Hey Guys,
I need to put a logic in Power query to consume only updated records from source. Details are:

I am using Azure blob storage as a source to my Power BI report. Data in blob is coming in form of files. Like x files for Table A. Every time when the file is generated in blob it has last x days of data in it. I only need to ingest the latest record for ID column based on ETL date to avoid duplicates.

As shown for Day1 and Day2, I am getting respective files for Table A with last two days of data.

 

Input files.PNG

 

 

 

 

 

 

 

 

Following should be the records coming in Power BI Model for Table A for these two days when refresh is done on Day 2:

 

Table A in Model.PNG

 

 

 

 

 

 

Thanks.

2 ACCEPTED SOLUTIONS
v-juanli-msft
Community Support
Community Support

Hi @Mann 

After refreshing, two files append in one table.

11.png

Ctrl+click on "ID","Name","Date", select Add Column->merge column

Then make a copy of this query->azure (2)

 

In "azure (2)",

Group by

12.png

 

In "azure", merge queries from "azure(2)" based on "Merged" column,

expand "last date" column,

Add a condition column

13.png

 

then filter "condition" column to remove blank.

14.png

 

Best Regards
Maggie

 

Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

Hi Maggie,

 

Thanks for the solution. This is great! Using this solution results into creating duplicate queries and we are dealling with huge volume of data here so I did following steps to manage it in one query:

#"Invoke Custom Function1" = Table.AddColumn(#"Removed Other Columns", "Transform File from Table A", each #"Transform File from Table A"([Content])),
    #"Removed Other Columns1" = Table.SelectColumns(#"Invoke Custom Function1", {"Transform File from Table A"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File from Table A", Table.ColumnNames(#"Transform File from Table A"(#"Sample File"))),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{{"ID", Int64.Type}, {"Name", type text}, {"Date", type text}, {"Work Status", type text}, {"ETL Date", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"ID", "Date"}, {{"All Data", each _, type table [ID=number, Name=text, Date=text, Work Status=text, ETL Date=text]}, {"Max Date", each List.Max([ETL Date]), type text}}),
    #"Expanded All Data" = Table.ExpandTableColumn(#"Grouped Rows", "All Data", {"Name", "Work Status", "ETL Date"}, {"All Data.Name", "All Data.Work Status", "All Data.ETL Date"}),
    #"Filtered Rows1" = Table.SelectRows(#"Expanded All Data", each [All Data.ETL Date] = [Max Date]),
    #"Renamed Columns" = Table.RenameColumns(#"Filtered Rows1",{{"All Data.Name", "Name"}, {"All Data.Work Status", "Work Status"}, {"All Data.ETL Date", "ETL Date"}})

I am grouping the needed columns and then expanding the data. After this I am filtering it with Max Date variable.

This also works as expected.

 

Do you know which out of these two solution is best in terms of performance?

Thanks.

View solution in original post

6 REPLIES 6
v-juanli-msft
Community Support
Community Support

Hi @Mann 

After refreshing, two files append in one table.

11.png

Ctrl+click on "ID","Name","Date", select Add Column->merge column

Then make a copy of this query->azure (2)

 

In "azure (2)",

Group by

12.png

 

In "azure", merge queries from "azure(2)" based on "Merged" column,

expand "last date" column,

Add a condition column

13.png

 

then filter "condition" column to remove blank.

14.png

 

Best Regards
Maggie

 

Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

This is awesome Maggie, this is exactly what I was looking for. 🙂

 

Regards

 

David





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




Hi Maggie,

 

Thanks for the solution. This is great! Using this solution results into creating duplicate queries and we are dealling with huge volume of data here so I did following steps to manage it in one query:

#"Invoke Custom Function1" = Table.AddColumn(#"Removed Other Columns", "Transform File from Table A", each #"Transform File from Table A"([Content])),
    #"Removed Other Columns1" = Table.SelectColumns(#"Invoke Custom Function1", {"Transform File from Table A"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File from Table A", Table.ColumnNames(#"Transform File from Table A"(#"Sample File"))),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{{"ID", Int64.Type}, {"Name", type text}, {"Date", type text}, {"Work Status", type text}, {"ETL Date", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"ID", "Date"}, {{"All Data", each _, type table [ID=number, Name=text, Date=text, Work Status=text, ETL Date=text]}, {"Max Date", each List.Max([ETL Date]), type text}}),
    #"Expanded All Data" = Table.ExpandTableColumn(#"Grouped Rows", "All Data", {"Name", "Work Status", "ETL Date"}, {"All Data.Name", "All Data.Work Status", "All Data.ETL Date"}),
    #"Filtered Rows1" = Table.SelectRows(#"Expanded All Data", each [All Data.ETL Date] = [Max Date]),
    #"Renamed Columns" = Table.RenameColumns(#"Filtered Rows1",{{"All Data.Name", "Name"}, {"All Data.Work Status", "Work Status"}, {"All Data.ETL Date", "ETL Date"}})

I am grouping the needed columns and then expanding the data. After this I am filtering it with Max Date variable.

This also works as expected.

 

Do you know which out of these two solution is best in terms of performance?

Thanks.

Hi @Mann 

I think your solution may be better for performance.

 

For my solution, there are duplicate queries.

You could click on the query, uncheck "enable load", so that this query will not be added in the data model.

It may improve the performance.

 

There are some tips to speed the performance in Power Query

Speeding up power query

Performance tip for List.Generate (1): Buffer your tables in Power BI and Power Query

Table.Buffer for cashing intermediate query results or how workaround Unnecessary Queries Issue

 

Best Regards
Maggie

 

Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

v-juanli-msft
Community Support
Community Support

Hi @Mann 

Before, your data source has "file1" (Table A),

Now a new file "file2" (Table B) is added to your data source,

You click on "refresh" button from Power BI, but only want the updated data like the last screenshot you post here.

Right?

 

Best Regards
Maggie

Hi Maggie,

These two files are for same Table A. Process is: On Day1 File 1 is present in Blob storage and then Power Bi is refreshed. Then On Day2 File 1 and File 2 both are present and then Power BI is refreshed. So files will keep adding in the source blob container everyday.
Requirement is to consume all files present in source whenever Power Bi is refreshed. Since some updates are coming in the files. Power BI should load only the updates as per ETL date as I have shown.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.