Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
pade
Advocate III
Advocate III

How to query only updated rows and merge with historical data

Can I merge updated rows in a SQL table with historical data?

 

I have a SQL table in an event system. Data of interest are 10 years back in time. To decrease the load of the SQL DB, I would like to only read the modified data each day, and merge that data with a query that is update less frequently (e.g once per year). My data might look like below for some years ago:

  

id  open        last_mod    custom
== ========== ========== ======
1 2010-01-01 2010-01-01 a
2   2010-01-02  2010-01-02  a
3   2010-01-03  2010-01-03  a

 

 The next day, the data might look like below:

id  open        last_mod    custom
==  ==========  ==========  ======
1   2010-01-01  2010-01-01  a
2   2010-01-02  2017-04-04  b
3   2010-01-03  2010-01-03  a
4   2017-04-04  2017-04-04  a

I.E. now row-id 4 has been added and row-id 2 has changed from "a" to "b" in the custom column. This is detectable in the last_mod column as well

 

Is it possible to combine two queries where Q1 reads all the data, and Q2 reads the updated data?

 

To increase the readability of this, I have added example of underlaying SQL query codes

 

Q1: Configured to not be included in refresh

let Q1 = Sql.Database("Server", "DB", 
[Query="SELECT * FROM Table AS Q1 where [last_mod] <= '2017-01-01'"]) in Q1

Q2: Configured to refresh every time

let Q2 = Sql.Database("Server", "DB", 
      [Query="SELECT * FROM Table AS Q2 where [last_mod] >= '2017-01-01'"])
in Q2

 

6 REPLIES 6
CahabaData
Memorable Member
Memorable Member

to detect updated records, as per your example, inherently requires a compare.  therefore one can't avoid importing all the data.  because you need all the data in order to do the compare.....

 

you would need to do that compare in the SQL Server itself and set up a view - and instead link to that view in order to reduce the import volume to just updated records.

www.CahabaData.com

I don't realy understand why @Greg_Deckler solution doesn't work. Why is data updated for Q1 even though I set it to not automatically update.

Going for that solution would be quite easy using the code below.

Important if using the formula below is to use the new query (Q2 in this case) as the first source to combine since "Distinct" will keep the first item

Table.Distinct(
Table.Combine({Q2, Q1}),
{"id"}
)

But since data is loaded from both Q1 and Q2, this method doesn't work

 

My SQL queries looks something like this:

Q1: Configured to not be included in refresh

let Q1 = Sql.Database("Server", "DB", 
[Query="SELECT * FROM Table AS Q1 where [last_mod] <= '2017-01-01'"]) in Q1

Q2: Configured to refresh every time

let Q2 = Sql.Database("Server", "DB", 
      [Query="SELECT * FROM Table AS Q2 where [last_mod] >= '2017-01-01'"])
in Q2

 

@pade

 

In Query Editor, the M query can only return one dataset. You can't have two "let - in" clause. You may write your Power Query like:

 

let
    Source1 = Sql.Database("servername", "database", [Query="select * from Table where start_time > '2016/12/13'"]),
    Source2 = Sql.Database("servername", "database", [Query="select * from Table where start_time <= '2016/12/13'"]),
    #"Appended Query" = Table.Combine({Source1, Source2})

in
    #"Appended Query"

 

 And you have to refresh both queries to avoid duplicate rows on updated rows. Since you have a big dataset with 10 years data, I suggest you use Direct Query Mode.

 

Regards,

Hi @v-sihou-msft, Sorry for missleading you. There are two separate queries, I have updated my previous message to make this more clear.

Greg_Deckler
Super User
Super User

You should be able to create Q1, import it and set it to not update. Then create Q2 and leave it to the default to update. Then you can create an Append query for Q1 and Q2 and it should work just as you describe.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Mastering Power BI 2nd Edition

DAX is easy, CALCULATE makes DAX hard...

That doesn't work @Greg_Deckler. Appending the two queries creates duplicates of the rows that has been changed. And as I understand, the Append command doesn't reuse already loaded data in the other queries, it re-queries the data. I did configure as you suggested, but could confirm checking network activity that Append re-loaded data Q2 as expected, but then also for Append1 where all Q1 data was re-loaded (under the name as Append1) 

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors