Reply
Regular Visitor
Posts: 24
Registered: ‎07-01-2016

How to query only updated rows and merge with historical data

[ Edited ]

Can I merge updated rows in a SQL table with historical data?

 

I have a SQL table in an event system. Data of interest are 10 years back in time. To decrease the load of the SQL DB, I would like to only read the modified data each day, and merge that data with a query that is update less frequently (e.g once per year). My data might look like below for some years ago:

  

id  open        last_mod    custom
== ========== ========== ======
1 2010-01-01 2010-01-01 a
2   2010-01-02  2010-01-02  a
3   2010-01-03  2010-01-03  a

 

 The next day, the data might look like below:

id  open        last_mod    custom
==  ==========  ==========  ======
1   2010-01-01  2010-01-01  a
2   2010-01-02  2017-04-04  b
3   2010-01-03  2010-01-03  a
4   2017-04-04  2017-04-04  a

I.E. now row-id 4 has been added and row-id 2 has changed from "a" to "b" in the custom column. This is detectable in the last_mod column as well

 

Is it possible to combine two queries where Q1 reads all the data, and Q2 reads the updated data?

 

To increase the readability of this, I have added example of underlaying SQL query codes

 

Q1: Configured to not be included in refresh

let Q1 = Sql.Database("Server", "DB", 
[Query="SELECT * FROM Table AS Q1 where [last_mod] <= '2017-01-01'"]) in Q1

Q2: Configured to refresh every time

let Q2 = Sql.Database("Server", "DB", 
      [Query="SELECT * FROM Table AS Q2 where [last_mod] >= '2017-01-01'"])
in Q2

 

Super Contributor
Posts: 4,598
Registered: ‎07-11-2015

Re: How to query only updated rows and merge with historical data

You should be able to create Q1, import it and set it to not update. Then create Q2 and leave it to the default to update. Then you can create an Append query for Q1 and Q2 and it should work just as you describe.

Regular Visitor
Posts: 24
Registered: ‎07-01-2016

Re: How to query only updated rows and merge with historical data

That doesn't work @smoupre. Appending the two queries creates duplicates of the rows that has been changed. And as I understand, the Append command doesn't reuse already loaded data in the other queries, it re-queries the data. I did configure as you suggested, but could confirm checking network activity that Append re-loaded data Q2 as expected, but then also for Append1 where all Q1 data was re-loaded (under the name as Append1) 

New Contributor
Posts: 463
Registered: ‎08-03-2016

Re: How to query only updated rows and merge with historical data

to detect updated records, as per your example, inherently requires a compare.  therefore one can't avoid importing all the data.  because you need all the data in order to do the compare.....

 

you would need to do that compare in the SQL Server itself and set up a view - and instead link to that view in order to reduce the import volume to just updated records.

Highlighted
Regular Visitor
Posts: 24
Registered: ‎07-01-2016

Re: How to query only updated rows and merge with historical data

[ Edited ]

I don't realy understand why @smoupre solution doesn't work. Why is data updated for Q1 even though I set it to not automatically update.

Going for that solution would be quite easy using the code below.

Important if using the formula below is to use the new query (Q2 in this case) as the first source to combine since "Distinct" will keep the first item

Table.Distinct(
Table.Combine({Q2, Q1}),
{"id"}
)

But since data is loaded from both Q1 and Q2, this method doesn't work

 

My SQL queries looks something like this:

Q1: Configured to not be included in refresh

let Q1 = Sql.Database("Server", "DB", 
[Query="SELECT * FROM Table AS Q1 where [last_mod] <= '2017-01-01'"]) in Q1

Q2: Configured to refresh every time

let Q2 = Sql.Database("Server", "DB", 
      [Query="SELECT * FROM Table AS Q2 where [last_mod] >= '2017-01-01'"])
in Q2

 

Moderator
Posts: 1,937
Registered: ‎03-06-2016

Re: How to query only updated rows and merge with historical data

@pade

 

In Query Editor, the M query can only return one dataset. You can't have two "let - in" clause. You may write your Power Query like:

 

let
    Source1 = Sql.Database("servername", "database", [Query="select * from Table where start_time > '2016/12/13'"]),
    Source2 = Sql.Database("servername", "database", [Query="select * from Table where start_time <= '2016/12/13'"]),
    #"Appended Query" = Table.Combine({Source1, Source2})

in
    #"Appended Query"

 

 And you have to refresh both queries to avoid duplicate rows on updated rows. Since you have a big dataset with 10 years data, I suggest you use Direct Query Mode.

 

Regards,

Regular Visitor
Posts: 24
Registered: ‎07-01-2016

Re: How to query only updated rows and merge with historical data

Hi @Simon_Hou-MSFT, Sorry for missleading you. There are two separate queries, I have updated my previous message to make this more clear.