Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Can I merge updated rows in a SQL table with historical data?
I have a SQL table in an event system. Data of interest are 10 years back in time. To decrease the load of the SQL DB, I would like to only read the modified data each day, and merge that data with a query that is update less frequently (e.g once per year). My data might look like below for some years ago:
id open last_mod custom
== ========== ========== ======
1 2010-01-01 2010-01-01 a
2 2010-01-02 2010-01-02 a
3 2010-01-03 2010-01-03 a
The next day, the data might look like below:
id open last_mod custom == ========== ========== ====== 1 2010-01-01 2010-01-01 a 2 2010-01-02 2017-04-04 b 3 2010-01-03 2010-01-03 a 4 2017-04-04 2017-04-04 a
I.E. now row-id 4 has been added and row-id 2 has changed from "a" to "b" in the custom column. This is detectable in the last_mod column as well
Is it possible to combine two queries where Q1 reads all the data, and Q2 reads the updated data?
To increase the readability of this, I have added example of underlaying SQL query codes
Q1: Configured to not be included in refresh
let Q1 = Sql.Database("Server", "DB",
[Query="SELECT * FROM Table AS Q1 where [last_mod] <= '2017-01-01'"]) in Q1
Q2: Configured to refresh every time
let Q2 = Sql.Database("Server", "DB", [Query="SELECT * FROM Table AS Q2 where [last_mod] >= '2017-01-01'"]) in Q2
to detect updated records, as per your example, inherently requires a compare. therefore one can't avoid importing all the data. because you need all the data in order to do the compare.....
you would need to do that compare in the SQL Server itself and set up a view - and instead link to that view in order to reduce the import volume to just updated records.
I don't realy understand why @Greg_Deckler solution doesn't work. Why is data updated for Q1 even though I set it to not automatically update.
Going for that solution would be quite easy using the code below.
Important if using the formula below is to use the new query (Q2 in this case) as the first source to combine since "Distinct" will keep the first item
Table.Distinct(
Table.Combine({Q2, Q1}),
{"id"}
)
But since data is loaded from both Q1 and Q2, this method doesn't work
My SQL queries looks something like this:
Q1: Configured to not be included in refresh
let Q1 = Sql.Database("Server", "DB",
[Query="SELECT * FROM Table AS Q1 where [last_mod] <= '2017-01-01'"]) in Q1
Q2: Configured to refresh every time
let Q2 = Sql.Database("Server", "DB", [Query="SELECT * FROM Table AS Q2 where [last_mod] >= '2017-01-01'"]) in Q2
In Query Editor, the M query can only return one dataset. You can't have two "let - in" clause. You may write your Power Query like:
let Source1 = Sql.Database("servername", "database", [Query="select * from Table where start_time > '2016/12/13'"]), Source2 = Sql.Database("servername", "database", [Query="select * from Table where start_time <= '2016/12/13'"]), #"Appended Query" = Table.Combine({Source1, Source2}) in #"Appended Query"
And you have to refresh both queries to avoid duplicate rows on updated rows. Since you have a big dataset with 10 years data, I suggest you use Direct Query Mode.
Regards,
Hi @v-sihou-msft, Sorry for missleading you. There are two separate queries, I have updated my previous message to make this more clear.
You should be able to create Q1, import it and set it to not update. Then create Q2 and leave it to the default to update. Then you can create an Append query for Q1 and Q2 and it should work just as you describe.
That doesn't work @Greg_Deckler. Appending the two queries creates duplicates of the rows that has been changed. And as I understand, the Append command doesn't reuse already loaded data in the other queries, it re-queries the data. I did configure as you suggested, but could confirm checking network activity that Append re-loaded data Q2 as expected, but then also for Append1 where all Q1 data was re-loaded (under the name as Append1)
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.