I need some ideas on how I should do this properly.
Many months ago I created a PBI report in import mode. I set the parameters with custom SQL code ( Get data -> SQL server -> advanced options -> SQL statement ). I used WHERE >= "date" to get the data I need instead of the whole table that goes back to 10+ years.
Now with this report the query for scheduled refresh got way too big ( because its overwriting every day ) and started to cause issues on our data warehouse so I had to close it down.
Because I used "SQL statement" under advanced options I can't implement incremential refresh to that report.
Now I need some suggestions what would be the right and most logical thing to do to still get this report running.
a) Make a whole new report and avoid "SQL statements". Use Get data -> SQL server, pick the table, click on transform data and put on filters like for the last ~1-2 weeks. ( Which I tried and I'm afraid it's not working the way it should since in my test I only picked last 2 days of data and it was loading 10+ minutes before i shut it down, it can not load that long for 2 days data so I'm afraid that it still trying to load EVERYTHING and then filtering ) Add incremental refresh. ( Problem1 - how can I get the last ~7-8 months of data in that report without downloading everything in 1 go? I can not have a 1 time run for that huge query, I need a solution to slowly but surely suck that data inside that report. Problem2 - The old report I made has so many measures and custom columns created it will be a pain to recreate everything. )
b) Try to fix the old report and somehow get the incremental refresh working ( no idea how or even if possible atm ), since this report already has ~6months of data inside it and all measures/custom columns that I could use and only add the missing dates. I was thinking maybe making a new query just like the old one and changeing the WHERE to the dates im missing and then merging those 2 tables but that would not be a long term solution.
What would be your suggestions if you would be in this situation?
"Because I used "SQL statement" under advanced options I can't implement incremential refresh to that report."
That is not necessarily true. As long as you incorporate RangeStart and RangeEnd filters in your custom query you can use incremental refresh.
Read up about "Query Folding". In its basic implementation it means that Power Query is lazy and is trying to offload as much of the processing to the upstream data source as possible.
Let's say you have a SQL table connection. Then you do a filter in Power Query to the last two years. You would think that Power Query has to do that work, but what may actually happen is that Power Query changes the "Select * from Table" query and adds your filter to it, and pushes the modified query to the data source. That way you don't load the entire table in Power Query and then throw away what you don't want - you only ask the source for what you want.
There's two issues with this. First - at some point you do a transform that can no longer be "folded" and that has to be done in Power Query. Second - Power Query assumes that the data source can handle the folded query in a timely manner. That is a big issue if your data source is not tuned for the types of queries you run (indexes, statistics).
I would work with the data source owners, explain to them how Power BI works, and then see if you can find a performance compromise using any of the following: Import Mode, Direct Query, Aggregations, Incremental Refresh.