Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Refresh of large and constantly updating data

Hi,

 

I have very large table of around 600 M rows. It has maily two columns file names and last used date of the file. so it is constantly updated with new date and other values. using direct query is making my report slow so I want to go with import mode.

 

I am stuck at how do I refresh this data daily. I can't refresh all 600M rows everyday that would not be efficient. I have read about detect data changes option but I do not have two date columns for that, I only have one date column that is " Last date used " that changes constantly. so can't go with detect data changes option.

can I get a way through this ? 

Thanks !

18 REPLIES 18
v-easonf-msft
Community Support
Community Support

Hi , @Anonymous 

Could you please tell me whether your problem has been solved?
If yes, you could accept the helpful answer as solution. You also could share your own solution here. For now, there is no content of description in the thread. If you still need help, please share more details to us.

Best Regards,
Community Support Team _ Eason
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous
Not applicable

hi @Greg_Deckler ,

each row in the table I am using is unique and row count of that table is around 1200 M +

so I think aggregations does't help me much in my case.

can I have any other solution..?

 

Thanks

@Anonymous - The purpose of aggregations. I have a table

 

ID  Category         Dimension       Value

1    Red                 One                  1

2    Blue                One                   10

3    Green             One                   12

4    Red                 Two                   5

5    Blue                Two                   5

 

Aggregation table(s) store:

Red  6

Blue 15

Green 12

 

One 23

Two 10

 

So, if you want the sum of Red, you are getting it from the aggregation table and not hitting the millions of rows in order to get the sum. This is an example. The key is to try to avoid aggregations or other operations that hit your main fact table and cause in essence a table scan. It's hard to be more specific because your request is very vague but this is the general route towards fast speeds against billion and trillion row datasets.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

ID  Category         Dimension       Value

1    Red                 One                  1

2    Blue                One                   10

3    Green             One                   12

4    Red                 Two                   5

5    Blue                Two                   5

 

Aggregation table(s) store:

Red  6

Blue 15

Green 12

 

One 23

Two 10

 

@Greg_Deckler  : lets take the same example.what if I want sum of value whose category is red and  dimension is one ?

Then I should be aggregating by taking both columns(category and dimension) into 'group by' right ? that again makes the same table !

did you get my point..?

@Anonymous - I understand your point. As I stated, the goal is to avoid operations where you hit all rows in the fact table. You are asking me to give you specific advice to your situation but you are 

1. Not giving me any sense at all about your data

2. Not providing me any details about how you want to present that data

 

If you want a report that simply lists a billion rows of data then I've got bad news for you, nothing is going to be fast doing that short of maybe a super computer. What possible use would be just listing out a billion rows of data anyway? Doesn't make any sense to me. So, figure out how you want to present those billion rows. Build the aggregations. Use the aggregations in your visuals. Otherwise all hope is lost, you cannot just expect any program to quickly list out a billion rows of information. And, how any human being could possibly comprehend a billion rows of data being displayed to them in the first place is non-sensical.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

@Greg_Deckler 

 

class_name    user_name   file_name   category    date           time_s            & other column

first                guru            file1             H               8/5/20       40

first                guru            file2             H               8/5/20       50

first                john            file1             H               8/5/20       25

second          guru            file1             F                6/5/20       60

 

my data is similar to this where each row is unique and filters are 'date range' , 'class_name' and  'category'.

I want to display the top 10 rows based on the 'time' column as filtered by the user.I am using top N filter for that.

now do you get the idea about the data and what I want ..?

@Anonymous - Right, so you are not truly aggregating anything and will always have to hit all rows in the base fact table. So that means that you are pretty much hosed. Only thing that I could think of would be to create a table that contains the top N rows for every possible combination of what you will allow them to filter on. 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

@Greg_Deckler 

yeah I thought of this option but I even have the date column column right ? so I coudn't go that way !

@Anonymous Well, that depends. I am probably doing this incorrectly but I believe your table would be:

 

n1 * n2 * n3 * n4 * n5 * 10

 

So, distinct number of values from each of your columns multiplied together * 10

 

So, let's say you have 3 years worth of data, that's roughly 1000 different dates. Then for each of the other categories you have 5 choices. For 5 columns like that you would have:

 

1000 * 5 * 5 * 5 * 5 * 10, or 6,250,000 rows

 

Better than a billion.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

@Greg_Deckler 

yeah but unique values for one of the column is around 1 million and I have about 9 years of data

@Anonymous - Now I have to understand the use case here. What exactly is this data and why is the top 10 of the rows for any combination of values important and how in the world do you end up with a million different values for something in a column if that isn't an index or something? And how is a user going to choose between a million different things anyway? They are going to scroll through a million choices looking for the one they want?


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

@Greg_Deckler 

 

The use case is to view the top 10 files based on time for a particular class_name within the selected date range.

This class_name column happens to have about 1M unique values. Each user would only be bothered about some of the class_name values so they know what they want to select. so they just type it instead of scrolling through all of them.

and even I don't know why that class_name has about 1M unique values.

After optimizing in every way possible I am getting the visual in and around 1min and sometimes 2 or 3 mins. It seems reasonable but I am still trying to make it faster. 

@Anonymous - I guess what I was going for was the industry, type of data, etc. In other words, this is manufacturing data about the time it takes to manufacture a component and thus it is important for whatever reason to figure out the items that take the most time to produce, etc. Not saying that is what this data is but trying to wrap my head around the use case as sometimes there are alternate ways to achieve something.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

@Greg_Deckler  Thanks for the replies

Do post if you find a way through !

Hi, @Anonymous 

Please check if this post help.

https://community.powerbi.com/t5/Service/Refresh-large-datasets-on-Power-BI-service/m-p/1154745

 

Have you consider using incremental refresh  if possible?

https://docs.microsoft.com/en-us/power-bi/admin/service-premium-incremental-refresh

 

Best Regards,
Community Support Team _ Eason

Greg_Deckler
Super User
Super User

@Anonymous - Go with DirectQuery and use Aggregation Tables to speed things up. That's how Microsoft demonstrates Power BI against trillion row datasets.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

Hi @Greg_Deckler , Thanks for the reply

Sorry I did not quite get your point can you say in detail please.

what do you mean by aggregation tables and how do I set them up( on SQL side or on power side )..?

 

Thanks

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors