cancel
Showing results for 
Search instead for 
Did you mean: 
Greg_Deckler

You Shouldn't Use DAX for That

Introduction

I have some grievances and while it may not quite be Festivus just yet, I'm going to air my grievances anyway. My most pressing grievance is around the phrase, "You shouldn't use DAX for that." You see this phrase all the time in the forums. Question comes in about how to do a particular calculation in DAX and someone responds "You shouldn't use DAX for that, use Power Query." Well, as you might have guessed, "I've got a lot of problems with you people...and now you're gonna hear about it." 🙂

 

Calling people idiots

The first problem I have with starting out a response with "You shouldn't use DAX for that." is that this is essentially the equivalent of stating "You are an idiot and don't understand what you are doing in the slightest nor do you have even a single clue about generally accepted best practices." First, it's not nice to call people names or tell them they are idiots. Second, you may be the one that is being an idiot, or at least incredibly arrogant and presumptive.

 

Think about it, what if they have to use DAX? An extremely popular usage model is one where business users work against Live datasets in Power BI. In these circumstances the business user doesn't have access to Power Query to change what is going on in the model, they are by-and-large relegated to using DAX measures to make any changes, etc. Sure, they could go back to whoever created the data model and ask them to modify it and six months later they will have their required changes. Not helpful. 

 

But even in Import and DirectQuery situations is it always advisable to use Power Query instead of DAX? Not necessarily. Any reasonably complex data transformation performed in Power Query has a good likelihood of breaking Native Query functionality. As a side note, the Native Query functionality pushes processing back to the source system and thus data loads tend to be faster if you don't break Native Query functionality. If you break Native Query functionality in a query the query has to use local resources on your desktop or gateway to perform the transformations. Now, in import-mode perhaps this is less of a protential problem if the speed of a single table load slows down. Although, there are situations where this could be highly problematic on an extremely large fact table with tens of millions of rows and end up exceeding the desired refresh window. More importantly, for DirectQuery models this could be disastrous. Here you have a measure or column used in a single visual that you could do in DAX but by doing it in Power Query you break Native Query functionality and end up slowing down every single visual in the entire report. Not smart.

 

By your own logic, you are also wrong

This idea of "You shouldn't use DAX for that" comes from a generally accepted best practice of pushing data transformation and calculation logic "up the data transformation chain". This is why the words "You shouldn't use DAX for that" are inevitably followed by "You should use Power Query". The logic here is that Power Query is further up the data transformation chain and thus you are transforming the data before it gets into the data model. Now, the first issue is that generally accepted best practices are just that, general. See specific cases cited above where generally accepted best practices might be bad. The second issue is that by this same logic, the phrase "You should use Power Query" is equally incorrect. Under this logic, Power Query is as equally bad as using DAX because what you *should* be doing is performing the data transformation and calculations at the source. See, it's the same principle. If it is wrong to do it in DAX then it is wrong to do it in Power Query.

 

Conclusion

Be nice, don't call people idiots. They just might know more about their needs and requirements than you do. At the end of the day, there will be people that neglect to include important details when asking questions. Other people will be arrogant, presumptive and internally inconsistent with their recommendations. The sun will rise tomorrow to witness more of the same.

Comments

Hi @Greg_Deckler ,

 

Love this blog-post of your's and do agree there can be multiple ways of achieving same thing in Power BI and therefore it doesn't mean addressing one way not be used over the other.

 

Thanks,

Pragati

Hi

"Think about it, what if they have to use DAX?"

 

I don't have to think. I know only what I've been told and I have a full right to base my opinions on this evidence. It's not my duty to think about everything and account for all and any situation. If the person has to for some strange reason do it in DAX, they should clearly state it. Which changes perspectives. But until this is done, my duty as a good and knowledgeable developer is to inform people about Best Practices and promote them. For the very simple reason that they are there to shield people from problems. Enough said.

 

"Sure, they could go back to whoever created the data model and ask them to modify it and six months later they will have their required changes."

 

Not only that they could, but they should. The best medicine is the one that addresses the root cause. Not the one that constantly merely treats the symptoms.

 

"But even in Import and DirectQuery situations is it always advisable to use Power Query instead of DAX? Not necessarily."

 

Of course not necessarily. If the transformation logic is closely related to some complex measures in the model, then it does not make sense to replicate the logic in the ETL layer. But such situations are simply rare, or VERY rare. If someone wants to just cleanse data, this should be done in the ETL layer, not in DAX (Data Analysis eXpression language is not a data mash-up language). That's so obvious that I'm ashamed that I have to state it. Even more embarassing is to remind you that the fact that something CAN be done does not mean it should be done.

 

"More importantly, for DirectQuery models this could be disastrous."

 

If a model is correctly built by a professional that knows his trade and the subject for which the cube has been built, transformations should never happen when data travels from the model to PBI. As simple as that. If this has to happen, it means the author of the cube has screwed it up. Waste of money in simple terms.

 

"Under this logic, Power Query is as equally bad as using DAX because what you *should* be doing is performing the data transformation and calculations at the source. See, it's the same principle. If it is wrong to do it in DAX then it is wrong to do it in Power Query."

 

Ideally, if you have a source that CAN transform your data into a form suitable for PBI, the source should do it. But if the source can't (say, a flat file), you should do it in PQ. Not in DAX. Once again: M - mash-up language, DAX - Data Analysis eXpression language. You seem to forget that if you do it in PQ, the data compression will be optimal. This is not true if you do it in DAX. For small models that does not matter, but for medium to large - IT SIMPLY DOES. I like to do things right the first time. I don't work according to this maxim: "There's never time to do it right but there's always time to do it twice." I stay away from it as much as possible.

 

"Be nice, don't call people idiots. They just might know more about their needs and requirements than you do. At the end of the day, there will be people that neglect to include important details when asking questions."

 

Pointing out that some technique should not be used in certain contexts is not calling people idiots. Where did you get this conclusion from, master? Let me repeat again. It's not my duty to know their needs and requirements. It's their duty to make me aware of their needs and requirements---let this sink in since you freely mix many different ideas in your text. And I really don't care about people "neglecting to include important details..." I must base my judgement on the evidence I see. That should be more than obvious.

 

Best,

daxer-almighty

@daxer-almighty IDK, it's difficult for me to tell, but it seems like I might have struck a nerve? 🙂

 

I consider that if a questioner asks a question in a DAX forum or says "I need a DAX..." that constitutes that they want a DAX solution and I shouldn't necessarily make assumptions that they shouldn't be using DAX. People can disagree but that's how I operate. Kind of like the question that brought you here. Re: How to solve "Cannot convert value '25.06.21' ... - Microsoft Power BI Community It is clear in this situation that the questioner is looking for a DAX solution as the question is asked in a DAX forum and there is actual DAX in the question itself. 

 

Compression differences only deal with import models and calculated columns or tables. And DAX columns and tables do compress but may be roughly 10% less efficient at compression than if done in Power Query or the Source. But not always. In many circumstances the compression is the same. And when you are talking measures, compression differences do not apply.

 

Otherwise, it seems like we agree. I don't see anything that refutes that generally accepted best practices are  not necessarily applicable to all situations and that's the main point of the article. Speaking in absolutes generally just makes one appear absolutely foolish.

@Greg_Deckler

 

The problem is that many people, or even most of them, who come to the fora for help are NEWBIES and they don't know how to correctly approach a problem; they think once they've learnt a bit of DAX, everything should be addressed with this tool. No, you don't use a hammer for every task. If they were professionals, they would not be asking these questions; they would know. They would know which tool to use when. We, as professionals, should be putting them on the right track, not merely tell them that something is doable and give them a recipe to treat a symptom. We should treat them as we'd like others to treat us: Give them the best solution/advice there is. This is my modus operandi.

 

As for the compression... A calculated column will never be optimally compressed. For the simple reason that this step happens AFTER the engine has finished analyzing all the tables' columns to achieve the optimal solution. The calc column does not take part in the analysis phase. This can and does have consequences in medium and large models. I'm saying that because I've seen problems with performance with such a setup in cubes that occupied at least 60GB compressed in memory.

 

"Compression differences only deal with import models and calculated columns or tables. And DAX columns and tables do compress but may be roughly 10% less efficient at compression than if done in Power Query or the Source. But not always. In many circumstances the compression is the same."

 

Can I know the source of this revelation, please? On top of that, DAX is not as efficient at data cleansing as Power Query (M) is, so it'll take more time (sometimes much more) to process data with it. Another reason to use the right tool for the job in an era when people want everything finished instantly.

 

"And when you are talking measures, compression differences do not apply."

 

But the post that has brought us here is not about measures; it's about data cleansing. And I have no idea why you'd tell me such an obvious thing. I really wouldn't like to think that you see me as an idiot.

 

Best,

daxer-almighty

 

 

@daxer-almighty First, "A calculated column will never be optimally compressed.". This is simply incorrect. They absolutely can be optimally compressed. It depends on whether the new column has a 1:1 relationship with an existing column. 

Storage differences between calculated columns and calculated tables - SQLBI.

 

Note, also one of the sources of the roughly 10% compression difference. But, there again, it all depends. Also, another source of the revelation is that Live and DirectQuery models do not tend to have to worry about compression in an import mode data model.

 

Also, I thought we were talking compression and not processing time, that's a different topic.

 

I personally feel that making the assumption that people on the forums are newbies is widely insulting. I've been here a long time and regularly see a lot of familiar faces. I did a spot check on current questions on page 1 of all forums. I got a "New Member" designation 15% of the time. Most of the question askers actually tended to have "Helper III, IV or V" levels with a lot of Frequent Visitor as well. Not certain of your definition of "newb" but wonder what data supports your position?

OK.

 

"It depends on whether the new column has a 1:1 relationship with an existing column."

 

@Greg_Deckler, how many times in your life have you seen a calc column that had a 1:1 with an existing column? I bet myriads, right? 😉 I'm not right now refuting your statement. By no means. I'm just saying that such things are so rare that they are not really even worth considering. And I don't know a single person, a single dev, that would spend a split second thinking about whether a calc column would or not have a 1:1 with some other... They all know that calc columns should be used only as a last resort, when everything else fails. For toy models, it does not matter. But for production-grade cubes that are 100GB in RAM (already compressed), IT DOES LIKE HELL.

 

" Also, another source of the revelation is that Live and DirectQuery models do not tend to have to worry about compression in an import mode data model."

 

But are calc columns not relevant only to Import mode? In DQ you can't have them. In Live, AFAIK, you can't have them. Unless you've got a composite model. But then, it again comes down to tables in memory. Of course, if your table is a small one, 1,000 rows or even, say, 10,000 rows, I'd be surprised if it mattered. However, I do like to follow Best Practices whenever I can. So, if I can do things right the first time... I just do them right the very first time. This has a lot of benefits. If later, for instance, such a table becomes big/huge (and such things do happen very often), I don't have to re-design a single bit. The right design is already there. I don't have to waste time (and money) to correct it. And, by the way, this is exactly what I'm paid some good money for. For getting things right the very first time.

 

"I personally feel that making the assumption that people on the forums are newbies is widely insulting. I've been here a long time and regularly see a lot of familiar faces. I did a spot check on current questions on page 1 of all forums. I got a "New Member" designation 15% of the time. Most of the question askers actually tended to have "Helper III, IV or V" levels with a lot of Frequent Visitor as well. Not certain of your definition of "newb" but wonder what data supports your position?"

 

I don't have to make assumptions. I often check the status of the people who pose questions that I'm answering. What's more, I often can from the way people pose their questions infer their level of advancement in DAX. Yes, it's really possible 🙂 There's nothing insulting about realizing that they are newbies. Many a time they admit this themselves, to make things funnier. Truly, I tell you today, Greg: From the way people phrase their problems it can often be seen how advanced they are. Just to cite the New Testament a little bit...

 

And by the way, if someone is Helper III, IV, V... or even a Frequent Visitor, well, that does not make them that much better than starters. I've witnessed this so many times myself that I'm just not going to argue. Any. More.

@Greg_Deckler 

 

I've only looked at the headers of the pages you "suggested" to me. Not gonna read this. Waste of time. But thanks 🙂

Hi I'm new to Power BI and have just set up my first chart which looks great and is linked to live SQL Server database. I'm just wondering how much DAX on average are we expected to use? I'm a SQL developer and was introduced to DAX programming on the course I attended several weeks ago. The trainer appeared to use DAX frequently so how important is DAX and can we live without it if we are SQL developers and use SQL server management studio etc to develop our Power BI Visualisations? Much appreciated
Polls
What is your favorite Power BI feature released this month?