Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
ericOnline
Post Patron
Post Patron

BEST PRACTICE: How much data cleanup to do in DAX vs. Power Query?

As I get a little more understanding of how to work with Power BI, I'm seeing that I can do many things in DAX as well as Power Query. 

Question for you Power BI Professionals:

- How much cleanup (filtering out nulls/blanks, joining tables, etc.) do you do in DAX vs. Power Query? 

- How do you differentiate what work you do in DAX vs. Power Query?

Thank you

@ImkeF , @amitchandak , @Greg_Deckler and others!

9 REPLIES 9
Marounnsader
Frequent Visitor

hi all

so since we mentioned PQ and transformations, how does PQ stack aside with other ETL tools, can we use or depend on PQ as the enterprise solution for ETL

Some things to consider:

- Most (everything?) you can do in PQ can be done via the PowerBI PQ GUI

  - Does this enable more "citizen developer" productivity within your org?

- Is there PowerBI expertise within your org to maintain reports?

  - Or does your org already use a MODERN ETL language? 

- Is PQ (via PowerBI) a potential "upgrade" for your orgs existing (maybe antiquated/esoteric) ETL products?

hi eric

great question here

im trying to leverage the PowerBI expertise in my team, ive been assigned newly to it, currently we are using ODICS and ADW from ORACLE, the problem is alot of projects and cleaning is made from very old team and mostly are gone with no proper documentation, and this is affecting the performance but we can say its clean with not nulls so far

 

im trying to use more shared datesets and dataflows so i can spread self-service BI culture in the company and now im in the process to hire new candidate to handle more ETL work but having expertise in ODI is more available than PQ, so im wondering if its better to start using the PQ powers or keep it for special cases

ImkeF
Super User
Super User

Very good replies!

 

My rule of thumb is:

 

1) Push everything upstream as possible (towards the source of the data)

2) .... unless you need a measure, then use DAX

3) Learn how to know when you need a measure: https://exceleratorbi.com.au/calculated-columns-vs-measures-dax/ 

 

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous
Not applicable

Generally, @Greg_Deckler is right.

I'm of the opinion that you should prepare data for analysis in the right tool. DAX is not this tool. After all, it's a Data Analysis eXpressions language for a reason. A good example: Would you want to do machine learning in SQL? Well, that's what I thought.

Power Query is a data mashup tool and thanks to the mechanism of query folding, it's intelligent enough to push data transformations into the source most of the time (sometimes this requires also the coder to be intelligent). Power Query is SO powerful that it can do ANY transformation of data you can think of and can do it really fast. For this, however, you have to know M (the PQ language) and know how to write optimized code. M is a functional language with its own type system, its own rules, its own quirks and curiosities. You have to learn them to know how to make code lightning fast and maintainable (documenting individual steps, for instance, is one good practice). But it's the same story with any language, DAX included.

Last thing... If you leave blanks in your dimensions, this will not only be ugly for the end user. It will also make code more complex and thus slower. Therefore you should always make sure you do not leave blanks in there and display some sensible labels for conceptual blanks. Fact tables, on the other hand, can have BLANKS in their measurements (not in dimension keys, though) because these are treated as 0 by DAX arith functions and should never be exposed to the end user. All calculations should always be done through measures and measures only.

Violation of the ideal will always happen, of course, but it 's good to give good reasons for such violations so that the next person does not curse you (and under their nose).

Best
D

Great insights @Anonymous ! You've given me many things to consider, look up and understand. Thank you.

(as an aside, and ONLY because I came across it the other day... yep... "they" are doing ML with SQL now 🙂 )

Anonymous
Not applicable

They really don't. They run Python or R scripts inside SQL Server. That's not doing ML in SQL.

Best
D
Greg_Deckler
Super User
Super User

Oh boy, that is a loaded question. @Anonymous will probably want to chime in on this.

 

So, my answer to this is that it depends. There's a few schools of thought on this. One, school of thought, try to push as much data cleanup up the "chain" as possible. So if you think about it, you have:

1. Source

2. Power Query

3. Data Model/DAX

 

The obvious thinking here is why process more data or store more data than you need to? So if you can weed things out in a SQL View versus connecting directly to a table, that's a good idea. Can really speed up data loads and refreshes. But, if you don't have access to the source to make changes, that's out obviously.

 

So that's data cleanup, generally best to do that in the Source or in Power Query. What about transformation? Depends, generally you want to do your transformations in Power Query. That being said, I have seen unpivoting data from a SharePoint list take inordinately long in Power Query. That's where UNPIVOT came from (DAX solution for unpivoting columns in tables). So that's not absolute but generally a good idea.

 

Now, what about adding columns and such. Well, this gets into more grey area in my opinion. Some things are a breeze to do in DAX and a real pain in Power Query and vice versa. Then there is the maintainability piece of it. If you have to do certain calculations in DAX then is it better to do them all in DAX so that you have a single code base? In other words, someone coming after you doesn't have to know Power Query and R and Python and DAX to maintain your report file? Because you can use all of them if you want in the same Power BI file and now you need to know 4 languages instead of 1 to maintain it.

 

So, in very large generalities, better to push the processing up the chain to the source and failing that Power Query. But, there are other things to consider in my opinion that does not make it an absolute black and white issue.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

GEM of a response! I'm keying in right now on "Then there is the maintainability piece of it." As someone who just inherited a rather large, "my first Power BI dashboard". Decreasing the learning curve for new maintainers is key.  Thank you for sharing your expertise @Greg_Deckler .

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.