cancel
Showing results for 
Search instead for 
Did you mean: 

Thoughts and Details about The Mythical DAX Index

Introduction

If you know much about me from this community, you probably know that I love the Quick Measures Gallery. The Quick Measures Gallery is one of those things that I latched on to because I found myself answering what amounted to be the same question over and over again in the forums. So, I reasoned it was far better to solve the problem once, post it to the Quick Measures Gallery and reference it versus reinventing the wheel each time. So that's exactly what I do these days whenever possible. Last I checked, I actually have almost 150 quick measures posted to the Quick Measures Gallery, 147 to be exact. Not that a single one of them has ever made it into the Desktop. But I'm not bitter...

Now, I do not normally do much with my Quick Measures other than reference them. Honestly I do not even tend to document them very well, I just kind of throw them out there with minimal information and just use them for reference when needed. However, my 147th quick measure is a little different. My 147th quick measure is a pure DAX implementation for creating tables complete with an index (unique consecutive counter id) column. And this one; this particular DAX quick measure, The Mythical DAX Index, I feel deserves special treatment because up until now, the possible existence of such a thing had always been considered effectively impossible.

What

Quite simply, the Mythical DAX Index Quick Measure demonstrates how to take a table of information and add an index column to it using nothing but DAX code. So take a table of information like this:

Orange
Apple
Banana
Mango

And output this:

Orange 1
Apple 2
Banana 3
Mango 4

How

Well, the DAX code is really a combination of techniques, one of which is something I started doing as part of my quest to translate as many Excel functions to DAX as possible. You see, there are a number of functions like ARABIC and BIN2DEC where I needed to take a string and create a table from it where each of the letters or digits became a row in the table. So I created a slick little way of doing that using GENERATESERIES coupled with an ADDCOLUMN that used the Value column of the generated series to parse out the correct character within the text.

This is a technique that I personally had never seen before or at least do not remember because I can really see no other practical use for it other than doing these sorts of conversion operations from binary to decimal or from roman numeral to Arabic. So, I would not have even been thinking along these lines without having undertaken the Excel to DAX Translation project. So, serendipitous timing if nothing else.

Now we throw in a bit of good fortune. The next piece of the puzzle came from a thread where a user had created a rather novel bubble sort. You would never get that from the title of the post but that's generally the case! 🙂 It was really kind of nifty but probably something you only do if you are very new to DAX or crazy like me and just trying to see what you can come up with. Let's face it, convention is that DAX is just not very well suited for that sort of thing. Anyway, @isThisABug asked about how to use DAX table functions to improve the DAX bubble sorting code and the answer ended up being the CONCATENATEX function using the optional (and officially undocumented) sorting options, parameters 4, 5, etc.

That’s when inspiration struck. I had essentially created a sorted text string from rows of values in a table. What would happen if I didn't sort it and used my technique to convert that text string back into a table? Could I actually achieve a DAX index, something that I had always deemed impossible? Turns out, yes, yes you can.

 

DAX Indexed Table =
    VAR __SourceTable = 'Table'
    VAR __Count = COUNTROWS(__SourceTable)
    VAR __SortText = CONCATENATEX('Table',[Product],"|")
    VAR __Table =
        ADDCOLUMNS(
            GENERATESERIES(1,__Count,1),
            "Product",PATHITEM(__SortText,[Value],TEXT)
        )
RETURN
    __Table

 

So, in the end, really not very much DAX code at all and rather simple. Get your source table and store it in the variable __SourceTable. Get a count of rows using COUNTROWS and store this in __Count. Concatenate the row values together using CONCATENATEX, along with the pipe character. This is where we can do sorting if we so desire using the 4th and 5th parameter. See the PBIX attached to the Quick Measure for an example. Now use GENERATESERIES from 1 to __Count to create our index column and because we used the | character, we can use PATHITEM to retrieve values by position within a hierarchy where every element except the last has one and only one child.

Boom! Easy...

Performance and Limitations

Granted, the example supplied has a table of only a single column. Obviously things get more complex if you have multiple columns. That’s an area where I am actively experimenting on the best approach. However, I do see uses for this technique even with only a single column of values because of how this technique handles duplicates when sorting. You see, RANKX puts the same rank for duplicates whereas this technique allows a unique ranking even with duplicates. That, honestly, is worth the price of admission in certain circumstances.

But how does it perform? Certainly there must be a reasonable maximum. You can’t have a parent child hierarchy with a million levels, right? You can’t just have a string that is infinitely long, right? Well, I have kicked the tires on some of the reasonable boundaries. I started by creating tables of random numbers in powers of 10 rows, so 10 rows, 100 rows, 1000 rows, 10,000 rows and I found that performance was either sub-second or just a couple seconds.

Ah, but what about 100,000 and 1,000,000 rows you ask? Well, you will be surprised to learn that…OK, really no surprise, performance seriously falls off after about 10,000 rows. As in, 100,000 rows took about 5 hours to create the indexed table and 1,000,000 rows took about 15 hours. Soooo…not certain of why the exponential drop off in performance between 10,000 and 100,000 rows but yeah, there is a practical maximum to this thing.

I am absolutely astounded though that 1,000,000 rows can return AT ALL. I mean, think about that, that means that you essentially have a text string that is several million characters long and a parent child hierarchy with a MILLION levels. That’s, that's unbelievable!! And by unbelieveable, I mean the fact that I have nothing better to do than to wait around 15 hours for a DAX calculation to return... 😅...😗...🤔...😞

Big Deal?

So why is creating an index column with DAX a big deal. Well, it is and it isn’t. It isn’t because, I mean, let’s face it people, this is information technology. Tends to not be life or death. Sure, it can be at times and we might get the impression that it is when, for example, a database or network wifi at work goes down but the reality is that 99.99% of the time information technology just really isn’t that important with regard to real lives being on the line. I mean, ask someone trapped in a burning building or sick with COVID-19 whether they care more about their wifi network connection or their first responder or doctor and it’s not even close. So, in the grand scheme of things, it is not a big deal at all. Let’s keep some perspective here people.

And even within the tiny microcosm of Power BI and DAX, is it really a big deal that we can create a table with an index by only using DAX? Again, ehhhhh, not really. I mean the actual practical uses for such a thing are extremely limited. Let’s be perfectly clear here folks, except in incredibly rare circumstances you should not be using DAX to create index columns for you. You should be doing that within the source data or within Power Query.

But, with all that being said, here is why I think that the DAX Index is kind of a big deal. It is kind of a big deal because proving that one can create an index using only DAX defies four, five or more years of conventional wisdom. That’s not insignificant. You see, for years conventional wisdom has said that creating an index column in DAX is impossible. And I have been just as guilty as everyone else of buying into and spreading the conventional wisdom which; honestly, is not like me at all!

You see, conventional wisdom about a DAX created index column is that it was impossible because “sort order within DAX is not guaranteed” is beaten into everyone’s head that reads the documentation on DAX functions. Plus, any and all attempts to create anything resembling an index using something like RANKX have all ended in miserable failure due to the, at times, infuriating way in which RANKX handles duplicates. Let’s put it this way, RANKX’s handling of duplicates is absolutely infuriating if you are trying to create an index column using DAX. It's been tried multiple, multiple times.

So, we old souls of DAX would knowingly smile to ourselves when newcomers to the world of DAX would naively ask if such a thing as a DAX index were possible. And we would politely respond that it was just not possible for DAX to do such a thing, DAX had no concept of “before” or “next” without some reference column like a date field or an already existing index column. “Ho, ho silly, naïve youngling, how adorable a question. Obviously you have much to learn and understand about DAX young grasshopper.” we would say. Well, OK, not those exact words because…yeeeesh! But, you know, sort of kind of those words in a far less arrogantly vomitous sort of way.

And thus, everyone just accepted the fact that you cannot create an index column in DAX and the entire community perpetuated this story based on past shared experiences and the DAX index became this mythical, ever elusive creature that would certainly be magical if ever found and just as certainly never would be. As tantalizing as a unicorn but just as imaginary.

And thus people stopped trying or even stopped thinking of trying to create and find it and the DAX index slowly passed into the realm of myth and legend. And there-in lies the problem in my opinion. We must never stop trying to push the limits of things. Whether it is the limits of some obscure coding language like DAX or the limits of what we ourselves are capable of learning and doing or the limits of the entire human race and what we collectively can accomplish.

I firmly believe we learn far more by failing than succeeding. Maybe that is because of my background as a classically trained engineer. Failure is a big part of engineering. It is why engineers stress test things, to find out when and why things break. Once you know when and why something breaks, you can engineer those things to be better and more durable. This is one of the main reasons I like to at least try to do things in DAX, regardless of whether it makes logical sense or not. By trying and succeeding, I learn something. By trying and failing, I learn even more.

So, at least for me, yes, the invention of the Mythical DAX Index is very much a big deal. It is yet again a reminder for myself that I must continually try to push the envelope, try new things, question convention, think beyond artificially imposed constraints and dabble in the impossible by asking why not instead of why.

Conclusion

The Mythical DAX Index Quick Measure has extremely limited uses but so what? It proves the fallacy of listening to convention, being constrained by dogma, limiting your creativity and artificially selling yourself short just because other, “wiser” people say it cannot be done or that it is foolish. Don’t fall into the trap. Always question convention, always question authority and always attempt the impossible. If for no other reason than because you can.

We should all thank @isThisABug for not having conformed to convention or been jaded by dogma. For pushing the envelope. Without that crazy attempt at a DAX bubble sort, something many folks would immediately write off as having no business being done in DAX, the Mythical DAX Index might have remained just that, a myth.

Comments

Amazing post @Greg_Deckler.  You are a pro of DAX and I see you are so proud and defending that side. This post and the one I posted were so useful. First I understood many things about DAX and M, and with all this sort of stuff you are helping so much the community by making us think from another perspective.

 

Thank you @Greg_Deckler. Again, great display of ideas!