Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
victorbetancurt
Regular Visitor

Distinct Count with value in same table

Hi everyone.

Beforehand, sorry for my bad english.

I have a table like this ('Table1'):

CODESTATE
1015A
1016A
1017B
1015B
1016B
1017B
1015C
1015A
1017B


I want a solution as optimal as possible. (I'd rather calculate with a Power Query function, not DAX).

I want a function (or formula, whatever) that helps me to count every different state for each row with the same code.

My result should be this:

CODESTATERESULT
1015A3
1016A2
1017B1
1015B3
1016B2
1017B1
1015C3
1015A3
1017B1

I explain the result: For code 1015 is 3 because there are 3 unique values for 1015 (A,B,C). For code 1016 is 2 there are 2 unique values (A,B). For code 1017 is 1 because there are 1 unque value for 1017 (B).

1 ACCEPTED SOLUTION

Another thing you may want to try then is instead of appending the two data tables, merge them side by side.

 

So intead of your original table, you end up with something like this:

 

CODEPREV_STATECUR_STATECHANGED
1015AATRUE
1016ABFALSE
1017BBTRUE

 

If whatever index you're using for each row stays the same between days, this may be a better way to store your data.  Be sure to check my previous reply for DAX code to solve your original problem

View solution in original post

12 REPLIES 12
Cmcmahan
Resident Rockstar
Resident Rockstar

Though I'm not sure why you want this as a PowerQuery function instead of as a DAX, but it's possible.

 

Go into the Query Editor, and at the far left of the Transform tab, you should see a Group By button.  Go through that wizard and Group By both Code and State.  Here's the entirety of the Power Query I used to create the results table as you have with the data snippet provided, with the bolded section being the important one:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjQwNFXSUXJUitUBc8yQOeZAjhOMY4rMMUPmYChzRuagmxYLAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Code = _t, State = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Code", Int64.Type}, {"State", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Code", "State"}, {{"Result", each Table.RowCount(_), type number}})
in
    #"Grouped Rows"

You should be able to copy/paste the above into the advanced editor and play with it directly.

Cmcmahan, I did it, but I lost the another columns, I want to keep them (in the example bottom I didn´t show them).

PD: I suggested Power Query for better performance, (I think). If there is another way to achieve this without impacting performance, feel free to help me.

Hi @victorbetancurt 

 

Please see the below M code for Custom Column 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjQwNFXSUXJUitUBc8yQOeZAjhOMY4rMMUPmYChzRuagmxYLAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Code = _t, State = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Code", Int64.Type}, {"State", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each let c = [Code], s = [State] in 
Table.RowCount( 
    Table.SelectRows( 
        #"Changed Type", 
        each [Code] = c and [State] = s
    )
), Int64.Type)
in
    #"Added Custom"

Regards,
Mariusz

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

It's not working. The query generates (loading here) a data processing of more than 3 gb.

So the answer depends on what data is in those other columns.   If it's data that would make sense to aggregate (either through COUNT, SUM, AVERAGE, etc) you can add it in the Group By wizard, or with the advanced Power Query editor like so:

 #"Grouped Rows1" = Table.Group(#"Changed Type", {"Code", "State"}, {{"Result", each Table.RowCount(_), type number}, {"Highest Price", each List.Max([Price]), type text}})

 

If you do not have data you want aggregated, you need to create a separate table and then do the Group By on that like before.  

 

The problem with Power Query is that there isn't a way to dynamically copy another table.  When you press copy, it copies a snapshot of the current table, so if you upload new data, you have to copy the table over again, or set the new table to load from the same data source.

 

I'm still not sure why you would strongly prefer Power Query instead of DAX for this.  Creating a custom summary table is incredibly easy in DAX, and I feel like if you have millions of rows of data, having to copy and then do Power Query transformations on the second table seems like a ton of extra processing.  

 

I tried to search and see if there are performance differences for Power Query vs DAX, and can't find any information on that. And since DAX is built to create measures exactly like this, I would use that.

Ok, I understand, could you give me a solution using DAX? Honestly, I got lost in your explanation. If you think It's possible with DAX, go ahead.

 

Maybe if I explain my scenario, it would be better for you.

I got 2 csv with the same structure, one of them has the transactions of yesterday, others of today.
I made a query, whose data source is a folder that contains these files, and creates a single table identifying Previous Data vs. Actual Data in a column. What I need is to easily filter the codes where the state changes from one period to another. And therefore, I thought about creating a column that would do the unique count of the states for each code using a filter object or a graph.

No problem, I can get wordy and over explain stuff.  I also think I misunderstood your original example and was rolling it up so you only saw each count once. 

 

Let me see if I can get your use case right, because it sounds like counting all the rows with the same code and state and appending that to each row wouldn't help much. 

 

So you have sales data from X, and sales data from Y.  You've merged them into one table, with a column identifying whether it's type X or Y data.  In this data, there is a state signifying something about that day's data.  What you're trying to accomplish is a way to easily see whether data with ID 123 in X is in a different state than data with the same ID 123 in Y?

 

Do you always have exactly two sets of data (previous and current), or is the plan for there to eventually be lots of data sets?

I explain it better. It's the same database (unfortunately the software who generates this data is outdated), but I created an external query that generates the entire database daily and stores it in a .csv file. Then, in a designated folder, I store them.

In power query I configured a query that uses that folder as a data source, and what I do is identify the most recent one and the one immediately above with a tag. What I need is to easily filter those data with changes in a field, from one day to the next. That is why I would like to create a custom column that stores the number of different values in a field for each record, knowing that the final table has two records with ID 123, one with the "Previous" tag and the other with the "Current" tag.
I could then use that column as the key field in a filter object where I list all records with more than a single value for the analyzed field (state).

Another thing you may want to try then is instead of appending the two data tables, merge them side by side.

 

So intead of your original table, you end up with something like this:

 

CODEPREV_STATECUR_STATECHANGED
1015AATRUE
1016ABFALSE
1017BBTRUE

 

If whatever index you're using for each row stays the same between days, this may be a better way to store your data.  Be sure to check my previous reply for DAX code to solve your original problem

Thanks, I'll try later the option with EARLIER (I could not understand very well the logic of that function). I had previously done that table through a query, so that solution was perfect for me. It would be good to know if Power Query really has better performance than DAX (I think so).
Thank you.

As far as performance, I think it depends on what you're doing.  Filtering and cutting down a large data set, I believe PowerQuery to be the better tool. But when it comes to calculating field values, especially complex ones, I think DAX is the way to go.

What EARLIER does is to step out the context a level.  So when a DAX column is being calculated, it does the calculation for each cell.  One of the shorthand things we do is say [column] to refer to the entire column.  EARLIER([column]) in a measure will step back one level and get the value for [column] on the current row.  This can be used to step out of GROUPBY clauses as well.

Also, if you still want the count added to each row, here's the DAX code to do that:

Result = CALCULATE( COUNT(Codes[Code]), FILTER(Codes, ([Code]&[State]) = (EARLIER([Code]) & EARLIER([State])) ) )

Just click your table, go to the Modeling tab, and add a new column.  

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.