Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Lenihan
Helper III
Helper III

append data from two columns only for distinct count.

Hi, 

So I have a table of about 25 columns, 35K rows of data.

I want to take two of those columns, append the data to make once column of about 70K of data. Then, remove duplicates.

 

Would this need to be a new table? If I create a calculated column, I'm going to end up with 70K rows of data, where 35K of them have null value in the other 24 columns columns. Unless there is a programmable way to just do a distinct count of all those rows of data. The possibility of the data that I'm trying to manage is about 50K different possibilities, so I can't just search for a key word or anything. 

If I did it in excel I would do exactly that - copy / paste all the data from one column onto the bottom of the other column, then click remove duplicates. 

 

I've searched, but the other options/questions I've found are people trying to do distinct count where they can concatenate the columns of data into a new column with the same # of rows as the original table.  This is different.

1 ACCEPTED SOLUTION

Thank you Imke for your patience and assistance. I was finally able to get it using the old copy/paste/paste/merge/remove duplicates way. I was able to get the chart right too - apparently I had the relationship pointing the wrong way. Once I changed the direction, it is now displaying the right values.

View solution in original post

19 REPLIES 19
tango1201
Frequent Visitor

Not sure if this will help, but when I want to stack two columns of a table into one, I move the two columns to the end (far right) and then simply unpivot the two columns.  This leaves you with an "Attribute" column where the attributes are the two column headers and a "Value" column which is the stacked column of data.

@tango1201 That was exactly what I was looking for. Thanks!

Greg_Deckler
Super User
Super User

I would copy your query, use an Append Query step to append your original query, use a calculated column in your query to concatenate your columns and then do a remove duplicates on that column. Although, I'm not exactly certain of your use case as it sounds a bit strange, I'm therefore not sure I have the solution correct.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

The thing though is I only need the number of distinct values in those columns.

 

For example - here is a data analogy

 

NameData1Name2Data3
a3c6
a4c7
b4a2
d5a2
c6a2
b2d4
a4c4
d5d6
d6c6
c2a4
a1b5
a5b6
c6d7
d7d5

 

I want to take all the values under Name, and Name2, and count how many unique. (except I'd have more than 4 different options of course, I have potentialy 40K unique values as they are configuration items in a CMDB). An item could appear under Name, or Name2, or could appear multiple times under both. 

If I append my data, I'm going to end up with double of the values under Name - but not necessary Name2 under Name. I essentially want to copy the full row of data in Name2 under Name.  The rest of the columns don't matter for what I need. I could create two copies of the query, delete all other columns, rename the column so they match in both new queries, then append them, but it seems like there should be a better way. 

You can take this simple formula to do this:

 

List.Count(List.Distinct(List.Union(Table.ToColumns(Source))))

 

Where "Source" is the reference to the table to be analyzed.

 

It reads inside-out as in Excel:

Table.ToColumns: transfers every column of your table to a list and then combines all those lists into a big list (nested list of lists). 

List.Union: combines them 

List.Distinct: removes dups

List.Count: does what it says 😉  

 

This is case-sensitive. Pls let me know if you want a case-insensitive count instead.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Thank you for the reply,

 

With this being just a number - where would this be created? It doesn't appear to be a new column since just a number, but when I try to create a new measure, it won't let me select any tables as the source, only existing measures I've already created.

 

I tried doing this formula too, based on a post from your earlier for someone else, 

List.Distinct(List.Union([CI Identifier], [Child CI Identifier]))

 

with CI Identifier and Child CI Identifier being the two columns I'm trying to merge and I was going to just then do a count on them, but I get this error:

 

Expression.Error: We cannot convert the value "CI00000085018" to type List.
Details:
    Value=CI00000085018
    Type=Type

 

In every row. the CI######## are the values within the rows of data. 

Having difficulties understanding your request.

 

My formula actually returns just a single value (the number of distinct values in the whole table) and would be loaded as a table with one column and one row to your data model where it could be adressed by any DAX-function.

 

Do you need the number of distinct values in just some specific columns?

Or sth totally different? If so, pls post a clear picture of input-data and desired output-data.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Hi,

Sorry didn't mean to confuse. The solution you provided sounds like what I want. Since I was having challenges, i was trying to do it other ways too

 

So I clicked on New Table, and entered this:

 

Table = List.Count(List.Distinct(List.Union(Table.ToColumns('CI Relationship Report'))))

 

 

But I got this error:

The syntax for 'Count' is incorrect. (DAX(List.Count(List.Distinct(List.Union(Table.ToColumns('CI Relationship Report')))))).

I think you are confusing Power Query (shaping) and Power Pivot (Modelling). New Table is power Pivot, List.Count is POwer query

 

http://exceleratorbi.com.au/shaping-modelling-power-bi/

 



* Matt is an 8 times Microsoft MVP (Power BI) and author of the Power BI Book Supercharge Power BI.

Hey Matt @MattAllington - congrats to your 1000 st post 🙂

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Oh wow, I new it was coming but didn't notice it tick over.  I guess I just ruined it with this reply!



* Matt is an 8 times Microsoft MVP (Power BI) and author of the Power BI Book Supercharge Power BI.

I'm still having challenges..  I right clicked, selected Reference. Entered the formula, but got that error.  I tried without the quotes as well, but that errors too.

 

I've also never used powerpivot or powerquery, so the blog post link from Matt didn't help me . sorry. I'm new too all this.. Excel functions and formulas are what I was only ever used to. Learning lots though!

 

error1.jpg

It looks as if you are trying to modify the first step of your new query instead of creating a second one like I tried to suggest.

 

But never mind, you can do it like that as well. You just have to use a special syntax in order to reference elements in M, who contain special characters, numbers or blanks: Wrap them in #"....".

 

So: #"CI Relationship Report" in your case.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Sorry, I didn't mean to modify the first step. I must have misunderstood.

 

I followed these steps this time then (hopefully did it as was expected)

 

Open the query editor, right click then click Reference then clicked on the Fx, pasted the formula.


This time no error - but it just keeps processing. In the bottom right corner, it counts the mb usage from the table itself. The filesize on the table that it is processing is only 9.5 MB; however, this formula was processing over 300 MB in size before I clicked on cancel.

I tried pasting this formula
List.Count(List.Distinct(List.Union(Table.ToColumns(Source))))

 

and this one

 

List.Count(List.Distinct(List.Union(Table.ToColumns(#"CI Relationship Report))))

 

But neither will finish processing. They keep running.

 

So I'm really starting to pull my hair out here.. I've spent over 30 hours for just one chart of about 40 I need.. 

 

I don't know if it is a relationship issue between my tables. Instead of trying to do count the unique values as a formula, i decided to a brute force solution - copy the data tables, delete all tables except the ones I want to stack, rename them then append them. It gave me the number I need.  I then added a column called "Reporting Month" and set it to January 2017.

 

So.. I'm trying to use that number now in a formula (Measure)

 

No Relationship % = (count('All CIs'[CI Identifier]) - COUNT('UniqueCIRelationships'[All CI Identifiers])) / COUNT('All CIs'[CI Identifier])

 

ALL Ci's is my main table. It has a list of all CIs that exist in the database

UniqueCIRelationships is the table i created with only those CIs that have a known relationship to other CIs. This is the one I appended then deleted duplicates.

 

If i do a count of (count('All CIs'[CI Identifier]) it equals 51,010  (good)

If I do a count of COUNT('UniqueCIRelationships'[All CI Identifiers])), it comes out to 23,466  (good)

The math for Relationship % comes out to 53%. Good there too

 

Now I try to put that in a table, I add the column "Reporting Month" which is a column in every table I have. They all are equal to January 2017 right now. The visual displays a percentage of -26.7%  ??  When I check the total of All CIs - Unique CIs, it comes out to a negative number on a visual if I plot it against the Reporting month column..but without adding that month and just making a card, the value is right.  

 

And unfortunately I'm not able to share my data so I'm unsure where to go next. 

Thank you Imke for your patience and assistance. I was finally able to get it using the old copy/paste/paste/merge/remove duplicates way. I was able to get the chart right too - apparently I had the relationship pointing the wrong way. Once I changed the direction, it is now displaying the right values.

Sorry @Lenihan to see that the performance of my formula is so poor.

This was a surprise for me as well and I tried to speed it up with no success. M has some unbuilt algorithms that might cause the source to be hit multiple times and there is nothing we users can do about it at the moment (even buffer doesn't work).

But I must admit that I've never come across sth bad like this... 😞

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Hi @Lenihan,

 

Great to hear the problem got resolved! Could you accept the corresponding reply as solution to help others who may have similar issue easily find the answer and close this thread?Smiley Happy

 

Regards

No prob 🙂

 

You need to use this in the query-editor and not in the data model/tab.

 

Open the query editor -> move your mouse over the query-name in the left navigation pane (Queries...) -> rightclick mouse -> check "reference":

 

Then a new query will be created with one step in it: "Source" with the reference to your table. Click on the fx like shown in the picture will create a new step where you paste the formula:

 

PBI_CountAllDistinctValuesInTable.png

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors