Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
WakeTurbulence
Frequent Visitor

Count comma delimited data

Hello,

 

I have a data dump whereby some fields can have more than one type in the same occurrence and other fields are multi-select which can give multiple categories in the same occurrence. I want to be able to count the total for each cell in Power BI but the solution seems to elude me despite searching exhaustively on the net. I'm sure the answer must be simple but can't seem to wrap my head around what I need to do.

 

Example:

 

Capture.PNG

 

Goal (aside from being able to slice, for example, "all occurrence categories for type a"):

Capture.PNGCapture.PNG

 

 

I also recognize that I can split the columns based on the comma delimiter but again, how would i agregate the data so that i am able to slice on a per occurrence basis? splitting by delimiter would work fine for the types but some of the categories have commas in their name so we would end up with categories being split in half and then not matching with the reference data.

2 ACCEPTED SOLUTIONS
Ashish_Mathur
Super User
Super User

Hi,

 

I do not have a fancy way to solve this.  All i can suggest is that we create two tables fro your source as shown below.  I have done so by using the Split column > By rows feature of Power Query.

 

Untitled.png


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

View solution in original post

Hi,

 

Do not split the data by columns.  SPlit it by rows as suggested in my previous post.


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

View solution in original post

16 REPLIES 16
Guido_Beulen
Frequent Visitor

Add a column in Power Query M with: 

= Table.AddColumn(#"[laststep]", "Count separated values", each Text.Length([ColumWhereToCountSeparatedValues])-Text.Length(Replacer.ReplaceText([ColumWhereToCountSeparatedValues],";",""))+1)

 

In this example the seperator is ";"

 

The DAX variation in form of a measure is: 

CountOfItems = LEN(MAX(List[List])) - LEN(SUBSTITUTE(MAX(List[List]),",","")) + 1

 Which was discussed in this topic.

Ashish_Mathur
Super User
Super User

Hi,

 

I do not have a fancy way to solve this.  All i can suggest is that we create two tables fro your source as shown below.  I have done so by using the Split column > By rows feature of Power Query.

 

Untitled.png


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

Thanks Ashish,

 

I was hoping to avoid this though. Does anyone else have any thoughts?

I have used power query to split the column into several columns so now i have columns: `type.1` `type.2``type.3` etc. in table 1. These columns all relate to the column `Type` in table 2. However, when i create relationships from each of the columns in table 1 to column 2 only the first one I create is active and the remainder are inactive. As a result, power BI is only counting the active relationship in its output. How can i make it count all of them as one statistic?

 

Untitled.png

 

In addition to the answer above, I would still prefer to leave this column unsplit and somehow count each report entered for one occurrence. for the most part there will only be 3 - 4 report types per occurrence but on the rare occasion there could be upwards of 20 reports for one occurrence. Obviously having 20 columns just for the off chance that this occurs is not ideal but if I limit the amount of columns the origin column is split in to say 10 then I am possibly loosing out on data.

 

 

Hi,

 

Please show your actual data and your expected result.


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

occurrence #report typedateroutelocation
1I21/08/2017 0:001A
2I22/08/2017 0:004B
3I22/08/2017 0:006G
4I23/08/2017 2:053T
5I23/08/2017 0:006V
6G23/08/2017 15:357T
7C,F,F23/08/2017 13:309T
8V,F22/08/2017 16:014T
9I24/08/2017 0:002E
10C24/08/2017 18:004E
11I22/08/2017 18:106G
12F,F,C,G23/08/2017 19:008B
13I23/08/2017 12:005A
14I24/08/2017 17:002A
15G22/08/2017 4:562T
16G07/08/2017 15:451V
17G18/08/2017 11:253V
18G18/08/2017 11:306L
19I20/08/2017 23:008M
20G21/08/2017 7:009M

 

I cant show the actual data but here is the same idea. I want to be able to have, say, a pie chart which shows the distribution of report type. When I slice the data for location T for example, I want to be able to see what report types are related to location T. I can get this result fine but the problem lies in the rows were there are multiple report types in one occurrence; Power BI does not count those results, it only counts the rows where there is only 1 type. If I split the report type column into multiple columns (type.1, type.2, type.3, etc), the new problem is that Power BI only counts the first column because the other relationships are rendered inactive because they all relate to the same column in table 2.

Hi,

 

Do not split the data by columns.  SPlit it by rows as suggested in my previous post.


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

but then the data will no longer be 'per occurrence'

Hi,

 

I believe it will if you use the the DISTINCTCOUNT() function.


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

I should also say that another goal is to simply count the total number of each report type submitted without slicing. 

Did you find any solution? I have been trying to solve this exact problem for 1 week already

Hi,

 

The solution from Ashish is pretty much it. Splitting the delimited data into rows works best. The id number will get duplicated but BI will accurately count each instance of the category.

 

For me this causes an unwanted duplication of all my other fields so my solutution for this is to split the category information out into a separate table that is linked to the host data table via the ID column.

 

I'm sure theres a better, more elegant way, but its working for me at this time until I sit down and try to work out a more elegant solution.

 

Wake

I should add that I do this for each field which contains delimited data.

 

I then have a measure which uses ROUNTROWS to get the total.

 

TOTAL = COUNTROWS(tblCategory)

Greg_Deckler
Super User
Super User

Just to clarify, from your sample data, the Types image should show an "a" of 3, correct? Because a occurs 3 times in your sample data?


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Mastering Power BI 2nd Edition

DAX is easy, CALCULATE makes DAX hard...

yes that is correct. I just put together the example in excel quickly and mis-counted.

Bump.

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.