Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Arul

Substring Occurrence Analysis Using DAX in Power BI

Problem Statement:

Let's assume you are tasked to analyze the data in the which comprising two columns: "Number of Items" and "Items." Each row represents a group of items, where the "Number of Items" column indicates the count of items within a group, and the "Items" column lists the items separated by commas. You need to do it without adding any additional column to your data model. 

Your objective is to calculate the occurrence of a specific substring within the "Items" column while considering comma delimiters within each item group.For example, if you are asked to count occurrences of "Apple_", your solution should count how many times "Apple_" appears within each group of items.

 

Solution:

Step 1: Create the calculated column.

Step 2: Use the following DAX expression in a calculated column.

Count Column =
LEN ( SUBSTITUTE ( 'Table'[Items], "Apple_", "Apple_0" ) )
    - LEN ( 'Table'[Items] )

 

Logic behind the DAX expression:


a. SUBSTITUTE('Table'[Items], "Apple_", "Apple_0") - The SUBSTITUTE function in Power BI replaces occurrences of a specified substring ("Apple_") within the "Items" column with a new string ("Apple_0"). This effectively cocatenate "0" to each occurrence of "Apple" in the original text.

 

b. LEN(SUBSTITUTE('Table'[Items], "Apple_", "Apple_0")) - This calculates the length of the modified text string after performing the substitution in previous step. The LEN function returns the number of characters in a text string.

 

c. LEN('Table'[Items]) - This function calculates the length of the original "Items" column values. 

 

d. LEN(SUBSTITUTE('Table'[Items], "Apple_", "Apple_0")) - LEN('Table'[Items]) -  This calculates the difference in length between the original "Items" column and the modified version with "0" appended to each occurrence of "Apple". The difference in length represents the total number of characters added by the substitution, which corresponds to the number of occurrences of "Apple_" in the original string.

 

Here's how the process works: Each occurrence of "Apple_" within an item group is replaced with "Apple_0", effectively increasing the length of the string. By comparing the lengths before and after the substitution, the difference gives the count of occurrences of "Apple_" within the item group.

 

Conclusion:

In conclusion, analyzing the dataset comprising "Number of Items" and "Items" columns without adding additional columns to the data model showcases the flexibility and efficiency of Power BI's capabilities.This approach preserves the simplicity and integrity of the original data structure while enabling robust data analysis using DAX.