How to filter multivalued column in Power BI

by stansw Frequent Visitor on ‎06-28-2018 04:03 PM

Recently someone asked me what is the best way to filter on multi-valued columns in Power BI. The question was in the context of Tags property of the Work Items - Today table in the Visual Studio Team Services [1]. Although the original question was very specific, the solution I came up with can be generalized to multivalued columns in any data model.

 

What is a multivalued column?

Multivalued column is a database design pattern where instead of normalizing and splitting data across multiple tables you keep multiple values in a single table. You can see it typically in the data warehouses where normalization would lead to a too granular fact tables. One of the best examples is the Categories column in the Product table where want to allow users to select multiple values, but you don't want to create a separate Categories table.

 

In the context of Visual Studio Team Services there is one table where this pattern was applied - Work Items - Today. It contains Tags column which is a "; " delimited list of tags like in the example below.

 

Work Item IdTitleTags
1Add column A to table Bdatabase; milestone1
2Create migration scriptdatabase
3Improve performance of slow queriesperformance; database
.........

 

Problem statement 

Given table with a multivalued column prepare data model that will allow users to easily filter on distinct values.  For example, we can start with the table below, which has multivalued Tags column.

 

let
    Source = #table(
        {"Work Item Id", "Title", "Tags"},
        {
            { "1", "Add column A to table B", "database; milestone1" },
            { "2", "Create migration script", "database" },
            { "3", "Improve performance of slow queries", "performance; database" }
        })
in
    Source
 

 

If we simply selected Tags for the slicer it would produce the following result. Instead of values users could only select combinations that appear in the dataset. That's not what we want.

 

 

A much better design is to extract distinct values from the Tags column so that we can build the following slicer.

 

 

Solution

The solution I would like to show you is based on the post by SQLJason where he talks about handling delimited rows [2]. I modernized and improved it a little to cover columns of arbitrary length and to avoid contaminating model with auxiliary tables. The idea stays the same and can be broken down into the following steps.

 

  1. Create temporary index table.
  2. Apply CROSSJOIN operation and convert source table from wide to long format.
  3. Define relationships.

 

In my previous post "Creating index table in DAX" I explained how to create index table for a given N. Here, N should be selected as the max number of elements in the multivalued column.

 

MaxLength =
VAR Separator = "; "
RETURN
    MAXX (
        'Work Items - Today',
        1 + LEN ( [Tags] )
            - LEN ( SUBSTITUTE ( [Tags], Separator, "" ) )
    )

 

Now we can use this DAX expression and create Indexes table.

 

Indexes =
VAR Separator = "; "
RETURN
    FILTER (
        SUMMARIZE (
            ADDCOLUMNS (
                CALENDAR (
                    DATE ( 2000, 1, 1 ),
                    DATE ( 2000
                        + MAXX (
                            'Work Items - Today',
                            1 + LEN ( [Tags] )
                                - LEN ( SUBSTITUTE ( [Tags], Separator, "" ) )
                        ), 1, 1 )
                ),
                "Index", YEAR ( [Date] ) - 2000
            ),
            [Index]
        ),
        [Index] > 0
    )
Index
1
2
3

 

We do not need to store this table in our model. Instead, we can simply save it in the DAX variable and reuse later.  

 

The final expression consists of the following operations:

  1. Save separator in the variable.
  2. Create index table.
  3. Add TagsCount to the Work Item - Today table to keep track of the index range.
  4. Apply CROSSJOIN with Indexes table.
  5. Filter out indexes that are outside of the range.
  6. Use PATHITEM to extract single value from the multivalued field by index and save it in Tag column.
  7. Summarize to reduce set of columns in the output table.

 

Tags =
VAR Separator = "; "
VAR Indexes =
    FILTER (
        SUMMARIZE (
            ADDCOLUMNS (
                CALENDAR (
                    DATE ( 2000, 1, 1 ),
                    DATE ( 2000
                        + MAXX (
                            'Work Items - Today',
                            1 + LEN ( [Tags] )
                                - LEN ( SUBSTITUTE ( [Tags], Separator, "" ) )
                        ), 1, 1 )
                ),
                "Index", YEAR ( [Date] ) - 2000
            ),
            [Index]
        ),
        [Index] > 0
    )
RETURN
    SUMMARIZE (
        ADDCOLUMNS (
            FILTER (
                CROSSJOIN (
                    ADDCOLUMNS (
                        'Work Items - Today',
                        "TagsCount", 1
                            + ( LEN ( [Tags] ) - LEN ( SUBSTITUTE ( [Tags], Separator, "" ) ) )
                                / LEN ( Separator )
                    ),
                    Indexes
                ),
                [Index] <= [TagsCount]
            ),
            "Tag", PATHITEM ( SUBSTITUTE ( [Tags], Separator, "|" ), [Index] )
        ),
        [Work Item Id],
        [Tag]
    )

 

It will produce the following result. This table captures relationship between work items and tags.

 

Work Item IdTag
1database
2database
3performance
1milestone1
3database

 

Now, we need to define relationships and specify cross filtering direction. First, task is easy because most likely Power BI will automatically detect the relationship like in the example below.

 

 

The automatic relationship is a standard one-to-many relationship, which means that it will allow us to filter Tags based on Work Item - Today selection. That is exactly opposite of what we need.  Double-click on the relationship to open the advanced editor and under "Cross filter direction" select "Both".

 

 

Finally, create a new slicer with Tag field from the newly created Tags table to get the best filtering experience! You can also try out amazing Smart Filter custom visual, which fits perfectly for this scenario.

 

References:

  1. Team Services & TFS - Available data tables in the Power BI Data Connector for Team Services
  2. Split a Delimited Row into Multiple Rows using DAX Queries
  3. It is not overengineering - Creating index table in DAX
  4. Custom visuals for Power BI - Smart Filter by OKViz
  5. SQLBI - Best Practices Using SUMMARIZE and ADDCOLUMNS
  6. Power BI Documentation - Calculated tables in Power BI Desktop

Comments
by natolira Frequent Visitor
on ‎06-30-2018 07:59 PM

Hey, nice approach.

 

I think this video gives an easier to implement approach:

by EdwardXAult Frequent Visitor
on ‎07-02-2018 12:55 PM

so happy to see this post as it's close to my challenge at hand. I have a tidy tag table which this described...though the tags are categorized. The goal is to cross tab tag categories and return file counts. Does this clean/tidy table lend itself to this or do I need to go the opposite way i.e. instead of having one-tag-table is a separate table needed for each tag category?

by stansw Frequent Visitor
on ‎07-12-2018 12:52 PM

@natolira, Thank you for posting the link to the video. Great to see similar solutions. There are some imporatant differences one should take into account when selecting solution that matches reporting requirements. Here are the most important ones.

 

Performance

Video uses SEARCH dax function in the measure. This means that search will ge evaluated at query time. If you work with large dataset this might be slow. In the approach I propose the Tags table which is calculated at the refresh time which means that after model is calculated queries run fast even for large datasets.

 

Flexibility

Solution proposed in the video focuses on the filtering scenario. It would be hard to include IsFiltered measure in other measures you may want to create. By creating calculated table you can treat it as a regular table in any other DAX expressions you may want to build on top of it.

 

@EdwardXAult, based on your description I would recommend ADDCOLUMNS to add Category column to Tags table. That way you will end up with a clean model while keeping number of tables small and keeping all the performance benefits of calculated table.