Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

DAX script is running slow for large dataset scenario.(1 million records)

Hi team, 

 

I have a datset with 1 million rows, which are loaded into PowerBI  by ODBC(MongoDB).  Below is some sample data.

 

I want to extract each element in multi-vlaue column "C_DS" into a new table , and then stats the value count and create a pie cart by creating a new measurement.

Here is the two Dax scripts I use:

1. Extract each element

 

 

 

C_DS_label = 

VAR SplitByCharacter = ","
VAR Table0 =
    SELECTCOLUMNS(
    ADDCOLUMNS (
        GENERATE (
            ROW ( "Text", CONCATENATEX(SampleData,SampleData[C_DS],",") ),
            VAR TokenCount =
                PATHLENGTH ( SUBSTITUTE ( [Text], SplitByCharacter, "|" ) )
            RETURN
                GENERATESERIES ( 1, TokenCount )
        ),
        "Word", PATHITEM ( SUBSTITUTE ( [Text], SplitByCharacter, "|" ), [Value] )
    ),
    "Word",[Word])
RETURN
    SUMMARIZE(Table0,[Word])

 

 

 

 

2. stats the value count

 

 

 

Count_C_DS =
var curr = SELECTEDVALUE(C_DS_label[Word])
return
calculate(COUNTROWS('SampleData'), FILTER(ALL('SampleData'[C_DS]), PATHCONTAINS(SUBSTITUTE ( SampleData[C_DS], ",", "|" ) , curr) ))

 

 

 

 

But I found that  the first DAX script(extract each element) is running very very ....very slow. It has run for 3 hours for 1 million records and hasn't been ended yet.

 

can any expert help take a look and advice how to improve the DAX script?

 

Thanks,

Cherie

1 ACCEPTED SOLUTION

Great @Anonymous 
Please use

C_DS_label =
DISTINCT (
    SELECTCOLUMNS (
        GENERATE (
            FILTER ( VALUES ( SampleData[C_DS] ), SampleData[C_DS] <> BLANK () ),
            VAR NumofWords =
                PATHLENGTH ( SUBSTITUTE ( [C_DS], ",", "|" ) )
            VAR NumSeries =
                GENERATESERIES ( 1, NumofWords )
            RETURN
                ADDCOLUMNS (
                    NumSeries,
                    "@Word", PATHITEM ( SUBSTITUTE ( [C_DS], ",", "|" ), [Value] )
                )
        ),
        "Word", [@Word]
    )
)

View solution in original post

4 REPLIES 4
tamerj1
Super User
Super User

 Hi @Anonymous 
Please try 

C_DS_label =  
SELECTCOLUMNS (
    GENERATE (
        FILTER ( VALUES ( SampleData[C_DS] ), SampleData[C_DS] <> BLANK () ),
        VAR NumofWords = PATHLENGTH ( SUBSTITUTE ( [C_DS], ",", "|" ) )
        VAR NumSeries = GENERATESERIES ( 1, NumofWords )
        RETURN
            ADDCOLUMNS ( 
                NumSeries,
                "@Word", PATHITEM ( SUBSTITUTE ( [C_DS], ",", "|" ), [Value] )
            )
    ),
    "Word", [@Word]
)

1.png

Anonymous
Not applicable

Hi @tamerj1 ,

 

Thank you for the advice. I tried, and it's very fast to extract the values. But the extracted values looks like duplicated, and not unique . What I expect is to extract the disctinct unique values from the C_DS column.

Here is the actual result in powerBI Desktop.

cheriemilk_0-1653381259530.png

 

Expected Result I want:

Word
Profile_Criteria
Keyword
Faceted_Criteria
Backgroud_Criteria
Rating_Criteria

 

Great @Anonymous 
Please use

C_DS_label =
DISTINCT (
    SELECTCOLUMNS (
        GENERATE (
            FILTER ( VALUES ( SampleData[C_DS] ), SampleData[C_DS] <> BLANK () ),
            VAR NumofWords =
                PATHLENGTH ( SUBSTITUTE ( [C_DS], ",", "|" ) )
            VAR NumSeries =
                GENERATESERIES ( 1, NumofWords )
            RETURN
                ADDCOLUMNS (
                    NumSeries,
                    "@Word", PATHITEM ( SUBSTITUTE ( [C_DS], ",", "|" ), [Value] )
                )
        ),
        "Word", [@Word]
    )
)
Anonymous
Not applicable

@tamerj1  thanks, works perfect.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.