Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
rupertbrown
Frequent Visitor

Performance Issues - Please Help

Hi there Super Users,

 

I'm trying to import a dataset with just over 1million rows of data and it is taking hours (2.5hrs now). I am concerned it may have something to do with the way I've structured my query, particularly in relation ot the merged queries.

 

In essence, there are some rows that have to be merged and the way I've achieved this is to import the data set twice, applying an index to the first dataset (inpatient_all), then also applying an index to the second dataset (inpatient), with the second dataset being the one I am manipulating (merging the rows).

 

If you're able to make sence of the M code (pasted below the signoff) and provide any suggestions on how I might make this more efficient, I'd really appreciate it.

 

Thanks heaps in advance for your time and assistance (really appreciate it as I've only been using Power BI for a week).

 

Cheers

Bec

 

let
    Source = Csv.Document(File.Contents("C:\Users\rlindberg\Desktop\DoH Data\hmdc_event_data_v1.csv"),[Delimiter=",", Columns=48, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"enc_person", type text}, {"age", Int64.Type}, {"hospital", Int64.Type}, {"mh_unit", Int64.Type}, {"sor_prof", Int64.Type}, {"sor_loc", Int64.Type}, {"afrom", Int64.Type}, {"diag", type text}, {"dagger", type text}, {"ediag1", type text}, {"ediag2", type text}, {"ediag3", type text}, {"ediag4", type text}, {"ediag5", type text}, {"ediag6", type text}, {"ediag7", type text}, {"ediag8", type text}, {"ediag9", type text}, {"ediag10", type text}, {"ediag11", type text}, {"ediag12", type text}, {"ediag13", type text}, {"ediag14", type text}, {"ediag15", type text}, {"ediag16", type text}, {"ediag17", type text}, {"ediag18", type text}, {"ediag19", type text}, {"ediag20", type text}, {"ecode1", type text}, {"ecode2", type text}, {"ecode3", type text}, {"ecode4", type text}, {"mdc_c", Int64.Type}, {"mhlstat", Int64.Type}, {"los", Int64.Type}, {"adm", type date}, {"sep", type date}, {"mos", Int64.Type}, {"dischto", Int64.Type}, {"epicar", Int64.Type}, {"mh_pdx", Int64.Type}, {"mh_adx", Int64.Type}, {"readm", Int64.Type}, {"Ever_Inpatient_MH", Int64.Type}, {"Ever_ED_MH", Int64.Type}, {"Ever_Community_MH", Int64.Type}, {"inpatient_mh", Int64.Type}}),
    #"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"adm", "START_DATE"}, {"sep", "END_DATE"}}),
    #"Added Custom2" = Table.AddColumn(#"Renamed Columns", "EVENT_SUB_CATEGORY", each if ([diag] <> null and Text.Start([diag],1) = "F") then "HOSP_MH_DIAGNOSIS_F"

else if ([diag] = "R45.81" or [diag] = "Z00.4" or [diag] = "Z03.2" or [diag] = "Z04.6" or [diag] = "Z09.3" or [diag] = "Z13.3" or [diag] = "Z13.4" or [diag] = "Z50.2" or [diag] = "Z50.3" or [diag] = "Z50.4" or [diag] = "Z54.3" or [diag] = "Z86.5" or [diag] = "Z91.4" or [diag] = "Z91.5") then "HOSP_MH_DIAGNOSIS_Z"

else if ([mdc_c] = "19" or [mdc_c] = "20") then "HOSP_MH_MDC"

else if [mh_unit] = 1 then "HOSP_MH_UNIT"

else if [mhlstat] = 1 then "HOSP_MH_INVOLUNTARY"

else if [mhlstat] = 2 then "HOSP_MH_VOLUNTARY"

else if ([ecode1] <> null and (
Text.Start([ecode1],2) = "X6" or
Text.Start([ecode1],2) = "X7" or
Text.Start([ecode1],3) = "X80" or
Text.Start([ecode1],3) = "X81" or
Text.Start([ecode1],3) = "X82" or
Text.Start([ecode1],3) = "X83" or
Text.Start([ecode1],3) = "X84" ) ) then "HOSP_MH_INTHARM"

else if ([ecode2] <> null and (
Text.Start([ecode2],2) = "X6" or
Text.Start([ecode2],2) = "X7" or
Text.Start([ecode2],3) = "X80" or
Text.Start([ecode2],3) = "X81" or
Text.Start([ecode2],3) = "X82" or
Text.Start([ecode2],3) = "X83" or
Text.Start([ecode2],3) = "X84" ) ) then "HOSP_MH_INTHARM"

else if ([ecode3] <> null and (
Text.Start([ecode3],2) = "X6" or
Text.Start([ecode3],2) = "X7" or
Text.Start([ecode3],3) = "X80" or
Text.Start([ecode3],3) = "X81" or
Text.Start([ecode3],3) = "X82" or
Text.Start([ecode3],3) = "X83" or
Text.Start([ecode3],3) = "X84" ) ) then "HOSP_MH_INTHARM"

else if ([ecode4] <> null and (
Text.Start([ecode4],2) = "X6" or
Text.Start([ecode4],2) = "X7" or
Text.Start([ecode4],3) = "X80" or
Text.Start([ecode4],3) = "X81" or
Text.Start([ecode4],3) = "X82" or
Text.Start([ecode4],3) = "X83" or
Text.Start([ecode4],3) = "X84" ) ) then "HOSP_MH_INTHARM"

else if ([dagger] <> null and Text.Start([dagger],1) = "F") then "HOSP_MH_CODIAGNOSIS_F"

else if ( ([ediag1] <> null and  Text.Start([ediag1],1) = "F") or ([ediag2] <> null and  Text.Start([ediag2],1) = "F") or ([ediag3] <> null and  Text.Start([ediag3],1) = "F") or ([ediag4] <> null and  Text.Start([ediag4],1) = "F") or ([ediag5] <> null and  Text.Start([ediag5],1) = "F") or ([ediag6] <> null and  Text.Start([ediag6],1) = "F") or ([ediag7] <> null and  Text.Start([ediag7],1) = "F") or ([ediag8] <> null and  Text.Start([ediag8],1) = "F") or ([ediag9] <> null and  Text.Start([ediag9],1) = "F") or ([ediag10] <> null and  Text.Start([ediag10],1) = "F") or ([ediag11] <> null and  Text.Start([ediag11],1) = "F") or ([ediag12] <> null and  Text.Start([ediag12],1) = "F") or ([ediag13] <> null and  Text.Start([ediag13],1) = "F") or ([ediag14] <> null and  Text.Start([ediag14],1) = "F") or ([ediag15] <> null and  Text.Start([ediag15],1) = "F") or ([ediag16] <> null and  Text.Start([ediag16],1) = "F") or
([ediag17] <> null and  Text.Start([ediag17],1) = "F") or ([ediag18] <> null and  Text.Start([ediag18],1) = "F") or
([ediag19] <> null and  Text.Start([ediag19],1) = "F") or
([ediag20] <> null and  Text.Start([ediag20],1) = "F") ) then "HOSP_MH_ADDDIAG_F"

else "HOSP_OTHER"),
    #"Sorted Rows" = Table.Sort(#"Added Custom2",{{"enc_person", Order.Ascending}, {"START_DATE", Order.Ascending}, {"END_DATE", Order.Ascending}}),
    #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1),
    #"Added Conditional Column" = Table.AddColumn(#"Added Index", "HomeIndex", each if [mos] = 5 then null else [Index]),
    #"Grouped Rows" = Table.Group(#"Added Conditional Column", {"enc_person"}, {{"SecondTable", each _, type table}}),
    #"Added Custom3" = Table.AddColumn(#"Grouped Rows", "Custom4", each Table.FillUp ([SecondTable],{"HomeIndex"})),
    #"Expanded Custom1" = Table.ExpandTableColumn(#"Added Custom3", "Custom4", {"enc_person", "START_DATE", "END_DATE", "mos", "los", "Index", "HomeIndex"}, {"enc_person.1", "START_DATE", "END_DATE", "mos", "los", "Index", "HomeIndex"}),
    #"Added Conditional Column1" = Table.AddColumn(#"Expanded Custom1", "Custom", each if [HomeIndex] = null then [Index] else [HomeIndex]),
    #"Removed Columns" = Table.RemoveColumns(#"Added Conditional Column1",{"SecondTable", "enc_person.1", "HomeIndex"}),
    #"Grouped Rows1" = Table.Group(#"Removed Columns", {"Custom", "enc_person"}, {{"START_DATE", each List.Min([START_DATE]), type anynonnull}, {"END_DATE", each List.Max([END_DATE]), type anynonnull}, {"los", each List.Sum([los]), type number}, {"min_INDEX", each List.Min([Index]), type number}, {"max_INDEX", each List.Max([Index]), type number}, {"S", each _, type table}}),
    #"Added Custom4" = Table.AddColumn(#"Grouped Rows1", "No_Stat_Sep", each Table.RowCount( Table.SelectRows([S], each [mos]=5))),
    #"Removed Columns2" = Table.RemoveColumns(#"Added Custom4",{"S", "Custom"}),
    #"Merged Queries" = Table.NestedJoin(#"Removed Columns2",{"min_INDEX"},inpatient_all,{"Index"},"NewColumn",JoinKind.LeftOuter),
    #"Expanded NewColumn" = Table.ExpandTableColumn(#"Merged Queries", "NewColumn", {"hospital", "mh_unit", "EVENT_SUB_CATEGORY","age"}, {"NewColumn.hospital", "NewColumn.mh_unit", "NewColumn.EVENT_SUB_CATEGORY","NewColumn.age"}),
    #"Merged Queries1" = Table.NestedJoin(#"Expanded NewColumn",{"max_INDEX"},inpatient_all,{"Index"},"NewColumn",JoinKind.LeftOuter),
    #"Expanded NewColumn1" = Table.ExpandTableColumn(#"Merged Queries1", "NewColumn", {"EVENT_SUB_CATEGORY"}, {"NewColumn.EVENT_SUB_CATEGORY.1"}),
    #"Renamed Columns1" = Table.RenameColumns(#"Expanded NewColumn1",{{"NewColumn.hospital", "Location"}, {"NewColumn.mh_unit", "Dedicated_MH_Unit"}, {"NewColumn.EVENT_SUB_CATEGORY", "EVENT_SUB_CATEGORY"}, {"NewColumn.EVENT_SUB_CATEGORY.1", "EVENT_SUB_CATEGORY_SEP"}, {"NewColumn.age", "age"}}),
    #"Removed Columns1" = Table.RemoveColumns(#"Renamed Columns1",{"los", "min_INDEX", "max_INDEX"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns1",{"enc_person", "age", "Location", "START_DATE", "END_DATE", "EVENT_SUB_CATEGORY", "EVENT_SUB_CATEGORY_SEP", "No_Stat_Sep", "Dedicated_MH_Unit"}),
    #"Merged Queries2" = Table.NestedJoin(#"Reordered Columns",{"Location"},#"WA Health Establishments",{"Estab ID"},"NewColumn",JoinKind.LeftOuter),
    #"Expanded NewColumn2" = Table.ExpandTableColumn(#"Merged Queries2", "NewColumn", {"Establishment Name", "Suburb"}, {"NewColumn.Establishment Name", "NewColumn.Suburb"}),
    #"Removed Columns3" = Table.RemoveColumns(#"Expanded NewColumn2",{"Location"}),
    #"Reordered Columns1" = Table.ReorderColumns(#"Removed Columns3",{"enc_person", "age", "NewColumn.Suburb", "NewColumn.Establishment Name", "START_DATE", "END_DATE", "EVENT_SUB_CATEGORY", "EVENT_SUB_CATEGORY_SEP", "No_Stat_Sep", "Dedicated_MH_Unit"}),
    #"Renamed Columns3" = Table.RenameColumns(#"Reordered Columns1",{{"NewColumn.Establishment Name", "Location"}, {"NewColumn.Suburb", "Region"}}),
    #"Added Custom" = Table.AddColumn(#"Renamed Columns3", "EVENT_CATEGORY", each "INPATIENT"),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "EVENT_GROUP_CAT", each if [EVENT_SUB_CATEGORY] = "HOSP_MH_DIAGNOSIS_F" then "HOSP_MH_1"
else if [EVENT_SUB_CATEGORY] = "HOSP_OTHER" then "HOSP_OTHER"
else "HOSP_MH_OTHER"),
    #"Reordered Columns2" = Table.ReorderColumns(#"Added Custom1",{"enc_person", "age", "Region", "Location", "START_DATE", "END_DATE", "EVENT_CATEGORY", "EVENT_SUB_CATEGORY", "EVENT_GROUP_CAT", "EVENT_SUB_CATEGORY_SEP", "No_Stat_Sep", "Dedicated_MH_Unit"})
in
    #"Reordered Columns2"

1 ACCEPTED SOLUTION
stretcharm
Memorable Member
Memorable Member

It's a bit tricky to go through your M, but here are a few tips that might help.

 

Get your intitial data set loaded and as small as possible e.g. remove unwanted columns get your datatypes correct e.g. date not datetime. You can then create a 2nd data set with a reference so you don't do that work twice.

 

There is quite a bit of conditional and string manipliation. Maybe try and create a new column for the Text.Start([ecode1],2) & Text.Start([ecode1],3) items and replace nulls with a dummy value. Then create a Lookup Query that can be joined in the Model rather that trying to do it for each row.

 

There is probably scope for other lookups that might have a composite key. I tend concatinate fields with a | to get my keys.

 

Lookups can be generated from your source data but they may easier to create a manual lookup feed that is much quicker to load that getting a distinct list of values from 1mill rows.

 

Try moving some of your Column expressions to DAX see if that performs better.

 

Can the merge be moved into the model as a relationship? If you need to see it a single item then you can use DAX to combine tables e.g. NATURALINNERJOIN(<leftJoinTable>, <rightJoinTable>)  then hide the 2 source tables.

 

Other options are to do some preprocessing outside of M. I have a really really nasty file with 25000 columns and I do preprocessing in R before loading a nice neat set of files into PowerBI.

View solution in original post

3 REPLIES 3
stretcharm
Memorable Member
Memorable Member

It's a bit tricky to go through your M, but here are a few tips that might help.

 

Get your intitial data set loaded and as small as possible e.g. remove unwanted columns get your datatypes correct e.g. date not datetime. You can then create a 2nd data set with a reference so you don't do that work twice.

 

There is quite a bit of conditional and string manipliation. Maybe try and create a new column for the Text.Start([ecode1],2) & Text.Start([ecode1],3) items and replace nulls with a dummy value. Then create a Lookup Query that can be joined in the Model rather that trying to do it for each row.

 

There is probably scope for other lookups that might have a composite key. I tend concatinate fields with a | to get my keys.

 

Lookups can be generated from your source data but they may easier to create a manual lookup feed that is much quicker to load that getting a distinct list of values from 1mill rows.

 

Try moving some of your Column expressions to DAX see if that performs better.

 

Can the merge be moved into the model as a relationship? If you need to see it a single item then you can use DAX to combine tables e.g. NATURALINNERJOIN(<leftJoinTable>, <rightJoinTable>)  then hide the 2 source tables.

 

Other options are to do some preprocessing outside of M. I have a really really nasty file with 25000 columns and I do preprocessing in R before loading a nice neat set of files into PowerBI.

Hi @stretcharm

Thanks very much for the suggestion to use R. With your inspiration, we are now developing a solution by creating a SQL database importing the 5 flat files, completing the "wrapping up" of rows and any other manipulation and analysis in the database, and then importing the final product to Power BI to allow the use of the custom visuals. Thanks again. Cheers 🙂

Thanks heaps for the tips. I appreciate that the M code is rather difficult to decipher so thanks heaps for taking the time to look at it. I'll trying performing some of the transformations using DAX and see if that helps. Cheers and thanks Smiley Happy

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.