Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Baykus
Frequent Visitor

how to do multivariate clustering with multiple choice answered text data type in Power BI Desktop

Hi,

 

I am using scatter chart to find best clusters by automaticly find clusters option. I have 4 dimentions (questions) data to use in clustering model. but 2 questions have multiple choice text answers. How should I add those answers in axis as numeric values in scatter chart? 

 

  1. Question (Dimension): Bussiness Profile (Multiple Choice)

Answer types in text format: Wholesaler, Exporter, Importer, Producer, Distributor, Consulting, Investor, Retailer, Supplier, Other

 

  1. Question (Dimension): Area Interest (Multiple Choice)

Answer types in text format: Greenhouse, Agricultural Analysis, Banking, Machinery, Certification, Packaging, Plant Nutrition, Plant Protection, Project & Consultancy, Seed

 

    3. Question (Dimension): Occupation (Single Choice) 

Answer Types: Farmer, Engineer, Academician,Non Profit Organization, Student, Other

 

    4. Question (Dimension): National/International (Single Choice) 

Answer Types: National, International

 

Baykus_1-1642098321653.png

Baykus_2-1642098373192.png

 

I need to find best customer persona groups clusters by using above questions data. Is there anybody can help me on how to build the Fields and what kind of data transformation steps I need to do on Transform Data? I tried unpivot columns or conditional columns contains for each answers. but since it is text, scatter chart X & Y axis need to be number format. Even I used the conditional columns (0-1) data for multiple choice answers to Tooltips, I couldnt be sure if the cluster model works correctly in terms of statistical model.

 

Thanks

Nilay

 

 

7 REPLIES 7
Baykus
Frequent Visitor

all answers are equally imortant yes. and I would like to find out customer personas according to the liked products and company bussiness. So I would like to include all answers and questions to the model to find out the best clusters divided. When I add them to tool tips, it has limit for 15 items. So I am not able to add all answers and I am not sure if the model works well. 

 

And multiple choices means multiple answers, yes.

I would expand the multi choice answers into new rows in Power Query and then treat them as if they had come from separate surveys. Please provide sanitized sample data that fully covers your issue. Paste the data into a table in your post or use one of the file services. 

Below sample includes each possible answer type divided by "|" for multi answers. Hope this is what you need for sanitized sample data. I coulndt find the way to share excel file.

 

Unique KeyNational and International*GRW_Area_Interest*GRW_Business_Profile
1International[PC_AGR-21]Greenhouse |[PC_AGR-24004]Seed|[PC_AGR-25]Plant |[PC_AGR-22001]Irrigation Systems & Equipment|[PC_AGR-26]Plant Nutrition |[PC_AGR-19001]Project & Consultancy Services |[PC_AGR-19006]Certification|[PC_AGR-19007]Packaging|[PC_AGR-12006]Agricultural Banking & Finance |[PC_AGR-19005]Agricultural Analysis & Research|[PC_AGR-21001]Agriculture Machinery & Technologies|[PC_AGR-27]Plant Protection |[PC_AGR-23]Livestock [NOB_WSTRADE]Wholesaler|[NOB_MNFC]Producer|[NOB_EXPTRD]Exporter|[NOB_IMPTRD]Importer|[NOB_PRVDRSUPLR]Supplier|[NOB_DD]Distributor|[NOB_RETAILTRDE]Retail / Dealer / Branch / Store|[NOB_INVST]Investor|[NOB_CS]Consulting|Other (Please specify):10 lacks
2International[PC_AGR-21]Greenhouse |[PC_AGR-24004]Seed|[PC_AGR-25]Plant |[PC_AGR-22001]Irrigation Systems & Equipment|[PC_AGR-26]Plant Nutrition |[PC_AGR-19001]Project & Consultancy Services |[PC_AGR-19006]Certification|[PC_AGR-19007]Packaging|[PC_AGR-12006]Agricultural Banking & Finance |[PC_AGR-19005]Agricultural Analysis & Research|[PC_AGR-21001]Agriculture Machinery & Technologies|[PC_AGR-27]Plant Protection |[PC_AGR-23]Livestock [NOB_WSTRADE]Wholesaler|[NOB_EXPTRD]Exporter|[NOB_PRVDRSUPLR]Supplier|[NOB_DD]Distributor
3National[PC_AGR-21]Greenhouse |[PC_AGR-24004]Seed|[PC_AGR-25]Plant |[PC_AGR-22001]Irrigation Systems & Equipment|[PC_AGR-26]Plant Nutrition |[PC_AGR-19001]Project & Consultancy Services |[PC_AGR-19006]Certification|[PC_AGR-19007]Packaging|[PC_AGR-12006]Agricultural Banking & Finance |[PC_AGR-19005]Agricultural Analysis & Research|[PC_AGR-21001]Agriculture Machinery & Technologies|[PC_AGR-27]Plant Protection |[PC_AGR-23]Livestock [NOB_PRVDRSUPLR]Supplier|[NOB_RETAILTRDE]Retail / Dealer / Branch / Store
4National[PC_AGR-21]Greenhouse |[PC_AGR-24004]Seed|[PC_AGR-25]Plant |[PC_AGR-22001]Irrigation Systems & Equipment|[PC_AGR-26]Plant Nutrition |[PC_AGR-19001]Project & Consultancy Services |[PC_AGR-19006]Certification|[PC_AGR-19007]Packaging|[PC_AGR-12006]Agricultural Banking & Finance |[PC_AGR-19005]Agricultural Analysis & Research|[PC_AGR-21001]Agriculture Machinery & Technologies|[PC_AGR-27]Plant Protection |[PC_AGR-23]Livestock [NOB_WSTRADE]Wholesaler|[NOB_RETAILTRDE]Retail / Dealer / Branch / Store
5International[PC_AGR-21]Greenhouse |[PC_AGR-24004]Seed|[PC_AGR-25]Plant |[PC_AGR-22001]Irrigation Systems & Equipment|[PC_AGR-26]Plant Nutrition |[PC_AGR-19001]Project & Consultancy Services |[PC_AGR-19006]Certification|[PC_AGR-19007]Packaging|[PC_AGR-12006]Agricultural Banking & Finance |[PC_AGR-19005]Agricultural Analysis & Research|[PC_AGR-21001]Agriculture Machinery & Technologies|[PC_AGR-27]Plant Protection |[PC_AGR-23]Livestock [NOB_WSTRADE]Wholesaler|[NOB_RETAILTRDE]Retail / Dealer / Branch / Store
6International[PC_AGR-21]Greenhouse |[PC_AGR-24004]Seed|[PC_AGR-25]Plant |[PC_AGR-22001]Irrigation Systems & Equipment|[PC_AGR-26]Plant Nutrition |[PC_AGR-19001]Project & Consultancy Services |[PC_AGR-19006]Certification|[PC_AGR-19007]Packaging|[PC_AGR-12006]Agricultural Banking & Finance |[PC_AGR-19005]Agricultural Analysis & Research|[PC_AGR-21001]Agriculture Machinery & Technologies|[PC_AGR-27]Plant Protection |[PC_AGR-23]Livestock [NOB_DD]Distributor|[NOB_RETAILTRDE]Retail / Dealer / Branch / Store
7National[PC_AGR-21]Greenhouse |[PC_AGR-24004]Seed|[PC_AGR-25]Plant |[PC_AGR-22001]Irrigation Systems & Equipment|[PC_AGR-26]Plant Nutrition |[PC_AGR-19001]Project & Consultancy Services |[PC_AGR-19006]Certification|[PC_AGR-19007]Packaging|[PC_AGR-12006]Agricultural Banking & Finance |[PC_AGR-19005]Agricultural Analysis & Research|[PC_AGR-21001]Agriculture Machinery & Technologies|[PC_AGR-27]Plant Protection |[PC_AGR-23]Livestock [NOB_MNFC]Producer|[NOB_RETAILTRDE]Retail / Dealer / Branch / Store
Baykus
Frequent Visitor

I also have done what you suggested above about expand answers in to raws by creating conditional colum for answers with 1-0 numeric data in Power Query. but I dont know what to do to add all those new answer columns to clustering model in scatter chart with X and Y axis to find automaticly clusters. When I add all 1-0 answer columns in add tooltips for 2 multi answer question (Area Interest and Business Profile), then the scattering will not be correctly because there wil be only 1 answer (for example below screenshot example is Greenhouse answer) in Y axis while it is whole question answers in X-axis. 

Baykus_1-1642459624849.png

 

 

Not exactly what I meant - here's how I would approach it:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("7Zddb5swFIb/ipWrTeo0oPnQdkeAVkhJiiBrJ2VW5bmn4MUxzDbVkPrjZxCEIrXaot1l3JnXfs+xz3lkYLeb2JOLSSg0SEE0ywXh5nkXeffudfzBsfG1BBBZXipAz0d5allTnAA89NIMR5wI/WKRY1k2DqVkaRMYJZXScFDoW2lZzhwFP0tWHEDo3jFvY2xKLVnjOc7Zn+pokcx/ANVdCC8XquSaCFqhBOQTo6CGljn2QGr2yGizh8HcAkeE7knKRNrrTu1xU8moCVxKwtGSiL1Z0uW8YsLkg2Ga2dDimipWih2PGoMCImnWn9SuT9N7AK0JzZgAWXWeLdBM5DxPGajet2grZAqhTSEGJXIu8Yo9gdI53aO6iZub5f1dso1dP8B3Wc5BEQ7yudHXmyuvLudDSTsp+BptYx8Hv4pc6k4M140YHl6KUXzrx8mXaBXjpCwKzroJ38c+U6Z530udt1ocbN1wZYIEOAZNGEcfkQ/1TsxgKU0xMzNIzHpoU25uky0ORXOUNoiX4LbZdbdudGbc7yIOxGCpCqDssXr/2bYQNx1VE3yxmzgj1v8h1q8y/Je4NthcmgybkZjzI+ZNCE64nxpCpiMh50nIW3fKqYDMxjfPSMkfKZmPlJwxJf/4IdwQshhfNOcJxyv/XiexgX8D", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Unique Key" = _t, #"National and International" = _t, #"*GRW_Area_Interest" = _t, #"*GRW_Business_Profile" = _t]),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(Source, {{"*GRW_Area_Interest", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "*GRW_Area_Interest"),
    #"Split Column by Delimiter1" = Table.ExpandListColumn(Table.TransformColumns(#"Split Column by Delimiter", {{"*GRW_Business_Profile", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "*GRW_Business_Profile")
in
    #"Split Column by Delimiter1"
How to use this code: Create a new Blank Query. Click on "Advanced Editor". Replace the code in the window with the code provided here. Click "Done".

 

With that transformation you should now be able to do your scatter plots. Beware - scatter plots require numeric values. You may want to try other visuals, or add indexes to your response values.

Thank you very much for the codes. I applied but it gives me errors. 

 

I found a link https://www.youtube.com/watch?v=0dN1scyuZvs which explains what I need exactly. but in the example in video, there is not multiple choice answers for data columns. So what I did, I used per answer in new columns that I have created as 1-0 numeric data. then it gives me error because of 1-15 measures limit to run automacitly clustring. 

 

Baykus_0-1642599338515.png

 

lbendlin
Super User
Super User

For the multiple choice answers are all choices equally important?  If yes then you could add weights to the responses.  Single answers get a weight of 1, three choices get a weight of 1/3 each.

 

Or you could say that multiple choices are equivalent to multiple answers.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.