In the first part of the tutorial here we imported Clinton's and Trump's Facebook messages and their number of shares. In this part, we will extract key phrases from those messages using Microsoft Cognitive Services Text Analytics (AKA Azure Machine Learning).
When we are done, we will be able to create stunning reports that will correlate key phrases with our metrics. In our example: We will be able to see which key phrases triggered the highest number of shares, and slice them by positive, neutral or negative sentiment, as shown in this report (click here to view it on PowerBI.com):
Before we start
Make sure you have an account on Microsoft Cognitive Services (Signup here), and a (free) subscription to Text Analytics Preview. We'll need the API key that can be copied from here (as shown in the following screenshot).
Note: If you didn't follow the Part 1 here, and you wish to try this tutorial on your own data, please prepare a table with text messages and unique keys (e.g. Index). Make sure you don't have duplicate keys, and that you don't have any blank values as messages.
Preparing the API queries - A parameter and a query function
In this section we'll prepare the advanced Power Query code that we'll need for the Key Phrase API.
Continue with the Power BI Desktop file that was created in Part 1. If the Query Editor is not open, click Edit Queries.
In the Query Editor's Home tab, click Manager Parameters, then select New Parameter.
In the Parameters dialog, set APIKeyas Name and paste the API Key that you obtained from Microsoft Cognitive Services into the text box Current Value.
When you are done, click OK.
Note: Using a parameter here, will allow you to export the Power BI Desktop file as a template and share with colleagues and friends without sharing your API Key. Read here how to create Power BI Templates.
In the Query Editor's Home tab, click the drop down menu below the icon of New Source and select Blank Query. Still in Home tab, click Advanced Query and paste the following code:
Click Done in the Advanced Editor, and rename the new query to GetKeyPhrases.
Note: Now in Queries pane, you will notice that the new query GetKeyPhrases has an fx icon. The query above is not a normal query. It is a query function that receives a table as an input, converts it into a JSON format, sends it to Microsoft Cognitive Services and returns the JSON response.
Preparing your text for analysis
In this section, we'll prepare the text messages in the right format that will be accepted by the API.
Note: This is a crucial step that you'll need to reuse whenever you run text analytics on your own messages.
In Queries pane, right click on query Posts and select Reference. Change the new query to KeyPhrases.
In Home tab, click Choose Columns, and select the columns messageand id. When you are done click OK.
Important tips: If you run this tutorial on your own data. Follow these guidelines: Make sure you have two columns here: A column with unique keys for each message, and another column with the actual text. Make sure that the you use unique keys. Use the Query Editor's Remove Duplicates if needed. Also, make sure that you don't have empty values.
Rename the column message to text. This is crucial. Later on we convert the table into a JSON format with the name text.
Note: If you are running this tutorial on your own data, ensure that the column with the unique keys is named id.
In Add Column tab, click Custom Column.
Set language as New column name, and = "en" as Custom column formula. Then click OK.
Note: The Text Analytics Key Phrases API support the following languages: English (en), German (de), Spanish (es) and Japanese (ja). You can use any of these codes if your text is in those languages.
Group data to sub-tables of 1000 messages each
Microsoft Cognitive Services allow us to send 1000 different messages on each API call as part of the Key Phrases API. In this part, we'll group the table into sub-tables of 1000 rows.
In Add Column click Index Column.
Select the new column, Index, and in Transform tab, click Standard, then select Integer-Divide.
In Integer-Divide dialog, set 1000as Value and click OK.
Now the values in column Indexare all 0 in the first 1000 rows, all 1 in the next 1000 rows, and so forth.
Select the column Index, and click Group By in Transform tab.
In the Group By dialog, ensure that Indexis selected in the drop down menu Group By.
Set sub-table as the text in New column name.
Select All Rows as Operation, and click OK.
After the grouping step, we have a new column sub-table that contains our tables in bulks of 1000 rows each (excluding the last row that has the remainders).
We can delete the column Index, and click Invoke Custom Function in Add Column tab.
In Invoke Custom Function dialog, set GetKeyPhrasesas Function Query and select sub-table as the column. When you are done, click OK.
Delete the column sub-table, and click the expand button in the left side of the header GetKeyPhrases.
Ensure all columns are selected in the expand pane, uncheck Use original column name as prefix, and click OK.
Delete the column errors.
Note: For troubleshooting purposes, you can create a new query that will look into the column errors. In that column, you will find errors that are returned by Microsoft Cognitive Services. For example, if you have a row that contains a text message which is too big, you can learn about it in this column, and truncate the messages. The maximal allowed length for text messages is 10240 characters. To truncate the messages, select the text column, right click on the header and select Split Column, then select By Number of Characters...
Click on the expand button in the header of column documents, select all columns and click OK.
Click on the expand button in the header of column KeyPhrases.
Finally, we have a new column with a key phrase for each message id. You should note that we will usually have multiple key phrases per message, so our table is now expended to multiple rows per message id.
For a final cosmetic touch, let's capitalize each word in column KeyPhrases:
Select the column KeyPhrases, right click on its header, select Transform and then select Capitalize Each Word.
We have completed the extraction of key messages. It's time to move to the modeling and visualization.
In the Query Editor window, click Close & Apply.
Feeding Key Phrases to a WordCloud Custom Visual
We are almost done. It's time to move to the visualization part. We'll now learn how to create a simple WordCloud visual with our key phrases.
After completing the last two parts, we now have two tables Posts and KeyPhrases. We can create a relationship between column id in both tables (if Power BI Desktop didn't detect them automatically). To do it, go to Relationships view, and drag and drop the column idfrom Poststo KeyPhrasesas shown here.
Download WordCloud (from here), and import it to your report (here is how to import custom visuals).
Click the Report view, and add a WordCloud to your report. Drag and drop the column KeyPhrasesto Category and messageto Value as shown in this screenshot:
In Format tab in Visualizations, turn off Stop Words and Rotate Text.
Note: Turning off Stop Words is crucial here. Without it, you will not be able to resize the WordCloud visual.
Under section General, turn off Word-breaking.
You can now resize the WordCloud visual, and see the size of the key phrases is defined by the number of posts.
You can now create a second WordCloud visual by copying and pasting the first one, and assign sharesto Values instead ofmessage. Now, in the second WordCloud the key phrases will be bigger if their posts got higher shares.
You can add Sentiment Analysis (Learn how here), and slice your WordClouds by sentiment as demonstrated in this report.
To get the actual report above, and apply both Sentiment Analysis and Key Phrases, subscribe to DataChanthere.