Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Anonymous

Data Profiling in Power BI (Power BI Update April 2019)

As per the April 2019 update, Microsoft has introduced a data profiling capability in Power BI desktop.
Data profiling helps us easily find the issues with our imported data from data sources in to Power BI.

We can do data profiling in the Power Query editor. So, whenever we are connecting to any data source using Import mode, click on Edit instead of Load so that we can leverage the Data profiling capabilities for any column in Power BI desktop.


In Power Query Editor, Under View tab in Data Preview Section we can see the following data profiling functionalities-

  1. Column quality
  2. Column distribution
  3. Column profile


2019-04-18 09_49_28-Power BI April Update 2019 [Read-Only] - Word.png

  1. Column quality: In this section, we can easily see valid, Error and Empty percentage of data values associated with the Selected table. As you are able to see Green line below this column, it means data is 100 % valid. In some cases, if data is not valid, you can see a different percentage value for selected column.  Also, the bottom line for that column will be Red in the case of Error and black or dark gray in the case of Empty. It can be mixed if Some data is Valid and some are empty and with Error

Note: You can perform various transformations like Remove Errors, Keep Errors, Replace Errors, Keep Duplicates, Remove Duplicates, etc., by clicking on the eclipses (…) or right clicking on the visual.

 

  1. Column Distribution- In this section we can easily see the inline value distribution histogram.
    For example, # distinct records, # unique records. Once you hover on distributed histogram, you can also see the percentage associated with distinct and unique data for the selected column

Note: You can perform various transformations like Remove Errors, Keep Errors, Replace Errors, keep duplicates, Remove Duplicates, etc., by clicking on the eclipses (…) or right clicking on the visual.

 

2019-04-18 09_53_56-Power BI April Update 2019 [Read-Only] - Word.png

  1. Column Profile- This is one of the most important features. As you can see for the selected column, you can see Column statistics which includes distribution measures like #Count, #error, #empty, #distinct, #Unique, #Empty string, Min, Max

You will also able to see Column distribution that will show a bar graph with the selected column on axis and the bar height will show the count for category variable.

2019-04-18 09_55_14-Power BI April Update 2019 [Read-Only] - Word.png

Note: Column profile shows the Column statistics and column distribution for the selected column.
Whereas you can see column quality and column distribution for all the columns present in table at a time.

All features at one glance

2019-04-18 09_57_24-Power BI April Update 2019 [Read-Only] - Word.png

 

Comments

In addtion there is an M function Table.Profile that gives more advanced profile details.

@cwebb  has a good overview of how to use it.

https://blog.crossjoin.co.uk/2016/01/12/descriptive-statistics-in-power-bim-with-table-profile/

 

Anonymous

Thanks for sharing this, its super useful! 

 

Do you know if there is any way to convert these views in to a dashboard rather than only viewing in the Power Query Editor? 

Anonymous

Hi @Anonymous ,

You can create a table by using this M function
= Table.Profile(tablename)

This will give profiles of columns of the tables.


Then you can create reports in Power BI desktop, then you can publish it to Power BI service and you can pin visuals fro creating the dashboards.

For more details on this, Please, check this article from chris web
https://blog.crossjoin.co.uk/2016/01/12/descriptive-statistics-in-power-bim-with-table-profile/




Anonymous

Thanks @Anonymous 


Is it possible to add additional summary statistics beyone the base ones its provides (e.g. percentiles, count of outliers, std error, variance etc)?


I see there is an additionalAggregates option in Table.Profile() but I'm unable to figure out how it works. 

Profiles are easy

Go to your Query in Query Editor then select Insert Step after

add Table.Profile(
before the previous step name and then ) after

e,g,

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSlTSUTJUitWJVkoGsozBrCogAHLMgUApNhYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", Int64.Type}}),
    Custom1 = Table.Profile(#"Changed Type")
in
    Custom1

If you want to do more the M/powerQuery as load of functions to let you do what you need.

Dax or powerBI visuals also can do lots things that let you profile as well.

 

For advanced profiling you can also try SSIS Data Profiling Task (part of sql server)

https://www.youtube.com/watch?v=HpxbhWkxaEs

There are lots of often very expensive data quality tools

You can get ideas from videos like this.

https://www.youtube.com/watch?v=wj8szJ3g0RI