Have you ever asked these questions to yourself while working with multiple reports/dashboards using common datasets - “What if I modify this dataset?”, “Will this dataset modification impact any reports/dashboards?”, “Why report refresh failed?”, etc. If yes, then you are just one step away from answers to all these questions.
Power BI has a pretty good feature to show the lineage of all the reports that are published into Power BI workspace. I have not used this feature for some time but now have started using it as it gives a clear picture on the flow around a report/dashboard when published in Power BI workspace.
Today in this blog I will show few key advantages of this feature which can be beneficial for both the report developers and the report consumers, along with all the answers to the above questions.
Whenever we login to Power BI Service and navigate to say, “My Workspace”, we end up with the following default view:
In the above screenshot, I have also highlighted the “View” option. When we try to click on the drop-down icon against it, it shows us an additional option for “Lineage”. See below:
When we click on this ‘Lineage’ option, we end up with the following view with the little icon changed as well (identifying a flow view:
Pretty cool right!
Such a lovely way to visualise the entire lineage connected with a report!
Let us now see what all information we can see in this view:
The sections highlighted in RED are the reports that are in this workspace.
The sections highlighted in BLUE are the datasets that are populating these reports.
The sections highlighted in GREEN shows the type of data-source connection that are used for the reports. These can be files (csv. Excel, etc.), web connections, etc. Along with this we can also see how this data-source refresh is setup in the workspace for the individual reports like using a gateway, etc.
Now let us just consider a single report lineage so that we can understand how this can help us:
The report in the above screenshot uses a dataset which is the combination of two:
Excel file which is a been coming from an online web source
This dashboard uses a R script which is used to automatically generate the data extracts and to automate this a Power BI Gateway is used.
When clicked on the small icon below this data source, it opens an ‘Impact Analysis’ window which shows the following information:
It shows what kind of data source it is and its path as highlighted in the screenshot.
It shows the number of workspaces where this data is used.
It also shows how many datasets are created using this data source.
It also shows all the reports in the workspace that use this data source as a dataset for reporting.
The dataset part in the next step on the flow, shows the status on the data refresh. [Refreshed: 15/02/21, 10:58:00] It shows that there was an error with the data refresh therefore shows an error icon. On the same place we see also see a REFRESH button which can be used to refresh this dataset.
When we further see there is a following option to check if there is an impact of this dataset somewhere else in this workspace.
When we click on it, we see the following:
As highlighted a new window appears on the right – Impact Analysis.
It shows few metrics against this dataset:
Number of workspaces it has been used in.
Number of reports that use this dataset.
Number of dashboards using this dataset. (the dataset is not pinned to a dashboard therefore shows 0 against this metric)
Number of Views against this dashboard.
The report part is the next and final step in the lineage which shows which report is populated using the dataset.
The icon highlighted in RED contains a link which directly navigates the user to the report.
The icon highlighted in GREEN is for seeing the lineage of this report. When clicked on it, the lineage for this report is highlighted, keeping other report’s lineages greyed-out/unhighlighted.
The following screenshot shows the lineage for the reports that are pinned as a dashboard in the workspace:
If we see the “Scoreboard” dataset, it has not got anything before it in the lineage, as it is an online real-time dataset.
Next is the “Scoreboard” report using the above dataset in the lineage.
The last in the lineage is the “Scoreboard”dashboard that has been pinned using the report.
Let us also see what all details can we see in the lineage of a report that is populated by the stand-along data-source like excel/csv files stored on a machine. See the screenshot below:
When we click on the “Variable Power BI” dataset in the lineage, a new window opens on the right.
The new window shows us that this dataset contains 29 tables in total. (not shown in this view as it needs scrolling down further)
Further when we click on one of the tables, we can see that the table expands to show the columns/fields and calculated measures in it.
Apart from this information, we also see the following:
Sensitivity of the dataset (if set)
Owner of the dataset – who has configured it.
When was the dataset last refreshed.
When is the next refresh scheduled on.
To conclude the blog, let us now summarise quickly the key points about the Power BI Lineage View:
Lineage view gives us quick view on what all reports and datasets exist within a workspace.
This view shows us which datasets are populating which all reports/dashboards.
This view helps in judging the early impact of change in a dataset and how it will affect other reports and dashboards.
Lineage flow also shows us what datasets are refreshed and if they are not then we can easily kick-off a quick data refresh from here.
This area helps in a better management of reports from a central point of view.
Troubleshooting report refresh failures is easy as this view shows an error identifier against any such failures.
So, this is my take on Lineage View in Power BI. What is yours?