The scenario here is that we have monthly finance transactional data CSV files for each month from Jan 2007 to May 2008 (e.g. extracts from a ERP system) which we need to combine and analyze, bringing in the latest data on a monthly basis and showing MTD/QTD/YTD performance as well as a comparison to the prior year.
This is a common requirement in Finance and has typically been achieved with Excel using macros to combine files, lookup formulas to bring all the data into one wide table and external links to other files where historical data volumes are large and/or scattered across multiple files. Even with the use of macros, this process is typically still quite manual, time-consuming and error-prone. Individuals work on their own versions of spreadsheets which leads to a lack of governance/version control; the reporting is also typically quite static and difficult to draw real action-driven insights from.
In this series of blog posts, I will show run through building out an interactive Income Statement report in Power BI and the benefits of doing so, with step by step instructions for building out the report along with the rationale for each step.
1 Extract and review the Files
Extract the contents of the zip file attached in this blog post to a local folder on your machine. The source files in the Finance Data folder consist of the following:
Monthly Results -> CSVs 35 monthly CSV files covering the period from July 2005 to May 2008. Each file contains transactions for the month with the following columns:
DateKey: The first of the month in YYYYMMDD format (e.g. 20050701)
MonthValue: The month in YYYYMM format (e.g. 200507)
Organisation Key: The ID that relates to the Organisations sheet in the Reference data file (e.g. ID 8 relates to Organisation ‘Canadian Division’)
Department Key: The ID that relates to the Departments sheet in the Reference data file (e.g. ID 6 relates to the department ‘Research and Development’)
ScenarioKey: The ID that relates to the Scenarios sheet in the Reference data file (ID 1 relates to Actuals, 2 to Budget)
Account Key: The GL Account ID that relates to the GL Accounts sheet in the Reference data file (e.g. GL Account 52 relates to Intercompany Sales; the GL Accounts sheet has details of each GL account e.g. the Account Type [Revenue/Expenditures], SubHeader, SubHeader2)
Amount The amount of the transaction in local currency (The currency which relates to the department)
Amount USD: The amount of the transaction in USD
New Monthly File: A transaction file for June 2008, which we can copy into the Monthly Results folder and have it automatically picked up by Power BI
Reference Data: An excel file with 6 worksheets, each for a different set of Reference data linked to in the transactional files.
If working with this data in Excel, we would have to combine all the monthly transaction data into a single file (if possible with the data volumes) in order to be able to analyze over time, and then use lookup formulas to bring in all the required reference data columns. We could then link this wide table into a fixed format Income statement report in Excel, and/or perform Pivot table analysis and graphs.
With Power BI, however, we can combine the transaction files automatically by connecting to the Monthly Results folder as our data source – each time the data is refreshed (up to 8 times per day with Power BI Pro) it will automatically pick up and combine all the files in that folder as well as apply any transformations required.
We can also keep the data tables (the transaction data and all the reference data) separate rather than having to combine them into one wide table. This is a more performant and efficient way of querying data and allows us to perform the type of analysis that we wouldn’t be able to if it were all in one table. (We will be organising our data tables in what is known as a star schema – see here for an article describing why it is required for more complex analysis).
2 Create a Parameter
Whilst we can just select the folder connector straight away from the data sources list and enter the folder path of the Monthly Results folder, we will instead first create a parameter with the folder path and use this in the connector instead.
Parameters in the Query editor (not to be confused with what-if parameters in the report view) are a way of storing values which are then referred to in one or more transformation steps. For example if this whole report was to be specific to only one particular region (though the data source contains data for all regions), you could store the required region value as a parameter and then refer to this when applying a filter on the data.
In this scenario, it is a good idea to store the folder path as a parameter because it refers to a path on your machine; if someone else opens the PBIX file on their machine instead, the folder path specified may not exist and they will need to update it to where those files are stored on their machine. Rather than updating the code in individual queries, they can just change the parameter value and any queries that refer to this folder path will then automatically reflect this updated value.
We start by creating a new report in Power BI Desktop and going to the query editor by clicking on ‘Transform Data’ under the Home Tab of the ribbon:
In Windows explorer, navigate to the folder where you saved the zip file contents in Step 1 above
From this folder navigate to the folder “\Finance Data\Monthly Results\CSVs” and copy the full folder path from the explorer bar
Back in the query editor in Power BI, click on Manage Parameters -> New Parameter
Give the parameter the name FolderPath, keep the type as Any and set the value to the folder you navigated to above and then press OK:
3 Connect to the Monthly Results Folder
Click on ‘New Source’ in the Home tab of the query editor window and navigate to More... -> Folder (Under All) and then click connect:
Click on the drop down next to the ABC box, select Parameter and ensure that the FolderPath parameter is selected then click ok:
Click on ‘Combine and Transform Data’ in the dialog box that shows up and accept the default options for combining:
By viewing the queries in the blade on the left-hand side of the screen, you will see that this step has generated a number of queries including a Sample File, a parameter linking to this Sample File and a Transform File function. The sample file is the first file in the folder (default option chosen above) which defines the format of all the files – each file in the folder will have to match the format (No of columns, data types) of this first file. The function is for any transformations that are required to be applied to each file before they are combined. By default, the only step in here is to promote headers (so that the header row isn’t repeated each time); if there are other transformations required you could modify this function or define your own.
4 Apply Transformations to the Combined Data
The Query with the combined data from all the CSV files is in a query called ‘Query1’. Rename this to something more sensible (e.g. FInanceData) by clicking on the Name under the Query settings and changing the value. Note that if this pane is not visible, it can be made so by selecting ‘Query settings’ from the view pane:
The query settings pane shows all the transformation steps applied. Clicking on a step will show the data with all steps up until that point applied; steps can be deleted or moved up/down, though be careful of doing this as certain steps might require prior steps to have taken place (e.g. the renaming of a new column can’t happen until that column is created!).
Combining the data above already created a number of steps automatically in Query1 – e.g. applying the transformation function to each file, using ‘Expand Contents’ to combine the data and changing the data types. It is often better to delete the automatic data type change step and set the data types yourself at the end as a last step before closing. However, in this case we aren’t really applying any further transformations and the data types have been applied correctly.
The only transformation step we can apply ourselves is to remove columns which are not needed – e.g. in this case we don’t need the first column ‘Source.Name’. With this column selected, right click on the column and select ‘Remove’.
5 Bring in the Reference Data
We now need to bring in the Reference data worksheets in the ‘Reference Data’ Excel file – e.g. the list of Accounts and the list of Organisations.
To do this, select ‘New Source’ in Home ribbon and Navigate to the Excel option.
Navigate to the Reference Data.xlsx file saved in the folder path Finance Data\Reference Tables
In the Navigator option that comes up , select all of the tables (denoted with a table icon, not also selecting those with the worksheet icon as this would double up the data) and then click OK:
Note that the path of the Excel file has been hardcoded into the transformation steps which is something that we were trying to avoid with the Finance data. To remedy this, we can create another parameter e.g. called ReferenceDataFolderPath and then manually update the code generated by us brining in the Excel file to refer to the new parameter instead.
Note that you can edit/tweak the M code that is automatically generated by the Query Editor UI either in the formula bar (where this is not visible, select the checkbox from the ‘View’ tab of the ribbon) and/or from the Advanced Editor, accessible via the Home tab.
6 Organise Queries
It is good practice to organise your queries into folders called Query Groups.
You can create the folders ‘Parameters’, ‘Finance Data’, ‘Reference Data’ by right clicking in the blank space under the list of queries and selecting ‘New Group’:
Then select the queries (for multiple queries holding shift and select), right click and select ‘Move to Group’ to move them into the correct folders:
Click on ‘Close and Apply’ to bring all the data into Power BI:
Navigate to the Model view in Power BI Desktop. Your data model view should now look something like this (with relationships autodetected):