cancel
Showing results for 
Search instead for 
Did you mean: 

How To Find a Column in a Haystack

Recently, I was faced with a very interesting problem. I had developed a complex dataset. Beside that there are more than 200 reports using a live connection to this dataset. A report’s user had come and asked for negating a Boolean value of a column – because I had misunderstood the meaning of the column. To fix the bug in the dataset is a simple task. But how to find all the reports which are using this column in a visual, a measure, or a filter? I must fix all of them, too. How to identify all those reports? That is my topic today.

 

By the way, a similar situation happens if you rename a column. All visuals using this column will become gray with an error message, as you can see on the screenshot below.

1.PNG

As you can imagine, it is not the fastest and the most comfortable way to open each of more than 200 reports and check all visuals, filters, and measures if they use a column or not. I had to find another solution.

Unfortunately, there is no official tool from Microsoft, which I could use for this batch job. I had to write my own one in PowerShell. I will visit every single report PBIX file and search for a column name in its code.

 

The How-To Part

I have written a simple PowerShell code which gets a root directory of all my reports and a term I search for. It visits every single PBIX file, renames its extension to ZIP and extracts it. A PBIX file is just a masquerade of a ZIP file. An unzipped file is a directory. In this directory, I am interested in the file Report/Layout, which contains the report’s own measures and a definition of all pages.

Let’s take a look at the file internals. It is a JSON file which you can open in a text editor and format. The red box on the screenshot below is a definition of a report page. The green one contains a definition of measures defined in this report. If I go back to the red box, there are two other colorful boxes. The blue one contains visuals of the page and the purple one contains filters.

2.png

 

In my use case, I don’t care about this level of detail. In the vast majority of cases, it is enough to have a list of file names using a column.

And now the well documented code (also available on GitHub ) :

 

 

 

<#
    .SYNOPSIS
        Search for all PBIX files containing a regex phrase.
    .DESCRIPTION
        It parses all PBIX files in a directory recursively and 
        searches for all occurences of a regex phrase.
        Developed as a helper tool for finding all PBIX files
        which contain a column in a measure, calculated table or
        calculated column.
    .INPUTS
        - PBIT file
        - searched phrase
    .OUTPUTS
        List of file paths
    .NOTES
        Version:        1.0
        Author:         Michal Dvorak (@nolockcz)
        Creation Date:  03.01.2020
        Purpose/Change: Initial script development
#>

##### CHANGE
    # where to search
    $rootDirectory = "C:\Users\..."
    # what to search (as regex)
    $searchedPhraseRegex = "Measure[A-Z][A-Z]C"
##### CHANGE END

# if rootDirectory doesn't exist, return
$rootDirectoryExists = Test-Path -Path $rootDirectory
if (!$rootDirectoryExists) {
    Throw "The path " + $rootDirectory + " doesn't exist."
}

# save the original working directory
$originalWorkingDirectory = Get-Location

try {
    # hide the progress of unpacking files
    $progressPreference = 'SilentlyContinue'

    # get paths of all *.pbix files in the rootDirectory or any other subdirectory
    $pbixFilePaths = Get-ChildItem -Path $rootDirectory -Include "*.pbix" -Recurse | ForEach-Object { $_.FullName }

    # a name of a temp working directory
    $tempDirectoryName = New-Guid
    # create a new temp working directory
    $tempDirectoryPath = New-Item -Path $rootDirectory -Name $tempDirectoryName -ItemType "directory"
    # go to the new directory
    Set-Location -Path $tempDirectoryPath

    # for each pbix-file
    foreach ($pbixFilePath in $pbixFilePaths) {    
        # create a copy of the file and change the extension to .zip
        Copy-Item $pbixFilePath -Destination "tmp.zip"

        # try unpack and read the content
        try {
            # unpack the zip file
            Expand-Archive -Path "tmp.zip" -DestinationPath "tmp"
            # read the content of the file Report\Layout
            $fileContent = Get-Content -Path "tmp\Report\Layout" -Encoding Unicode -Raw | ConvertFrom-Json
        }
        catch {        
            # if something goes wrong, log and continue
            Write-Output $("Not Evaluated: " + $pbixFilePath.Replace($rootDirectory, ""))
            continue
        }

        # if the file contains the search phrase, log the shorten file path
        if (($fileContent | Select-String  -Pattern $searchedPhraseRegex).length -gt 0) {
            Write-Output $($pbixFilePath.Replace($rootDirectory, ""))
        }

        # remove all temp files
        if (Test-Path -Path $tempDirectoryPath) {
            Get-ChildItem -Path $tempDirectoryPath -Recurse | Remove-Item -Force -Recurse
        }
    }

    # set the root working directory to leave the temp directory
    Set-Location -Path $rootDirectory

    # remove the tmp working direcotry
    if (Test-Path -Path $tempDirectoryPath) {
        Remove-Item -Path $tempDirectoryPath -Recurse -Force
    }
}
finally {
    # set the original working directory back again
    Set-Location -Path $originalWorkingDirectory    

    # show progress again
    $progressPreference = 'Continue'
}

 

 

If I had some spare time someday, I’d like to improve the code that it parses the nested JSON files and tells me where and how a column is used.