Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Handling Images in Power BI (Python, Power Query, DAX, Relationships - all into play.)

Hi @marcorusso @AlbertoFerrari 

 

We had a requirement to show images in Power BI visuals. A lot of them. Hosting the images at some websites and storing the URL in our SQL table was not a viable option. So we went with the following approach.

 

1) Store the images on our local computer initially.

2) Create a CSV file with image file names and the respective project codes. It could be anything, for example, an employee ID or something like that.

3) Write a Python script to convert the images into binary data (based on the listing on CSV file) and store it in an SQL Server DB along with the respective project codes. This eliminates the need for hosting the images somewhere and the need for a URL.

4) Connect the Power BI desktop to this SQL server table and do the transformations using Power Query / Transform Data and convert the binary data into text and subsequently convert that into an image URL using a prefix.

5) Then 'Close & Apply' to bring this into Power BI desktop and mark the URL field as "Image URL"

6) Use this "Image URL" on visuals to display the image.

 

I encountered a problem at Step 5:

The binary data is imported after converting to text and even though Power BI - DAX, supports text with lengths as long as 268,435,456 Unicode characters (256 mega characters), when we import that from Power Query, the text lengths are getting truncated to 32766 characters. But the URL text field has lengths more than that and in theory, it could be as long as 256 mega characters.

To fix this, I went back to step 4 and added another transformation step - Split by Length. Most importantly, since I did not know the max length that could occur, splitting the texts into multiple columns was not an option. So I chose an option to split by length as rows and used a length of 30000. This gave a result, that is usable.

 

Subsequently, I created a calculated table using ALL() to get a unique list of project codes. (Because the imported data has the same project code appearing in multiple rows because of the Split by Length as Rows transformation.)

 

I further created a One-To-Many relationship between these two tables and added a calculated column using CONCATENATEX(RELATEDTABLE(<importedtable>),<URL field>). Which concatenated back all the 30000 character long splits of each image back to a single URL of length which could be, in theory, as long as 268,435,456 Unicode characters (256 mega characters) because DAX will support that length. The truncation was happening only during the transmission from Power Query to DAX.

 

Now the concatenated URL (using CONCATENATEX) works perfectly fine.

 

Given below are the Python Script, Power Query, Relationship, and DAX codes used.

 

Python Script (The database credentials are read from the OS environment variables)

This converts the images to binary and stores them in an SQL server table. A csv file with two fields is given as input to this program. The format of CSV file is 

sreenathv_1-1645081781992.png

The following program loops through each line of the CSV file.

 

# import the libraries
import os
import subprocess

#import pyodbc
import base64
from base64 import *

subprocess.call('cls',shell=True)
print('\t\t Python Program for converting images to binary and storing in database.')
print('\n\nThe input template should be a csv file with filenames and dealernames delimited by a comma.\n\n')

list_of_records = []

inputfilename = input("Enter file name:")
print('The file name is '+inputfilename)

inputfile = open(inputfilename,'r')
if_lines = inputfile.readlines()
if_lines_list = []
for if_line in if_lines:
    if_lines_list.append(if_line.strip().split(','))
inputfile.close()

firstrecord = True
for i in if_lines_list:
    if firstrecord:
        firstrecord=False
    else:
        print(i[0],i[1])
        fn = i[1]
        img_file = open(fn,'rb')
        image_read = img_file.read()
        image_64_encode = base64.encodebytes(image_read)
        img_file.close()
        project_code = i[0]
        x = [image_64_encode,project_code]
        list_of_records.append(x)

host = os.environ['SQL_IP']
port = os.environ['SQL_PORT']
server = host + ',' + port
database =  os.environ['SQL_DB']
username = os.environ['SQL_USER'] 
password = os.environ['SQL_PASSWORD'] 
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()

SQLCommand = (''' 
INSERT INTO [dbo].[Project Works]
           ([Image],[ProjectCode])
     VALUES
           (?,?)
''')

for r in list_of_records:
    cursor.execute(SQLCommand,r)
cnxn.commit()

cursor.close()
cnxn.close()

 

Power Query

This imports the stored binary image data from the SQL server, then converts the binary to text, splits it into 30000 characters long fields, and then stores them into rows.

let
    Source = Sql.Databases(<server>),
    Operations = Source{[Name="Operations"]}[Data],
    #"dbo_Project Works" = Operations{[Schema="dbo",Item="Project Works"]}[Data],
    #"Transformed Column" = Table.TransformColumns(#"dbo_Project Works",{{"Image", Text.FromBinary}}),
    #"Added Custom" = Table.AddColumn(#"Transformed Column", "Image URL", each "data&colon;image/jpeg;base64," & [Image]),
    #"Split Column by Position" = Table.ExpandListColumn(Table.TransformColumns(#"Added Custom", {{"Image URL", Splitter.SplitTextByRepeatedLengths(30000), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Image URL"),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Position",{{"Image URL", type text}})
in
    #"Changed Type"

 

DAX part

1) Calculated TABLE

Projects = ALL('Project Works'[ProjectCode])

2) Then create a relationship between these two tables as follows…

sreenathv_0-1645081012833.png

3) Added a Calculated Column to Projects Table

Image URL = CONCATENATEX(RELATEDTABLE('Project Works'),'Project Works'[Image URL])

That's it. Then marked this field as category "Image URL" and used on visuals everthing is working fine.

 

Here is my question. This is the first time ever that I am actually relying on the order of rows to get the result. So far, whatever DAX functions I have used, I was not bothered about in what order the data is stored in the table or in what order the formula is evaluating them. 

 

But in this case of CONCATENATEX, although it is giving the correct result, I think the order in which the data is stored in the table, or the order in which CONCATENATEX takes the data for evaluation decides whether the resultant "Image URL" is correct or not.

 

My questions are,

1) Is it guaranteed that CONCATENATEX will always concatenate the strings (each one of them 30000 characters or less long) in the same order they were split in Power Query.

2) Is there a chance that the evaluation happens in a different order and CONCATENATEX gives a URL that is joined out of order and therefore unusable?

3) Is this a correct usage?

4) Generally, I would like to know what do you think about using CONCATENATEX in such scenarios and your expert opinion.

 

Thanks in Advance.

 

 

 

1 REPLY 1
V-lianl-msft
Community Support
Community Support

Hi @Anonymous ,

 

If the sorting in power query does not change and the filtering conditions of DAX formula do not change, the running order will not change.

As far as I know, the 32766 character limit on text being loaded into Power BI, and the 2.1 million character limit in DAX functions.

In addition to the live connection, the data will always be imported or cached in the model, so 32766 is an inevitable limitation unless you use live connection.

 

Best Regards,
Liang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.