Solved: Re: Collecting data (not visual) outputs from Pyth...

sfmike99 · ‎03-02-2019

I've been moving some basic Python visuals into Power BI which is amazing with the slicers:

Now I'm trying to add in some statistical analysis, and pass data / output back to the canvas. Here's a simple Pearson example:

# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script: 

# dataset = pandas.DataFrame(ROUND, SCORE)
# dataset = dataset.drop_duplicates()

# Paste or type your script code here:

import scipy.stats
print('Pearson Correlation:',scipy.stats.pearsonr(dataset['ROUND'], dataset['SCORE']))

Normally in Spyder the Print command outputs to the iPython console. However in PBI it fails with the error "No image was created. The Python code didn't result in the creation of any visuals."

Is there a way to output text from the Python script to the PBI visual?

Alternatively (and better), can I add the outputs from the Python script to the PBI data model? (I've read a litte about the REST API, but wondering if there is a more direct way).

Thanks.

LivioLanzo · ‎03-03-2019

Hi @sfmike99

when you open a Query in the query editor you can choose Python script in the Transform tab (on the far right).

The DataFrame passed to Python is named 'dataset' and it is the table of the preceding query step. You can apply transofmrations to this DataFrame within Python by inputting the code and after you click ok you will see different DataFrames available for expansion, normally here you'd expand only the dataframe you actually want to return. Let me know if you need me to post pictures which could make it more clear

Did I answer your question correctly? Mark my answer as a solution!

Proud to be a Datanaut!

View solution in original post

LivioLanzo · ‎03-03-2019

@sfmike99 yes all the Drataframes should be visible after the script is run. You should have as many rows as dataframes created by your code. Normally then you'd filter by just the one you'd want to expand and then expand it

Did I answer your question correctly? Mark my answer as a solution!

Proud to be a Datanaut!

View solution in original post

LivioLanzo · ‎03-02-2019

Hi @sfmike99 ,

have you already tried to run the Python script from within the Power Query window?

That is where a Pandas DataFrame can be returned

Did I answer your question correctly? Mark my answer as a solution!

Proud to be a Datanaut!

sfmike99 · ‎03-02-2019

Aha that makes sense. My first day trying this...

Can you point me to any examples of the syntax? Lots for visuals but can't find much about passing data / dataframes back and forth in Power Query. For example:

import pandas
dataset = pandas.DataFrame(ROUND, SCORE)

Fails with the message "NameError: name 'ROUND' is not defined" even though that column is in an existing table.

Thanks!

LivioLanzo · ‎03-03-2019

Hi @sfmike99

when you open a Query in the query editor you can choose Python script in the Transform tab (on the far right).

The DataFrame passed to Python is named 'dataset' and it is the table of the preceding query step. You can apply transofmrations to this DataFrame within Python by inputting the code and after you click ok you will see different DataFrames available for expansion, normally here you'd expand only the dataframe you actually want to return. Let me know if you need me to post pictures which could make it more clear

Did I answer your question correctly? Mark my answer as a solution!

Proud to be a Datanaut!

Anonymous · ‎06-21-2019

Hi @LivioLanzo and @sfmike99 , this is exactly, what I'm looking for. I understand every step and it works, but now I need my new dataframes to become available outside the query editor.

sfmike99 · ‎06-24-2019

You just need to name your new dataframe in the Python script, then expand that one when it is finished.

For instance, I have a table with a bunch of programs with survey rounds and scores. I create a new dataframe based on unique program names, then compute the correlation between round and score for each program. (This is a perfect use case for Python since Power BI has very limited stats capabilities).

# 'dataset' holds the input data for this script
import scipy.stats

Results = dataset[['PROGRAM']].drop_duplicates()
Results['INDEX'] = Results['PROGRAM']
Results.set_index('INDEX', inplace=True)

for idx in Results.index:
    MeasureX = dataset['ROUND'].loc[dataset['PROGRAM'] == idx]
    MeasureY = dataset['SCORE'].loc[dataset['PROGRAM'] == idx]
    Output =  scipy.stats.pearsonr(MeasureX, MeasureY)
    Results.loc[idx, 'R Value'] =  Output[0]
    Results.loc[idx, 'P Value'] =  Output[1]

When that completes, I see both 'dataset' and 'Results' coming back - I simply click to expand 'Results' and it's available in my Power BI model like any other table.

sfmike99 · ‎03-03-2019

That works - thanks for helping me get my head around it!

Can I generate a new dataframe and pass it back to Power BI? For example, if I want to create a summary table of scores?

I can create a new table in the Python script without errors:

newtable = dataset[('ROUND')]

But I think I should see that table in the pulldown after the script executes right?

LivioLanzo · ‎03-03-2019

@sfmike99 yes all the Drataframes should be visible after the script is run. You should have as many rows as dataframes created by your code. Normally then you'd filter by just the one you'd want to expand and then expand it

Did I answer your question correctly? Mark my answer as a solution!

Proud to be a Datanaut!

sfmike99 · ‎03-03-2019

That works! I forgot the double brackets in the Python script: newtable = dataset[['ROUND']]

Way cool...

LivioLanzo · ‎03-03-2019

@sfmike99 no problem!

You can also pass to Python more than one dataframes (for instance the result of other Queries). In fact the Power Query function used to run the Python script has this synthax:

Python.Execute(script as Test, dataframes)

let's say you have two queries named Query1 and Query2 which you both need in Python, you would do:

Python.Execute('.. your python code ..', [df1=Query1, df2=Query2])

Now in Python the two dataframes are knows as df1 and df2

You can also omit the last argument, dataframes, this can be useful if you want to fetch data from the internet for instance:

    Source = Python.Execute("import pandas as pd
                            import pandas_datareader as stock_reader 
                            df1 = pd.DataFrame(stock_reader.get_data_yahoo('AMZN', start='2015-01-01')['Adj Close'])
                            df2 = pd.DataFrame(stock_reader.get_data_yahoo('MSFT', start='2015-01-01')['Adj Close'])
                            r = df1.merge(df2, how='inner', left_index=True, right_index=True, suffixes=('_AMZN', '_MSFT'))
                            r.reset_index(inplace=True)")

Did I answer your question correctly? Mark my answer as a solution!

Proud to be a Datanaut!

elria · ‎06-05-2020

LivioLanzo Nice explanation, thank you!

Is there any way to merge the two datasets/steps of the query? Or the moment we create the python output (step2) the source data (step 1) disappear?

My python script uses attributes from step1 but the returned results don't include those attributes.

How can I refer to different tables or columns of tables of the PBI file as well? Is there this possibility?

Thanks,

Elria

Collecting data (not visual) outputs from Python

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly