Showing results for 
Search instead for 
Did you mean: 

From Pandas and SQL to Power Query Code

This is an original article from LaDataWeb blog that I wanted to share with the community in English.

Many times, I have found people working with Python in data engineering or as data scientists who need to work with a Microsoft data movement tool (Data Factory, Logic Apps, etc.) or a reporting tool (Power BI, Excel, etc.) due to a company requirement. The truth here is that those people hate working without coding in Python. They don't want to manage a small environment configuring and clicking for transformations.

Today, Microsoft is pushing Power Query to its limits, integrating the language in many services or tools for data transformations. Now, it can be used with SSIS, SSAS, Power Automate, Logic Apps, Power BI and Dataflows (in Power Platform and Data Factory). Each day, the work with Power Query is growing faster and embracing not only BI roles but also data engineers and scientists.

That's why I have created this post to help them how to code the basic SQL and Pandas (Python lib) transformations with Power Query.

First, they need to understand that in most common situations (dataflows and Power BI) Power Query works in steps. Each line calls the previous result to increase a new step from the previous one and to keep it in order. Now you know that if you see #"Last Step" we are talking about a table result of the previous step.

Let's check the examples that will be listed in the following order: SQL, Python Pandas, Power Query.


Show first 5 lines of the dataset







In Pandas:






In Power Query:



Table.FirstN(#"Last Step",5)



Count rows



Table.RowCount(#"Last Step")



Select the columns of a table



SELECT column1, column2 FROM table1
df[["column1", "column2"]]
#"Last Step"[[Columna1],[Columna2]]

//Or it might be:

Table.SelectColumns(#"Last Step", {"Columna1", "Columna2"} )



Filter rows



SELECT column1, column2 FROM table1 WHERE column1 = 2
df[['column1', 'column2']].loc[df['column1'] == 2]
Table.SelectRows(#"Last Step", each [column1] == 2 )



Filtering with more than 1 column



SELECT * FROM table1 WHERE column1 > 1 AND column2 < 25
df.loc[(df['column1'] > 1) & (df['column2'] < 25)]

# Or using OR y NOT operators

df.loc[(df['column1'] > 1) | ~(df['column2'] < 25)]
Table.SelectRows(#"Last Step", each [column1] > 1 and column2 < 25 )

//Or using OR y NOT operators

Table.SelectRows(#"Last Step", each [column1] > 1 or not ([column1] < 25 ) )



Filters with complex operators



SELECT * FROM table1 WHERE column1 BETWEEN 1 and 5 AND column2 IN (20,30,40,50) AND column3 LIKE '%arcelona%'
df.loc[(df['colum1'].between(1,5)) & (df['column2'].isin([20,30,40,50])) & (df['column3'].str.contains('arcelona'))]
Table.SelectRows(#"Last Step", each ([column1] > 1 and [column1] < 5) and List.Contains({20,30,40,50}, [column2]) and Text.Contains([column3], "arcelona") )



Join tables



SELECT t1.column1, t2.column1 FROM table1 t1 LEFT JOIN table2 t2 ON t1.column_id = t2.column_id



There are two functions that can help us in this process. Merge and Join.



df_joined = df1.merge(df2, left_on='lkey', right_on='rkey', how='left')
df_joined = df1.join(df2, on='column_id', how='left')Luego seleccionamos dos columnas
df_joined.loc[['column1_df1', 'column1_df2']]



In Power Query, we first we set the join conditions and then select the columns with the ExpandTableColumn function.



#"Origen" = #"Last Step"[[column1_t1]]
#"Paso Join" = Table.NestedJoin(#"Origen", {"column_t1_id"}, table2, {"column_t2_id"}, "Prefijo", JoinKind.LeftOuter)
#"Expansion" = Table.ExpandTableColumn(#"Paso Join", "Prefijo", {"column1_t2"}, {"Prefijo_column1_t2"})



Group By



SELECT column1, count(*) FROM table1 GROUP BY column1
Table.Group(#"Last Step", {"column1"}, {{"Alias de count", each Table.RowCount(_), type number}})



Filtering a table grouped by



SELECT store, sum(sales) FROM table1 GROUP BY store HAVING sum(sales) > 1000
df_grouped = df.groupby('store')['sales'].sum()
df_grouped.loc[df_grouped > 1000]
#”Grouping” = Table.Group(#"Last Step", {"store"}, {{"Alias de sum", each List.Sum([sales]), type number}})
#"Final" = Table.SelectRows( #"Grouping" , each [Alias de sum] > 1000 )



Sort descending by column



SELECT * FROM table1 ORDER BY column1 DESC
df.sort_values(by=['column1'], ascending=False)
Table.Sort(#"Last Step",{{"column1", Order.Descending}})



UNION two tables with the same characteristics






In Pandas we have two known options, append and concat.




pd.concat([df1, df2])
Table.Combine({table1, table2})





The following transformations are only for Pandas and Power Query because the are not as regular in query languages as SQL.


Analyze table content




Table.Profile(#"Last Step")




Check for unique values in columns




Table.Profile(#"Last Step")[[Column],[DistinctCount]]




Generate a test table with manual entry data.




df = pd.DataFrame([[1,2],["Boris Yeltsin", "Mikhail Gorbachev"]], columns=["CustomerID", "Name"])
Table.FromRecords({[CustomerID = 1, Name = "Bob", Phone = "123-4567"]})




Remove column from dataset

There are two ways




df.drop(['column1'], axis=1)
Table.RemoveColumns(#"Last Step",{"column1"})




Apply transformations in a column




df.apply(lambda x : x['column1'] + 1 , axis = 1)
Table.TransformColumns(#"Last Step", {{"column1", each _ + 1, type number}})




Alright! That is all folks. It was a long way and it could be longer, but this might be the basics we need. I hope this helps some people find Power Query more user-friendly, since they can just write code directly.