Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more.
Get startedGrow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.
I was completing the Microsoft Learn module titled Ingest data with Spark and Microsoft Fabric notebooks - Training | Microsoft Learn
I tried to connect to my SQL DB using the example in the module:
# Placeholders for Azure SQL Database connection info
server_name = "your_server_name.database.windows.net"
port_number = 1433 # Default port number for SQL Server
database_name = "your_database_name"
table_name = "YourTableName" # Database table
client_id = "YOUR_CLIENT_ID" # Service principal client ID
client_secret = "YOUR_CLIENT_SECRET" # Service principal client secret
tenant_id = "YOUR_TENANT_ID" # Azure Active Directory tenant ID
# Build the Azure SQL Database JDBC URL with Service Principal (Active Directory Integrated)
jdbc_url = f"jdbc:sqlserver://{server_name}:{port_number};database={database_name};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;Authentication=ActiveDirectoryIntegrated"
# Properties for the JDBC connection
properties = {
"user": client_id,
"password": client_secret,
"driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"tenantId": tenant_id }
# Read entire table from Azure SQL Database using AAD Integrated authentication
sql_df = spark.read.jdbc(url=jdbc_url, table=table_name, properties=properties)
# Show the Azure SQL DataFrame
sql_df.show()
I changed all the parameters, but keep getting an error.
My client_id = I've used both my user email address (SSO) and the application (Client) ID
client_secret = I've used my m365 / azure password and the registered app client secret value and the secret ID
Error
LOG:
Py4JJavaError Traceback (most recent call last)
Cell In[17], line 32
24 properties = {
25 "user": client_id,
26 "password": client_secret,
27 "driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver",
28 "tenantId": tenant_id
29 }
31 # Read entire table from Azure SQL Database using AAD Password authentication
---> 32 sql_df = spark.read.jdbc(url=jdbc_url, table=table_name, properties=properties)
34 # Show the Azure SQL DataFrame
35 sql_df.show()
File /opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py:927, in DataFrameReader.jdbc(self, url, table, column, lowerBound, upperBound, numPartitions, predicates, properties)
925 jpredicates = utils.toJArray(gateway, gateway.jvm.java.lang.String, predicates)
926 return self._df(self._jreader.jdbc(url, table, jpredicates, jprop))
--> 927 return self._df(self._jreader.jdbc(url, table, jprop))
File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
1316 command = proto.CALL_COMMAND_NAME +\
1317 self.command_header +\
1318 args_command +\
1319 proto.END_COMMAND_PART
1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
1323 answer, self.gateway_client, self.target_id, self.name)
1325 for temp_arg in temp_args:
1326 if hasattr(temp_arg, "_detach"):
File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:169, in capture_sql_exception.<locals>.deco(*a, **kw)
167 def deco(*a: Any, **kw: Any) -> Any:
168 try:
--> 169 return f(*a, **kw)
170 except Py4JJavaError as e:
171 converted = convert_exception(e.java_exception)
File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
332 format(target_id, ".", name, value))
Py4JJavaError: An error occurred while calling o4360.jdbc.
: com.microsoft.sqlserver.jdbc.SQLServerException: Failed to load ADAL4J Java library for performing ActiveDirectoryPassword authentication.
at com.microsoft.sqlserver.jdbc.SQLServerConnection.getFedAuthToken(SQLServerConnection.java:4440)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.onFedAuthInfo(SQLServerConnection.java:4415)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.processFedAuthInfo(SQLServerConnection.java:4380)
at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onFedAuthInfo(tdsparser.java:289)
at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:125)
at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:37)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:5233)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:3988)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.access$000(SQLServerConnection.java:85)
at com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:3932)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7375)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:3206)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2713)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:2362)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:2213)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:1276)
at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:861)
at org.apache.spark.sql.execution.datasources.jdbc.connection.BasicConnectionProvider.getConnection(BasicConnectionProvider.scala:49)
at org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProviderBase.create(ConnectionProvider.scala:102)
at org.apache.spark.sql.jdbc.JdbcDialect.$anonfun$createConnectionFactory$1(JdbcDialects.scala:123)
at org.apache.spark.sql.jdbc.JdbcDialect.$anonfun$createConnectionFactory$1$adapted(JdbcDialects.scala:119)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:63)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:241)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:346)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:236)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:219)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:219)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:261)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)
I asked Chat GPT and it said I need to perhaps load the adal4j Java library.
So I added the following code snippet:
from pyspark.sql import SparkSession
# Initialize Spark Session with adal4j library for Azure Active Directory authentication
spark = SparkSession.builder \
.appName("Azure SQL Connector") \
.config("spark.jars.packages", "com.microsoft.azure:adal4j:1.6.7") \
.getOrCreate()
Solved! Go to Solution.
Hi @Ryan_OC ,
I have tried connecting to it using pyodbc, Can you give a try?
import pyodbc
tenant_id = "tenant_id"
service_principal_id = f"client_id@{tenant_id}" # important to include your @fully qualified domain or tenant_id
service_principal_secret = "client_secret"
# Define your SQL Server details
server_name = "server_connection_string"
database_name = "database_name"
queryStr = 'SELECT 1 AS a, 2 AS b UNION ALL SELECT 2 AS a, 3 AS b'
# Define the SQL Server ODBC connection string
conn_str = (
f"DRIVER={{ODBC Driver 18 for SQL Server}};"
f"SERVER={server_name};"
f"DATABASE={database_name};"
f"UID={service_principal_id};"
f"PWD={service_principal_secret};"
f"Authentication=ActiveDirectoryServicePrincipal"
)
# Establish the connection
conn = pyodbc.connect(conn_str)
# Execute a query
cursor = conn.cursor()
cursor.execute(queryStr)
resultList = cursor.fetchall()
resultColumns = columns = [column[0] for column in cursor.description]
print(str([dict(zip(columns, row)) for row in resultList]))
Incase if this doesn't help, I will try to do a deeper investigation.
Thank you
Hi @v-gchenna-msft,
Thank you for your assistance.
I already have an app registered and have the tenant_ID, client_id and client_secret. These were the values I was using:
Client_Id = 1 above
Tenant_id = 2 above
Client secret = 3 or 4 above (Not sure which)?
I then added this code as it relates to the above registered app:
--DB
CREATE USER [Appenate] FROM EXTERNAL PROVIDER;
GO
ALTER ROLE db_owner ADD member [Appenate]
GO
I then ran this code:
# Placeholders for Azure SQL Database connection info server_name = "your_server_name.database.windows.net" port_number = 1433 # Default port number for SQL Server database_name = "your_database_name" table_name = "YourTableName" # Database table client_id = "YOUR_CLIENT_ID" # Service principal client ID client_secret = "YOUR_CLIENT_SECRET" # Service principal client secret tenant_id = "YOUR_TENANT_ID" # Azure Active Directory tenant ID # Build the Azure SQL Database JDBC URL with Service Principal (Active Directory Integrated) jdbc_url = f"jdbc:sqlserver://{server_name}:{port_number};database={database_name};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;Authentication=ActiveDirectoryIntegrated" # Properties for the JDBC connection properties = { "user": client_id, "password": client_secret, "driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver", "tenantId": tenant_id } # Read entire table from Azure SQL Database using AAD Integrated authentication sql_df = spark.read.jdbc(url=jdbc_url, table=table_name, properties=properties) # Show the Azure SQL DataFrame sql_df.show()
BUT the same error occurred 😞
Any ideas?
Hi @Ryan_OC ,
I have tried connecting to it using pyodbc, Can you give a try?
import pyodbc
tenant_id = "tenant_id"
service_principal_id = f"client_id@{tenant_id}" # important to include your @fully qualified domain or tenant_id
service_principal_secret = "client_secret"
# Define your SQL Server details
server_name = "server_connection_string"
database_name = "database_name"
queryStr = 'SELECT 1 AS a, 2 AS b UNION ALL SELECT 2 AS a, 3 AS b'
# Define the SQL Server ODBC connection string
conn_str = (
f"DRIVER={{ODBC Driver 18 for SQL Server}};"
f"SERVER={server_name};"
f"DATABASE={database_name};"
f"UID={service_principal_id};"
f"PWD={service_principal_secret};"
f"Authentication=ActiveDirectoryServicePrincipal"
)
# Establish the connection
conn = pyodbc.connect(conn_str)
# Execute a query
cursor = conn.cursor()
cursor.execute(queryStr)
resultList = cursor.fetchall()
resultColumns = columns = [column[0] for column in cursor.description]
print(str([dict(zip(columns, row)) for row in resultList]))
Incase if this doesn't help, I will try to do a deeper investigation.
Thank you
OK I gave it a try and I received this:
[{'a': 1, 'b': 2}, {'a': 2, 'b': 3}] and it stated it was successful. Was it successful?
SO now do I need to change the QueryStr to access a table?
Great @Ryan_OC .
Yes it was successfull, you were able to access the database. You can change the QueryStr to access a table.
Thanks so much! I don't understand why the first method was unable to work...is it a fabric issue do you think?
Hi @Ryan_OC ,
I am not sure of that.
Have you followed below format?
tenant_id = "tenant_id"
service_principal_id = f"client_id@{tenant_id}" # important to include your @fully qualified domain or tenant_id
service_principal_secret = "client_secret"
Same problem 😞
Hi @Ryan_OC
Did you manage to get this working? I think I'm essentially doing the same thing as you (the DP-600 course) and now attempting to connect to my own instance.
As you can see from the above, I'm able to connect to the database and run the query without any error.
However, when I then change the query to select from a table in the DB, I get the following:
I'm basically trying to run the contents of a table into the lakehouse via a notbook - I can probably do this via a Gen2 pipleline and use my existing content gateway, but would be much better to run via notebook.
Thanks
Ahhhh - for anyone who is interested, my issue was with the query - Fabric didn't like column 13 of my query (which was using a function to return a date), when I cast that to a varchar it was fine!
Good news! Glad it came right for you
Hi @Ryan_OC ,
Thanks for using Fabric Community,
As I understand you are trying to connect Azure SQL using pyspark in Fabric via Service Principle (tenant_id + client_id +client_secret)
Here are the steps you can follow inorder to connect:
Step 1: Create a Service Principle in Azure Active Directory (Microsoft Entra ID).
Step 2: Create your tenant_id, client_id and client_secret.
Step 3: Execute below queries in your Azure SQL Query Editor.
--DB
CREATE USER [v-gchennaSP] FROM EXTERNAL PROVIDER;
GO
ALTER ROLE db_owner ADD member [v-gchennaSP]
GO
Note: you should enter your App Registration name over here and also need to login using AAD account.
Step 4: After executing successfully, try to check for an entry in this table using below query.
SELECT * FROM sys.database_principals
WHERE name = 'v-gchennaSP';
Step 5: Then use below code to connect
# Placeholders for Azure SQL Database connection info
server_name = "your_server_name.database.windows.net"
port_number = 1433 # Default port number for SQL Server
database_name = "your_database_name"
table_name = "YourTableName" # Database table
client_id = "YOUR_CLIENT_ID" # Service principal client ID
client_secret = "YOUR_CLIENT_SECRET" # Service principal client secret
tenant_id = "YOUR_TENANT_ID" # Azure Active Directory tenant ID
# Build the Azure SQL Database JDBC URL with Service Principal (Active Directory Integrated)
jdbc_url = f"jdbc:sqlserver://{server_name}:{port_number};database={database_name};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;Authentication=ActiveDirectoryIntegrated"
# Properties for the JDBC connection
properties = {
"user": client_id,
"password": client_secret,
"driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"tenantId": tenant_id
}
# Read entire table from Azure SQL Database using AAD Integrated authentication
sql_df = spark.read.jdbc(url=jdbc_url, table=table_name, properties=properties)
# Show the Azure SQL DataFrame
sql_df.show()
Hope this is helpful. Please feel free incase of further queries.
Ask questions in Data Engineering, Data Science, Data Warehouse and General Discussion.
Ask questions in Eventhouse and KQL, Eventstream, and Reflex.
User | Count |
---|---|
6 | |
4 | |
2 | |
1 | |
1 |
User | Count |
---|---|
13 | |
8 | |
4 | |
3 | |
3 |