Solved: Re: Error connecting to SQL Database using Fabric ...

Ryan_OC · ‎01-29-2024

I was completing the Microsoft Learn module titled Ingest data with Spark and Microsoft Fabric notebooks - Training | Microsoft Learn

I tried to connect to my SQL DB using the example in the module:

# Placeholders for Azure SQL Database connection info 
server_name = "your_server_name.database.windows.net" 
port_number = 1433 # Default port number for SQL Server 
database_name = "your_database_name" 
table_name = "YourTableName" # Database table 
client_id = "YOUR_CLIENT_ID" # Service principal client ID 
client_secret = "YOUR_CLIENT_SECRET" # Service principal client secret 
tenant_id = "YOUR_TENANT_ID" # Azure Active Directory tenant ID 

# Build the Azure SQL Database JDBC URL with Service Principal (Active Directory Integrated) 
jdbc_url = f"jdbc:sqlserver://{server_name}:{port_number};database={database_name};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;Authentication=ActiveDirectoryIntegrated" 

# Properties for the JDBC connection 
properties = { 
"user": client_id, 
"password": client_secret, 
"driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver", 
"tenantId": tenant_id } 

# Read entire table from Azure SQL Database using AAD Integrated authentication 
sql_df = spark.read.jdbc(url=jdbc_url, table=table_name, properties=properties) 

# Show the Azure SQL DataFrame 
sql_df.show()

I changed all the parameters, but keep getting an error.

My client_id = I've used both my user email address (SSO) and the application (Client) ID
client_secret = I've used my m365 / azure password and the registered app client secret value and the secret ID

Error
LOG:

An error occurred during the execution of SQL statement. 1. Ensure that any SQL statements you are issuing have valid syntax. 2. Ensure that any SQL tables or functions that you are calling exist and your user as proper permissions for the actions your code is attempting to make. This applies to both standard SQL tables as well as external SQL tables. 3. Ensure that the columns referenced in your code exist in the target tables. 4. Check the logs for this Spark application by clicking the Monitor tab in left side of the Synapse Studio UI, select "Apache Spark Applications" from the "Activities" section, and find your Spark job from this list. Inspect the logs available in the "Logs" tab in the bottom part of this page.

Py4JJavaError                             Traceback (most recent call last)
Cell In[17], line 32
     24 properties = {
     25     "user": client_id, 
     26     "password": client_secret,  
     27     "driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver",
     28     "tenantId": tenant_id  
     29 }
     31 # Read entire table from Azure SQL Database using AAD Password authentication
---> 32 sql_df = spark.read.jdbc(url=jdbc_url, table=table_name, properties=properties)
     34 # Show the Azure SQL DataFrame
     35 sql_df.show()

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py:927, in DataFrameReader.jdbc(self, url, table, column, lowerBound, upperBound, numPartitions, predicates, properties)
    925     jpredicates = utils.toJArray(gateway, gateway.jvm.java.lang.String, predicates)
    926     return self._df(self._jreader.jdbc(url, table, jpredicates, jprop))
--> 927 return self._df(self._jreader.jdbc(url, table, jprop))

File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
   1316 command = proto.CALL_COMMAND_NAME +\
   1317     self.command_header +\
   1318     args_command +\
   1319     proto.END_COMMAND_PART
   1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
   1323     answer, self.gateway_client, self.target_id, self.name)
   1325 for temp_arg in temp_args:
   1326     if hasattr(temp_arg, "_detach"):

File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:169, in capture_sql_exception.<locals>.deco(*a, **kw)
    167 def deco(*a: Any, **kw: Any) -> Any:
    168     try:
--> 169         return f(*a, **kw)
    170     except Py4JJavaError as e:
    171         converted = convert_exception(e.java_exception)

File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling o4360.jdbc.
: com.microsoft.sqlserver.jdbc.SQLServerException: Failed to load ADAL4J Java library for performing ActiveDirectoryPassword authentication.
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.getFedAuthToken(SQLServerConnection.java:4440)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.onFedAuthInfo(SQLServerConnection.java:4415)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.processFedAuthInfo(SQLServerConnection.java:4380)
	at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onFedAuthInfo(tdsparser.java:289)
	at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:125)
	at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:37)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:5233)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:3988)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.access$000(SQLServerConnection.java:85)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:3932)
	at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7375)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:3206)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2713)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:2362)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:2213)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:1276)
	at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:861)
	at org.apache.spark.sql.execution.datasources.jdbc.connection.BasicConnectionProvider.getConnection(BasicConnectionProvider.scala:49)
	at org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProviderBase.create(ConnectionProvider.scala:102)
	at org.apache.spark.sql.jdbc.JdbcDialect.$anonfun$createConnectionFactory$1(JdbcDialects.scala:123)
	at org.apache.spark.sql.jdbc.JdbcDialect.$anonfun$createConnectionFactory$1$adapted(JdbcDialects.scala:119)
	at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:63)
	at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58)
	at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:241)
	at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:37)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:346)
	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:236)
	at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:219)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:219)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
	at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:261)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.base/java.lang.Thread.run(Thread.java:829)

I asked Chat GPT and it said I need to perhaps load the adal4j Java library.

So I added the following code snippet:

from pyspark.sql import SparkSession

# Initialize Spark Session with adal4j library for Azure Active Directory authentication
spark = SparkSession.builder \
    .appName("Azure SQL Connector") \
    .config("spark.jars.packages", "com.microsoft.azure:adal4j:1.6.7") \
    .getOrCreate()

What am I doing wrong?

v-gchenna-msft · ‎01-29-2024

Hi @Ryan_OC ,

I have tried connecting to it using pyodbc, Can you give a try?

import pyodbc

tenant_id = "tenant_id"
service_principal_id = f"client_id@{tenant_id}" # important to include your @fully qualified domain or tenant_id
service_principal_secret = "client_secret"

# Define your SQL Server details
server_name = "server_connection_string"
database_name = "database_name"
queryStr = 'SELECT 1 AS a, 2 AS b UNION ALL SELECT 2 AS a, 3 AS b'    

# Define the SQL Server ODBC connection string
conn_str = (
    f"DRIVER={{ODBC Driver 18 for SQL Server}};"
    f"SERVER={server_name};"
    f"DATABASE={database_name};"
    f"UID={service_principal_id};"
    f"PWD={service_principal_secret};"
    f"Authentication=ActiveDirectoryServicePrincipal"
)

# Establish the connection
conn = pyodbc.connect(conn_str)

# Execute a query
cursor = conn.cursor()
cursor.execute(queryStr)
resultList = cursor.fetchall()
resultColumns = columns = [column[0] for column in cursor.description]
print(str([dict(zip(columns, row)) for row in resultList]))

Incase if this doesn't help, I will try to do a deeper investigation.

Thank you

View solution in original post

Ryan_OC · ‎01-29-2024

Hi @v-gchenna-msft,

Thank you for your assistance.

I already have an app registered and have the tenant_ID, client_id and client_secret. These were the values I was using:
Screenshot 2024-01-29 154211.png
Screenshot 2024-01-29 154725.png

Client_Id = 1 above

Tenant_id = 2 above

Client secret = 3 or 4 above (Not sure which)?

I then added this code as it relates to the above registered app:

--DB
CREATE USER [Appenate] FROM EXTERNAL PROVIDER;
GO
 
ALTER ROLE db_owner ADD member [Appenate]
GO

I then ran this code:

# Placeholders for Azure SQL Database connection info
server_name = "your_server_name.database.windows.net"
port_number = 1433  # Default port number for SQL Server
database_name = "your_database_name"
table_name = "YourTableName" # Database table
client_id = "YOUR_CLIENT_ID"  # Service principal client ID
client_secret = "YOUR_CLIENT_SECRET"  # Service principal client secret
tenant_id = "YOUR_TENANT_ID"  # Azure Active Directory tenant ID


# Build the Azure SQL Database JDBC URL with Service Principal (Active Directory Integrated)
jdbc_url = f"jdbc:sqlserver://{server_name}:{port_number};database={database_name};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;Authentication=ActiveDirectoryIntegrated"

# Properties for the JDBC connection
properties = {
    "user": client_id, 
    "password": client_secret,  
    "driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver",
    "tenantId": tenant_id  
}

# Read entire table from Azure SQL Database using AAD Integrated authentication
sql_df = spark.read.jdbc(url=jdbc_url, table=table_name, properties=properties)

# Show the Azure SQL DataFrame
sql_df.show()

BUT the same error occurred 😞

Any ideas?

v-gchenna-msft · ‎01-29-2024

Hi @Ryan_OC ,

I have tried connecting to it using pyodbc, Can you give a try?

import pyodbc

tenant_id = "tenant_id"
service_principal_id = f"client_id@{tenant_id}" # important to include your @fully qualified domain or tenant_id
service_principal_secret = "client_secret"

# Define your SQL Server details
server_name = "server_connection_string"
database_name = "database_name"
queryStr = 'SELECT 1 AS a, 2 AS b UNION ALL SELECT 2 AS a, 3 AS b'    

# Define the SQL Server ODBC connection string
conn_str = (
    f"DRIVER={{ODBC Driver 18 for SQL Server}};"
    f"SERVER={server_name};"
    f"DATABASE={database_name};"
    f"UID={service_principal_id};"
    f"PWD={service_principal_secret};"
    f"Authentication=ActiveDirectoryServicePrincipal"
)

# Establish the connection
conn = pyodbc.connect(conn_str)

# Execute a query
cursor = conn.cursor()
cursor.execute(queryStr)
resultList = cursor.fetchall()
resultColumns = columns = [column[0] for column in cursor.description]
print(str([dict(zip(columns, row)) for row in resultList]))

Incase if this doesn't help, I will try to do a deeper investigation.

Thank you

Ryan_OC · ‎01-29-2024

OK I gave it a try and I received this:
[{'a': 1, 'b': 2}, {'a': 2, 'b': 3}] and it stated it was successful. Was it successful?

SO now do I need to change the QueryStr to access a table?

v-gchenna-msft · ‎01-29-2024

Great @Ryan_OC .

Yes it was successfull, you were able to access the database. You can change the QueryStr to access a table.

Ryan_OC · ‎01-29-2024

Thanks so much! I don't understand why the first method was unable to work...is it a fabric issue do you think?

v-gchenna-msft · ‎01-29-2024

Hi @Ryan_OC ,

I am not sure of that.

Have you followed below format?

tenant_id = "tenant_id"

service_principal_id = f"client_id@{tenant_id}" # important to include your @fully qualified domain or tenant_id

service_principal_secret = "client_secret"

Ryan_OC · ‎01-29-2024

Same problem 😞

Flipper · ‎03-11-2024

Hi @Ryan_OC

Did you manage to get this working? I think I'm essentially doing the same thing as you (the DP-600 course) and now attempting to connect to my own instance.

As you can see from the above, I'm able to connect to the database and run the query without any error.

However, when I then change the query to select from a table in the DB, I get the following:

I'm basically trying to run the contents of a table into the lakehouse via a notbook - I can probably do this via a Gen2 pipleline and use my existing content gateway, but would be much better to run via notebook.

Thanks

Flipper · ‎03-11-2024

Ahhhh - for anyone who is interested, my issue was with the query - Fabric didn't like column 13 of my query (which was using a function to return a date), when I cast that to a varchar it was fine!

Ryan_OC · ‎03-11-2024

Good news! Glad it came right for you

v-gchenna-msft · ‎01-29-2024

Hi @Ryan_OC ,

Thanks for using Fabric Community,

As I understand you are trying to connect Azure SQL using pyspark in Fabric via Service Principle (tenant_id + client_id +client_secret)

Here are the steps you can follow inorder to connect:

Step 1: Create a Service Principle in Azure Active Directory (Microsoft Entra ID).
Step 2: Create your tenant_id, client_id and client_secret.
Step 3: Execute below queries in your Azure SQL Query Editor.

--DB
CREATE USER [v-gchennaSP] FROM EXTERNAL PROVIDER;
GO
 
ALTER ROLE db_owner ADD member [v-gchennaSP]
GO

Note: you should enter your App Registration name over here and also need to login using AAD account.

Step 4: After executing successfully, try to check for an entry in this table using below query.

SELECT * FROM sys.database_principals
WHERE name = 'v-gchennaSP';

Step 5: Then use below code to connect

# Placeholders for Azure SQL Database connection info
server_name = "your_server_name.database.windows.net"
port_number = 1433  # Default port number for SQL Server
database_name = "your_database_name"
table_name = "YourTableName" # Database table
client_id = "YOUR_CLIENT_ID"  # Service principal client ID
client_secret = "YOUR_CLIENT_SECRET"  # Service principal client secret
tenant_id = "YOUR_TENANT_ID"  # Azure Active Directory tenant ID


# Build the Azure SQL Database JDBC URL with Service Principal (Active Directory Integrated)
jdbc_url = f"jdbc:sqlserver://{server_name}:{port_number};database={database_name};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;Authentication=ActiveDirectoryIntegrated"

# Properties for the JDBC connection
properties = {
    "user": client_id, 
    "password": client_secret,  
    "driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver",
    "tenantId": tenant_id  
}

# Read entire table from Azure SQL Database using AAD Integrated authentication
sql_df = spark.read.jdbc(url=jdbc_url, table=table_name, properties=properties)

# Show the Azure SQL DataFrame
sql_df.show()

Hope this is helpful. Please feel free incase of further queries.