Solved: Writing dataframe to delta table fails with error ...

JayJay11 · ‎04-10-2024

Hello there,

I try to write a manipulated dataframe back to a delta table in a Lakehouse using "overwrite". There are no schema changes, it is just less data than before.

dfCustomer = spark.read.table("LakehouseOperations.factCustomerBase")
dfScope = spark.read.table("LakehouseOperations.tecDataScopeSnapshots")

dfCustomerJoined = dfCustomer.join(dfScope, dfCustomer.snapshot_date == dfScope.scopeDate, "inner").drop("scopeDate", "scopeDateBuckets")

dfCustomerJoined.write.mode("overwrite").format("delta").option("overwriteSchema", "true").save("Tables/factCustomerBase")

Error message:

-------------------------------------------------------------------------- Py4JJavaError Traceback (most recent call last) Cell In[74], line 2 1 #dfBuildingsJoined.write.mode("overwrite").format("delta").option("overwriteSchema", "true").save("Tables/factBuildings") ----> 2 dfCustomerJoined.write.mode("overwrite").format("delta").option("overwriteSchema", "true").save("Tables/factCustomerBase") File /opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py:1398, in DataFrameWriter.save(self, path, format, mode, partitionBy, **options) 1396 self._jwrite.save() 1397 else: -> 1398 self._jwrite.save(path) File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args) 1316 command = proto.CALL_COMMAND_NAME +\ 1317 self.command_header +\ 1318 args_command +\ 1319 proto.END_COMMAND_PART 1321 answer = self.gateway_client.send_command(command) -> 1322 return_value = get_return_value( 1323 answer, self.gateway_client, self.target_id, self.name) 1325 for temp_arg in temp_args: 1326 if hasattr(temp_arg, "_detach"😞 File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:169, in capture_sql_exception.<locals>.deco(*a, **kw) 167 def deco(*a: Any, **kw: Any) -> Any: 168 try: --> 169 return f(*a, **kw) 170 except Py4JJavaError as e: 171 converted = convert_exception(e.java_exception) File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name) 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client) 325 if answer[1] == REFERENCE_TYPE: --> 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". 328 format(target_id, ".", name), value) 329 else: 330 raise Py4JError( 331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n". 332 format(target_id, ".", name, value)) Py4JJavaError: An error occurred while calling o5692.save. : org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:322) at org.apache.spark.sql.execution.OptimizeWriteExchangeExec.doExecute(OptimizeWriteExchangeExec.scala:66) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:231) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:282) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:279) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:227) at org.apache.spark.sql.delta.constraints.DeltaInvariantCheckerExec.doExecute(DeltaInvariantCheckerExec.scala:72) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:231) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:282) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:279) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:227) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$1(FileFormatWriter.scala:254) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.writeAndCommit(FileFormatWriter.scala:296) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeWrite(FileFormatWriter.scala:237) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:218) at org.apache.spark.sql.delta.files.TransactionalWrite.$anonfun$writeFiles$1(TransactionalWrite.scala:421) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:120) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:209) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:105) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:67) at org.apache.spark.sql.delta.files.TransactionalWrite.writeFiles(TransactionalWrite.scala:385) at org.apache.spark.sql.delta.files.TransactionalWrite.writeFiles$(TransactionalWrite.scala:348) at org.apache.spark.sql.delta.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:139) at org.apache.spark.sql.delta.files.TransactionalWrite.writeFiles(TransactionalWrite.scala:222) at org.apache.spark.sql.delta.files.TransactionalWrite.writeFiles$(TransactionalWrite.scala:219) at org.apache.spark.sql.delta.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:139) at org.apache.spark.sql.delta.commands.WriteIntoDelta.write(WriteIntoDelta.scala:335) at org.apache.spark.sql.delta.commands.WriteIntoDelta.$anonfun$run$1(WriteIntoDelta.scala:98) at org.apache.spark.sql.delta.commands.WriteIntoDelta.$anonfun$run$1$adapted(WriteIntoDelta.scala:93) at org.apache.spark.sql.delta.DeltaLog.withNewTransaction(DeltaLog.scala:232) at org.apache.spark.sql.delta.commands.WriteIntoDelta.run(WriteIntoDelta.scala:93) at org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:180) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:152) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:120) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:209) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:105) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:67) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:152) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:145) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(...

How can I solve this? Sorry I had to cut the error message due to character limitations here.

v-gchenna-msft · ‎04-12-2024

Hi @JayJay11 ,

Closing this thread as this is a duplicate thread of this - Solved: Re: Error when writing dataframe to delta table - Microsoft Fabric Community

View solution in original post

v-gchenna-msft · ‎04-12-2024

Hi @JayJay11 ,

Closing this thread as this is a duplicate thread of this - Solved: Re: Error when writing dataframe to delta table - Microsoft Fabric Community

Writing dataframe to delta table fails with error (Spark_Ambiguous_UserApp_JobAborted)

Helpful resources

Europe’s largest Microsoft Fabric Community Conference

New forum boards available in Synapse

New forum boards available in Real-Time Intelligence.

Fabric Monthly Update - May 2024

Jumpstart your career with the Fabric Career Hub