Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
ce87
Helper I
Helper I

Notebook Error: Cannot broadcast the table that is larger than 8.0 GiB: 8.5 GiB.

I have a notebook that is throwing the following error;

 

 

Py4JJavaError: An error occurred while calling o4515.execute.
: org.apache.spark.SparkException: Cannot broadcast the table that is larger than 8.0 GiB: 8.5 GiB.
	at org.apache.spark.sql.errors.QueryExecutionErrors$.cannotBroadcastTableOverMaxTableBytesError(QueryExecutionErrors.scala:2366)
	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:231)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

 

 

I have the following spark config set in the notebook, whcih I beleive is supposed to override the 8gb limit, but it doesn't appear to be working.

 

 

spark.conf.set('spark.sql.autoBroadcastJoinThreshold', '-1')

 

 

Does anyone have any ideas for getting around this error?

 

Thanks.

8 REPLIES 8
Expiscornovus
Resolver III
Resolver III

Hi @ce87,

 

Have you tried the -1 value without single quotes? Test if that makes a difference and disables it?

 

spark.conf.set('spark.sql.autoBroadcastJoinThreshold', -1)

 

 

No quotes didn't help. Was able to confirm that the spark config settings are taking affect using spark.conf.get(). Seems that it isn't making any difference.

 

I have opened a ticket with Microsoft Support. 

Hi @ce87 

 

Thanks for using Microsoft Fabric Community.

If you have opened a support ticket, a reference to the ticket number would be greatly appreciated. This will allow us to track the progress of your request and ensure you receive the most efficient support possible.

 

Thank you.

Hi @ce87 


We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others.
Otherwise, will respond back with the more details and we will try to help.


Thank you.

Well, Microsoft support has been completly useless. All i've gotten is a bunch of emails from an operations manager asking if everything is OK. I probably wouldn't be opening a support ticket if everything was OK. Microsoft support is becoming a joke.

Hi @ce87,

 

Thanks for double checking that. Let us know what your progress is with Microsoft Support 👍

ce87
Helper I
Helper I

Thanks for the suggestions. No dice. I tried 

spark.conf.set('spark.ms.autotune.enabled', 'true')

and confirmed it was properly set with;

spark.conf.get('spark.ms.autotune.enabled')   

 

and I still run into the same issue.

Expiscornovus
Resolver III
Resolver III

Hi @ce87,

 

Maybe you can use Autotune for your Spark configuration. That feature also manages the spark.sql.autoBroadcastJoinThreshold setting:

https://learn.microsoft.com/en-us/fabric/data-engineering/autotune?tabs=pyspark

 

I believe you can enable Autotune via the below PySpark code

%%pyspark
spark.conf.set('spark.ms.autotune.enabled', 'true')

 

Helpful resources

Announcements
Expanding the Synapse Forums

New forum boards available in Synapse

Ask questions in Data Engineering, Data Science, Data Warehouse and General Discussion.

RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayFBCUpdateCarousel

Fabric Monthly Update - May 2024

Check out the May 2024 Fabric update to learn about new features.

Top Solution Authors