Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
SergioTorrinha
Resolver II
Resolver II

Notebook Resources | Spark not recognized in local machine | File not recognized in local machine

Hi everyone!

 

These questions are related to working with Notebook Resources, see documentation here:
https://learn.microsoft.com/en-us/fabric/data-engineering/author-notebook-resource-with-vs-code

 

1. I have downloaded an existing notebook I had on my Fabric remote workspace into my local machine, following the exact steps documented in link above.
When on my machine, spark library is not being recognized altought its running locally (and in the remote Fabric workspace), aparently, whitout any issue, as you can see in print screen below

 

Capture.PNG


Am I doing something wrong here, or should I do something differently?

 

2. I have a log file stored in the Files section of my data lake, in fabric remote workspace.

In Fabric workspace, below code works fine, however in local machine, it throws the highlighted error.

SergioTorrinha_2-1703761968065.png

 

How can/should I reference the file, so everything works fine locally and in the remote workspace?


Thank you.

15 REPLIES 15
QixiaoWang
Employee
Employee

Serogio

for the first question related to the spark variable, in fabric notebook, before its execution, there would be some pre-run code executed by the system to define this "spark" variable as the runtime context of spark, so you dont need to manually define any spark session. given this pre-run code only execute when running the notebook, that explain why it is notrecognized in your own .py file. I think you can just ignore this error. 

 

for your second quesiton about the file path, we might need some time to work on the solution, as workground to unblock you, could you please manually create a folder as the error require? 

 

thanks

qixiao 

Hi @QixiaoWang !

Thanks for your input.
Correct me if I'm wrong, but I beleive you are one of the authors os the documentation about Notebook Resources and. If you are, then it makes you one of the best persons to answer my questions (not diminuishing the importance of the work of your other collegues, of course).
Am I right? =]

SergioTorrinha_0-1704355542410.png


Anyway, to the points:

 

1 - Spark session question:

I think one of the main purposes of having VS Code associated to notebooks, is to be able to develop my solutions using my local VS code development environment. Using this development environment is more convient than using MS Fabric UI to directly develop in there and, also, I am able to create modules that are part of my solution not having to resort to MS Fabric environments due to the general overhead they represent when managing packages (and I am refering directly to the mandatory rule of having to define wheel files to deal with that).

I think ignoring the error is not an option, because ideally I would like to develop things in my machine, provided I am synchronized with my data lake tables and files, as it makes me more productive.

 

2 - 'No such file or directory' error:

With your input, I came to realize that, indeed, the file does not exist physically in my machine but it does exist on my data lake, as you can see in image below:

SergioTorrinha_0-1704357120105.png

 

By reading through the VS code integration documentation, I understood that the notebook was synched with my datalake and, therefore, I wouldn't need to have the tables and files form my data lake in my local machine.

Perhaps I interpreted that wrongly and I know that Fabric is on it's early stages and continuously evolving, but would'nt it make sense to not have the files physically present in my machine?

 

3 - Bonus question:

All in all, I guess one of the perks of working with VS code and data lake synchronization, is to be able to have a familiar and productive environment to develop our solutions. Question here is: reading through my questions, please let me know if I'm doing something wrong in what the development setup is concerned, because my general purpose is to find the ideal setup to develop with Fabric.

 

Worth noting that I have been in contact with your support teams, which are also looking into this (questions 1 and 2 only).

I also have previously provided the support ticket number (please see message 9).

 

Thank you.

 

Serogio

 

hey..sorry for replying a little bit late. 

again, really appriecate the feedback of the Synapse VS Code experience and I am more than happy to run into a call to discuss further, feel free to reach out to me via: qixwang@microsoft.com

 

for the "spark" variable issue, there is the quick workaround that might help to address the warning.

The pre-run code I mentiond actually stay in the local desktop, too. It is sub-folder under your Home directory as:  .ipython/profile_default/startup/init_lighter.py. 

 

so in your own .py file, you can import that "init_lighter.py" module. 

 

 

import sys
import os
home_directory = os.path.expanduser("~")
directory_to_append = os.path.join(home_directory, ".ipython/profile_default/startup/")
sys.path.append(directory_to_append)

from init_lighter import spark
print("Spark version: " + spark.version)

 

now within you own .py file, you can call the method from the "Spark" variable without any warning from Python interperter. 

 

for the second issue, one thing need to call out in current release, ONLY the spark code will be posted/synced to the remote workspace for the execution, the pure python code would be still execute locally in your desktop, that explain why the you will need a local folder path macth to the lakehosue log. 

 

qixiao

Hi @QixiaoWang !

Just wanted to give you (and other users) an update about this issue.

I was able to surpass the 'Import "init_lighter" could not be resolved' error, that I have described previously, by adding the module path to my user settings.json from VS Code (for a reference on how I did it, please check this link ).

However, even when importing this module, the spark object is not recognized and, therefore, I still have an error in the end.

SergioTorrinha_0-1705569464206.png


At this point I think it is simpler, if one just initialize (and terminate) the spark session within the notebook when working in local folder unless, of course, you have some other solution?

Thank you.

Hi @QixiaoWang !

 

No worries about the late reply. I’m glad I can contribute for the VS code experience in Fabric and, thanks for the time you put into trying to figure out these issues.

 

Regarding Spark session invocation:

As a quick test, I tried to invoke the spark session inside my notebook, before even going into my modules, but apparently the init_lighter.py module is not being recognized, as you can see in image below:

SergioTorrinha_0-1704705547445.png

 

although it was initialized when I opened the notebook in my local machine, as you can see in the output console in above image, and also it does exist in my local machine as below image demonstrates:

SergioTorrinha_1-1704705547446.png

 

Maybe, at this point, it’s just me being bad with Python, but you happen to have any clue on how to solve this one?

 

Regarding the logging file:

I understand what you mean, but it is still a bit odd to me that the Tables are recognized but the Files are not. I am just wondering, right now, what would be the best/advised practice to have in this case. Perhaps instead of a logging file, I should have a logging table, so I don’t have to replicate too many artifacts in my local machine, in order to avoid mistakes or maintenance complexity/overhead.

 

Again, thanks for the input and for your time. 🙂

Hi @QixiaoWang !

I wonder if you had some time to have a look at my questions above?

Please let me know.

 

Thank you.

SergioTorrinha
Resolver II
Resolver II

@v-gchenna-msftthanks for quick reply.

Regarding spark question:
Maybe I'm confusing stuff here, but if I manually create the session on the notebook, will it also work in the remote Fabric workspace?

 

Regarding File path question:
I was still editing my question before you answered. I'm sorry for that. Could please have another look at my original post?

Thank you.

Hi @SergioTorrinha ,

FYI: I am unable to open the link.

vgchennamsft_0-1703763223106.png


Inorder to run the fabric notebooks in local and consume your fabric workspace you should first configure your local system. 

VS Code extension overview - Microsoft Fabric | Microsoft Learn

Thanks for sharing the documentation, but I've already setted up my local machine to consume my fabric workspace with the exact steps in that documentation you shared.

 

I don't understand why you can't open the link (I also can't to be honest, which is weird enough).
But in any case, the link is pointing to the Notebook Resource section of the documentation link you just shared

SergioTorrinha_0-1703764095631.png

 

Hi @SergioTorrinha ,

If you were able to complete the configuration completely then you can see your workspace notebooks and lakehouse details towards the left.

Hi @v-gchenna-msft !

Yes and, indeed, that's the case. I can see all those details in my local machine, sorry if that wasn't clear.
The problem, is that the same pathing I'm using in the remote workspace to reference the log file is, aprently, not the same as the pathing I need to use in my local machine, according to screenshots shared on my original post.
Also, I still did not understood if the spark session I need to create manually on my local machine, will remain valid when I upload the local notebook to the remote workspace.


Please let me know if my messages are not clear or if you require any other information on my end.

Thank you.

Hi @SergioTorrinha ,

Apologies for the issue you have been facing. If its a bug, we will definitely would like to know and properly address it. Please go ahead and raise a support ticket to reach our support team: https://support.fabric.microsoft.com/support

After creating a Support ticket please provide the ticket number as it would help us to track for more information.

Hi @v-gchenna-msft !

I'm not sure if it's a bug but, in any case, I have submitted a suport ticket as you suggested.
The support ticket number is 2312290050001023.

Please let me know if you require other information from my end.
Thank you

Hi @SergioTorrinha ,

Thank you for shairng the ticket number.
Team will look into your issue shortly and will get some help on it.

Hope you got insights over your query, Please continue using Fabric Community for your further queries.

v-gchenna-msft
Community Support
Community Support

Hi @SergioTorrinha ,

Thank you for using Fabric Community.

Spark Session is present by default in Fabric Workspace but when working in local you should create it manually.

Please find below code:

import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
      .master("local[1]")
      .appName("SparkByExample")
      .getOrCreate()


Spark - Create a SparkSession and SparkContext - Spark By {Examples} (sparkbyexamples.com)

Hope this is helpful. Please let me know incase of further queries. 

Helpful resources

Announcements
April Fabric Update Carousel

Fabric Monthly Update - April 2024

Check out the April 2024 Fabric update to learn about new features.

Top Kudoed Authors