Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
bcdobbs
Super User
Super User

Download Binary File from SFTP in Notebook

Hi,

I know I can do this in a data pipeline copy activity but I'd like to be able to download a binary file from SFTP within a notebook.

 

I can download CSV using paramiko decoding it to a string and then writing it to one lake but when I try and write anything other than string data it fails saying it can't find the path. I'd like to be able to just do a direct binary copy.

 

Has anyone got a code snippet that would work?

 

Thanks


Ben



Ben Dobbs

LinkedIn | Twitter | Blog

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!
1 ACCEPTED SOLUTION
bcdobbs
Super User
Super User

Thank you! That link gave me the clue I needed. I'm downloaded data from SFTP using:

 

## Main SFTP Function

def download_files_from_sftp(hostname, port, username, password, local_directory, remote_directory):
    ## Initialize the SSH client
    ssh_client = paramiko.SSHClient()
    ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    
    try:
        ## Connect to the server
        ssh_client.connect(hostname, port=port, username=username, password=password)

        ## Initialize SFTP client
        with ssh_client.open_sftp() as sftp:
            ## List files in the specified directory
            file_list = sftp.listdir(f"{remote_directory}")
            for file in file_list:
                remote_filepath = f"{remote_directory}/{file}"
                local_filepath = os.path.join(local_directory, file)

                ## Check if it is a file and not a directory
                if sftp.stat(remote_filepath).st_mode & 0o170000 == 0o100000: # 0o100000 is the mask for a regular file
                    ## Download the file
                    ## Reference lakehouse as local storage using: "/lakehouse/default/Files/FolderName/filename.csv"
                    sftp.get(remote_filepath, local_filepath)
                    print(f"Downloaded {file}")

    except Exception as e:
        print(f"Error: {e}")

    finally:
        ## Close the SSH client
        if ssh_client: ssh_client.close()

 

The missing part was correctly referencing the attached default lakehouse as if it was local storage.

By using a path in the format: "/lakehouse/default/Files/FolderName/filename.csv" the problems I was having went away!

 

Obviously if you're using the code above make sure you're storing the passwords etc as secrets in Azure Key Vault and access them using

sftp_password = mssparkutils.credentials.getSecret("https://keyvaultname.vault.azure.net/", "Secret-Name")

 



Ben Dobbs

LinkedIn | Twitter | Blog

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!

View solution in original post

4 REPLIES 4
bcdobbs
Super User
Super User

Thank you! That link gave me the clue I needed. I'm downloaded data from SFTP using:

 

## Main SFTP Function

def download_files_from_sftp(hostname, port, username, password, local_directory, remote_directory):
    ## Initialize the SSH client
    ssh_client = paramiko.SSHClient()
    ssh_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    
    try:
        ## Connect to the server
        ssh_client.connect(hostname, port=port, username=username, password=password)

        ## Initialize SFTP client
        with ssh_client.open_sftp() as sftp:
            ## List files in the specified directory
            file_list = sftp.listdir(f"{remote_directory}")
            for file in file_list:
                remote_filepath = f"{remote_directory}/{file}"
                local_filepath = os.path.join(local_directory, file)

                ## Check if it is a file and not a directory
                if sftp.stat(remote_filepath).st_mode & 0o170000 == 0o100000: # 0o100000 is the mask for a regular file
                    ## Download the file
                    ## Reference lakehouse as local storage using: "/lakehouse/default/Files/FolderName/filename.csv"
                    sftp.get(remote_filepath, local_filepath)
                    print(f"Downloaded {file}")

    except Exception as e:
        print(f"Error: {e}")

    finally:
        ## Close the SSH client
        if ssh_client: ssh_client.close()

 

The missing part was correctly referencing the attached default lakehouse as if it was local storage.

By using a path in the format: "/lakehouse/default/Files/FolderName/filename.csv" the problems I was having went away!

 

Obviously if you're using the code above make sure you're storing the passwords etc as secrets in Azure Key Vault and access them using

sftp_password = mssparkutils.credentials.getSecret("https://keyvaultname.vault.azure.net/", "Secret-Name")

 



Ben Dobbs

LinkedIn | Twitter | Blog

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!

Hi @bcdobbs ,

Glad to know that your query got resolved. Please continue using Fabric Community on your further queries.

v-gchenna-msft
Community Support
Community Support

Hi @bcdobbs ,

Thanks for using Fabric Community.
Can you please check this similar thread - Solved: Re: Copy a file from an URL to OneLake - Microsoft Fabric Community . 
I think you might get some ideas after looking into that thread.

Please let me know if you have further queries.

Hello @bcdobbs ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .

Helpful resources

Announcements
Expanding the Synapse Forums

New forum boards available in Synapse

Ask questions in Data Engineering, Data Science, Data Warehouse and General Discussion.

RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayFBCUpdateCarousel

Fabric Monthly Update - May 2024

Check out the May 2024 Fabric update to learn about new features.

Top Solution Authors