Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
ManishSuryaG
Frequent Visitor

Microsoft Fabric Error handling and logging

Hello All,

I have started working on Microsoft Fabric data engineering POC using notebooks and pipelines and stuck in between, I am looking for below scenarios but did not get more details, looking for references if any please share,

 

1. When error occurs in PySpark notebook, what log does program output.
We need to design error handling rule and log format.

 

2. Transaction Management -
Can PySpark manage transaction scope ?
Ex) We'd like to commit after program loads Table A and Table B. If program fails to load Table B, then program execute rollback Table A.

 

3. Log Management - Does MS Fabric have log management function?  Can we use LogAnalytics?


4. Backup/ restore function - Does MS Fabric have backup/restore function? can we use Synapse?

10 REPLIES 10
v-nikhilan-msft
Community Support
Community Support

Hi @ManishSuryaG 
Thanks for using Fabric Community.

1) Please refer to this document for driver logs:
Apache Spark application detail monitoring - Microsoft Fabric | Microsoft Learn
Print/exception goes to cell output and driver log. log4j goes to driver log only. 

2) For warehouse transactions you can refer to this:
Transactions in Warehouse tables - Microsoft Fabric | Microsoft Learn

3) Currently we dont support Log Management. You cannot use Log Analytics.

4) What do you mean by backup/restore? Can you please explain this so that I can help you better?
Thanks.

Hope this helps. Please let me know if you have any further questions.

Thank you @v-nikhilan-msft for the updates, below are some updates,

 

1. can we log the PySpark notebook execution logs to tables?

 

2. what are best practice to load the data from Bronze to Silver and Silver to Gold? and how to manage the execution logs at each stage? can we log into table level if we use notebooks for execution?

 

3. Is Transaction only limited to DW/ Gold layer?      

 

Hope you can understand the queries. Please let me know if any.

 

Thank you.

 

Hi @ManishSuryaG 

1) Currently we cannot log the PySpark notebook execution logs to tables.

2) For information regarding Medallion architechture you can refer to this link:
https://learn.microsoft.com/en-us/fabric/onelake/onelake-medallion-lakehouse-architecture

If the loading of data happened within one spark application, then it will be in one driver log of that specific spark application. We don't currently have way to automatically emit log to table.
If you want to emit data within table, it shouldn't go with log way, maybe you could directly write that information to table using spark dataframes in a notebook.

3) Yes. Currently transactions are only supported for warehouse.

Hope this helps. Please let us know if you have any further questions.


Thank you @v-nikhilan-msft  for details.

 

But still looking for how to manage the custom logs for each level execution(like in tables/ files) to track the process which will be used to further analysis and report.

E.g. While loading data from Bronze to Silver, how we can manage execution logs and log somewhere for further analysis like in ADF we are able to read the activity output and log into database tables etc.

 

Hope you understand the requirement.  

Hi @ManishSuryaG 
I understood the ask. But the options are unavailable at the moment. So I replied the same.

Hi @ManishSuryaG 
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. Otherwise, will respond back with the more details and we will try to help.
Thanks

Hi @ManishSuryaG 
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. Otherwise, will respond back with the more details and we will try to help.
Thanks

Hi @v-nikhilan-msft  - can you expand on "Print/exception goes to cell output and driver log"?

 

I have been using the following function to ensure messages are printed to the notebook cell output, since I have never been able to get logging statements (e.g. logging.info('This is a log')) to show up in the cell output.

 

import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

def print_and_log(message, level='INFO'):
    """
    Print the specified message and log it with the specified level.

    Args:
        message (str): The message to be printed and logged.
        level (str, optional): The log level. Defaults to 'INFO'.
    """
    print(message)

    # Mapping log level strings to their corresponding logging functions
    log_levels = {
        'INFO': logging.info,
        'WARNING': logging.warning,
        'ERROR': logging.error,
        'CRITICAL': logging.critical
    }

    # Get the logging function based on the specified level
    log_func = log_levels.get(level.upper())

    if log_func:
        log_func(message)
    else:
        raise ValueError("Invalid log level specified")

 

Up until tonight, that function has only ever output the results of the "print(message)" and has never displayed the log record with the format specified in the logging.basicConfig (i.e. the output of log_func(message)

 

For short period tonight, the code above not only executed the print() statement twice (for some unknown reason) but also output the log statement both to the driver logs and the cell output. I unfortunately don't have any screenshots of the cell output, but I do have one of the driver logs which showed the same behavior. You can see that the print statements have been executed twice and the log_func() output is present in the format DATETIME - INFO - MESSAGE.

logging_screenshot.png

 

I have not been able to reproduce this behavior and I know the code did not change between previous runs and the run above because the code was contained in a .WHL file and was not modified.

 

Any ideas how that happened? Is it possible now to see logging statements in the output of notebook cells? If not, is it in the roadmap?

Any info would be great - I'd love to be able to avoid print() statements.

Hi @jihool3670 
Can you please try this snippet?
Choose the language as Pyspark and type snippet, you will get the dropdown as follows:

vnikhilanmsft_3-1706789614370.png

 


It will guide you to use logger to be able to show in both cell output and driver log like below:

vnikhilanmsft_1-1706789510866.pngvnikhilanmsft_2-1706789563244.png


Hope this helps. Please let me know if you have any further questions.

Oh wow awesome - I'll give it a shot. Thank you!

Helpful resources

Announcements
Expanding the Synapse Forums

New forum boards available in Synapse

Ask questions in Data Engineering, Data Science, Data Warehouse and General Discussion.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Kudoed Authors