Solved: Re: SparkR in Synapse in Fabric

lk-u1248 · ‎03-24-2024

Hi All,

I'm a beginner with Fabric. I'm trying to follow tutorials like https://microsoftlearning.github.io/mslearn-fabric/Instructions/Labs/03-delta-lake.html

By default my notebook is in PySpark, so I can run

df = spark.read.format("csv").option("header","true").load("Files/products.csv")
display(df)

as expected. I can also easily do all the basic R things right out of the box which is amazing

%%sparkr

print('hello world')
library(tidyverse)

data.frame(a = 1:3, b = letters[c(1,1,2)], c = Sys.Date() - 1:3) %>%
group_by(b) %>%
summarise(n())

But how do I load my file in the workspace into R?

%%sparkr
read_csv("Files/products.csv")

...returns "Error: 'Files/products.csv' does not exist in current working directory" so I'm guessing I have a different working directory for my R session than PySpark?

For bonus points, how do I load a delta table into R?

Thanks very much for any insights!

v-gchenna-msft · ‎03-25-2024

Hi @lk-u1248 ,

I apologize for the misunderstanding, here are few examples with spark R:

Lakehouse structure -

Example 1:

%%sparkr

# Load data into a SparkDataFrame from a table

# Method 1:
df <- tableToDF("gopi_lake_house.abc")

display(df)

# Method 2:
results <- sql("SELECT * FROM gopi_lake_house.abc LIMIT 1000")

head(results)

Example 2:

%%sparkr

# Load data into a SparkDataFrame from a file

df <- loadDF(
        path        = "Files/raw/Customer.csv",
        source      = "csv",
        header      = "true",
        inferSchema = "true"
      )

display(df)

Example 3:

%%sparkr

# Save data into a table from a SparkDataFrame

# New Table
tableName <- "gopi_lake_house.abcd"

data   <- list(
            list(1L, "Raymond", "green",  "apple"),
            list(2L, "Loretta", "purple", "grape"),
            list(3L, "Wayne",   "yellow", "banana")
          )

schema <- structType(
            structField("id",    "integer"),
            structField("name",  "string"),
            structField("color", "string"),
            structField("fruit", "string")
          )

df <- createDataFrame(
        data   = data,
        schema = schema
      )

saveAsTable(
  df        = df,
  tableName = tableName
)

# Verify that the table was successfully saved by
# displaying the table's contents.
display(sql(paste0("SELECT * FROM ", tableName)))

Docs to refer -
Tutorial: Work with SparkR SparkDataFrames on Azure Databricks - Azure Databricks | Microsoft Learn

Hope this is helpful. Please let me know incase of further queries.

View solution in original post

v-gchenna-msft · ‎03-24-2024

Hi @lk-u1248 ,

Thanks for using Fabric Community.
Unfortunately I am unable to find any way to save a dataframe as table using spark R even after searching every where in google. It looks like we cannot load with spark R.

I suggest you to use pyspark inorder to load it to tables, you can use combination of pyspark and spark R.

Code Snippet -

df = spark.read.format("csv").option("header","true").load("Files/year/month/date/sales.csv")
# df now is a Spark DataFrame containing CSV data from "Files/year/month/date/sales.csv".
display(df)

df.write.format("delta").save("Tables/actual_weather")

Above code can be executed along with your existing code, but make sure that above code is written in pyspark not in spark R.

Post from Reddit -

He seemed to suggest "you can do all this in R" but then didnt know specifics and said to use python anyway.

Hope this is helpful. Please let me know incase of further queries.

lk-u1248 · ‎03-25-2024

Sorry, but your message amounts to "use PySpark" while the point of my question is how to use R. I'm afraid your response misses the point, but thank you for your time.

v-gchenna-msft · ‎03-25-2024

Hi @lk-u1248 ,

I apologize for the misunderstanding, here are few examples with spark R:

Lakehouse structure -

Example 1:

%%sparkr

# Load data into a SparkDataFrame from a table

# Method 1:
df <- tableToDF("gopi_lake_house.abc")

display(df)

# Method 2:
results <- sql("SELECT * FROM gopi_lake_house.abc LIMIT 1000")

head(results)

Example 2:

%%sparkr

# Load data into a SparkDataFrame from a file

df <- loadDF(
        path        = "Files/raw/Customer.csv",
        source      = "csv",
        header      = "true",
        inferSchema = "true"
      )

display(df)

Example 3:

%%sparkr

# Save data into a table from a SparkDataFrame

# New Table
tableName <- "gopi_lake_house.abcd"

data   <- list(
            list(1L, "Raymond", "green",  "apple"),
            list(2L, "Loretta", "purple", "grape"),
            list(3L, "Wayne",   "yellow", "banana")
          )

schema <- structType(
            structField("id",    "integer"),
            structField("name",  "string"),
            structField("color", "string"),
            structField("fruit", "string")
          )

df <- createDataFrame(
        data   = data,
        schema = schema
      )

saveAsTable(
  df        = df,
  tableName = tableName
)

# Verify that the table was successfully saved by
# displaying the table's contents.
display(sql(paste0("SELECT * FROM ", tableName)))

Docs to refer -
Tutorial: Work with SparkR SparkDataFrames on Azure Databricks - Azure Databricks | Microsoft Learn

Hope this is helpful. Please let me know incase of further queries.

lk-u1248 · ‎03-27-2024

Great, very helpful! Thanks a lot.

v-gchenna-msft · ‎03-27-2024

Hi @lk-u1248 ,

We haven’t heard from you on the last response and was just checking back to see if your query was answered.
Otherwise, will respond back with the more details and we will try to help .

SparkR in Synapse in Fabric

Helpful resources

New forum boards available in Synapse

Fabric certifications survey

Fabric Monthly Update - April 2024

Fabric Community Update - April 2024