Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hi All,
I'm a beginner with Fabric. I'm trying to follow tutorials like https://microsoftlearning.github.io/mslearn-fabric/Instructions/Labs/03-delta-lake.html
By default my notebook is in PySpark, so I can run
df = spark.read.format("csv").option("header","true").load("Files/products.csv")
display(df)
as expected. I can also easily do all the basic R things right out of the box which is amazing
%%sparkr
print('hello world')
library(tidyverse)
data.frame(a = 1:3, b = letters[c(1,1,2)], c = Sys.Date() - 1:3) %>%
group_by(b) %>%
summarise(n())
But how do I load my file in the workspace into R?
%%sparkr
read_csv("Files/products.csv")
...returns "Error: 'Files/products.csv' does not exist in current working directory" so I'm guessing I have a different working directory for my R session than PySpark?
For bonus points, how do I load a delta table into R?
Thanks very much for any insights!
Solved! Go to Solution.
Hi @lk-u1248 ,
I apologize for the misunderstanding, here are few examples with spark R:
Lakehouse structure -
Example 1:
%%sparkr
# Load data into a SparkDataFrame from a table
# Method 1:
df <- tableToDF("gopi_lake_house.abc")
display(df)
# Method 2:
results <- sql("SELECT * FROM gopi_lake_house.abc LIMIT 1000")
head(results)
Example 2:
%%sparkr
# Load data into a SparkDataFrame from a file
df <- loadDF(
path = "Files/raw/Customer.csv",
source = "csv",
header = "true",
inferSchema = "true"
)
display(df)
Example 3:
%%sparkr
# Save data into a table from a SparkDataFrame
# New Table
tableName <- "gopi_lake_house.abcd"
data <- list(
list(1L, "Raymond", "green", "apple"),
list(2L, "Loretta", "purple", "grape"),
list(3L, "Wayne", "yellow", "banana")
)
schema <- structType(
structField("id", "integer"),
structField("name", "string"),
structField("color", "string"),
structField("fruit", "string")
)
df <- createDataFrame(
data = data,
schema = schema
)
saveAsTable(
df = df,
tableName = tableName
)
# Verify that the table was successfully saved by
# displaying the table's contents.
display(sql(paste0("SELECT * FROM ", tableName)))
Docs to refer -
Tutorial: Work with SparkR SparkDataFrames on Azure Databricks - Azure Databricks | Microsoft Learn
Hope this is helpful. Please let me know incase of further queries.
Hi @lk-u1248 ,
Thanks for using Fabric Community.
Unfortunately I am unable to find any way to save a dataframe as table using spark R even after searching every where in google. It looks like we cannot load with spark R.
I suggest you to use pyspark inorder to load it to tables, you can use combination of pyspark and spark R.
Code Snippet -
df = spark.read.format("csv").option("header","true").load("Files/year/month/date/sales.csv")
# df now is a Spark DataFrame containing CSV data from "Files/year/month/date/sales.csv".
display(df)
df.write.format("delta").save("Tables/actual_weather")
Above code can be executed along with your existing code, but make sure that above code is written in pyspark not in spark R.
Post from Reddit -
He seemed to suggest "you can do all this in R" but then didnt know specifics and said to use python anyway.
Hope this is helpful. Please let me know incase of further queries.
Sorry, but your message amounts to "use PySpark" while the point of my question is how to use R. I'm afraid your response misses the point, but thank you for your time.
Hi @lk-u1248 ,
I apologize for the misunderstanding, here are few examples with spark R:
Lakehouse structure -
Example 1:
%%sparkr
# Load data into a SparkDataFrame from a table
# Method 1:
df <- tableToDF("gopi_lake_house.abc")
display(df)
# Method 2:
results <- sql("SELECT * FROM gopi_lake_house.abc LIMIT 1000")
head(results)
Example 2:
%%sparkr
# Load data into a SparkDataFrame from a file
df <- loadDF(
path = "Files/raw/Customer.csv",
source = "csv",
header = "true",
inferSchema = "true"
)
display(df)
Example 3:
%%sparkr
# Save data into a table from a SparkDataFrame
# New Table
tableName <- "gopi_lake_house.abcd"
data <- list(
list(1L, "Raymond", "green", "apple"),
list(2L, "Loretta", "purple", "grape"),
list(3L, "Wayne", "yellow", "banana")
)
schema <- structType(
structField("id", "integer"),
structField("name", "string"),
structField("color", "string"),
structField("fruit", "string")
)
df <- createDataFrame(
data = data,
schema = schema
)
saveAsTable(
df = df,
tableName = tableName
)
# Verify that the table was successfully saved by
# displaying the table's contents.
display(sql(paste0("SELECT * FROM ", tableName)))
Docs to refer -
Tutorial: Work with SparkR SparkDataFrames on Azure Databricks - Azure Databricks | Microsoft Learn
Hope this is helpful. Please let me know incase of further queries.
Great, very helpful! Thanks a lot.
Hi @lk-u1248 ,
We haven’t heard from you on the last response and was just checking back to see if your query was answered.
Otherwise, will respond back with the more details and we will try to help .
Ask questions in Data Engineering, Data Science, Data Warehouse and General Discussion.
Check out the April 2024 Fabric update to learn about new features.
User | Count |
---|---|
14 | |
9 | |
8 | |
4 | |
3 |