Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
arpost
Advocate V
Advocate V

How do you delete specific rows from a Lakehouse table?

Greetings, all. I have some files loaded into a Lakehouse that I've also loaded into a Lakehouse table. I want to delete some of the rows in the Lakehouse table, but I can't seem to find a way to do that. How does one go about doing this?

 

 

Any help appreciated.

1 ACCEPTED SOLUTION
v-nikhilan-msft
Community Support
Community Support

Hi @arpost ,
Thanks for using Fabric Community.

It is possible to delete rows in a Lakehouse table using Spark-SQL in notebooks. I have created a repro of the scenario. I have attached the screenshots for your reference.
Trying to delete the row which has value of 46134

vnikhilanmsft_0-1696958603394.png

Query succeeded
vnikhilanmsft_1-1696958624474.png

After deletion of the row:
vnikhilanmsft_2-1696958682038.png


You can refer this link for more information: Link1 
Hope this helps. Do let us know if you have any further issues. Glad to help.

View solution in original post

9 REPLIES 9
YOUNG99
Frequent Visitor

@CalebCarter927 

Give the below a try.  I won't take credit for it.  It was given to me by a MSFT trainer when I proposed the question.

 

----------------------------------------------------------

%%pyspark

df = spark.sql("SELECT MAX(DataDate) MDate FROM Lakehouse.Table")

MDate = df.agg({"MDate":"max"}).collect()[0]['max(MDate)']
#print(MDate)

sqlStr = "DELETE From Lakehouse.Table WHERE DataDate < '"+str(MDate)+"'"
#print(sqlStr)

#dfDistinct = spark.sql("SELECT DISTINCT DataDate FROM Lakehouse.Table")
#display(dfDistinct)

dml = spark.sql(sqlStr)

Right, "<", ">" and "=" type of comparisons work.

Problem is only with the word "IN".

TimoRiikonen
Advocate III
Advocate III

It doesn't work in a programmatic manner.

If you have pre-defined solution, then you can make the SQL command.
But if you for example create a table of keys and try to delete those rows, that is not possible atm.

 

Couple of options:
- Delete everything and recreate all of it without the lines you wanted to delete.

- Use Power Query filtering in dataflow and make the dataflow to overwrite the data.

TimoRiikonen
Advocate III
Advocate III

I have the same problem: subqueries are not supported in delete.

I tried to create a temporary table and make a join, but that failed as well.

Since I have only two values, I can hard code my case for now.

Original deletion query:

DELETE FROM DE_LH_200_SILVER_Default.ServiceRequest WHERE updated NOT IN (select distinct DATE(updated) from DE_LH_100_BRONZE_Default.ServiceRequest)

Failed attempt with temp table and join:

CREATE TABLE deletecontent 
(
    todelete DATE
);
INSERT INTO deletecontent
    select distinct DATE(updated) from DE_LH_200_SILVER_Default.ServiceRequest WHERE updated NOT IN (select distinct DATE(updated) from DE_LH_100_BRONZE_Default.ServiceRequest);
SELECT * FROM deletecontent;
SELECT DISTINCT DATE(updated) FROM DE_LH_200_SILVER_Default.ServiceRequest INNER JOIN deletecontent ON deletecontent.todelete = DE_LH_200_SILVER_Default.ServiceRequest.updated;
DELETE FROM DE_LH_200_SILVER_Default.ServiceRequest SR INNER JOIN deletecontent ON deletecontent.todelete = DE_LH_200_SILVER_Default.ServiceRequest.updated;

 

 

 

arpost
Advocate V
Advocate V

Appreciate it, @v-nikhilan-msft! I was hoping there was a way to do it through a UI, but that works.

Hi @arpost ,
Glad that your query got resolved. Please continue using Fabric Community for help on your queries.

v-nikhilan-msft
Community Support
Community Support

Hi @arpost ,
Thanks for using Fabric Community.

It is possible to delete rows in a Lakehouse table using Spark-SQL in notebooks. I have created a repro of the scenario. I have attached the screenshots for your reference.
Trying to delete the row which has value of 46134

vnikhilanmsft_0-1696958603394.png

Query succeeded
vnikhilanmsft_1-1696958624474.png

After deletion of the row:
vnikhilanmsft_2-1696958682038.png


You can refer this link for more information: Link1 
Hope this helps. Do let us know if you have any further issues. Glad to help.

I'm new to SparkSQL.  Would anyone happen to know how to acheive the following SQL statement?  I get the error:  Error: Subqueries are not supported in the DELETE

 

%%SQL

DELETE FROM <Table>
WHERE [ImportDate] < (SELECT MAX([ImportDate]) FROM <Table>)

 

Thanks in advance

I'm trying to do something similar too. Did you find a solution? Thanks

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayFabricCarousel

Fabric Monthly Update - May 2024

Check out the May 2024 Fabric update to learn about new features.

Top Solution Authors