Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Ayesha_shah
Regular Visitor

Nlp data cleaning problem.

Hello to All,

kindly help me in following query .

I have a dataset i want to search over each  row and check meta data in each cell   and copy it to the root cell 3000 rows and 490 colums.

Captureo0.PNG

 the meta data mostly starts like (1) ,(2)... or a,b...,  (i),(ii),(iii)... or "gjk"in duble quets in above pictur second row has root cell 5.the  has meta data till the  6.in our

Thanx in advance

1 ACCEPTED SOLUTION

Hi @Ayesha_shah ,

According to your description, here's my solution.

1.This is the sample data.

vkalyjmsft_0-1640054110374.png

2.Select column B,C,D,E at the same time, then replace  value " with )

vkalyjmsft_1-1640054264049.png

Get this result.

vkalyjmsft_2-1640054292675.png

3.Select extract text after delimiter in the Transform tab.

vkalyjmsft_3-1640054370406.png

vkalyjmsft_6-1640054643554.png

Get this result.

vkalyjmsft_12-1640054916243.png

4.Seperately select column B,C,D,E, then select split column in the Transform tab

vkalyjmsft_18-1640055346472.png

 

vkalyjmsft_10-1640054804317.png

Get this result.

vkalyjmsft_13-1640055004675.png

5.Remove the null columns, get this result.

vkalyjmsft_14-1640055046446.png

6.Select column B,C,D,E at the same time, and then select merge columns in the Transform tab.

vkalyjmsft_15-1640055163482.png

vkalyjmsft_16-1640055261139.png

Get the expected result.

vkalyjmsft_17-1640055281191.png

This is the query code.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("Lc0xDsMgDAXQqyBPsZSl7SmydyMMyJjUKmApoJ6/QLK9768vWwub8RlWWB54qAYT9TQjPpE0SzmGX6gxCnE3uNXC+zO4CAbf/JSg1AuClXxrfHKYhx9KMT4lQ5xSnfOtFzsU5rDDZNXMN0t/c5OS0Lf24Nwf", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [A = _t, B = _t, C = _t, D = _t, E = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"A", type text}, {"B", type text}, {"C", type text}, {"D", type text}, {"E", type text}}),
    #"Replaced Value" = Table.ReplaceValue(#"Changed Type","""",")",Replacer.ReplaceText,{"B", "C", "D", "E"}),
    #"Extracted Text After Delimiter" = Table.TransformColumns(#"Replaced Value", {{"B", each Text.AfterDelimiter(_, ")"), type text}, {"C", each Text.AfterDelimiter(_, ")"), type text}, {"E", each Text.AfterDelimiter(_, ")"), type text}, {"D", each Text.AfterDelimiter(_, ")"), type text}}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Extracted Text After Delimiter", "B", Splitter.SplitTextByDelimiter(")", QuoteStyle.None), {"B.1", "B.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"B.1", type text}, {"B.2", type text}}),
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Changed Type1", "C", Splitter.SplitTextByDelimiter(")", QuoteStyle.Csv), {"C.1", "C.2"}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"C.1", type text}, {"C.2", type text}}),
    #"Split Column by Delimiter2" = Table.SplitColumn(#"Changed Type2", "D", Splitter.SplitTextByDelimiter(")", QuoteStyle.Csv), {"D.1", "D.2"}),
    #"Changed Type3" = Table.TransformColumnTypes(#"Split Column by Delimiter2",{{"D.1", type text}, {"D.2", type text}}),
    #"Split Column by Delimiter3" = Table.SplitColumn(#"Changed Type3", "E", Splitter.SplitTextByEachDelimiter({")"}, QuoteStyle.None, true), {"E.1", "E.2"}),
    #"Changed Type4" = Table.TransformColumnTypes(#"Split Column by Delimiter3",{{"E.1", type text}, {"E.2", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type4",{"B.2", "C.2", "D.2", "E.2"}),
    #"Merged Columns" = Table.CombineColumns(#"Removed Columns",{"A", "B.1", "C.1", "D.1", "E.1"},Combiner.CombineTextByDelimiter(" ", QuoteStyle.None),"Merged")
in
    #"Merged Columns"

I attach my sample below for reference.

Best Regards,
Community Support Team _ kalyj

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

 

 

 

 

 

 

View solution in original post

3 REPLIES 3
v-yanjiang-msft
Community Support
Community Support

Hi @Ayesha_shah ,

I'm sorry for that I'm not clear about your purpose"search over each row and check meta data in each cell".

Could you please give an example and explain it more?

 

Best Regards,
Community Support Team _ kalyj

 

Yes search over each row and meta data starts with small brackets(1).. or (i).. or in double quets "asdn" i want to merge over single row with  root data  

in one row .This is solution i need

Capturejhc.PNG

 while below is problem

Capturehj,.PNG

Hi @Ayesha_shah ,

According to your description, here's my solution.

1.This is the sample data.

vkalyjmsft_0-1640054110374.png

2.Select column B,C,D,E at the same time, then replace  value " with )

vkalyjmsft_1-1640054264049.png

Get this result.

vkalyjmsft_2-1640054292675.png

3.Select extract text after delimiter in the Transform tab.

vkalyjmsft_3-1640054370406.png

vkalyjmsft_6-1640054643554.png

Get this result.

vkalyjmsft_12-1640054916243.png

4.Seperately select column B,C,D,E, then select split column in the Transform tab

vkalyjmsft_18-1640055346472.png

 

vkalyjmsft_10-1640054804317.png

Get this result.

vkalyjmsft_13-1640055004675.png

5.Remove the null columns, get this result.

vkalyjmsft_14-1640055046446.png

6.Select column B,C,D,E at the same time, and then select merge columns in the Transform tab.

vkalyjmsft_15-1640055163482.png

vkalyjmsft_16-1640055261139.png

Get the expected result.

vkalyjmsft_17-1640055281191.png

This is the query code.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("Lc0xDsMgDAXQqyBPsZSl7SmydyMMyJjUKmApoJ6/QLK9768vWwub8RlWWB54qAYT9TQjPpE0SzmGX6gxCnE3uNXC+zO4CAbf/JSg1AuClXxrfHKYhx9KMT4lQ5xSnfOtFzsU5rDDZNXMN0t/c5OS0Lf24Nwf", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [A = _t, B = _t, C = _t, D = _t, E = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"A", type text}, {"B", type text}, {"C", type text}, {"D", type text}, {"E", type text}}),
    #"Replaced Value" = Table.ReplaceValue(#"Changed Type","""",")",Replacer.ReplaceText,{"B", "C", "D", "E"}),
    #"Extracted Text After Delimiter" = Table.TransformColumns(#"Replaced Value", {{"B", each Text.AfterDelimiter(_, ")"), type text}, {"C", each Text.AfterDelimiter(_, ")"), type text}, {"E", each Text.AfterDelimiter(_, ")"), type text}, {"D", each Text.AfterDelimiter(_, ")"), type text}}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Extracted Text After Delimiter", "B", Splitter.SplitTextByDelimiter(")", QuoteStyle.None), {"B.1", "B.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"B.1", type text}, {"B.2", type text}}),
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Changed Type1", "C", Splitter.SplitTextByDelimiter(")", QuoteStyle.Csv), {"C.1", "C.2"}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"C.1", type text}, {"C.2", type text}}),
    #"Split Column by Delimiter2" = Table.SplitColumn(#"Changed Type2", "D", Splitter.SplitTextByDelimiter(")", QuoteStyle.Csv), {"D.1", "D.2"}),
    #"Changed Type3" = Table.TransformColumnTypes(#"Split Column by Delimiter2",{{"D.1", type text}, {"D.2", type text}}),
    #"Split Column by Delimiter3" = Table.SplitColumn(#"Changed Type3", "E", Splitter.SplitTextByEachDelimiter({")"}, QuoteStyle.None, true), {"E.1", "E.2"}),
    #"Changed Type4" = Table.TransformColumnTypes(#"Split Column by Delimiter3",{{"E.1", type text}, {"E.2", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type4",{"B.2", "C.2", "D.2", "E.2"}),
    #"Merged Columns" = Table.CombineColumns(#"Removed Columns",{"A", "B.1", "C.1", "D.1", "E.1"},Combiner.CombineTextByDelimiter(" ", QuoteStyle.None),"Merged")
in
    #"Merged Columns"

I attach my sample below for reference.

Best Regards,
Community Support Team _ kalyj

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

 

 

 

 

 

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.