Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Shimi43
Frequent Visitor

Splitting Text By Delimiter with Unpredictable Ending Delimiter

I'm rather new to Power BI. I'm working on a project in which I'm trying to split paragraphs of words into sections. The paragraphs are listed in a table   I have the titles of all the desiered sections and can put them in a list, and have found a mass replacement that allowed me to standarize all of the section names in the paragraphs as shown below.  

 

The problem is the order of section names is inconsistant. Ideally I would just put a list and it would end there

 

=Table.AddColumn(#"20. Mass Replace Section Names", "CAP Information", each Text.BetweenDelimiters([Data], "CAP", {"FRUITING_BODY", "LATEX","GILLS", "STALK", "VEIL", "SPORE_PRINT", "HABITAT", "EDIBILITY", "COMMENTS"}, type text)

 

This would just grab the information after the word CAP and stop grabbing info after running into one of the other section names. But this doesn't work and I haven't been able to find or create a function that can do this. Can someone smarter than me help me out?  Thank you in advance.

 

 

 

If you want examples of entries:

 

CAP 4-12 cm broad, convex with a central depression and strongly inrolled margin when young, shallowly depressed in age; surface viscid when moist, yellowish-buff to pinkish- buff to whitish, the center usually pinkish to pinkish-orange when fresh, or sometimes pale pinkish to pinkish-orange throughout; margin bearded with a dense white mat of soft, woolly hairs (while inrolled) that may mimic a veil; hairs sparse orevenabsentin age. Flesh thick, firm, brittle, white or tinged pinkish; taste very acrid. LATEX often scanty, white and typically unchanging (but staining gills yellowish in var. nordmanensis). GILLS white to yellowish-tan or developing a pinkish tinge, crowded, narrow, adnate to slightly decurrent. STALK 2-7 cm long, 0.5-1.5 cm thick, rigid, equal or with a narrowed base, firm, dry, often hollow in age; colored like cap or paler, sometimes with dingy ochre spots. SPORE_PRINT creamy-white; spores 7-10 x 6-8 microns, elliptical, with amyloid ridges.

 

And other one:

 

FRUITING_BODY 1-4.5 cm broad, obtusely bell-shaped or convex, expanding somewhat in age but usually retaining a broad umbo; surface dry, densely and coarsely scaly or fibrillose-scaly, the scales often erect; usually more fibrillose toward margin; brownish-purple to purplish- gray or lilac-gray, often darker in age. Flesh colored like cap; odor strongly and persistently fruity-fragrant (like grape soda or grape juice). GILLS adnate or notched, fairly well- spaced, colored more or less like cap. STALK 1-3.5 cmlong,(2)3-10(15)mmthick,arising from a swollen, sometimes hollow,juglike underground ""tuber" 1-2.5 cm high and up to 2 cm thick; colored more or less like cap and covered with conspicuous scales like those on the cap, except for the smooth, sometimes silky apex and yellowish to buff-colored "tuber"; hollow or partially hollow in age. VEIL not forming a distinct ring on stalk. SPORE_PRINT pinkish; spores 6.5-9 )( 4 microns, elliptical, smooth, not amyloid.

 

and a third example:

 

CAP 1-3.5 (5) cm broad, convex or centrally depressed with an in curved margin, be- coming plane or centrally depressed; surface viscid when moist but soon dry and often shiny, smooth, dark grayish-brown or olive-brown to blackish-brown, fading to gray or paler as it dries. Flesh thin, white to grayish. GILLS adnate to slightly decurrent, close, white to pale grayish (usually paler than cap). STALK2-6 cm long, 2-5 (6) mmthick,more or less equal, smooth, colored more or less like cap or slightly paler, but not fading as quickly. SPORE_PRINT white; spores 4.5-6.5x 3.5-4.5 microns, broadly elliptical to nearly round, smooth or very minutely ornamented, amyloid.

 

Shimi43_0-1659723649339.png

 

1 ACCEPTED SOLUTION
lbendlin
Super User
Super User

How do you identify section names?  Do you have a finite list of these, or are they distinguished by being uppercase?

 

What do you expect as a result for the second example?

 

Here is a possible implementation. First create a table "Sections" with all possible sections.

 

let
    Source = {"CAP", "FRUITING_BODY", "LATEX","GILLS", "STALK", "VEIL", "SPORE_PRINT", "HABITAT", "EDIBILITY", "COMMENTS"},
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "Section"}})
in
    #"Renamed Columns"

 

then use that table in your query.

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fVZdb9s2FP0rF36yAcmo3STt4KdsS4tgQVuk2bAhKQpaoiTWFKmSlB39+51LSZbddXkxLJu8H+eec64eH2e/XX+ii3S1pqymrbMiTyizZi+f6aBCRYIyaYITmnLZOOm9soaEyckHZ02pO1LGWa1lTrVwpTJ0qKShzramTMhXQmt7wKnhNo7hiCjlhnzrCpFJ2iufqby/VlvlQ0Kd5FvKV+m2LQoKlhpldvxM4w+HSgX8kFCoZCxROmp9i3TdePj0nnXClLJPUqAQXLSOvK1lULX01AgtX7gXKmfbsrJt2IxtbqVwOdoZUMql8TJWJXEikC0QvUAvB2u5pkoo52mOA0g0QrZAYJytRUe1qlWGOHup9GY47RvhENQ6uZdGbD267MFb0jstudBKZbuECuXqBMNTIWiZDEWgPZwuUeHQy4aC8PhjL11HInMqX9Ld9cPN36g1ABafCRO68TqPOHSNyiKirckqAIF4NN+2AcMXyvBTqbT207x4uHvhlmSsy2thAIryiyW9v727+zxEBrrTfIMwXGmODrVtOKKYxsDlg43OHgB0QkY4fE1I5Eb0cbxWZRUiu7LWOeCzpM8P13d/0Dp9w4TWlln4anmZrpaX/MOAmFOlQkT5HYTh/MMQ+wyAbCu8HHHNHUDpIaos131kcGY1ZpOTVjtwUDQciYnkkhNmxdA5WunIZpWTmKoNHnV++nh/8/XT/e2HB7QoRd2lEZ8NHwBD6U26ekXPdJW+BTkAgvEoWGvVBB5KMtRcd9pCPZhmKf1y9iV5nL27//P24fbD+6+/fvz9H1qlF33rg7jtNrReArMtgqUQaIMOUHgveqR4bjB7ngT3cGB+9v0SD36UmJMjA0QfmNp6aydRR9CiJnCYuZRZ5jIeQDN8ImGhwFjg6WUaf+ulzF/RfI+3dDKD4sakNXA5uQYGHKDBQZAbruNgmFRN6xodCRK/Rd8onYhZtdIiS/lpHGou3A7mca6sH0e7IZuzYYymxy010oHdCMEMLFyrQpcWTiC2CTSPN/G9QU82F5y7f/rWqkweNTGQGf8aG7KKeV5A/Ih44PmwB2QyenJfUMSA+4CZHqsbWb9KX8dZR9rP14vXoNB8dbmo6573winPMyucrTE4f2AXMqds7RmefGvLGLs1uXQlvA/tPs2eZqHdSvc0Q6J1T6oKAoxgtA3jvT5qbPNyxQMn4EWjhYJ/HnbT2taPJIiHQ8WjxtKJRi8aZmgmm0AFOxwzprY2VKdNeKV3GFGDHcZpJndChbw/0rG0Y0ObUdpRwS6oyLczuS/pr5vbO54SZ6576ueYvzJZgP7wjCJhjHp3Lu6j/Q66voIb/UKLOV38XNdjP5xpkHeva97U/YTnl4v/rmvWcL+qz9ZtbxOGu4BH7o97GgtDprgbO2k0vPp/IrywqaMleIu+c9drYlgllTLd1AkrLAowbnRWKaeyWu3l8MiDgS53xwOsgmhC+GeUbrRWEp5UQD4Fu5vWoEmm7TJk+kFhP10XAI+d5ORyfBEYItD8+EYRU2NZG6bgYtDbOr2alsw6xVyuFjRq7Yz2cdFMgLwsDfaZsdJhnTDOkXk9KMDgO1xkp7tzqp0vEPh+CrI9EygTl8CRbpE4iD7Rjls3eKNha2e1j7VyMfGFATRpg4zObUTN71s4M9Hzy78=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Entry = _t]),
    #"Added Custom" = Table.AddColumn(Source, "Next Section after CAP", (k)=> Table.Min(Table.AddColumn(Sections,"Found",each Text.Length(Text.BetweenDelimiters(k[Entry],"CAP",[Section]))),"Found")[Section]),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "Result", each Text.BetweenDelimiters([Entry],"CAP",[Next Section after CAP]))
in
    #"Added Custom1"

 

 

How to use this code: Create a new Blank Query. Click on "Advanced Editor". Replace the code in the window with the code provided here. Click "Done".

 

View solution in original post

2 REPLIES 2
Shimi43
Frequent Visitor

Thank you so much for your help! This solves my problem!

 

But to answer your question there is going to be a column for each of the major sections: "CAP", "FRUITING_BODY", "LATEX","GILLS", "STALK", "VEIL", "SPORE_PRINT", "HABITAT", "EDIBILITY", "COMMENTS", etc.

 

Part of the problem in seperating it by going to capalization was it would put the information for the FRUITING_BODY under the CAP column if there wasn't a CAP section, or if LATEX and GILLS where in a different order from one entry to another, it would swap which column it would put it under.

 

But now that we can do it by a specific column (CAP in this example) I'm going to try to find a way to loop it cover all the sections so each section information is in it's own column. 

 

Again thank you so much for your help.

lbendlin
Super User
Super User

How do you identify section names?  Do you have a finite list of these, or are they distinguished by being uppercase?

 

What do you expect as a result for the second example?

 

Here is a possible implementation. First create a table "Sections" with all possible sections.

 

let
    Source = {"CAP", "FRUITING_BODY", "LATEX","GILLS", "STALK", "VEIL", "SPORE_PRINT", "HABITAT", "EDIBILITY", "COMMENTS"},
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "Section"}})
in
    #"Renamed Columns"

 

then use that table in your query.

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fVZdb9s2FP0rF36yAcmo3STt4KdsS4tgQVuk2bAhKQpaoiTWFKmSlB39+51LSZbddXkxLJu8H+eec64eH2e/XX+ii3S1pqymrbMiTyizZi+f6aBCRYIyaYITmnLZOOm9soaEyckHZ02pO1LGWa1lTrVwpTJ0qKShzramTMhXQmt7wKnhNo7hiCjlhnzrCpFJ2iufqby/VlvlQ0Kd5FvKV+m2LQoKlhpldvxM4w+HSgX8kFCoZCxROmp9i3TdePj0nnXClLJPUqAQXLSOvK1lULX01AgtX7gXKmfbsrJt2IxtbqVwOdoZUMql8TJWJXEikC0QvUAvB2u5pkoo52mOA0g0QrZAYJytRUe1qlWGOHup9GY47RvhENQ6uZdGbD267MFb0jstudBKZbuECuXqBMNTIWiZDEWgPZwuUeHQy4aC8PhjL11HInMqX9Ld9cPN36g1ABafCRO68TqPOHSNyiKirckqAIF4NN+2AcMXyvBTqbT207x4uHvhlmSsy2thAIryiyW9v727+zxEBrrTfIMwXGmODrVtOKKYxsDlg43OHgB0QkY4fE1I5Eb0cbxWZRUiu7LWOeCzpM8P13d/0Dp9w4TWlln4anmZrpaX/MOAmFOlQkT5HYTh/MMQ+wyAbCu8HHHNHUDpIaos131kcGY1ZpOTVjtwUDQciYnkkhNmxdA5WunIZpWTmKoNHnV++nh/8/XT/e2HB7QoRd2lEZ8NHwBD6U26ekXPdJW+BTkAgvEoWGvVBB5KMtRcd9pCPZhmKf1y9iV5nL27//P24fbD+6+/fvz9H1qlF33rg7jtNrReArMtgqUQaIMOUHgveqR4bjB7ngT3cGB+9v0SD36UmJMjA0QfmNp6aydRR9CiJnCYuZRZ5jIeQDN8ImGhwFjg6WUaf+ulzF/RfI+3dDKD4sakNXA5uQYGHKDBQZAbruNgmFRN6xodCRK/Rd8onYhZtdIiS/lpHGou3A7mca6sH0e7IZuzYYymxy010oHdCMEMLFyrQpcWTiC2CTSPN/G9QU82F5y7f/rWqkweNTGQGf8aG7KKeV5A/Ih44PmwB2QyenJfUMSA+4CZHqsbWb9KX8dZR9rP14vXoNB8dbmo6573winPMyucrTE4f2AXMqds7RmefGvLGLs1uXQlvA/tPs2eZqHdSvc0Q6J1T6oKAoxgtA3jvT5qbPNyxQMn4EWjhYJ/HnbT2taPJIiHQ8WjxtKJRi8aZmgmm0AFOxwzprY2VKdNeKV3GFGDHcZpJndChbw/0rG0Y0ObUdpRwS6oyLczuS/pr5vbO54SZ6576ueYvzJZgP7wjCJhjHp3Lu6j/Q66voIb/UKLOV38XNdjP5xpkHeva97U/YTnl4v/rmvWcL+qz9ZtbxOGu4BH7o97GgtDprgbO2k0vPp/IrywqaMleIu+c9drYlgllTLd1AkrLAowbnRWKaeyWu3l8MiDgS53xwOsgmhC+GeUbrRWEp5UQD4Fu5vWoEmm7TJk+kFhP10XAI+d5ORyfBEYItD8+EYRU2NZG6bgYtDbOr2alsw6xVyuFjRq7Yz2cdFMgLwsDfaZsdJhnTDOkXk9KMDgO1xkp7tzqp0vEPh+CrI9EygTl8CRbpE4iD7Rjls3eKNha2e1j7VyMfGFATRpg4zObUTN71s4M9Hzy78=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Entry = _t]),
    #"Added Custom" = Table.AddColumn(Source, "Next Section after CAP", (k)=> Table.Min(Table.AddColumn(Sections,"Found",each Text.Length(Text.BetweenDelimiters(k[Entry],"CAP",[Section]))),"Found")[Section]),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "Result", each Text.BetweenDelimiters([Entry],"CAP",[Next Section after CAP]))
in
    #"Added Custom1"

 

 

How to use this code: Create a new Blank Query. Click on "Advanced Editor". Replace the code in the window with the code provided here. Click "Done".

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.