Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Lori_C
Frequent Visitor

Split text string column values into multiple rows at a position relative to a delimiter

Hello, I need to split the values in the Notes column (see BEFORE) into a separate row for each note (see AFTER).  A new note begins when a timestamp occurs in the text string.  The timestamp follows this pattern and is always MDT: MM-DD-YYYY HH:MM AM (or PM) MDT.  The number of individual notes in each column value varies between 1 and 70.  The idea I have come up with, although unsuccessful in implementing, is to identify the position of each MDT occurrence in the the row and split 23 positions *before* that occurrence, which puts the date at the beginning of the new row and the note after it.  I could then extract the date into a separate column and change the data type so it's a true datetime stamp.  This is only my thought process though and I am open to any and all creative ideas that get me to the result I need in the AFTER example!  Note, using the BR tags as a delimiter is not an option with this source data.  Thank you in advance for your help and ideas!

 

BEFORE

  
Order_IDItem_IDNotes
64272217995869307-03-2018 7:52 AM MDT STATUS SAVED AS COMPLETE (NOT PERFORMED) BY KATHLEEN DOE FROM COMPANY1, LLC.<BR/>07-02-2018 12:53 PM MDT STATUS SAVED AS READY (NOT PERFORMED) BY ANN ROSE FROM COMPANY1, LLC.<BR/>07-02-2018 12:53 PM MDT VENDOR CHANGED TO: NOT PERFORMED<BR/>07-02-2018 9:50 AM MDT VIOLET GRAY ASSIGNED THIS ITEM TO ANN ROSE.<BR/>07-02-2018 9:38 AM MDT ROBERT SMITH ASSIGNED THIS ITEM TO VIOLET GRAY.<BR/>07-02-2018 5:09 AM MDT ITEM CREATED BY EDWARD JONES
64577118002872607-05-2018 10:56 AM MDT STATUS SAVED AS COMPLETE (YES) BY JUSTIN TIME FROM COMPANY1, LLC.<BR/>07-03-2018 1:06 PM MDT STATUS SAVED AS READY (YES) BY DOROTHY TOTO FROM COMPANY1, LLC.<BR/>07-03-2018 1:06 PM MDT DOROTHY TOTO: <BR>EMPLOYMENT VERIFICATION PROVIDED PAM N., HUMAN RESOURCES REP&NBSP;<BR>07-03-2018 7:30 AM MDT OSCAR VILAS ASSIGNED THIS ITEM TO DOROTHY TOTO.<BR/>07-03-2018 7:20 AM MDT VIOLET GRAY ASSIGNED THIS ITEM TO OSCAR VILAS.<BR/>07-02-2018 8:45 PM MDT JOHN DOE ASSIGNED THIS ITEM TO VIOLET GRAY.<BR/>07-02-2018 3:32 PM MDT DINO SAUR ASSIGNED THIS ITEM TO JOHN DOE.<BR/>07-02-2018 1:49 PM MDT JOHN DOE ASSIGNED THIS ITEM TO DINO SAUR.<BR/>07-02-2018 1:31 PM MDT JULIA CHILDS ASSIGNED THIS ITEM TO JOHN DOE.<BR/>07-02-2018 1:31 PM MDT STATUS SAVED AS PENDING (PENDING) BY JULIA CHILDS FROM COMPANY1, LLC.<BR/>07-02-2018 1:31 PM MDT VENDOR CHANGED TO: ATTEMPT #1 - PHONE ATTEMPT<BR/>07-02-2018 1:30 PM MDT JULIA CHILDS: <BR>CALL FOR PROCESS

 

 

AFTER   
Order_IDItem_IDDatetime stampNote
64272217995869307-03-2018 7:52 AM STATUS SAVED AS COMPLETE (NOT PERFORMED) BY KATHLEEN DOE FROM COMPANY1, LLC.<BR/>
64272217995869307-02-2018 12:53 PMSTATUS SAVED AS READY (NOT PERFORMED) BY ANN ROSE FROM COMPANY1, LLC.<BR/>
64272217995869307-02-2018 12:53 PMVENDOR CHANGED TO: NOT PERFORMED<BR/>
64272217995869307-02-2018 9:50 AM VIOLET GRAY ASSIGNED THIS ITEM TO ANN ROSE.<BR/>
64272217995869307-02-2018 9:38 AM ROBERT SMITH ASSIGNED THIS ITEM TO VIOLET GRAY.<BR/>
64272217995869307-02-2018 5:09 AM ITEM CREATED BY EDWARD JONES
64577118002872607-05-2018 10:56 AMSTATUS SAVED AS COMPLETE (YES) BY JUSTIN TIME FROM COMPANY1, LLC.<BR/>
64577118002872607-03-2018 1:06 PM STATUS SAVED AS READY (YES) BY DOROTHY TOTO FROM COMPANY1, LLC.<BR/>
64577118002872607-03-2018 1:06 PM DOROTHY TOTO: <BR>EMPLOYMENT VERIFICATION PROVIDED PAM N., HUMAN RESOURCES REP&NBSP;<BR>
64577118002872607-03-2018 7:30 AM OSCAR VILAS ASSIGNED THIS ITEM TO DOROTHY TOTO.<BR/>
64577118002872607-03-2018 7:20 AM VIOLET GRAY ASSIGNED THIS ITEM TO OSCAR VILAS.<BR/>
64577118002872607-02-2018 8:45 PM JOHN DOE ASSIGNED THIS ITEM TO VIOLET GRAY.<BR/>
64577118002872607-02-2018 3:32 PM DINO SAUR ASSIGNED THIS ITEM TO JOHN DOE.<BR/>
64577118002872607-02-2018 1:49 PM JOHN DOE ASSIGNED THIS ITEM TO DINO SAUR.<BR/>
64577118002872607-02-2018 1:31 PM JULIA CHILDS ASSIGNED THIS ITEM TO JOHN DOE.<BR/>
64577118002872607-02-2018 1:31 PM STATUS SAVED AS PENDING (PENDING) BY JULIA CHILDS FROM COMPANY1, LLC.<BR/>
64577118002872607-02-2018 1:31 PM VENDOR CHANGED TO: ATTEMPT #1 - PHONE ATTEMPT<BR/>
64577118002872607-02-2018 1:30 PM JULIA CHILDS: <BR>CALL FOR PROCESS
4 REPLIES 4
sanddie
New Member

I have a similar question, but I have two columns I need to split into their own rows whenever the date/time stamp appears since each starts its own note.  I'm assuming I'd have to do the first column and then the second?  Also, I'm not clear on the instructions provided below, as it doesn't seem to be working to separate my data.  Any help would be appreciated.    

v-danhe-msft
Employee
Employee

Hi @Lori_C,

Based on my test, you can refer to below steps:

1.Use the “Split Column by Delimiter” function to different notes.

A.PNG

2.Unpivot the columns, except the [Order_ID] and the [Item_ID].

B.PNG

3.Remove the [Attribute] column and use the “Split Column by Delimiter” function again to achieve the correct result.

C.PNG

You can also download the PBIX file to have a view.

https://www.dropbox.com/s/wrut5tvgy8xfqc3/Split%20text%20string%20column%20values%20into%20multiple%...

 

Regards,

Daniel He

Community Support Team _ Daniel He
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi Greg and Daniel, thank you both for the suggestions!  Unfortunately they don't work** because of inconsistent break tags in the source data.  The actual data is much uglier than the sanitzed example I posted (shocking, right?).  Each string is littered with breaks (<BR>, <BR/>, <BR>) based on operator entry styles which vary, and multiple operators appear in the same string.  I've tried splitting using the various break tags separately and in different ordered combinations but the results aren't clean enough; also replacing the various tags with a single delimiter but I get too many splits where I don't want them.

 

**Greg I should add, I haven't gotten your code to work - I'm brand new to M and haven't figured out how to replace the posted table with my actual column name - but going through it conceptually and seeing Splitter.SplitTextByDelimiter("<BR/>" makes me think yours relies on the break tag as well.  Please correct me if I'm wrong about that!

 

I keep coming back to my "true" delimiter being the datetime stamp because it's application-generated and always appears at the beginning of each manual entry.  My SAS brain tell me to get the position of every instance of MDT and then count back 19 or 20 characters (which points to the beginning of the date) and say "split here"!  Or to split at the MM of anything formatted like MM-DD-YYYY HH:MM AM (or PM) MDT.  I just can't figure out how to tell this to Get-and-Transform.  Any thoughts?  Thanks much!       

Greg_Deckler
Super User
Super User

See if this works:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("pVTbcpswFPyVM+5LOmOnQpib8iSDbOQgiZGEM0yap04/of/fgw0UZzBxmjcNg3ZXu3vO6+sq3tKE0tV6FSRZFqVxFuKZJBsSbigJUkhYRIErUIUH57lvHDh+EgVwB7lRdSW8gAdtPNTC7o1VovgOuxaeuS8rITQURsDeGnX+m+s2WENV5Y8//xAS/trZH+fD746RXhgDyqIQ6nlKK3jRzvFxrcEa9xWuk9CFsZCXXB+QzRsGVzy3YTIWkcGkkzToCRwsR1HOyYPusErpQHqhEHWUuqArY2E6AFqzExadUNKXNxAnnAugESPZAHq+maObHrHQPlG8cFvA0WjhVm/rrhdRkgRdL1JCaJrQuO9F1DtHWBR/XIxWuHM8x8Z5qcFLdW9Cff8CRuIPyjBwYHjGly0agpb8N8kUhcFw53JF4LNMq4TuymLlXubcS6OhtuYkCxRVoyH6cQ1loziGLJxpbC46pXWHQGO9c/XTNejVsIVjj4zLucVkK3zofOpTpQuPSxj9RDkntAtVStk2Ggw7mvIy5V/pZshCOiYgtcGQG3sDcSBcmmu2ze7UN7ItwoXBCNdUkuOSkFVxK5i7BP5DfF/rGveQ1Ad46A/9AE1oP7fjJlQzK457VF17+BbABuoSF8DwaRGSzPnxfl5yXlWAy7MbEBwD3CxvfwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Order_ID = _t, Item_ID = _t, Notes = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Order_ID", Int64.Type}, {"Item_ID", Int64.Type}, {"Notes", type text}}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Notes", Splitter.SplitTextByDelimiter("<BR/>", QuoteStyle.Csv), {"Notes.1", "Notes.2", "Notes.3", "Notes.4", "Notes.5", "Notes.6", "Notes.7", "Notes.8", "Notes.9", "Notes.10", "Notes.11"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Notes.1", type text}, {"Notes.2", type text}, {"Notes.3", type text}, {"Notes.4", type text}, {"Notes.5", type text}, {"Notes.6", type text}, {"Notes.7", type text}, {"Notes.8", type text}, {"Notes.9", type text}, {"Notes.10", type text}, {"Notes.11", type text}}),
    #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Changed Type1", {"Order_ID", "Item_ID"}, "Attribute", "Value"),
    #"Removed Columns" = Table.RemoveColumns(#"Unpivoted Columns",{"Attribute"}),
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Removed Columns", "Value", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Value.1", "Value.2"}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Value.1", type date}, {"Value.2", type text}}),
    #"Split Column by Delimiter2" = Table.SplitColumn(#"Changed Type2", "Value.2", Splitter.SplitTextByEachDelimiter({"MDT"}, QuoteStyle.Csv, false), {"Value.2.1", "Value.2.2"}),
    #"Changed Type3" = Table.TransformColumnTypes(#"Split Column by Delimiter2",{{"Value.2.1", type time}, {"Value.2.2", type text}})
in
    #"Changed Type3"

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Mastering Power BI 2nd Edition

DAX is easy, CALCULATE makes DAX hard...

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.