Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
charleshale
Responsive Resident
Responsive Resident

Power Query Advanced Text Extraction

Hi.  I frequently work with a ~5m row table of comma-delimited email lists that, by subscriber, show what email list various newsletter subscribers sign up for.   The format of the lists is [US or BNA]_[StateAbbreviation]_[TownName]_[UniqueID].  The data isnt terribly clearn -- it looks like this:

 

LIST

US_MA_Wayland_19, BNA_IL_JOLIET_1251,CAL_IL_JOLIET_1251
BNA_CA_SANMATEO_1160,US_CA_SANMATEO_1160, US_NY_Manhattan_1340
BNA_CA_MISSIONVIEJO_1081,US_CA_MISSIONVIEJO_1081,APP_USERS,CAL_CA_MISSIONVIEJO_1081
BNA_NY_LONGISLAND_10985,US_NY_LONGISLAND_10985,APP_USERS,CAL_NY_LONGISLAND_10985

 

My goal is to exact the unique UniqueIDs in order, separated by commas so it looks like this:

NEW COLUMN

19, 1251
1160, 1340
1081

10985

 

Usually, I accomplish this exercise through tedious flattening of the table, extracting, and re-grouping.   However, the past few days I have been trying to use PowerQuery to simply extract the text, which I am able to do using Text.Select and Null.  However, I am not able to get the delimiter and deduping in the result.    Any advice?

 

 

1 ACCEPTED SOLUTION
v-lili6-msft
Community Support
Community Support

hi, @charleshale

After my test, you could try this code in Advanced editor

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("bY67CsJAEEV/JWy9RUaNJOUYF9mwj+AkioQwLFhYSKo0/r3rA1GT9sy9Z27XiZbYIh/D7RqGM0Mhk41D1oYrb7RqGBYZyBLNHxK97MQjWSITOouN8gywTmUUTlgSoTuxDcMljGMYGJar9FthNZH27qBVFStpDm/NlGNdc0tqT89Rc5GPN3403u00GXTbeCryTL6GTPCvdSYh+v4O", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [List = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"List", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Text.Select([List], {"0".."9",","})),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Added Custom", {{"Custom", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Custom"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Custom", Int64.Type}}),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type1", each ([Custom] <> null)),
    #"Removed Duplicates" = Table.Distinct(#"Filtered Rows", {"Custom"}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Removed Duplicates",{{"Custom", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type2", {"List"}, {{"Result", each Text.Combine([Custom], ", "), type text}})
in
    #"Grouped Rows"

Result:

2.JPG

 

and here is pbxi file, please try it.

 

Best Regards,

Lin

Community Support Team _ Lin
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

4 REPLIES 4
v-lili6-msft
Community Support
Community Support

hi, @charleshale

 

Could you please tell me if your problem has been solved? If it is, could you please mark the helpful replies as Answered?

 

Best Regards,

Lin

Community Support Team _ Lin
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

WOW!  Yes!  It worked in a fraction of the time my code did.  Nice work!   What you wrote is really nifty -- I havent seen anything like it in the books on power query I've been reading.    Thank you

Hi, Lin.   Just seeing this (due to the new year's -- apologies) and trying it now.    Note that this is the full clumsy way I'd tried it that's been processing for days -- so looking forward to trying yours. [Note some of the bulk is dealing with dirty data -- for example making sure I only extract numbers that have BNA_ or US_ in their strings].    Attempting now and will report back.   Thank you.   Charlie 

 

 

let
    Source = Csv.Document(File.Contents("C:\Users\Open Use\Dropbox (Hale Global)\HMastr\DllrDeals\AOL Patch - Post Closing\AA CH Data Projects\Re-Model\charlie_allusersopen.csv"),[Delimiter=",", Columns=22, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
   #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Profile Id", type text}, {"Email", type text}, {"Engagement", type text}, {"Lists", type text}, {"Profile Created Date", type datetime}, {"Signup", type datetime}, {"Opens", Int64.Type}, {"Clicks", Int64.Type}, {"Pageviews", Int64.Type}, {"Last Open", type datetime}, {"Last Click", type datetime}, {"Last Pageview", type datetime}, {"Optout Time", type datetime}, {"List Signup", type text}, {"Geolocation City", type text}, {"Geolocation State", type text}, {"Geolocation Country", type text}, {"Geolocation Zip", type text}, {"Lifetime Message", Int64.Type}, {"Top Device", type text}, {"Email Status", type text}, {"Email Message", type text}}),
    #"Flatten[Lists]" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type", {{"Lists", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Lists"),
    #"MakeListsText" = Table.TransformColumnTypes(#"Flatten[Lists]",{{"Lists", type text}}),
    #"ExtractListPIDs" = Table.AddColumn(#"MakeListsText", "PIDNumber", each Text.Combine(List.RemoveNulls(List.Transform(Text.ToList([Lists]),each if Value.Is(Value.FromText(_), type number) then _ else null)))),
    #"FilterNullPIDs" = Table.SelectRows(#"ExtractListPIDs", each ([PIDNumber] <> "")),
    #"RemoveDupPIDsPerProfile" = Table.Distinct(#"FilterNullPIDs", {"PIDNumber", "Profile Id"}),
    #"Added Conditional Column" = Table.AddColumn(#"RemoveDupPIDsPerProfile", "Custom", each if Text.StartsWith([Lists], "BNA_") then "OK" else if Text.StartsWith([Lists], "US_") then "OK" else "Null"),
    #"Filtered Rows1" = Table.SelectRows(#"Added Conditional Column", each ([Custom] = "OK")),
    #"Removed Columns2" = Table.RemoveColumns(#"Filtered Rows1",{"Lists", "Custom"}),
    #"GroupPIDsByEmail(NotProfile)" = Table.Group(#"Removed Columns2", {"Email"}, {{"PIDs", each _, type table}}),
    #"Ungroup1" = Table.AddColumn(#"GroupPIDsByEmail(NotProfile)", "PIDNumber", each Table.Column ([PIDs], "PIDNumber")),
    #"Extract1" = Table.TransformColumns(#"Ungroup1", {"PIDNumber", each Text.Combine(List.Transform(_, Text.From), ","), type text}),
    #"NewColFindFirstPID" = Table.AddColumn(#"Extract1", "FirstPID", each Text.BeforeDelimiter([#"PIDNumber"], ","), type text),
    #"NewColFind#PIDs" = Table.AddColumn(#"NewColFindFirstPID", "CountItems", each List.Count(Text.Split([PIDNumber],","))),
    #"NewColIfEditor" = Table.AddColumn(#"NewColFind#PIDs", "EditorEmail", each if Text.Contains([Email], "patch.com") then 1 else null),
    #"Ungroup2" = Table.AddColumn(#"NewColIfEditor", "Profile Id", each Table.Column([PIDs], "Profile Id")),
    #"Extract2" = Table.ExpandListColumn(#"Ungroup2", "Profile Id"),
    #"RemoveDupes" = Table.Distinct(#"Extract2"),
    #"Changed Type2" = Table.TransformColumnTypes(#"RemoveDupes",{{"FirstPID", Int64.Type}}),
    #"Ungroup3" = Table.AddColumn(#"Changed Type2", "Engagement", each Table.Column([PIDs], "Engagement")),
    #"Extract3" = Table.ExpandListColumn(#"Ungroup3", "Engagement"),
    #"Ungroup4" = Table.AddColumn(#"Extract3", "Profile Created Date", each Table.Column([PIDs], "Profile Created Date")),
    #"Extract4" = Table.ExpandListColumn(#"Ungroup4", "Profile Created Date"),
    #"Ungroup5" = Table.AddColumn(#"Extract4", "Optout Time", each Table.Column([PIDs], "Optout Time")),
    #"Extract5" = Table.ExpandListColumn(#"Ungroup5", "Optout Time"),
    #"Ungroup6" = Table.AddColumn(#"Extract5", "Opens", each Table.Column([PIDs], "Opens")),
    #"Extract6" = Table.ExpandListColumn(#"Ungroup6", "Opens"),
    #"Ungroup7" = Table.AddColumn(#"Extract6", "Last Open", each Table.Column([PIDs], "Last Open")),
    #"Extract7" = Table.ExpandListColumn(#"Ungroup7", "Last Open"),
    #"Removed Columns" = Table.RemoveColumns(Extract7,{"PIDs"})

in

    #"Removed Columns"

 

v-lili6-msft
Community Support
Community Support

hi, @charleshale

After my test, you could try this code in Advanced editor

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("bY67CsJAEEV/JWy9RUaNJOUYF9mwj+AkioQwLFhYSKo0/r3rA1GT9sy9Z27XiZbYIh/D7RqGM0Mhk41D1oYrb7RqGBYZyBLNHxK97MQjWSITOouN8gywTmUUTlgSoTuxDcMljGMYGJar9FthNZH27qBVFStpDm/NlGNdc0tqT89Rc5GPN3403u00GXTbeCryTL6GTPCvdSYh+v4O", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [List = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"List", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Text.Select([List], {"0".."9",","})),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Added Custom", {{"Custom", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Custom"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Custom", Int64.Type}}),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type1", each ([Custom] <> null)),
    #"Removed Duplicates" = Table.Distinct(#"Filtered Rows", {"Custom"}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Removed Duplicates",{{"Custom", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type2", {"List"}, {{"Result", each Text.Combine([Custom], ", "), type text}})
in
    #"Grouped Rows"

Result:

2.JPG

 

and here is pbxi file, please try it.

 

Best Regards,

Lin

Community Support Team _ Lin
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.