Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
diedrich08
Frequent Visitor

Extract number from text string based on conditions

Hi All,

I have done extensive searching and I don't believe this is a repeat, but is definitely and extension of previous questions. I am attempting to extract numbers from a text string within a Power BI function. I have successfully extracted the numbers from the string into a value using the below:

 

Text.Combine(
  List.RemoveNulls(
    List.Transform(
      Text.ToList([string_col]),
    each if Value.Is(Value.FromText(_), type number) 
    then _ else null)
    )
  )

 

Using this code works great when the number I am interested in is the only number in the string, for example:

 

  "Bring on the 1234567 comments" results in 1234567

 

However, I can't resolve extracting my number when multiple different numbers occur in the string, for example:

 

  "Bring on on the 1234567 comments with 50 telling me this is a repeat" results in 123456750

 

What I need to do is one pull the number within the string that meets conditions (one in my case). For my particular issue, the number I need to extract will always be the only 7 digit number in the string, so I feel like this should be a more straight forward answer?

 

Is there a way to extract only the 7 digit number using my provided function or something similar? If I am way off base, can someone please set me on the proper path?

 

As always, the communities help is greatly appreciated.

1 ACCEPTED SOLUTION
OwenAuger
Super User
Super User

@diedrich08

 

Here's one method:

  1. Create a function using the following code. This function takes a text input and returns the first 7-digit number found (if any).
  2. Invoke this function to create a custom column
(InputText as text) =>
let
  RequiredLength = 7,
  Digits = {"0".."9"},
  CharacterList = Text.ToList(InputText),
  FirstNumber =
    List.Accumulate(
    CharacterList,
    "",
    (String,CurrentChar)=>
      if Text.Length(String) = RequiredLength then String
      else if List.Contains(Digits,CurrentChar) then String & CurrentChar
      else ""
    ) ,
  ReturnValue =
    if Text.Length(FirstNumber) = RequiredLength then FirstNumber else null
in
  ReturnValue

The function works by taking the characters of InputText from left to right, and building up a string of numbers (stopping when 7 numeric characters are accumulated), otherwise resetting to an empty string when it encounters a non-numeric character. 

 

Here's another idea using table grouping to group consecutive digits together (also a function):

(InputText as text) => 
CharacterList = Text.ToList(InputText), CharacterTable = Table.FromList(CharacterList, Splitter.SplitByNothing(), type table[Character = text], null, ExtraValues.Error), AddedIndex = Table.AddIndexColumn(CharacterTable, "Index", 1, 1), AddedDigitFlag = Table.AddColumn(AddedIndex, "Digit", each List.Contains({"0".."9"},[Character]), type logical), DigitGroups = Table.Group(AddedDigitFlag, {"Digit"}, {{"Number", each Text.Combine(Table.Sort(_,{"Index"})[Character]), type text}}, GroupKind.Local), FilterNumbersLength7 = Table.SelectRows(DigitGroups, each [Digit] = true and Text.Length([Number])=7), FirstNumber = try FilterNumbersLength7{0}[Number] otherwise null in FirstNumber

 

Another option might be using some R code to find text matching an appropriate regular expression.

 

Regards,

Owen 🙂


Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
Twitter
LinkedIn

View solution in original post

8 REPLIES 8
OwenAuger
Super User
Super User

@diedrich08

 

Here's one method:

  1. Create a function using the following code. This function takes a text input and returns the first 7-digit number found (if any).
  2. Invoke this function to create a custom column
(InputText as text) =>
let
  RequiredLength = 7,
  Digits = {"0".."9"},
  CharacterList = Text.ToList(InputText),
  FirstNumber =
    List.Accumulate(
    CharacterList,
    "",
    (String,CurrentChar)=>
      if Text.Length(String) = RequiredLength then String
      else if List.Contains(Digits,CurrentChar) then String & CurrentChar
      else ""
    ) ,
  ReturnValue =
    if Text.Length(FirstNumber) = RequiredLength then FirstNumber else null
in
  ReturnValue

The function works by taking the characters of InputText from left to right, and building up a string of numbers (stopping when 7 numeric characters are accumulated), otherwise resetting to an empty string when it encounters a non-numeric character. 

 

Here's another idea using table grouping to group consecutive digits together (also a function):

(InputText as text) => 
CharacterList = Text.ToList(InputText), CharacterTable = Table.FromList(CharacterList, Splitter.SplitByNothing(), type table[Character = text], null, ExtraValues.Error), AddedIndex = Table.AddIndexColumn(CharacterTable, "Index", 1, 1), AddedDigitFlag = Table.AddColumn(AddedIndex, "Digit", each List.Contains({"0".."9"},[Character]), type logical), DigitGroups = Table.Group(AddedDigitFlag, {"Digit"}, {{"Number", each Text.Combine(Table.Sort(_,{"Index"})[Character]), type text}}, GroupKind.Local), FilterNumbersLength7 = Table.SelectRows(DigitGroups, each [Digit] = true and Text.Length([Number])=7), FirstNumber = try FilterNumbersLength7{0}[Number] otherwise null in FirstNumber

 

Another option might be using some R code to find text matching an appropriate regular expression.

 

Regards,

Owen 🙂


Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
Twitter
LinkedIn

Hi Owen,

 

This is great, 

 

I was wondering if you had a solution to extract 2 sets of numbers within the same row;

 

column x = 1 x 654321 and 2 x 123456

result = 2 columns, column 1= 654321 & column 2 =  123456

 

so by using your function set to 6 digits extraction, is it possible to extract two sets of (6) occurance digits from the same cell? column x extraction would create two seperate columns one with the first 6 digit number found in column x and the second column with the second occurance of 6 digit number... amended to capture if there were 3 occurance and so on?

 

cheers,

Hello Owen, is there any way to get this into a Flow action? I tried getting the file, copying your text into a Compose action, and replacing InputText with the Filename from Dynamic Content, but the result in Body looks like it displays your text, not the evaluated function... Thanks In Advance.

@jamesg21 

The M code I posted earlier can't be directly translated to Flow.

I'm sure there is a way of performing the same actions using Flow functions but I don't know enough to tell you what that would be. Perhaps try posting on the Flow forums.

 

You can use Power Query actions with a SQL Server data source using the Premium "Transform data using Power Query" action, but I suspect that may not be what you are wanting.

 

Regards,

Owen


Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
Twitter
LinkedIn

@OwenAuger Thank you, no worries some bright spark in the Flow forum translated my requirement into 2 split functions: split at ' - ' and then split at '-'. For completion (and anyone else stuck on a similar problem), this is the link.

Anonymous
Not applicable

Thank you for that!

If I want to extract only a string that would start with 1 only and have a lenght of 8, what would I need to add? I've been trying a few things but nothing seems to be working. 

 

Thank you!

 

Anonymous
Not applicable

Hi,

 

Could you expand a bit more on the possibility of getting this done using R code? I have tried this solution for a similar problem but haven't managed to get it to work. Not sure what I am doing wrong but I am not very familiar with using functions so I'm sure it's something silly.

 

My problem is slightly different though, I cannot use number of characters as there will be other numbers in there which are the same length as the number I am trying to extract. However, I do know my number will always start with 45313 and always be 10 characters long. Any help is appreciated.

 

Cheers

Anonymous
Not applicable

Just in case anyone is struggling with a problem similar to mine, I have managed to solve it using a two step solution.

 

I added a custom column using the following code:

 

First Column =

if Text.Contains([YourDataColumn], "45313") then 
Text.PositionOf([YourDataColumn], "45313") else null

 

I then added a second column which references the first one using this code:

 

Second Column =

if [FirstColumn]<>null then Text.Range([YourDataColumn], [FirstColumn], 10) else null

 

The first column calculates the character position of the sequence I know my number will always start with. The second column uses Text.Range which will select a set number of characters (in my case 10) starting from a particular number of characters onwards in the overall text. Since I'm trying to extract information from an e-mail inbox it is impossible to predict which position the information needed will be in, this is overcome by using the first column in the second one. 

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.