Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

How to extract continuous string set from a column with multiple criteria

Hi,

I have a Column A with client names.
Table 1                                                                      Table 2
Client Name                                                              Ongoing Projects
Microsoft Corporation Ltd                                        Microsoft
BARCALAYS (LLP)                                                      Barcalays
Caseys General                                                          Casey's
LLP Ernst and Young                                                 Ernst and Young
ASTRAZENCALimited                                                Astrazanca

I want to search and extract the client names from Table 1 as below:
Result Column
Microsoft
Barcalays
Casey
Ernst and Young
Astrazenca

Basically the logic I am looking for is as below. The function/formula should check for continuous sequence of characters from Table B in Table A and return that entire sequence. Also this search should be case insensitive (For ex: BARCALAYS and Barcalays) and also the search text can be in between the within text column anywhere and not necessarily at the beginning (Ex: LLP Ernst and Youngand Ernst and Young).
One important criteria is it should return matches for continuous string greater than 2 (i.e; 3 and above).

I tried various formulae like Search, Find, Minx, Left but I am not getting the desired result with the above set of criteria met. Please help me with the logical formula for this.

Happy to give any additional info pertaining to this.

1 ACCEPTED SOLUTION

@Anonymous - OK, I updated this with a Client Name output version and Project output version in 2 additional columns. I checked the matches, don't see anything obviously amiss. Solved?

 

PBIX is attached. 

 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

View solution in original post

53 REPLIES 53

@Anonymous - I will take a look.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Hi Greg, pleasure in greeting you... 2 queries:

1. Does the Microsoft Power BI Cookcbook (2nd Edition) have a Spanish version?

2. In the book, will you have cases to identify ranges of text and texts with momodines?

Example, if I have the value "AC23", "AJ80" and "KRZ600".

NumberSinceUntil
Caso1AB01AJ70
Caso2A*J*
Caso3KRZ600

AC23 = Case1 and Case2

AJ80 = Caso2

KRZ600 = Caso3

Thanks and greetings...

@Anonymous Glad we got there, was an interesting problem. There are still replies in this thread you haven't kudo'd... 🙂

 

This may be my record for longest thread before solution, 6 pages!!!


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

@Greg_DecklerGenius!! Thank you so much for this. You have literally spared me a grave headache. Wish I could give you a dozen Kudos haha!! Peace!

@Anonymous - OK, I updated this with a Client Name output version and Project output version in 2 additional columns. I checked the matches, don't see anything obviously amiss. Solved?

 

PBIX is attached. 

 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

@Greg_Deckler  For some wierd reason, I added data to my datadump today and after refresh I dont see Blue Cross and Blue Shield of Arizona I as well as The Ernst & Young entry for Blue Cross . I only see BLUE CROSS OF IDAHO CARE PLUS, INC in the data dump (excel). However when I filter in the data view of the BI desktop under list of projects, there are two entries for Blue cross. Idaho care plus and the Ernst and Young Entry. I am not sure what could be a reason for this. I dont see Blue Shield entry in my entire data set nor in the data view. Have you come across such odd things?

On the return value, we could either return the Project Name for the match or the Client Name for the Match. (either works - Better to know when to change what for convenience).

I am still trying to understand some part of the code. May be I can play around with the threshold values and get a match for Blue cross also for all kind of entries currently present. That is if you dont come up with one earlier than me 🙂

Please share the latest improvements you could make on the existing one to have better results. Hopefully the next post from you could be our very much needed near to perfect solution or even better - A perfect Solution 😉

Anonymous
Not applicable

@Greg_DecklerThank you Greg. You are truly a genius. I will go ahead and use this for my presentation today. This is really helpful. Thank you for all your time.
However I do not understand just one thing. Why was the exception given to Blue Cross?

Even National Grid has two matches with client names similar to Blue cross. While this works for National Grid, why is it not working for Blue Cross?

(See -Ernst and YoungNATIONAL Grid     and Ernst and YoungBLUE Cross   under client names for which blue cross exception was added). While National grip is getting mapped to YoungNATIONAL Grid, Blue Cross should get mapped to YoungBLUE Cross since the client names are of similar nature or structure. Any thoughts on why this is happening? Then may be we can figure out to eliminate giving this exception as well and have a perfectly working formula without any exceptions 🙂
Let me know you thoughts. I will go ahead and accept this as a solution post your confirmation on this piece.

Anonymous
Not applicable

@Greg_DecklerThe logic I think will work is , instead of giving the matches by taking both the Client name and project columns into consideration, can we only take Project column into consideration and provide the maximum match term with project column from the client name?

Example: 

Client Name       Project Name

National Grid      National Grid

First National      ABB

 

Operation to follow as below:

Take National Grid for consideration.

Check for 'N' in all cells of Client Name and save this list in a variable.

Check for 'Na' in this saved list and update this variable with the new lesser list.

Check for 'Nat' in this newly saved list and update the variable......... and so on until we check for 'National G'.
We have to stop with the first character after the first occurance of space because if not then the return values will need to be an exact match with the client name.
This above logic will ensure that National Grid is mapped only to National Grid and not First National.

Now, The above step is iterated by taking ABB into consideration. 

This way we can ensure that every Project Name is mapped exactly to one item from the Client Name and the retun value will be the Client Name and not the matching strings.

This was my initial proposal for the PHP code but I am not sure how I lost the course on this. Probably I got overwhelmed with the results I recieved with DAX.

 

Is there a way to try the above logic with PBI? This seems to be fool proof atleast for the current set of Client Names and considering the way they appear, it could work on any additional clients that may add in to this list.

 

I need to submit this presentation in another 12 hours , i.e 12:00 PM. Hope this method is possible.

@Anonymous - Correct me if I am wrong, but this would fail in the case of the sample data you provided, specifically:

 

LLP Ernst and Young trying to match with Ernst and Young

 

If I go with the first three trying to match, it will never return Ernst and Young because LLP does not equal "Ern".

 

Right?


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

@Greg_DecklerNo. The first 3 and above characters are to be taken from the reference table(column) and then search for a match in the client name column. 

So 'Ern' will be matched with the client column to return the result for wherever the string 'ern' appears. So this way LLP 'Ern'st and Young will be mapped to 'Ern'.

 

I got the solution for this logic from a friend of mine using the PHP code. I am not sure of how to have this integrated to my Power BI. I have provided the link to my drive having the result file (.csv) and the code image for your reference. If you could help me with a way to implement this with Power BI that will fix the problem.

 

-------------------->    Drive Link 

 

Thank you for your time. Really appreciate it.

@Anonymous - Yeah, even with what I laid out if it was even possible in DAX, you would end up with the same exact problem. You will run into situations where there isn't a firm way to decide if "Barclays Capital" better matchines "Barclay's Investments" or "Tristate Capital" for instance. How do you decide? In that example, both would match 7 consequetive letters. Technically " Capital" is 8 so it would probably win. That's just an example of what you run into with fuzzy matching scenarios.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

@Anonymous - I confirmed that the SEARCH approach will work. See Table (24) and Table (24a) in attached PBIX below sig. Made a small edit to get the information in the format you wanted:

Column = 
  VAR __ClientName = [Client Name]
  VAR __Table =
    ADDCOLUMNS(
      ADDCOLUMNS(
        'Table (24a)',
        "Search",SEARCH([Ongoing Projects],__ClientName,,0)
      ),
      "Result",IF([Search]>0,[Ongoing Projects],BLANK())
    )
RETURN
  MAXX(__Table,[Result])

 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.