cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Helper IV
Helper IV

How to extract continuous string set from a column with multiple criteria

Hi,

I have a Column A with client names.
Table 1                                                                      Table 2
Client Name                                                              Ongoing Projects
Microsoft Corporation Ltd                                        Microsoft
BARCALAYS (LLP)                                                      Barcalays
Caseys General                                                          Casey's
LLP Ernst and Young                                                 Ernst and Young
ASTRAZENCALimited                                                Astrazanca

I want to search and extract the client names from Table 1 as below:
Result Column
Microsoft
Barcalays
Casey
Ernst and Young
Astrazenca

Basically the logic I am looking for is as below. The function/formula should check for continuous sequence of characters from Table B in Table A and return that entire sequence. Also this search should be case insensitive (For ex: BARCALAYS and Barcalays) and also the search text can be in between the within text column anywhere and not necessarily at the beginning (Ex: LLP Ernst and Youngand Ernst and Young).
One important criteria is it should return matches for continuous string greater than 2 (i.e; 3 and above).

I tried various formulae like Search, Find, Minx, Left but I am not getting the desired result with the above set of criteria met. Please help me with the logical formula for this.

Happy to give any additional info pertaining to this.

1 ACCEPTED SOLUTION

@deepakramamurth - OK, I updated this with a Client Name output version and Project output version in 2 additional columns. I checked the matches, don't see anything obviously amiss. Solved?

 

PBIX is attached. 

 


---------------------------------------

@ me in replies or I'll lose your thread!!!

I have a NEW book! 
DAX Cookbook from Packt
Over 120 DAX Recipes!




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




View solution in original post

52 REPLIES 52

@deepakramamurth - I will take a look.


---------------------------------------

@ me in replies or I'll lose your thread!!!

I have a NEW book! 
DAX Cookbook from Packt
Over 120 DAX Recipes!




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




@deepakramamurth Glad we got there, was an interesting problem. There are still replies in this thread you haven't kudo'd... 🙂

 

This may be my record for longest thread before solution, 6 pages!!!


---------------------------------------

@ me in replies or I'll lose your thread!!!

I have a NEW book! 
DAX Cookbook from Packt
Over 120 DAX Recipes!




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




@Greg_DecklerGenius!! Thank you so much for this. You have literally spared me a grave headache. Wish I could give you a dozen Kudos haha!! Peace!

@deepakramamurth - OK, I updated this with a Client Name output version and Project output version in 2 additional columns. I checked the matches, don't see anything obviously amiss. Solved?

 

PBIX is attached. 

 


---------------------------------------

@ me in replies or I'll lose your thread!!!

I have a NEW book! 
DAX Cookbook from Packt
Over 120 DAX Recipes!




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




View solution in original post

@Greg_Deckler  For some wierd reason, I added data to my datadump today and after refresh I dont see Blue Cross and Blue Shield of Arizona I as well as The Ernst & Young entry for Blue Cross . I only see BLUE CROSS OF IDAHO CARE PLUS, INC in the data dump (excel). However when I filter in the data view of the BI desktop under list of projects, there are two entries for Blue cross. Idaho care plus and the Ernst and Young Entry. I am not sure what could be a reason for this. I dont see Blue Shield entry in my entire data set nor in the data view. Have you come across such odd things?

On the return value, we could either return the Project Name for the match or the Client Name for the Match. (either works - Better to know when to change what for convenience).

I am still trying to understand some part of the code. May be I can play around with the threshold values and get a match for Blue cross also for all kind of entries currently present. That is if you dont come up with one earlier than me 🙂

Please share the latest improvements you could make on the existing one to have better results. Hopefully the next post from you could be our very much needed near to perfect solution or even better - A perfect Solution 😉

@Greg_DecklerThank you Greg. You are truly a genius. I will go ahead and use this for my presentation today. This is really helpful. Thank you for all your time.
However I do not understand just one thing. Why was the exception given to Blue Cross?

Even National Grid has two matches with client names similar to Blue cross. While this works for National Grid, why is it not working for Blue Cross?

(See -Ernst and YoungNATIONAL Grid     and Ernst and YoungBLUE Cross   under client names for which blue cross exception was added). While National grip is getting mapped to YoungNATIONAL Grid, Blue Cross should get mapped to YoungBLUE Cross since the client names are of similar nature or structure. Any thoughts on why this is happening? Then may be we can figure out to eliminate giving this exception as well and have a perfectly working formula without any exceptions 🙂
Let me know you thoughts. I will go ahead and accept this as a solution post your confirmation on this piece.

@Greg_DecklerThe logic I think will work is , instead of giving the matches by taking both the Client name and project columns into consideration, can we only take Project column into consideration and provide the maximum match term with project column from the client name?

Example: 

Client Name       Project Name

National Grid      National Grid

First National      ABB

 

Operation to follow as below:

Take National Grid for consideration.

Check for 'N' in all cells of Client Name and save this list in a variable.

Check for 'Na' in this saved list and update this variable with the new lesser list.

Check for 'Nat' in this newly saved list and update the variable......... and so on until we check for 'National G'.
We have to stop with the first character after the first occurance of space because if not then the return values will need to be an exact match with the client name.
This above logic will ensure that National Grid is mapped only to National Grid and not First National.

Now, The above step is iterated by taking ABB into consideration. 

This way we can ensure that every Project Name is mapped exactly to one item from the Client Name and the retun value will be the Client Name and not the matching strings.

This was my initial proposal for the PHP code but I am not sure how I lost the course on this. Probably I got overwhelmed with the results I recieved with DAX.

 

Is there a way to try the above logic with PBI? This seems to be fool proof atleast for the current set of Client Names and considering the way they appear, it could work on any additional clients that may add in to this list.

 

I need to submit this presentation in another 12 hours , i.e 12:00 PM. Hope this method is possible.

@deepakramamurth - Correct me if I am wrong, but this would fail in the case of the sample data you provided, specifically:

 

LLP Ernst and Young trying to match with Ernst and Young

 

If I go with the first three trying to match, it will never return Ernst and Young because LLP does not equal "Ern".

 

Right?


---------------------------------------

@ me in replies or I'll lose your thread!!!

I have a NEW book! 
DAX Cookbook from Packt
Over 120 DAX Recipes!




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




@Greg_DecklerNo. The first 3 and above characters are to be taken from the reference table(column) and then search for a match in the client name column. 

So 'Ern' will be matched with the client column to return the result for wherever the string 'ern' appears. So this way LLP 'Ern'st and Young will be mapped to 'Ern'.

 

I got the solution for this logic from a friend of mine using the PHP code. I am not sure of how to have this integrated to my Power BI. I have provided the link to my drive having the result file (.csv) and the code image for your reference. If you could help me with a way to implement this with Power BI that will fix the problem.

 

-------------------->    Drive Link 

 

Thank you for your time. Really appreciate it.

@deepakramamurth - Yeah, even with what I laid out if it was even possible in DAX, you would end up with the same exact problem. You will run into situations where there isn't a firm way to decide if "Barclays Capital" better matchines "Barclay's Investments" or "Tristate Capital" for instance. How do you decide? In that example, both would match 7 consequetive letters. Technically " Capital" is 8 so it would probably win. That's just an example of what you run into with fuzzy matching scenarios.


---------------------------------------

@ me in replies or I'll lose your thread!!!

I have a NEW book! 
DAX Cookbook from Packt
Over 120 DAX Recipes!




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




@deepakramamurth - I confirmed that the SEARCH approach will work. See Table (24) and Table (24a) in attached PBIX below sig. Made a small edit to get the information in the format you wanted:

Column = 
  VAR __ClientName = [Client Name]
  VAR __Table =
    ADDCOLUMNS(
      ADDCOLUMNS(
        'Table (24a)',
        "Search",SEARCH([Ongoing Projects],__ClientName,,0)
      ),
      "Result",IF([Search]>0,[Ongoing Projects],BLANK())
    )
RETURN
  MAXX(__Table,[Result])

 


---------------------------------------

@ me in replies or I'll lose your thread!!!

I have a NEW book! 
DAX Cookbook from Packt
Over 120 DAX Recipes!




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




Helpful resources

Announcements
secondImage

Happy New Year from Power BI

This is a must watch for a message from Power BI!

December Update

Check it Out!

Click here to read more about the December 2020 Updates!

Community Blog

Check it Out!

Click here to read the latest blog and learn more about contributing to the Power BI blog!

Get Ready for Power BI Dev Camp

Get Ready for Power BI Dev Camp

Mark your calendars and join us for our next Power BI Dev Camp!.

Top Solution Authors
Top Kudoed Authors