Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
ThomasDay
Impactful Individual
Impactful Individual

Replacing values in a text column from related table

Hello fellow DAX'ers.  I do find text handling complicated.  Hope one of you can help me out.

before.PNG

I have the following situation where entries in a system name column are very inconsistent...  For example, the Red items are variants of a proper consistent system name.  

 

I've created a list of inconsistent values and their corrections.  There will be a modest number of corrections and MANY that are just right--hence the desire to just have a list of corrections.

 

After.PNG

I want the following where only the names needing correction are updated.  I've tried dozens of variants and need a starting strategy if you don't mind.  The things I've tried have been of the form: and the issue is making a scalar out of the text rows in the relatedtable 

Any patterns or strategies would be greatly appreciated and thanks in advance! Tom

 

 

UniformSystemName = 
//Create a column in the Hospital Demographics Table--to provide an iterator/row to work with
//Find out if there is a correction (HowMany =0 says no, HowMany = 1 says yes)
//there will not be more than one record found. VAR HowMany = COUNTROWS(RELATEDTABLE('SystemNameCorrections')) RETURN IF ( HowMany = 0, HospitalDemographicsTable[HospitalSystemName],
//If found one, use the corrected IF( HowMany = 1, FIRSTNONBLANK(SystemNameCorrections[ConsistentName]),
//Ignore anything else "") )
24 REPLIES 24

Did you consider using the Query (aka Power Query) to cleanup and normalize the company names? By using Power Query for the data preps, you will have a cleaner and simpler design. If you are interested in this direction, I can show you how to create queries that will normalize your company names to the desired format.

ThomasDay
Impactful Individual
Impactful Individual

Thanks, I'd welcome knowing how to do that using a query.  That sounds promising.

 

I have a table of known corrections which can be applied to each new quarter's data...and then will surely see a few more corrections to be made as folks invent new ways to make the names inconsistent.  I'll then create an updated list of corrections and want to clean up the system names again.

 

AND I'd love to know how to do this in DAX (presumably create column) if anyone has a strategy/pattern for that.  In the DAX world, I need to know more about (the confounding) text column handling.

 

So what would you do to normalize the system name in this example?

Tom

In the exmaple below, I duplicate the original column to the column "Normalized" and perform the fixup of the new column by lowercasing the text, removing the ending dot and finding text that ends with "inc" to replace it with " inc".

 

Screenshot_39.png

You can copy the entire M statement to see the exmaple.

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCq4sLknNVfDMS9ZTitWB8YFcJJ4CjOuRX1zgXpRfWoAQ8Q8OcA/yDw3w9HPGVAI0MhYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each [Column1]),
    #"Renamed Columns" = Table.RenameColumns(#"Added Custom",{{"Column1", "Original"}, {"Custom", "Normalized"}}),
    #"Lowercased Text" = Table.TransformColumns(#"Renamed Columns",{{"Normalized", Text.Lower}}),
    
    fnReplaceSuffix = (inputText, suffixToRemove, suffixToAdd) =>
        if Text.EndsWith(inputText,suffixToRemove) then
                Text.ReplaceRange(inputText,Text.Length(inputText)-Text.Length(suffixToRemove),Text.Length(suffixToRemove),suffixToAdd)
        else inputText,

    RemoveDots = Table.TransformColumns(#"Lowercased Text",{{"Normalized", each fnReplaceSuffix(_,".","")}}),
    
    HandleInc = Table.TransformColumns(RemoveDots, {{"Normalized", each fnReplaceSuffix(_, "inc", " inc")}}),

    #"Capitalized Each Word" = Table.TransformColumns(HandleInc,{{"Normalized", Text.Proper}})
        
in
    #"Capitalized Each Word"

 

So, while the M statement may be complicated, you do get a much simplifier model.

 

ThomasDay
Impactful Individual
Impactful Individual

Hello Data--yes, if the normalizations are predictable.  I should have been clearer that some systems have hundreds of facilities...and many of the transforms aren't cleanup of an entry but using the appropriate one.

 

CHS/Community Health System

Community Health System

CHS

 

All should become Community Health System.  Hence it's a custom thing rather than a transform.  Can you find/lookup and substitute a proper name during the query?

 

Thanks,

Tom

So, create a function and then create a new column that calls that function with the value that you want to lookup and have the function return the correction. Here is an example from an upcoming blog post on Hex to Decimal conversion. You should be able to use the same basic process. Note, your list of lookups and corrections could come from a data source such as a SQL table instead of the list that I created manually in "M" code:

 

let
    fnHex2Dec = (input) =>
 
let
 
values = {
 
{"0", 0},
{"1", 1},
{"2", 2},
{"3", 3},
{"4", 4},
{"5", 5},
{"6", 6},
{"7", 7},
{"8", 8},
{"9", 9},
{"A", 10},
{"B", 11},
{"C", 12},
{"D", 13},
{"E", 14},
{"F", 15}
 
},
 
Result = Value.ReplaceType({List.First(List.Select(values, each _{0}=input)){1}},type {number})
 
in
 
Result
in
    fnHex2Dec

This function takes a single input parameter, a single text character and translates it to a decimal equivalent. You can test this function by clicking the "Invoke" button in the Power Query Editor window. Be sure to enter a single value preceded by a single quote, such as 'A. The single quote forces the input to be recognized as text. This ensures that if you enter a 7, that it is recognized as text instead of a number.

 

One other note, you would have to add some logic for lookups that were correct, they would not be found so probably get a null result back or have to handle the error so you would have to wrap your call to this function in an "if" statement in the column code and if an error or null comes back, just use the original value. If I have some time, I'll play with it a little and see if I can get you even closer.

 

 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

@Greg_Deckler this is very promising.  I have these in an excel file in a folder--or in a table in the model.  The mistakes are quite consistent for each provider--generally done by the software that submits the data to cms and the "system name" seldom changes.  I will have an excel file that someone that watches these things can update with a new "error".  

 

@Sean your land records watchout is a good one.  This use-case is to simplify a menu/slicer of system names to select from so all the facilities from one system should be collapsed under one "commonly expected" system name.  The "real" data we're very aware is what it is, so we're good in this particular situation.  Thanks for thinking of that.

 

Also, Sean, perhaps 1/3->1/2 of facilities are in a system.  And only 5% of those have abnormal system names.  Does your idea work if you make a relationship between entered system name (Many to 1) and corrections just on the system name.  If the relationship doesn't return an entry it means there isn't a correction---can I then just use the entered name like in smoupre's suggestion?

 

Tom

Sean
Community Champion
Community Champion

you can of course get the Name entered instead of "Unknown" if there's no value in the Lookup table...

Correct Names2.png

If the mistakes are consistent though as you say then you may not need this...

ThomasDay
Impactful Individual
Impactful Individual

@Sean--I'll try your solution in a different configuration in the am.  First more context:

 

~6000 providers

~2500 in 500 systems

~700 of the 2500 have inconsistent names entered.

There are ~150 distinct inconsistent names resolving to 50 properly configured consistent names

Each year providers change systems (acquired) or systems themselves are acquired---it's not helpful to tie providers to a correct name, RATHER it's key to tie misspells (Hospital Grup) or inconsistencies (CHS should be Community Health Systems) to the proper names.  That list is readily maintained without examining every provider every year--a slicer will have some stray names to be fixed.

 

So---I'll try your solution looking to relate the inconsistent name in the provider table--to the inconsistent name in the corrections table--and substitute the related consistent value.  That is a many to ONE relationship like you're suggesting.  That's what I was trying to do unsuccessfully. I will need to cover for no record found.

 

@Greg_Deckler looks like you're building an M version of a switch function via a table "injection" of the switch values.  That's exactly what I want to do.  I'd love to know how to have the table feed your function values--perhaps you can give me a little nudge on how to feed the function.  Actually I've not heard of building a function except through create column or table, so this is very intriguing and very useful.

 

Anyway, that's why this language is so confounding...handling text is still very foreign to me.  Let me know what you all think and Thank you...

Tom

@ThomasDay - Hey Thomas, I just created this in another thread:

 

Here is an example of a lookup function that uses a database table as a source of its information:

 

let fnHex2DecFromDB = (input) =>

let
    Source = Sql.Database("MySQLServer", "MyDatabase"),
    dbo_Hex2Dec = Source{[Schema="dbo",Item="Hex2Dec"]}[Data],
    Record = Table.First(Table.SelectRows(dbo_Hex2Dec, each [Hex] = input)),
    Result = Record.Field(Record,"Dec")
in
    Result
in
    fnHex2DecFromDB

In the example above, "MySQLServer" is the name of the SQL Server, "MyDatabase" is the name of the database. In that database on that server I have a table called "Hex2Dec" with two columns named "Hex" and "Dec". The rows are what you would imagine them to be for hex to decimial conversion. Basically, the premise here should work for any lookup situation that has unique lookup values. To use this, just create a new column with the formula:

 

 

=fnHex2DecFromDB([TextColumn])

 

 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Thank you very much @Greg_Deckler  I'm traveling (today)/tomorrow and look forward to building it into my model when I get back.  It will be a real assset!  Several transformations have been done by a SQL coding friend so I have SQL Express on my computer--so I'm sure I can load the table there.  

 

Thank you again,

Tom

Cool @ThomasDay, let me know how it goes! I am actually in the process of creating a blog article on this very topic demonstrating lookups in Power BI using DAX as well as M, etc.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

So, how about using it to fix an existing column of data.  For each system name (skip blanks)--would I call the function--check if it has an entry in the conversion table and substitute.  I'm sure that's not quite it's standard use case--but a logical extension for ETL work.  Perhaps create a new column from the existing column?  Not quite sure what the wrapper to the function would be in that case.

 

Let me know what you think about that and thanks,

Tom

I would create a new column and do something like the following (this is psuedo-code)

 

= if [System] <> "" then if fnLookup([System]) = "" then [System] else fnLookup([System]) else ""

Again, that's psuedo code, you may have to break that if statement up into two columns not even sure you can do nested if's in M, will have to try that. In any case, you're getting ahead of me, I'm still working on the blog article and I have this exact scenario in mind for solving in DAX as well as M.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

@Greg_Deckler

thinking this "lookup"-function already exists in M: Merge (Table.Join)

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

@ImkeF True to a point, but in my opinion, Merge does not cover all of the possible use cases because you don't have complete and utter control over the "lookup". Essentially, Merge suffers from the same limitations as you have when doing simple relationships in Power BI Desktop.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

I think I don't understand what you mean here. Will you cover that in your upcoming article?

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Yes, it will be in there. The basic concept is, if you do it via relationships or Table.Join, you effectively have to have all matching values, even for "good" values. Think of the instance where you have a list of known data errors that you wish to clean up and replace. Are there methods for handling that? Sure, it can be done but it gets messy with relationships and Table.Join(s). One method is to have translations for "good" values to that same "good" value. Or, there is the scenario where there are exceptions, so you want to replace this with that but not in this particular circumstance. Again, are there more than one way to solve that? Sure, but again it can again get very messy. The article I am writing deals with all of those different scenarios and presents about half a dozen or more ways of solving lookups along with all of the pros and cons.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Yes, moving towards "fuzzy lookup" this is the way to go. Looking forward your approaches here.

But I'd expect it costing performance. So for unconditional joins expecting that Table.Join performs much faster.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

ThomasDay
Impactful Individual
Impactful Individual

For the ~5-6K rows of this example, I used @Sean's DAX solution and made the relationship "fuzzy" coupled with @DataChants ideas of stripping consistent known issues out ahead of time (in DAX).

  • Step 1 was cleaning up the incoming names to get rid of a lot of stuff.  I did this in the col create DAX.
SystemInfoByRept = 
    	SELECTCOLUMNS(
		FILTER(2014_ALPHA,
		[Text] = "SearchString"),
		"ReptRecNo",2014_ALPHA[ReptRecNo],
		"SystemName",TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(2014_ALPHA[Text],"NAME:",""),".",""),"INC",""),"LLC","")))
  • You can see that every name--whether it's a "good one" or not will at least be clean and reliably findable plus things that are always wrong are already removed.

 

  • Step 2 was building a list of corrections from the SystemName.  I put all the names recorded by providers in Excel--got a distinct list and culled out the ones that didn't need changing.  What's left are the ones that need a change...for example "System of Bahston" should be "System of Boston".  Some are misspells, others are unconforming names...but a list of several hundred corrections.  This list, in excel is easily updated and is simply a list of changes--Before and After.
  • Step 3 Query this to make a table...and relate many to one with the Raw System Names.  Obviously the ImproperSystemNames need to be Unique

The relationship.PNG

  • And the last step was to run the Related "lookup" and if a HospitalSystemName matches a Corrected System Name row, use it otherwise use the original.  So for each row of SystemNames...it uses RELATED to see if it's improper.  ISBLANK() tests if something is found--and if it's not Blank..fetches the Corrected System Name using RELATED again.
UniformSystemName = 
	If(ISBLANK
		(RELATED(SystemNameCorrections[ImproperSystemNames])),
		SystemInfoByRept[HospitalSystemName],RELATED(SystemNameCorrections[Corrected System Name])
		)

There's a lot of trickery in the syntax here, so many thanks to @Sean for being the pattern guy on this.  RELATED is used in to see if a record in the ImproperSystemNames exists AND in fetching the Corrected System Name.  And all of this in a Column create!!

Tom

I have an entire blog article devoted to lookups in staging area of the Community Blog if you have access to it.

 

http://community.powerbi.com/t5/Community-Blog-Staging/Good-Ol-VLOOKUP-The-Ultimate-Guide-to-Lookups...

 

Very similar DAX to yours, in fact, I think your ISBLANK is a little cleaner. The blog article covers both DAX and M lookups.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.