Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hello fellow DAX'ers. I do find text handling complicated. Hope one of you can help me out.
I have the following situation where entries in a system name column are very inconsistent... For example, the Red items are variants of a proper consistent system name.
I've created a list of inconsistent values and their corrections. There will be a modest number of corrections and MANY that are just right--hence the desire to just have a list of corrections.
I want the following where only the names needing correction are updated. I've tried dozens of variants and need a starting strategy if you don't mind. The things I've tried have been of the form: and the issue is making a scalar out of the text rows in the relatedtable
Any patterns or strategies would be greatly appreciated and thanks in advance! Tom
UniformSystemName =
//Create a column in the Hospital Demographics Table--to provide an iterator/row to work with
//Find out if there is a correction (HowMany =0 says no, HowMany = 1 says yes)
//there will not be more than one record found. VAR HowMany = COUNTROWS(RELATEDTABLE('SystemNameCorrections')) RETURN IF ( HowMany = 0, HospitalDemographicsTable[HospitalSystemName],
//If found one, use the corrected IF( HowMany = 1, FIRSTNONBLANK(SystemNameCorrections[ConsistentName]),
//Ignore anything else "") )
Did you consider using the Query (aka Power Query) to cleanup and normalize the company names? By using Power Query for the data preps, you will have a cleaner and simpler design. If you are interested in this direction, I can show you how to create queries that will normalize your company names to the desired format.
Thanks, I'd welcome knowing how to do that using a query. That sounds promising.
I have a table of known corrections which can be applied to each new quarter's data...and then will surely see a few more corrections to be made as folks invent new ways to make the names inconsistent. I'll then create an updated list of corrections and want to clean up the system names again.
AND I'd love to know how to do this in DAX (presumably create column) if anyone has a strategy/pattern for that. In the DAX world, I need to know more about (the confounding) text column handling.
So what would you do to normalize the system name in this example?
Tom
In the exmaple below, I duplicate the original column to the column "Normalized" and perform the fixup of the new column by lowercasing the text, removing the ending dot and finding text that ends with "inc" to replace it with " inc".
You can copy the entire M statement to see the exmaple.
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCq4sLknNVfDMS9ZTitWB8YFcJJ4CjOuRX1zgXpRfWoAQ8Q8OcA/yDw3w9HPGVAI0MhYA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t]), #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}), #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each [Column1]), #"Renamed Columns" = Table.RenameColumns(#"Added Custom",{{"Column1", "Original"}, {"Custom", "Normalized"}}), #"Lowercased Text" = Table.TransformColumns(#"Renamed Columns",{{"Normalized", Text.Lower}}), fnReplaceSuffix = (inputText, suffixToRemove, suffixToAdd) => if Text.EndsWith(inputText,suffixToRemove) then Text.ReplaceRange(inputText,Text.Length(inputText)-Text.Length(suffixToRemove),Text.Length(suffixToRemove),suffixToAdd) else inputText, RemoveDots = Table.TransformColumns(#"Lowercased Text",{{"Normalized", each fnReplaceSuffix(_,".","")}}), HandleInc = Table.TransformColumns(RemoveDots, {{"Normalized", each fnReplaceSuffix(_, "inc", " inc")}}), #"Capitalized Each Word" = Table.TransformColumns(HandleInc,{{"Normalized", Text.Proper}}) in #"Capitalized Each Word"
So, while the M statement may be complicated, you do get a much simplifier model.
Hello Data--yes, if the normalizations are predictable. I should have been clearer that some systems have hundreds of facilities...and many of the transforms aren't cleanup of an entry but using the appropriate one.
CHS/Community Health System
Community Health System
CHS
All should become Community Health System. Hence it's a custom thing rather than a transform. Can you find/lookup and substitute a proper name during the query?
Thanks,
Tom
So, create a function and then create a new column that calls that function with the value that you want to lookup and have the function return the correction. Here is an example from an upcoming blog post on Hex to Decimal conversion. You should be able to use the same basic process. Note, your list of lookups and corrections could come from a data source such as a SQL table instead of the list that I created manually in "M" code:
let fnHex2Dec = (input) => let values = { {"0", 0}, {"1", 1}, {"2", 2}, {"3", 3}, {"4", 4}, {"5", 5}, {"6", 6}, {"7", 7}, {"8", 8}, {"9", 9}, {"A", 10}, {"B", 11}, {"C", 12}, {"D", 13}, {"E", 14}, {"F", 15} }, Result = Value.ReplaceType({List.First(List.Select(values, each _{0}=input)){1}},type {number}) in Result in fnHex2Dec
This function takes a single input parameter, a single text character and translates it to a decimal equivalent. You can test this function by clicking the "Invoke" button in the Power Query Editor window. Be sure to enter a single value preceded by a single quote, such as 'A. The single quote forces the input to be recognized as text. This ensures that if you enter a 7, that it is recognized as text instead of a number.
One other note, you would have to add some logic for lookups that were correct, they would not be found so probably get a null result back or have to handle the error so you would have to wrap your call to this function in an "if" statement in the column code and if an error or null comes back, just use the original value. If I have some time, I'll play with it a little and see if I can get you even closer.
@Greg_Deckler this is very promising. I have these in an excel file in a folder--or in a table in the model. The mistakes are quite consistent for each provider--generally done by the software that submits the data to cms and the "system name" seldom changes. I will have an excel file that someone that watches these things can update with a new "error".
@Sean your land records watchout is a good one. This use-case is to simplify a menu/slicer of system names to select from so all the facilities from one system should be collapsed under one "commonly expected" system name. The "real" data we're very aware is what it is, so we're good in this particular situation. Thanks for thinking of that.
Also, Sean, perhaps 1/3->1/2 of facilities are in a system. And only 5% of those have abnormal system names. Does your idea work if you make a relationship between entered system name (Many to 1) and corrections just on the system name. If the relationship doesn't return an entry it means there isn't a correction---can I then just use the entered name like in smoupre's suggestion?
Tom
you can of course get the Name entered instead of "Unknown" if there's no value in the Lookup table...
If the mistakes are consistent though as you say then you may not need this...
@Sean--I'll try your solution in a different configuration in the am. First more context:
~6000 providers
~2500 in 500 systems
~700 of the 2500 have inconsistent names entered.
There are ~150 distinct inconsistent names resolving to 50 properly configured consistent names
Each year providers change systems (acquired) or systems themselves are acquired---it's not helpful to tie providers to a correct name, RATHER it's key to tie misspells (Hospital Grup) or inconsistencies (CHS should be Community Health Systems) to the proper names. That list is readily maintained without examining every provider every year--a slicer will have some stray names to be fixed.
So---I'll try your solution looking to relate the inconsistent name in the provider table--to the inconsistent name in the corrections table--and substitute the related consistent value. That is a many to ONE relationship like you're suggesting. That's what I was trying to do unsuccessfully. I will need to cover for no record found.
@Greg_Deckler looks like you're building an M version of a switch function via a table "injection" of the switch values. That's exactly what I want to do. I'd love to know how to have the table feed your function values--perhaps you can give me a little nudge on how to feed the function. Actually I've not heard of building a function except through create column or table, so this is very intriguing and very useful.
Anyway, that's why this language is so confounding...handling text is still very foreign to me. Let me know what you all think and Thank you...
Tom
@ThomasDay - Hey Thomas, I just created this in another thread:
Here is an example of a lookup function that uses a database table as a source of its information:
let fnHex2DecFromDB = (input) => let Source = Sql.Database("MySQLServer", "MyDatabase"), dbo_Hex2Dec = Source{[Schema="dbo",Item="Hex2Dec"]}[Data], Record = Table.First(Table.SelectRows(dbo_Hex2Dec, each [Hex] = input)), Result = Record.Field(Record,"Dec") in Result in fnHex2DecFromDB
In the example above, "MySQLServer" is the name of the SQL Server, "MyDatabase" is the name of the database. In that database on that server I have a table called "Hex2Dec" with two columns named "Hex" and "Dec". The rows are what you would imagine them to be for hex to decimial conversion. Basically, the premise here should work for any lookup situation that has unique lookup values. To use this, just create a new column with the formula:
=fnHex2DecFromDB([TextColumn])
Thank you very much @Greg_Deckler I'm traveling (today)/tomorrow and look forward to building it into my model when I get back. It will be a real assset! Several transformations have been done by a SQL coding friend so I have SQL Express on my computer--so I'm sure I can load the table there.
Thank you again,
Tom
Cool @ThomasDay, let me know how it goes! I am actually in the process of creating a blog article on this very topic demonstrating lookups in Power BI using DAX as well as M, etc.
So, how about using it to fix an existing column of data. For each system name (skip blanks)--would I call the function--check if it has an entry in the conversion table and substitute. I'm sure that's not quite it's standard use case--but a logical extension for ETL work. Perhaps create a new column from the existing column? Not quite sure what the wrapper to the function would be in that case.
Let me know what you think about that and thanks,
Tom
I would create a new column and do something like the following (this is psuedo-code)
= if [System] <> "" then if fnLookup([System]) = "" then [System] else fnLookup([System]) else ""
Again, that's psuedo code, you may have to break that if statement up into two columns not even sure you can do nested if's in M, will have to try that. In any case, you're getting ahead of me, I'm still working on the blog article and I have this exact scenario in mind for solving in DAX as well as M.
thinking this "lookup"-function already exists in M: Merge (Table.Join)
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
@ImkeF True to a point, but in my opinion, Merge does not cover all of the possible use cases because you don't have complete and utter control over the "lookup". Essentially, Merge suffers from the same limitations as you have when doing simple relationships in Power BI Desktop.
I think I don't understand what you mean here. Will you cover that in your upcoming article?
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Yes, it will be in there. The basic concept is, if you do it via relationships or Table.Join, you effectively have to have all matching values, even for "good" values. Think of the instance where you have a list of known data errors that you wish to clean up and replace. Are there methods for handling that? Sure, it can be done but it gets messy with relationships and Table.Join(s). One method is to have translations for "good" values to that same "good" value. Or, there is the scenario where there are exceptions, so you want to replace this with that but not in this particular circumstance. Again, are there more than one way to solve that? Sure, but again it can again get very messy. The article I am writing deals with all of those different scenarios and presents about half a dozen or more ways of solving lookups along with all of the pros and cons.
Yes, moving towards "fuzzy lookup" this is the way to go. Looking forward your approaches here.
But I'd expect it costing performance. So for unconditional joins expecting that Table.Join performs much faster.
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
For the ~5-6K rows of this example, I used @Sean's DAX solution and made the relationship "fuzzy" coupled with @DataChants ideas of stripping consistent known issues out ahead of time (in DAX).
SystemInfoByRept = SELECTCOLUMNS( FILTER(2014_ALPHA, [Text] = "SearchString"), "ReptRecNo",2014_ALPHA[ReptRecNo], "SystemName",TRIM(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(2014_ALPHA[Text],"NAME:",""),".",""),"INC",""),"LLC","")))
UniformSystemName = If(ISBLANK (RELATED(SystemNameCorrections[ImproperSystemNames])), SystemInfoByRept[HospitalSystemName],RELATED(SystemNameCorrections[Corrected System Name]) )
There's a lot of trickery in the syntax here, so many thanks to @Sean for being the pattern guy on this. RELATED is used in to see if a record in the ImproperSystemNames exists AND in fetching the Corrected System Name. And all of this in a Column create!!
Tom
I have an entire blog article devoted to lookups in staging area of the Community Blog if you have access to it.
Very similar DAX to yours, in fact, I think your ISBLANK is a little cleaner. The blog article covers both DAX and M lookups.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
104 | |
101 | |
79 | |
72 | |
64 |
User | Count |
---|---|
142 | |
108 | |
101 | |
81 | |
74 |