Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Data Cleansing Advice Needed

I have a number of data fields that I am trying to cleanse and am trying to find the most efficient way to go about this. Please see below sample of data: 

 

Machine Names:
Acritech Hole Popper ED-2000M
Acritech Hole Popper ED-2000M2
Acritech Hole Popper ED-2000MR
Acritech Hole Popper ED-2500M
Dener Press Brake Elite-XL-175-3.6
Dener Press Brake Elite-XL-320-4
Gantry Laser 25LMX
Gantry Laser 25LMXII
Gantry Laser 30LMXIII
Gantry Laser 35LMXRII
Gantry Laser 40LMXVII
Ingersoll Sinker EDM Eagle 500
Ingersoll Sinker EDM Gantry 1500HSJ
Ingersoll Sinker EDM Gantry 500
Ingersoll Sinker EDM Gantry 500HSJ
Ingersoll Sinker EDM Gantry 800

 

Goal: To be able to extract a specific portion of the machine name (EX: Acritech Hole Popper / Dener Press Brake / Gantry Laser / Ingersoll Sinker EDM) from a list simular to the sample data above. 

 

Problem: I am struggling on finding the most efficient way of extracting a certain portion of the machine name because there are hundreds of names like this and not all of the companies have the same number of characters or any delimiters that I can extract from. 

 

How would you go about this? Calculated column with hundreds of IF statements? 

 

Any guidance that you can share will be greatly appreciated!

 

Thank you datanauts!

1 ACCEPTED SOLUTION
zoloturu
Memorable Member
Memorable Member

@Anonymous,

 

You can solve it in below way:

 

1. As because there is no common logic between rows then you need to create a dictionary by yourself. So, create a table called 'Dictionary' which will store all correct machine names as below:

dictionary for machine.PNG 

(example - link)

 

2. Create a new blank query and call it fnFuzzyLookup. Then paste below Power Query code there:

 

(Input as text) => 
let 
    ListCount = List.Count(Dictionary[Brand]),  
    Result = List.Generate( () => 
                            [
                                i = 0,
                                t = Dictionary[Brand]{i},
                                k = if Text.Middle(Input,0,Text.Length(t)) = t then 1 else 0
                             ],
                             each (if ([i] < ListCount and [k] < 2) then true else false),
                             each [
                                        i = [i] + 1,
                                        t = Dictionary[Brand]{i},
                                        k = if [k] = 1 then 2 else (if (Text.Middle(Input,0,Text.Length(t)) = t) then 1 else 0)  
                                    ],
                             each (if [k] = 0 then "" else [t])
    )
in
    List.Last(Result)

3. Go to the main table (query) and then Add column -> Invoke custom function:

 fuzzy lookup.PNG

fuzzy lookup result.PNG

Regards,
Ruslan
-------------------------------------------------------------------
Did I answer your question? Mark my post as a solution!

View solution in original post

9 REPLIES 9
zoloturu
Memorable Member
Memorable Member

@Anonymous,

 

You can solve it in below way:

 

1. As because there is no common logic between rows then you need to create a dictionary by yourself. So, create a table called 'Dictionary' which will store all correct machine names as below:

dictionary for machine.PNG 

(example - link)

 

2. Create a new blank query and call it fnFuzzyLookup. Then paste below Power Query code there:

 

(Input as text) => 
let 
    ListCount = List.Count(Dictionary[Brand]),  
    Result = List.Generate( () => 
                            [
                                i = 0,
                                t = Dictionary[Brand]{i},
                                k = if Text.Middle(Input,0,Text.Length(t)) = t then 1 else 0
                             ],
                             each (if ([i] < ListCount and [k] < 2) then true else false),
                             each [
                                        i = [i] + 1,
                                        t = Dictionary[Brand]{i},
                                        k = if [k] = 1 then 2 else (if (Text.Middle(Input,0,Text.Length(t)) = t) then 1 else 0)  
                                    ],
                             each (if [k] = 0 then "" else [t])
    )
in
    List.Last(Result)

3. Go to the main table (query) and then Add column -> Invoke custom function:

 fuzzy lookup.PNG

fuzzy lookup result.PNG

Regards,
Ruslan
-------------------------------------------------------------------
Did I answer your question? Mark my post as a solution!

Anonymous
Not applicable

@zoloturu absolutely stunning! There was 100% chance that I could not have come up with this on my own! So much gratitude! Thank you!

 

Question for you, is there a way for me to add a further modification to this Power Query? Everything worked out great but I have a few fields that are null/blank and I have custom names that I would like to fill in. 


Ex: Any descr named "Gantry Laser" be updated to = "Tanaka Gantry Laser"? Please see screenshot below.

 

My hope is to fill the blank/null values with names that I will have the option to custom. 

 

Any guidance will be greatly appreciated!

InvokeCustomColumn - Fill Null Values.png

 

 

 

@Anonymous,

 

If I understood you correctly then below approach can help you:

 

1. Go to the table where you plan to add a column.

2. Menu -> Add column -> Custom column

3. Enter a condition similar to below

 

if Text.Middle([NewColumn],0,12) = "Gantry Laser" then "Tanaka " & [NewColumn] else [NewColumn]

left replace.PNG

 

(basically, check if the left side is 'Gantry Laser' then add 'Tanaka' to it or you can use a function Text.Replace)

 

Text.Middle function reference - https://docs.microsoft.com/en-us/powerquery-m/text-middle 

Text.Replace - https://docs.microsoft.com/en-us/powerquery-m/text-replace

 

Regards,
Ruslan
-------------------------------------------------------------------
Did I answer your question? Mark my post as a solution!

Anonymous
Not applicable

So much gratitude to you! THANK YOU!

@Anonymous,

 

You are welcome.

 

Regards,
Ruslan

If there is no repeatable rule, then there is no getting around the fact that you will need to Interveine manually at some point. I suggest, create a list of all the unique entries in Excel. Add a column with the result you want. Load the table to power query and merge it on the description. Then extract the new column you need. You can also create an audit query that shows which items are missing values. 



* Matt is an 8 times Microsoft MVP (Power BI) and author of the Power BI Book Supercharge Power BI.

As @MattAllington said.

 

These steps might help you preparing that list, as they remove all strings with numbers in them:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jY/BCoMwEER/ZfHclBi17bVFqRYFUSiCeAgSrBiiJF76941iT9XG6743s0xZWtdatiOrXxD2nEHaDwOTEPiIYIwTqzoYDGJWsv+K9/3jM6FPqWRKwU3SjkHAdQQVMbLPHnKOJ5PmEIzcWbpTMco3xFRpl3hxUmyco+gXOHgGa2SKZGvEnTLPhUSiYVL1nEPeim7emUBAGz1cr91WlkZbS2H+MHp7unZWXaaq6gM=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [#"Machine Names:" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Machine Names:", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "TextToList", each Text.Split([#"Machine Names:"], " ")),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "StringToListOfCharacters", each List.Transform([TextToList], (x) => Text.ToList(x))),
    #"Added Custom2" = Table.AddColumn(#"Added Custom1", "SelectStringsWithoutNumbers", each List.Select([StringToListOfCharacters], (x) => not List.ContainsAny(x, {"0".."9"}))),
    #"Added Custom3" = Table.AddColumn(#"Added Custom2", "Reassemble", each Text.Combine(List.Transform([SelectStringsWithoutNumbers], (x) => Text.Combine(x, "")), " ")),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom3",{"Machine Names:", "Reassemble"})
in
    #"Removed Other Columns"

  and you'll get a table like this:

image.png

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

@Anonymous

 

Seems we can also simply split by space/delimiter ...by splitting at the right most Space/Delimiter

 

solit.png

 

 


Regards
Zubair

Please try my custom visuals

Indeed, that's much easier for the sample given. 

... But will only work if there is just one string that contains numbers 😉

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.