cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
win_toeknee Member
Member

Data Cleansing Advice Needed

I have a number of data fields that I am trying to cleanse and am trying to find the most efficient way to go about this. Please see below sample of data: 

 

Machine Names:
Acritech Hole Popper ED-2000M
Acritech Hole Popper ED-2000M2
Acritech Hole Popper ED-2000MR
Acritech Hole Popper ED-2500M
Dener Press Brake Elite-XL-175-3.6
Dener Press Brake Elite-XL-320-4
Gantry Laser 25LMX
Gantry Laser 25LMXII
Gantry Laser 30LMXIII
Gantry Laser 35LMXRII
Gantry Laser 40LMXVII
Ingersoll Sinker EDM Eagle 500
Ingersoll Sinker EDM Gantry 1500HSJ
Ingersoll Sinker EDM Gantry 500
Ingersoll Sinker EDM Gantry 500HSJ
Ingersoll Sinker EDM Gantry 800

 

Goal: To be able to extract a specific portion of the machine name (EX: Acritech Hole Popper / Dener Press Brake / Gantry Laser / Ingersoll Sinker EDM) from a list simular to the sample data above. 

 

Problem: I am struggling on finding the most efficient way of extracting a certain portion of the machine name because there are hundreds of names like this and not all of the companies have the same number of characters or any delimiters that I can extract from. 

 

How would you go about this? Calculated column with hundreds of IF statements? 

 

Any guidance that you can share will be greatly appreciated!

 

Thank you datanauts!

1 ACCEPTED SOLUTION

Accepted Solutions
zoloturu
Advisor

Re: Data Cleansing Advice Needed

@win_toeknee,

 

You can solve it in below way:

 

1. As because there is no common logic between rows then you need to create a dictionary by yourself. So, create a table called 'Dictionary' which will store all correct machine names as below:

dictionary for machine.PNG 

(example - link)

 

2. Create a new blank query and call it fnFuzzyLookup. Then paste below Power Query code there:

 

(Input as text) => 
let 
    ListCount = List.Count(Dictionary[Brand]),  
    Result = List.Generate( () => 
                            [
                                i = 0,
                                t = Dictionary[Brand]{i},
                                k = if Text.Middle(Input,0,Text.Length(t)) = t then 1 else 0
                             ],
                             each (if ([i] < ListCount and [k] < 2) then true else false),
                             each [
                                        i = [i] + 1,
                                        t = Dictionary[Brand]{i},
                                        k = if [k] = 1 then 2 else (if (Text.Middle(Input,0,Text.Length(t)) = t) then 1 else 0)  
                                    ],
                             each (if [k] = 0 then "" else [t])
    )
in
    List.Last(Result)

3. Go to the main table (query) and then Add column -> Invoke custom function:

 fuzzy lookup.PNG

fuzzy lookup result.PNG

Regards,
Ruslan
-------------------------------------------------------------------
Did I answer your question? Mark my post as a solution!

9 REPLIES 9
Super User
Super User

Re: Data Cleansing Advice Needed

If there is no repeatable rule, then there is no getting around the fact that you will need to Interveine manually at some point. I suggest, create a list of all the unique entries in Excel. Add a column with the result you want. Load the table to power query and merge it on the description. Then extract the new column you need. You can also create an audit query that shows which items are missing values. 

* Matt is a Microsoft MVP (Power BI) and author of the Power BI Book Supercharge Power BI.
Super User
Super User

Re: Data Cleansing Advice Needed

As @MattAllington said.

 

These steps might help you preparing that list, as they remove all strings with numbers in them:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jY/BCoMwEER/ZfHclBi17bVFqRYFUSiCeAgSrBiiJF76941iT9XG6743s0xZWtdatiOrXxD2nEHaDwOTEPiIYIwTqzoYDGJWsv+K9/3jM6FPqWRKwU3SjkHAdQQVMbLPHnKOJ5PmEIzcWbpTMco3xFRpl3hxUmyco+gXOHgGa2SKZGvEnTLPhUSiYVL1nEPeim7emUBAGz1cr91WlkZbS2H+MHp7unZWXaaq6gM=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [#"Machine Names:" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Machine Names:", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "TextToList", each Text.Split([#"Machine Names:"], " ")),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "StringToListOfCharacters", each List.Transform([TextToList], (x) => Text.ToList(x))),
    #"Added Custom2" = Table.AddColumn(#"Added Custom1", "SelectStringsWithoutNumbers", each List.Select([StringToListOfCharacters], (x) => not List.ContainsAny(x, {"0".."9"}))),
    #"Added Custom3" = Table.AddColumn(#"Added Custom2", "Reassemble", each Text.Combine(List.Transform([SelectStringsWithoutNumbers], (x) => Text.Combine(x, "")), " ")),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom3",{"Machine Names:", "Reassemble"})
in
    #"Removed Other Columns"

  and you'll get a table like this:

image.png

Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!

Imke Feldmann

How to integrate M-code into your solution -- Check out more PBI- learning resources here -- Performance Tipps for M-queries




Highlighted
Super User
Super User

Re: Data Cleansing Advice Needed

@win_toeknee

 

Seems we can also simply split by space/delimiter ...by splitting at the right most Space/Delimiter

 

solit.png

 

 

Super User
Super User

Re: Data Cleansing Advice Needed

Indeed, that's much easier for the sample given. 

... But will only work if there is just one string that contains numbers Smiley Wink

Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!

Imke Feldmann

How to integrate M-code into your solution -- Check out more PBI- learning resources here -- Performance Tipps for M-queries




zoloturu
Advisor

Re: Data Cleansing Advice Needed

@win_toeknee,

 

You can solve it in below way:

 

1. As because there is no common logic between rows then you need to create a dictionary by yourself. So, create a table called 'Dictionary' which will store all correct machine names as below:

dictionary for machine.PNG 

(example - link)

 

2. Create a new blank query and call it fnFuzzyLookup. Then paste below Power Query code there:

 

(Input as text) => 
let 
    ListCount = List.Count(Dictionary[Brand]),  
    Result = List.Generate( () => 
                            [
                                i = 0,
                                t = Dictionary[Brand]{i},
                                k = if Text.Middle(Input,0,Text.Length(t)) = t then 1 else 0
                             ],
                             each (if ([i] < ListCount and [k] < 2) then true else false),
                             each [
                                        i = [i] + 1,
                                        t = Dictionary[Brand]{i},
                                        k = if [k] = 1 then 2 else (if (Text.Middle(Input,0,Text.Length(t)) = t) then 1 else 0)  
                                    ],
                             each (if [k] = 0 then "" else [t])
    )
in
    List.Last(Result)

3. Go to the main table (query) and then Add column -> Invoke custom function:

 fuzzy lookup.PNG

fuzzy lookup result.PNG

Regards,
Ruslan
-------------------------------------------------------------------
Did I answer your question? Mark my post as a solution!

win_toeknee Member
Member

Re: Data Cleansing Advice Needed

@zoloturu absolutely stunning! There was 100% chance that I could not have come up with this on my own! So much gratitude! Thank you!

 

Question for you, is there a way for me to add a further modification to this Power Query? Everything worked out great but I have a few fields that are null/blank and I have custom names that I would like to fill in. 


Ex: Any descr named "Gantry Laser" be updated to = "Tanaka Gantry Laser"? Please see screenshot below.

 

My hope is to fill the blank/null values with names that I will have the option to custom. 

 

Any guidance will be greatly appreciated!

InvokeCustomColumn - Fill Null Values.png

 

 

 

zoloturu
Advisor

Re: Data Cleansing Advice Needed

@win_toeknee,

 

If I understood you correctly then below approach can help you:

 

1. Go to the table where you plan to add a column.

2. Menu -> Add column -> Custom column

3. Enter a condition similar to below

 

if Text.Middle([NewColumn],0,12) = "Gantry Laser" then "Tanaka " & [NewColumn] else [NewColumn]

left replace.PNG

 

(basically, check if the left side is 'Gantry Laser' then add 'Tanaka' to it or you can use a function Text.Replace)

 

Text.Middle function reference - https://docs.microsoft.com/en-us/powerquery-m/text-middle 

Text.Replace - https://docs.microsoft.com/en-us/powerquery-m/text-replace

 

Regards,
Ruslan
-------------------------------------------------------------------
Did I answer your question? Mark my post as a solution!

win_toeknee Member
Member

Re: Data Cleansing Advice Needed

So much gratitude to you! THANK YOU!

zoloturu
Advisor

Re: Data Cleansing Advice Needed

@win_toeknee,

 

You are welcome.

 

Regards,
Ruslan