cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
Anonymous
Not applicable

Data Cleansing Advice Needed

I have a number of data fields that I am trying to cleanse and am trying to find the most efficient way to go about this. Please see below sample of data: 

 

Machine Names:
Acritech Hole Popper ED-2000M
Acritech Hole Popper ED-2000M2
Acritech Hole Popper ED-2000MR
Acritech Hole Popper ED-2500M
Dener Press Brake Elite-XL-175-3.6
Dener Press Brake Elite-XL-320-4
Gantry Laser 25LMX
Gantry Laser 25LMXII
Gantry Laser 30LMXIII
Gantry Laser 35LMXRII
Gantry Laser 40LMXVII
Ingersoll Sinker EDM Eagle 500
Ingersoll Sinker EDM Gantry 1500HSJ
Ingersoll Sinker EDM Gantry 500
Ingersoll Sinker EDM Gantry 500HSJ
Ingersoll Sinker EDM Gantry 800

 

Goal: To be able to extract a specific portion of the machine name (EX: Acritech Hole Popper / Dener Press Brake / Gantry Laser / Ingersoll Sinker EDM) from a list simular to the sample data above. 

 

Problem: I am struggling on finding the most efficient way of extracting a certain portion of the machine name because there are hundreds of names like this and not all of the companies have the same number of characters or any delimiters that I can extract from. 

 

How would you go about this? Calculated column with hundreds of IF statements? 

 

Any guidance that you can share will be greatly appreciated!

 

Thank you datanauts!

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
Memorable Member
Memorable Member

Re: Data Cleansing Advice Needed

@Anonymous,

 

You can solve it in below way:

 

1. As because there is no common logic between rows then you need to create a dictionary by yourself. So, create a table called 'Dictionary' which will store all correct machine names as below:

dictionary for machine.PNG 

(example - link)

 

2. Create a new blank query and call it fnFuzzyLookup. Then paste below Power Query code there:

 

(Input as text) => 
let 
    ListCount = List.Count(Dictionary[Brand]),  
    Result = List.Generate( () => 
                            [
                                i = 0,
                                t = Dictionary[Brand]{i},
                                k = if Text.Middle(Input,0,Text.Length(t)) = t then 1 else 0
                             ],
                             each (if ([i] < ListCount and [k] < 2) then true else false),
                             each [
                                        i = [i] + 1,
                                        t = Dictionary[Brand]{i},
                                        k = if [k] = 1 then 2 else (if (Text.Middle(Input,0,Text.Length(t)) = t) then 1 else 0)  
                                    ],
                             each (if [k] = 0 then "" else [t])
    )
in
    List.Last(Result)

3. Go to the main table (query) and then Add column -> Invoke custom function:

 fuzzy lookup.PNG

fuzzy lookup result.PNG

Regards,
Ruslan
-------------------------------------------------------------------
Did I answer your question? Mark my post as a solution!

View solution in original post

9 REPLIES 9
Highlighted

Re: Data Cleansing Advice Needed

If there is no repeatable rule, then there is no getting around the fact that you will need to Interveine manually at some point. I suggest, create a list of all the unique entries in Excel. Add a column with the result you want. Load the table to power query and merge it on the description. Then extract the new column you need. You can also create an audit query that shows which items are missing values. 



* Matt is a Microsoft MVP (Power BI) and author of the Power BI Book Supercharge Power BI.
Highlighted
Super User III
Super User III

Re: Data Cleansing Advice Needed

As @MattAllington said.

 

These steps might help you preparing that list, as they remove all strings with numbers in them:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jY/BCoMwEER/ZfHclBi17bVFqRYFUSiCeAgSrBiiJF76941iT9XG6743s0xZWtdatiOrXxD2nEHaDwOTEPiIYIwTqzoYDGJWsv+K9/3jM6FPqWRKwU3SjkHAdQQVMbLPHnKOJ5PmEIzcWbpTMco3xFRpl3hxUmyco+gXOHgGa2SKZGvEnTLPhUSiYVL1nEPeim7emUBAGz1cr91WlkZbS2H+MHp7unZWXaaq6gM=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [#"Machine Names:" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Machine Names:", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "TextToList", each Text.Split([#"Machine Names:"], " ")),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "StringToListOfCharacters", each List.Transform([TextToList], (x) => Text.ToList(x))),
    #"Added Custom2" = Table.AddColumn(#"Added Custom1", "SelectStringsWithoutNumbers", each List.Select([StringToListOfCharacters], (x) => not List.ContainsAny(x, {"0".."9"}))),
    #"Added Custom3" = Table.AddColumn(#"Added Custom2", "Reassemble", each Text.Combine(List.Transform([SelectStringsWithoutNumbers], (x) => Text.Combine(x, "")), " ")),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom3",{"Machine Names:", "Reassemble"})
in
    #"Removed Other Columns"

  and you'll get a table like this:

image.png

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Highlighted
Super User III
Super User III

Re: Data Cleansing Advice Needed

@Anonymous

 

Seems we can also simply split by space/delimiter ...by splitting at the right most Space/Delimiter

 

solit.png

 

 

Try my new Power BI game Cross the River
Highlighted
Super User III
Super User III

Re: Data Cleansing Advice Needed

Indeed, that's much easier for the sample given. 

... But will only work if there is just one string that contains numbers 😉

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Highlighted
Memorable Member
Memorable Member

Re: Data Cleansing Advice Needed

@Anonymous,

 

You can solve it in below way:

 

1. As because there is no common logic between rows then you need to create a dictionary by yourself. So, create a table called 'Dictionary' which will store all correct machine names as below:

dictionary for machine.PNG 

(example - link)

 

2. Create a new blank query and call it fnFuzzyLookup. Then paste below Power Query code there:

 

(Input as text) => 
let 
    ListCount = List.Count(Dictionary[Brand]),  
    Result = List.Generate( () => 
                            [
                                i = 0,
                                t = Dictionary[Brand]{i},
                                k = if Text.Middle(Input,0,Text.Length(t)) = t then 1 else 0
                             ],
                             each (if ([i] < ListCount and [k] < 2) then true else false),
                             each [
                                        i = [i] + 1,
                                        t = Dictionary[Brand]{i},
                                        k = if [k] = 1 then 2 else (if (Text.Middle(Input,0,Text.Length(t)) = t) then 1 else 0)  
                                    ],
                             each (if [k] = 0 then "" else [t])
    )
in
    List.Last(Result)

3. Go to the main table (query) and then Add column -> Invoke custom function:

 fuzzy lookup.PNG

fuzzy lookup result.PNG

Regards,
Ruslan
-------------------------------------------------------------------
Did I answer your question? Mark my post as a solution!

View solution in original post

Highlighted
Anonymous
Not applicable

Re: Data Cleansing Advice Needed

@zoloturu absolutely stunning! There was 100% chance that I could not have come up with this on my own! So much gratitude! Thank you!

 

Question for you, is there a way for me to add a further modification to this Power Query? Everything worked out great but I have a few fields that are null/blank and I have custom names that I would like to fill in. 


Ex: Any descr named "Gantry Laser" be updated to = "Tanaka Gantry Laser"? Please see screenshot below.

 

My hope is to fill the blank/null values with names that I will have the option to custom. 

 

Any guidance will be greatly appreciated!

InvokeCustomColumn - Fill Null Values.png

 

 

 

Highlighted
Memorable Member
Memorable Member

Re: Data Cleansing Advice Needed

@Anonymous,

 

If I understood you correctly then below approach can help you:

 

1. Go to the table where you plan to add a column.

2. Menu -> Add column -> Custom column

3. Enter a condition similar to below

 

if Text.Middle([NewColumn],0,12) = "Gantry Laser" then "Tanaka " & [NewColumn] else [NewColumn]

left replace.PNG

 

(basically, check if the left side is 'Gantry Laser' then add 'Tanaka' to it or you can use a function Text.Replace)

 

Text.Middle function reference - https://docs.microsoft.com/en-us/powerquery-m/text-middle 

Text.Replace - https://docs.microsoft.com/en-us/powerquery-m/text-replace

 

Regards,
Ruslan
-------------------------------------------------------------------
Did I answer your question? Mark my post as a solution!

Highlighted
Anonymous
Not applicable

Re: Data Cleansing Advice Needed

So much gratitude to you! THANK YOU!

Highlighted
Memorable Member
Memorable Member

Re: Data Cleansing Advice Needed

@Anonymous,

 

You are welcome.

 

Regards,
Ruslan

Helpful resources

Announcements
Community Blog

Community Blog

Visit our Community Blog for articles, guides, and information created by fellow community members.

Using the Community

Using the Community

Need help with the Power BI Community? Our 'Using the Community' support articles are a great place to start.

Community Summit North America

Community Summit North America

Innovate, Collaborate, Grow. The top training and networking event across the globe for Microsoft Business Applications

Power Platform 2020 release wave 2 plan

Power Platform 2020 release wave 2 plan

Features releasing from October 2020 through March 2021

Top Solution Authors
Top Kudoed Authors