Solved: Cleaning strings effectively (power query)

Anonymous · ‎10-10-2019

Hi guys,

I have a list of names from two different sources, but they are not exactly the same. Unfortunately, there is no consistent way to split the name based on a delimiter and then pick only the first and surname, because there are inconsistencies in these areas as well.

What are the possibilities in either dax or m to clean data like this?

Example:

Source 1:
Daniel L Jones
Meredith Anne Summer
Chloe Lemaire-Trudeau
Martin van Hubert

Source 2:
Daniel Jones
Anne Summer
Chloe Lemaire
Martin van der Hubert

Anonymous · ‎10-11-2019

I manage to fix it by doing:

-A fuzzy merge (join) of the names from the two sources
-Create ID's (index)
-Join these to the fact table

View solution in original post

v-eachen-msft · ‎10-10-2019

Hi @Anonymous ,

I studied the examples you provided below and can't find an inherent rule for cleaning. As a workaround, you could use other fields(like unique ids) to match them.

Community Support Team _ Eads
If this post helps, then please consider Accept it as the solution to help the other members find it.

Anonymous · ‎10-11-2019

I manage to fix it by doing:

-A fuzzy merge (join) of the names from the two sources
-Create ID's (index)
-Join these to the fact table

Cleaning strings effectively (power query)

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly