Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
JohnThomas
Helper II
Helper II

Normalizing Text in a Field

Hi all,

 

Looking for some help that seems like it should have been solved before, but I can't seem to find it. If someone can provide a link I would appreciate it!

 

Problem:

I have bunch of csv files I am importing from a folder.  One of the trasfomations I need to make is to normalize the employee's name.  I have a consistant Employee ID, but evertime there is a change in status (pay increase, job title change, department change, etc.) the admin that enters the employee's name always does it differently.  Sometimes there are comas, sometimes there two spaces between first and last name, sometimes the middle name is there, other times just the middle initial, etc.  When the Employee ID and Name are placed on a pivot table, this causes all kinds of havoc!

 

What I need:

Looking for the best way to pick the most recent transaction and change all the previous names to that name. (Yes, sometimes people get married, so the name change would be a valid one.)

 

What I have tried:

Using Power Query "Replace Values" to fix the names.  This works, but they have to be done one at a time and every week there's more corrections that have to be made.  The list of "# Fix Name 1" is getting rather long.

 

What I think might be the best solution:

  1. Create another table (tblEmployeeNames) with Employee ID and Name
  2. Delete the Name from the source data using Power Query.
  3. Set a relation in the data model to the tblEmployeeNames table.

Issue with above is I don't always know when we add employees.  Is there a way to dinamically only append new names to tblEmployeeNames?

 

Thanks in advance...

1 ACCEPTED SOLUTION
VP
MVP

There are 2 options that I used in past for similar situation.

 

Option 1:- Preferred option

  • Use Active directory as data source. Pull Employee ID and Given Name from Active directory
  • Connect Employee ID from Active directory to Employee ID from CSV file.
  • Use Given Name for Power BI interaction.

 

Option 2:-Table reference (As you suggested)

 

  1. After combining All CSV files, Create new query using first query as a reference.
  2. Remove all column except Employee ID
  3. Remove Employee ID Duplicate
  4. Compare this list to separate table where you maintain employee ID Vs Name reference.
  5. Keep Only new entry:- Load this to Power BI report to keep track of new entry. Also this list all record that you need to add it to master table
  6. Append filtered list (Only new Entry) to existing records.
  7. Use new table for name display.

 

Open to learn how others have solved in past

 

 

View solution in original post

2 REPLIES 2
VP
MVP

There are 2 options that I used in past for similar situation.

 

Option 1:- Preferred option

  • Use Active directory as data source. Pull Employee ID and Given Name from Active directory
  • Connect Employee ID from Active directory to Employee ID from CSV file.
  • Use Given Name for Power BI interaction.

 

Option 2:-Table reference (As you suggested)

 

  1. After combining All CSV files, Create new query using first query as a reference.
  2. Remove all column except Employee ID
  3. Remove Employee ID Duplicate
  4. Compare this list to separate table where you maintain employee ID Vs Name reference.
  5. Keep Only new entry:- Load this to Power BI report to keep track of new entry. Also this list all record that you need to add it to master table
  6. Append filtered list (Only new Entry) to existing records.
  7. Use new table for name display.

 

Open to learn how others have solved in past

 

 

Can you expand on Step 4?  How do I compare two tables for what's in one table and not the other (in Power Query)?

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.