cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
sbollmann Frequent Visitor
Frequent Visitor

How to Compare 2 Name Tables (M and F) for Duplicates and Delete Row with Lower Ranking

I've downloaded the list of top 1000 boys' and girls' names from the U.S Census in order to create a table that will help me assign genders to a list of customer names. However, I ran into an unexpected problem.  While I did expect there would be some unisex names on the list, I was shocked to discover there are several hundred - more than I can sort through manually.  For example, in 1970, Jennifer is the top female baby name but also appears on the top 1000 male baby names. There are lots and lots of other examples like this.  Thankfully, each name has a ranking (higher number ranking means more popular), so I need to write some sort of expression that evaluates a column like this: If a name exists on both the male names table and the female names table, delete the name row with the lower ranking.  How would I do this?

1 ACCEPTED SOLUTION

Accepted Solutions
KGrice Established Member
Established Member

Re: How to Compare 2 Name Tables (M and F) for Duplicates and Delete Row with Lower Ranking

I downloaded the same to see how I would do this. My wife loves baby name stats so this is double fun for me.

 

I started with top 1000 boy names from 2015 and top 1000 girl names from 2015 in two separate tables. Each table has the name, sex, and count of names given that year. From that, I made sure to sort based on count, and then added an index column to rank them 1 to 1000.

 

You might want to prefix your column names with "Girls" and "Boys" to keep them organized. Then merge the two queries with a full outer join. This will keep all rows from both sets, but where the two sets have a match, you'll only end up with one row there. For example, Harper was number 10 for girls, and number 722 for boys. They end up on the same row with a rank for each.

 

From there, add two custom columns:

 

Name = if [Girls.Name] = null then [Boys.Name] else [Girls.Name]

 

Sex = if [Girls.Sex] = null then [Boys.Sex] else if [Boys.Sex] = null then [Girls.Sex] else if [Girls.Rank] < [Boys.Rank] then [Girls.Sex] else [Boys.Sex]

 

You should end up with something like this:

 

Names.PNG

 

Remove all of the columns but the new ones, and you'll have a consolidated list with the highest-ranking M/F names kept when they are used on both lists.

1 REPLY 1
KGrice Established Member
Established Member

Re: How to Compare 2 Name Tables (M and F) for Duplicates and Delete Row with Lower Ranking

I downloaded the same to see how I would do this. My wife loves baby name stats so this is double fun for me.

 

I started with top 1000 boy names from 2015 and top 1000 girl names from 2015 in two separate tables. Each table has the name, sex, and count of names given that year. From that, I made sure to sort based on count, and then added an index column to rank them 1 to 1000.

 

You might want to prefix your column names with "Girls" and "Boys" to keep them organized. Then merge the two queries with a full outer join. This will keep all rows from both sets, but where the two sets have a match, you'll only end up with one row there. For example, Harper was number 10 for girls, and number 722 for boys. They end up on the same row with a rank for each.

 

From there, add two custom columns:

 

Name = if [Girls.Name] = null then [Boys.Name] else [Girls.Name]

 

Sex = if [Girls.Sex] = null then [Boys.Sex] else if [Boys.Sex] = null then [Girls.Sex] else if [Girls.Rank] < [Boys.Rank] then [Girls.Sex] else [Boys.Sex]

 

You should end up with something like this:

 

Names.PNG

 

Remove all of the columns but the new ones, and you'll have a consolidated list with the highest-ranking M/F names kept when they are used on both lists.