Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
sbollmann
Frequent Visitor

How to Compare 2 Name Tables (M and F) for Duplicates and Delete Row with Lower Ranking

I've downloaded the list of top 1000 boys' and girls' names from the U.S Census in order to create a table that will help me assign genders to a list of customer names. However, I ran into an unexpected problem.  While I did expect there would be some unisex names on the list, I was shocked to discover there are several hundred - more than I can sort through manually.  For example, in 1970, Jennifer is the top female baby name but also appears on the top 1000 male baby names. There are lots and lots of other examples like this.  Thankfully, each name has a ranking (higher number ranking means more popular), so I need to write some sort of expression that evaluates a column like this: If a name exists on both the male names table and the female names table, delete the name row with the lower ranking.  How would I do this?

1 ACCEPTED SOLUTION
KGrice
Memorable Member
Memorable Member

I downloaded the same to see how I would do this. My wife loves baby name stats so this is double fun for me.

 

I started with top 1000 boy names from 2015 and top 1000 girl names from 2015 in two separate tables. Each table has the name, sex, and count of names given that year. From that, I made sure to sort based on count, and then added an index column to rank them 1 to 1000.

 

You might want to prefix your column names with "Girls" and "Boys" to keep them organized. Then merge the two queries with a full outer join. This will keep all rows from both sets, but where the two sets have a match, you'll only end up with one row there. For example, Harper was number 10 for girls, and number 722 for boys. They end up on the same row with a rank for each.

 

From there, add two custom columns:

 

Name = if [Girls.Name] = null then [Boys.Name] else [Girls.Name]

 

Sex = if [Girls.Sex] = null then [Boys.Sex] else if [Boys.Sex] = null then [Girls.Sex] else if [Girls.Rank] < [Boys.Rank] then [Girls.Sex] else [Boys.Sex]

 

You should end up with something like this:

 

Names.PNG

 

Remove all of the columns but the new ones, and you'll have a consolidated list with the highest-ranking M/F names kept when they are used on both lists.

View solution in original post

1 REPLY 1
KGrice
Memorable Member
Memorable Member

I downloaded the same to see how I would do this. My wife loves baby name stats so this is double fun for me.

 

I started with top 1000 boy names from 2015 and top 1000 girl names from 2015 in two separate tables. Each table has the name, sex, and count of names given that year. From that, I made sure to sort based on count, and then added an index column to rank them 1 to 1000.

 

You might want to prefix your column names with "Girls" and "Boys" to keep them organized. Then merge the two queries with a full outer join. This will keep all rows from both sets, but where the two sets have a match, you'll only end up with one row there. For example, Harper was number 10 for girls, and number 722 for boys. They end up on the same row with a rank for each.

 

From there, add two custom columns:

 

Name = if [Girls.Name] = null then [Boys.Name] else [Girls.Name]

 

Sex = if [Girls.Sex] = null then [Boys.Sex] else if [Boys.Sex] = null then [Girls.Sex] else if [Girls.Rank] < [Boys.Rank] then [Girls.Sex] else [Boys.Sex]

 

You should end up with something like this:

 

Names.PNG

 

Remove all of the columns but the new ones, and you'll have a consolidated list with the highest-ranking M/F names kept when they are used on both lists.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.