Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
hannahemeredith
Regular Visitor

How to weed out bad Zip Codes

I have 87000+ responses from a survey and one of the questions was zip codes.  I'm trying to filter through the zip codes and get rid of answers like "234", "V3g1s4", "no thanks", etc.  Using the filter featers works for getting rid of blanks, but for this many responses, how can I get rid of invalid answers efficiently?  I'm looking for a method to filter out any non-numeric responses and any responses less than or greater than 5 digits.

 

On a side note, if anyone knows of the best mapping app to use within PowerBi, please share! I'm working with several different features.  ArcGis cannot display the amount of zip codes I have.

1 ACCEPTED SOLUTION
Greg_Deckler
Super User
Super User

Perhaps:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjE2MDNRitWJVjIyhtBlxumGxRBmXr5CSUZiXnYxmAdWamqmFBsLAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Int32.From([Column1])),
    #"Removed Errors" = Table.RemoveRowsWithErrors(#"Added Custom", {"Custom"}),
    #"Duplicated Column" = Table.DuplicateColumn(#"Removed Errors", "Custom", "Custom - Copy"),
    #"Calculated Text Length" = Table.TransformColumns(Table.TransformColumnTypes(#"Duplicated Column", {{"Custom - Copy", type text}}, "en-US"),{{"Custom - Copy", Text.Length, Int64.Type}}),
    #"Filtered Rows" = Table.SelectRows(#"Calculated Text Length", each ([#"Custom - Copy"] = 5)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Custom", "Custom - Copy"})
in
    #"Removed Columns"

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Mastering Power BI 2nd Edition

DAX is easy, CALCULATE makes DAX hard...

View solution in original post

5 REPLIES 5
synergised
Resolver II
Resolver II

If you are wanting to validate the zipcodes you might need to take it a step futher.  Basic validation, about the only thing you can check is if the zipcode is numeric or not (USA Only).  There are regex for zipcodes - but we didn't have much luck with them since we only have USA data - although it might help if you are getting USA and Canada zipcodes coming in. 

 

We require a 5 digit zipcode (USA Only) to be entered on our microsites.  The main goto entry for spam leads is "12345".. which is a valid zipcode (Schenectady,   NY) - so we have to use other criteria to weed out the spam leads from this region.  Point is - if you have 5 digits - it might look ok - but it could still be an invalid zipcode.

 

We resorted to buying zipcode data from:  https://www.zip-codes.com/  which gets updated monthly.  This has the added benefit of providing us additional data:  County, City, State, Longitude, Latitude, AreaCodes, etc.  We have a data cleanup step before loading the data into Power BI.  With this data, we can use a SQL join to determine if we have an invalid zipcode or not (with exceptions).  

 

 

 

Greg_Deckler
Super User
Super User

Perhaps:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjE2MDNRitWJVjIyhtBlxumGxRBmXr5CSUZiXnYxmAdWamqmFBsLAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Int32.From([Column1])),
    #"Removed Errors" = Table.RemoveRowsWithErrors(#"Added Custom", {"Custom"}),
    #"Duplicated Column" = Table.DuplicateColumn(#"Removed Errors", "Custom", "Custom - Copy"),
    #"Calculated Text Length" = Table.TransformColumns(Table.TransformColumnTypes(#"Duplicated Column", {{"Custom - Copy", type text}}, "en-US"),{{"Custom - Copy", Text.Length, Int64.Type}}),
    #"Filtered Rows" = Table.SelectRows(#"Calculated Text Length", each ([#"Custom - Copy"] = 5)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Custom", "Custom - Copy"})
in
    #"Removed Columns"

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Mastering Power BI 2nd Edition

DAX is easy, CALCULATE makes DAX hard...

Okay, I'm a little illiterate when it comes to syntax. I tried pasting the code and it gave me this:

 

Capture.PNG

Yeah, can't do that, I started with an Enter Data query. Create a blank query and paste in my code. Basically, what I did was to create a custom column with this formula:

 

=Int32.From([Column1])

 

This returns a number if it is a number or Error if not. I then right-clicked this column and chose "Remove Errors". Then you could select this column and choose Transform | Extract | Length. Then filter to just 5. Finally, just remove this column. 

 

You should be left with your original column with just valid zip codes.

 

 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Mastering Power BI 2nd Edition

DAX is easy, CALCULATE makes DAX hard...

Amazing! Thank you so much! Smiley Very Happy

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.