cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
hannahemeredith Frequent Visitor
Frequent Visitor

How to weed out bad Zip Codes

I have 87000+ responses from a survey and one of the questions was zip codes.  I'm trying to filter through the zip codes and get rid of answers like "234", "V3g1s4", "no thanks", etc.  Using the filter featers works for getting rid of blanks, but for this many responses, how can I get rid of invalid answers efficiently?  I'm looking for a method to filter out any non-numeric responses and any responses less than or greater than 5 digits.

 

On a side note, if anyone knows of the best mapping app to use within PowerBi, please share! I'm working with several different features.  ArcGis cannot display the amount of zip codes I have.

1 ACCEPTED SOLUTION

Accepted Solutions
Super User
Super User

Re: How to weed out bad Zip Codes

Perhaps:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjE2MDNRitWJVjIyhtBlxumGxRBmXr5CSUZiXnYxmAdWamqmFBsLAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Int32.From([Column1])),
    #"Removed Errors" = Table.RemoveRowsWithErrors(#"Added Custom", {"Custom"}),
    #"Duplicated Column" = Table.DuplicateColumn(#"Removed Errors", "Custom", "Custom - Copy"),
    #"Calculated Text Length" = Table.TransformColumns(Table.TransformColumnTypes(#"Duplicated Column", {{"Custom - Copy", type text}}, "en-US"),{{"Custom - Copy", Text.Length, Int64.Type}}),
    #"Filtered Rows" = Table.SelectRows(#"Calculated Text Length", each ([#"Custom - Copy"] = 5)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Custom", "Custom - Copy"})
in
    #"Removed Columns"

Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!


5 REPLIES 5
Super User
Super User

Re: How to weed out bad Zip Codes

Perhaps:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjE2MDNRitWJVjIyhtBlxumGxRBmXr5CSUZiXnYxmAdWamqmFBsLAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Int32.From([Column1])),
    #"Removed Errors" = Table.RemoveRowsWithErrors(#"Added Custom", {"Custom"}),
    #"Duplicated Column" = Table.DuplicateColumn(#"Removed Errors", "Custom", "Custom - Copy"),
    #"Calculated Text Length" = Table.TransformColumns(Table.TransformColumnTypes(#"Duplicated Column", {{"Custom - Copy", type text}}, "en-US"),{{"Custom - Copy", Text.Length, Int64.Type}}),
    #"Filtered Rows" = Table.SelectRows(#"Calculated Text Length", each ([#"Custom - Copy"] = 5)),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Custom", "Custom - Copy"})
in
    #"Removed Columns"

Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!


hannahemeredith Frequent Visitor
Frequent Visitor

Re: How to weed out bad Zip Codes

Okay, I'm a little illiterate when it comes to syntax. I tried pasting the code and it gave me this:

 

Capture.PNG

Highlighted
Super User
Super User

Re: How to weed out bad Zip Codes

Yeah, can't do that, I started with an Enter Data query. Create a blank query and paste in my code. Basically, what I did was to create a custom column with this formula:

 

=Int32.From([Column1])

 

This returns a number if it is a number or Error if not. I then right-clicked this column and chose "Remove Errors". Then you could select this column and choose Transform | Extract | Length. Then filter to just 5. Finally, just remove this column. 

 

You should be left with your original column with just valid zip codes.

 

 


Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!


synergised Member
Member

Re: How to weed out bad Zip Codes

If you are wanting to validate the zipcodes you might need to take it a step futher.  Basic validation, about the only thing you can check is if the zipcode is numeric or not (USA Only).  There are regex for zipcodes - but we didn't have much luck with them since we only have USA data - although it might help if you are getting USA and Canada zipcodes coming in. 

 

We require a 5 digit zipcode (USA Only) to be entered on our microsites.  The main goto entry for spam leads is "12345".. which is a valid zipcode (Schenectady,   NY) - so we have to use other criteria to weed out the spam leads from this region.  Point is - if you have 5 digits - it might look ok - but it could still be an invalid zipcode.

 

We resorted to buying zipcode data from:  https://www.zip-codes.com/  which gets updated monthly.  This has the added benefit of providing us additional data:  County, City, State, Longitude, Latitude, AreaCodes, etc.  We have a data cleanup step before loading the data into Power BI.  With this data, we can use a SQL join to determine if we have an invalid zipcode or not (with exceptions).  

 

 

 

hannahemeredith Frequent Visitor
Frequent Visitor

Re: How to weed out bad Zip Codes

Amazing! Thank you so much! Smiley Very Happy