ContentFull

Neighborhoods

Sun 17 February 2019

Figuring out the neighborhood a business is located in is not trivial. To illustrate here are 10 addresses in New York:

  • Bathtub Gin (132 9th Avenue, New York, NY 10010)
  • Beauty Bar (231 East 14th Street New York, NY 10003)
  • Botanic Lab (86 Orchard Street, New York, NY 10002)
  • City Winery (155 Varick Street, New York, NY 10013)
  • Coney Island USA (1208 Surf Ave., Brooklyn, New York 11224)
  • Dromedary Bar (266 Irving Avenue, Brooklyn, NY 11237)
  • Hill & Dale (115 Allen Street, New York, NY 10002)
  • KGB Bar (85 East 4th Street, New York, NY 10003)
  • Talon Bar (220 Wyckoff Ave, Brooklyn, New York 11237)

OpenStreetMap(OSM) was my first and most straightforward approach to mapping addresses to neighborhoods. Partially the ease is because of the Nominatim API that makes querying OSM a trivial task. Immediately two things become obvious. First, OSM uses neighbourhood as opposed to neighborhood. Second, the concepts of suburbs and neighborhoods is hard to understand.

Location OSM Neighborhood OSM Suburb
Bathtub Gin Chelsea
Beauty Bar Park Slope BK
Botanic Lab Chinatown
City Winery Hudson Square
Coney Island USA West Brighton
Dromedary Bar Bushwick
Hill & Dale Lower East Side
KGB Bar New Dorp Staten Island
Talon Bar Ridgewood

Diving deeper into the OSM data it becomes clear that there is a complicated ranking system for determining the primary neighborhood for an address. Notice how for the Bathtub Gin, there are actually 3 possible neighborhoods. Even more interesting is that the response in Python of querying OSM returns Chelsea as a suburb and not a neighborhood. In any case, OSM is not a good solution for resolving neighborhood queries because it leaves many blanks. Additionally, some of its results are too correct (City Winery by most people is in Chelsea and no one knows where Hudson Square is).

OpenStreetMap data for the Bathtub Gin

Second attempt was with Yelp. Yelp seemed to have good answers. Check this out:

Location Yelp
Bathtub Gin Chelsea
Beauty Bar Gramercy
Botanic Lab Lower East Side
City Winery South Village
Coney Island USA Coney Island
Dromedary Bar Bushwick
Hill & Dale Lower East Side
KGB Bar East Village
Talon Bar Bushwick

This looks perfect! Except the neighborhood information is not available through their API. Getting neighborhood data should not require scraping the internet for the answer.

At this point I realized that one place that has excellent neighborhood data is real estate sites. Checkout StreetEasy — they neatly display a hierarchy of neighborhoods.

OpenStreetMap data for the Bathtub Gin

Unfortunately StreetEasy does not make their neighborhood data available. Again, I'm not going to scrape their website for a shapefile. But their competitor, Zillow, does make their shapefile available here.

Pulling their data, and narrowing down the regions of interest down to only the New York City area, was surprisingly easy. The results were not bad, at least in par with Yelp:

Location Zillow
Bathtub Gin Chelsea
Beauty Bar Park Slope
Botanic Lab Lower East Side
City Winery SoHo
Coney Island USA Coney Island
Dromedary Bar Bushwick
Hill & Dale Lower East Side
KGB Bar New Dorp
Talon Bar Bushwick

Neighborhoods are not trivial to figure out. None of the solutions I tried worked well. Either some addresses had no neighborhood data or the neighborhood was too specific. If there is a correct neighborhood for every address, it should not be assumed that it is the one people use. None of the approaches I tried solved this problem elegantly nor could be generalized to other cities.

from geopy.geocoders import Nominatim
import shapefile
from shapely.geometry import Polygon, Point

def run():

    addresses = [
        ('Bathtub Gin', '132 9th Avenue, New York, NY 10010'),
        ('Beauty Bar', '231 East 14th Street New York, NY 10003'),
        ('Botanic Lab', '86 Orchard Street, New York, NY 10002'),
        ('City Winery', '155 Varick Street, New York, NY 10013'),
        ('Coney Island USA', '1208 Surf Ave., Brooklyn, New York 11224'),
        ('Dromedary Bar', '266 Irving Avenue, Brooklyn, NY 11237'),
        ('Hill & Dale', '115 Allen Street, New York, NY 10002'),
        ('KGB Bar', '85 East 4th Street, New York, NY 10003'),
        ('Talon Bar', '220 Wyckoff Ave, Brooklyn, New York 11237')
    ]

    # Generic client to query from OSM
    client = Nominatim(user_agent="my-application")

    # Attempt 1
    for address in addresses:
        response = client.geocode(address[1], addressdetails=True, extratags=True).raw
        print(address[0], response['address'].get('neighbourhood'), response['address'].get('suburb'))

    # Attempt 3
    regions = {}

    with shapefile.Reader("ZillowNeighborhoods-NY") as sf:
        for i in range(len(sf)):
            # Extract regions from the shapefile that are in the city of New York
            if sf.record(i)[2] == 'New York':
                regions[sf.record(i)[-2]] = Polygon(sf.shape(i).points)

    for address in addresses:
        # Lookup address Lon/Lat
        response = client.geocode(address[1], addressdetails=True, extratags=True).raw
        point = Point(float(response['lon']), float(response['lat']))

        # Filter neighborhoods which contain the address's Lon/Lat
        names = list(filter(lambda region: region[1].intersects(point), regions.items()))

        # Get the name of the neighborhoods
        names = [n[0] for n in names]

        if len(names):
            print(address[0], names)