Neighborhoods
Figuring out the neighborhood a business is located in is not trivial. To illustrate here are 10 addresses in New York:
- Bathtub Gin (132 9th Avenue, New York, NY 10010)
- Beauty Bar (231 East 14th Street New York, NY 10003)
- Botanic Lab (86 Orchard Street, New York, NY 10002)
- City Winery (155 Varick Street, New York, NY 10013)
- Coney Island USA (1208 Surf Ave., Brooklyn, New York 11224)
- Dromedary Bar (266 Irving Avenue, Brooklyn, NY 11237)
- Hill & Dale (115 Allen Street, New York, NY 10002)
- KGB Bar (85 East 4th Street, New York, NY 10003)
- Talon Bar (220 Wyckoff Ave, Brooklyn, New York 11237)
OpenStreetMap(OSM) was my first and most straightforward approach to mapping addresses to neighborhoods. Partially the ease is because of the Nominatim API that makes querying OSM a trivial task. Immediately two things become obvious. First, OSM uses neighbourhood as opposed to neighborhood. Second, the concepts of suburbs and neighborhoods is hard to understand.
Location | OSM Neighborhood | OSM Suburb |
---|---|---|
Bathtub Gin | Chelsea | |
Beauty Bar | Park Slope | BK |
Botanic Lab | Chinatown | |
City Winery | Hudson Square | |
Coney Island USA | West Brighton | |
Dromedary Bar | Bushwick | |
Hill & Dale | Lower East Side | |
KGB Bar | New Dorp | Staten Island |
Talon Bar | Ridgewood |
Diving deeper into the OSM data it becomes clear that there is a complicated ranking system for determining the primary neighborhood for an address. Notice how for the Bathtub Gin, there are actually 3 possible neighborhoods. Even more interesting is that the response in Python of querying OSM returns Chelsea as a suburb and not a neighborhood. In any case, OSM is not a good solution for resolving neighborhood queries because it leaves many blanks. Additionally, some of its results are too correct (City Winery by most people is in Chelsea and no one knows where Hudson Square is).
Second attempt was with Yelp. Yelp seemed to have good answers. Check this out:
Location | Yelp |
---|---|
Bathtub Gin | Chelsea |
Beauty Bar | Gramercy |
Botanic Lab | Lower East Side |
City Winery | South Village |
Coney Island USA | Coney Island |
Dromedary Bar | Bushwick |
Hill & Dale | Lower East Side |
KGB Bar | East Village |
Talon Bar | Bushwick |
This looks perfect! Except the neighborhood information is not available through their API. Getting neighborhood data should not require scraping the internet for the answer.
At this point I realized that one place that has excellent neighborhood data is real estate sites. Checkout StreetEasy — they neatly display a hierarchy of neighborhoods.
Unfortunately StreetEasy does not make their neighborhood data available. Again, I'm not going to scrape their website for a shapefile. But their competitor, Zillow, does make their shapefile available here.
Pulling their data, and narrowing down the regions of interest down to only the New York City area, was surprisingly easy. The results were not bad, at least in par with Yelp:
Location | Zillow |
---|---|
Bathtub Gin | Chelsea |
Beauty Bar | Park Slope |
Botanic Lab | Lower East Side |
City Winery | SoHo |
Coney Island USA | Coney Island |
Dromedary Bar | Bushwick |
Hill & Dale | Lower East Side |
KGB Bar | New Dorp |
Talon Bar | Bushwick |
Neighborhoods are not trivial to figure out. None of the solutions I tried worked well. Either some addresses had no neighborhood data or the neighborhood was too specific. If there is a correct neighborhood for every address, it should not be assumed that it is the one people use. None of the approaches I tried solved this problem elegantly nor could be generalized to other cities.
from geopy.geocoders import Nominatim
import shapefile
from shapely.geometry import Polygon, Point
def run():
addresses = [
('Bathtub Gin', '132 9th Avenue, New York, NY 10010'),
('Beauty Bar', '231 East 14th Street New York, NY 10003'),
('Botanic Lab', '86 Orchard Street, New York, NY 10002'),
('City Winery', '155 Varick Street, New York, NY 10013'),
('Coney Island USA', '1208 Surf Ave., Brooklyn, New York 11224'),
('Dromedary Bar', '266 Irving Avenue, Brooklyn, NY 11237'),
('Hill & Dale', '115 Allen Street, New York, NY 10002'),
('KGB Bar', '85 East 4th Street, New York, NY 10003'),
('Talon Bar', '220 Wyckoff Ave, Brooklyn, New York 11237')
]
# Generic client to query from OSM
client = Nominatim(user_agent="my-application")
# Attempt 1
for address in addresses:
response = client.geocode(address[1], addressdetails=True, extratags=True).raw
print(address[0], response['address'].get('neighbourhood'), response['address'].get('suburb'))
# Attempt 3
regions = {}
with shapefile.Reader("ZillowNeighborhoods-NY") as sf:
for i in range(len(sf)):
# Extract regions from the shapefile that are in the city of New York
if sf.record(i)[2] == 'New York':
regions[sf.record(i)[-2]] = Polygon(sf.shape(i).points)
for address in addresses:
# Lookup address Lon/Lat
response = client.geocode(address[1], addressdetails=True, extratags=True).raw
point = Point(float(response['lon']), float(response['lat']))
# Filter neighborhoods which contain the address's Lon/Lat
names = list(filter(lambda region: region[1].intersects(point), regions.items()))
# Get the name of the neighborhoods
names = [n[0] for n in names]
if len(names):
print(address[0], names)