Geocoding and Web APIs

Introduction

In this toolbox exercise, you will access web APIs directly and begin to write your own package to connect with new data sources.

This toolbox exercise will focus on dealing with geographical data. You will write a tool that takes an address or place name and prints the closest MBTA stop and the distance from the given place to that stop. For example:

>>> import mbta_finder
>>> mbta_finder.find_stop_near("Fenway Park")
Yawkey is 0.13 miles from Fenway Park

Note: There are already a few Python packages that interface with mapping services, but part of this exercise is seeing how you might write your own such package from scratch.

Get Set

Grab the starter code for this toolbox exercise via the normal fork-and-clone method from https://github.com/olin-toolboxes/ToolBox-Geocoding.

You should see mbta_finder.py within the toolbox folder, which has some optional scaffolding for this exercise.

Getting data from the web

Let’s grab some data from the Internet!

>>> from urllib.request import urlopen
>>> f = urlopen("http://www.olin.edu")
>>> print(f.read(400))
<!DOCTYPE html>
<html>
<head profile="http://www.w3.org/1999/xhtml/vocab">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="shortcut icon" href="http://www.olin.edu/favicon.ico"
type="image/vnd.microsoft.icon" />
<link rel="alternate" type="application/rss+xml" title="Front page feed"
href="http://www.olin.edu/rss.xml" />
<title>Olin College</title>

Here we’re using urllib.request to open a URL and get its contents over HTTP.

The urlopen method returns a file-like object, from which we read and print the first 400 bytes. The result is the HTML code that your browser uses to display the olin.edu homepage.

What if we want to get a specific piece of information, e.g. Olin’s mailing address? As you can see from the snippet above, the HTML website code includes lots of extraneous information used to display the page in your browser. We could of course wade through this (using Python!) to find what we’re looking for, but for many web services there’s an easier way to ask for what we want directly.

Accessing web data programmatically

That more direct method is application programming interfaces. APIs let you make requests using specifically constructed URLs and return data in a nicely structured format.

There are three main steps to using any web API:

1. Read the API documentation

You should specifically look out for whether the API can provide the data you want, how to request that data, and what the return format will be.

2. Request an API developer key

Web services generally limit the number of requests you can make by requiring a unique user key to be sent with each request. In order to get a key you’ll need to agree to their terms, which restrict how you can use the service. In this class we will never ask you to agree to terms you aren’t comfortable with

3. Test out your application and launch to users

A.K.A. the fun part.

The first API we will use is the Google Maps Geocoding API. This tool (among other things) allows you to specify a place name or address and receive its latitude and longitude. Take a few minutes to read the documentation (it’s quite good) through “Geocoding Responses”.

Structured data responses (JSON)

Back? Ok cool, let’s try it out in Python. We’re going to request the response in JSON format, which we can decode using Python’s json module.

>>> from urllib.request import urlopen
>>> import json
>>> from pprint import pprint
>>> url = "https://maps.googleapis.com/maps/api/geocode/json?address=Fenway%20Park"
>>> f = urlopen(url)
>>> response_text = f.read()
>>> response_data = json.loads(str(response_text, "utf-8"))
>>> pprint(response_data)

We used the pprint module to “pretty print” the response data structure with indentation so it’s easier to visualize. You should see something similar to the JSON response from the documentation, except built from Python data types. The response data structure is built from nested dictionaries and lists, and you can step through it to access the fields you want.

  >>> print(response_data["results"][0]["formatted_address"])
  4 Yawkey Way, Boston, MA 02215, USA

Note: You might notice that I didn’t provide an API key with the request. For the Google Maps API, you can actually get away without one for a small number or requests. Be sure to limit your requests (don’t repeatedly make requests in a loop, or rate-limit using time.sleep) so that Olin’s IP range is not blocked.

Write a function to extract the latitude and longitude from the JSON response.

Speaking URL

In the above example we passed a hard-coded URL to the urlopen function, but in your code you will need to generate the parameters based on user input. Check out Understanding URLs and their structure for a helpful guide to URL components and encoding.

You can build up the URL string manually, but it’s probably helpful to check out urlencode.

Write a function that takes an address or place name as input and returns a properly encoded URL to make a Google Maps geocode request.

Getting local

Now that we can find the coordinates of a given place, let’s take things one step further and find the closest public transportation stop to that location.

To accomplish this, we will use the MBTA-realtime API. Read through the API quickstart guide, and check out the details for stopsbylocation in the API v2 documentation.

Note: The MBTA documentation provides a testing API key that you can use for small numbers of requests, so you may not need to register

Write a function that takes a latitude and longitude and returns the name of the closest MBTA stop and its distance from the given coordinates.

What to turn in

Combine your work from the previous sections to create a tool that takes a place name or address as input, finds its latitude/longitude, and returns the nearest MBTA and its distance from the starting point.

Note: Sadly there are no MBTA stops close enough to Olin College - you have to get out into the city!

Making it cooler