Conditional GET

From ActiveArchives
Jump to: navigation, search

Conditional requests are HTTP requests that set a last_modified and/or etag header to which a server may simply respond "304" if the resource in question hasn't changed. The idea is that your program caches the "last_modified" or "etag" values from previous requests (among probably other things), to use for future "conditional requests" that can then fall back to previously stored values if nothing has changed.

Surprisingly under documented (it seems to me), a simple way to do an "conditional" HTTP GETs with python's standard library.

python 3

from urllib.request import BaseHandler, Request, build_opener
import urllib.response
 
# http://www.artima.com/forums/flat.jsp?forum=122&thread=15024
class NotModifiedHandler(BaseHandler):
    def http_error_304(self, req, fp, code, message, headers):
        addinfourl = urllib.response.addinfourl(fp, headers, req.get_full_url())
        addinfourl.code = code
        return addinfourl
 
def conditional_get(url, last_modified=None, etag=None, user_agent=None):
    """Uses optional last_modified and/or etag to do a "conditional get" of the
    given url.  (when neither is given, results in a regular get) Returns:
    file-like object as returned by urllib2.urlopen """
 
    request = Request(url)
    if user_agent:
        request.add_header("User-Agent", user_agent)
    if last_modified:
        request.add_header("If-Modified-Since", last_modified)
    if etag:
        request.add_header("If-None-Match", etag)
    opener = build_opener(NotModifiedHandler())
    return opener.open(request)

python 2

# http://www.artima.com/forums/flat.jsp?forum=122&thread=15024
class NotModifiedHandler(urllib2.BaseHandler):
    def http_error_304(self, req, fp, code, message, headers):
        addinfourl = urllib2.addinfourl(fp, headers, req.get_full_url())
        addinfourl.code = code
        return addinfourl
 
def conditional_get(url, last_modified=None, etag=None, user_agent=None):
    """Uses optional last_modified and/or etag to do a "conditional get" of the
    given url.  (when neither is given, results in a regular get) Returns:
    file-like object as returned by urllib2.urlopen """
 
    request = urllib2.Request(url)
    if user_agent:
        request.add_header("User-Agent", user_agent)
    if last_modified:
        request.add_header("If-Modified-Since", last_modified)
    if etag:
        request.add_header("If-None-Match", etag)
    opener = urllib2.build_opener(NotModifiedHandler())
    return opener.open(request)

Example of use

f = conditional_get(url)
print (f.code == 200)
info = f.info()
f2 = conditional_get(url, info.get("last_modified"), info.get("etag"))
print (f2.code == 304)
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox