Fetching Data from the Web¶
workflow.web provides a simple API for retrieving data from the Web modelled on the excellent requests library.
The purpose of workflow.web is to cover trivial cases at just 0.5% of the size of requests.
Features¶
- JSON requests and responses
- Form data submission
- File uploads
- Redirection support
The main API consists of the get() and post() functions and the Response instances they return.
Warning
As workflow.web is based on Python 2’s standard HTTP libraries, it does not verify SSL certificates when establishing HTTPS connections.
As a result, you must not use this module for sensitive connections.
If you require certificate verification for HTTPS connections (which you really should), you should use the excellent requests library (upon which the workflow.web API is based) or the command-line tool cURL, which is installed by default on OS X, instead.
Examples¶
There are some examples of using workflow.web in other parts of the documentation:
API¶
get() and post() are wrappers around request(). They all return Response objects.
- workflow.web.get(url, params=None, headers=None, cookies=None, auth=None, timeout=60, allow_redirects=True)¶
Initiate a GET request. Arguments as for request().
Returns: Response instance
- workflow.web.post(url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=60, allow_redirects=False)¶
Initiate a POST request. Arguments as for request().
Returns: Response instance
- workflow.web.request(method, url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=60, allow_redirects=False)¶
Initiate an HTTP(S) request. Returns Response object.
Parameters: - method (unicode) – ‘GET’ or ‘POST’
- url (unicode) – URL to open
- params (dict) – mapping of URL parameters
- data (dict or str) – mapping of form data {'field_name': 'value'} or str
- headers (dict) – HTTP headers
- cookies (dict) – cookies to send to server
- files (dict) – files to upload (see below).
- auth (tuple) – username, password
- timeout (int) – connection timeout limit in seconds
- allow_redirects (Boolean) – follow redirections
Returns: Response object
The files argument is a dictionary:
{'fieldname' : { 'filename': 'blah.txt', 'content': '<binary data>', 'mimetype': 'text/plain'} }
- fieldname is the name of the field in the HTML form.
- mimetype is optional. If not provided, mimetypes will be used to guess the mimetype, or application/octet-stream will be used.
The Response object¶
- class workflow.web.Response(request)¶
Returned by request() / get() / post() functions.
A simplified version of the Response object in the requests library.
>>> r = request('http://www.google.com') >>> r.status_code 200 >>> r.encoding ISO-8859-1 >>> r.content # bytes <html> ... >>> r.text # unicode, decoded according to charset in HTTP header/meta tag u'<html> ...' >>> r.json() # content parsed as JSON
- iter_content(chunk_size=4096, decode_unicode=False)¶
Iterate over response data.
New in version 1.6.
Parameters: - chunk_size (int) – Number of bytes to read into memory
- decode_unicode (Boolean) – Decode to Unicode using detected encoding
Returns: iterator
- raise_for_status()¶
Raise stored error if one occurred.
error will be instance of urllib2.HTTPError
- save_to_path(filepath)¶
Save retrieved data to file at filepath
Parameters: filepath – Path to save retrieved data.