Table of Contents generated with DocToc

Style Guide: Code Examples

Introduction

A code example should demonstrate how to use a function, class, or API in a way that is easy and quick to understand. For instance, here is a code example demonstrating how to write a simple object as JSON:

# PRELUDE
import json

# CODE
print json.dumps({"a":1})

# OUTPUT
{"a": 1}

This example is excellent because:

Here is code example demonstrating how to reverse a deque object:

# PRELUDE
import collections

# CODE
d = collections.deque([1, 2, 3])
d.reverse()
print d

# OUTPUT
deque([3, 2, 1])

This code example is excellent because

Here is a code example showing how to inspect the arguments of a function:

# PRELUDE
import inspect

# CODE
def f(a, b=2, *args, **kwargs):
    return a

print getargspec(f)

# OUTPUT
ArgSpec(args=['a', 'b'], varargs='args', keywords='kwargs', defaults=(2,))

This code example is excellent because

Finally, here is a code example showing how to extract the query string from a request on a Werkzeug server:

# PRELUDE
from werkzeug.wrappers import Request, Response
from werkzeug.serving import run_simple
from os import environ

# CODE
def application(environ, start_response):
    request = Request(environ)
    query = request.args.get("query")

    response = Response("You searched " + query)
    return response(environ, start_response)

# POSTLUDE
port = int(environ.get("PORT", "8000"))
run_simple("0.0.0.0", port, application)

This code example is excellent because

Goals: Readability, Concision, Simplicity, Consistency

Fortunately these goals are often aligned.

But do not follow these guidelines when they do not make sense. Instead, use your best judgment about when to bend or break these rules.

In this document, good code examples are shown like this:

print "This is a positive example of how to write good code examples."

And negative examples, showing what not to do, are shown like this:

print('This is a negative example showing how NOT to write code examples.')

Basics

This section contains basic guidelines for code examples.

Note that these may vary from traditional coding style guidelines because examples are driven by different principles of construction than those used in software engineering. For example, clarity and conciseness are important here, but modularity and reusability are not.

Follow PEP8 as a basic coding style guideline

The standard Python coding conventions used by almost all Python developers are described in a document called PEP8: https://www.Python.org/dev/peps/pep-0008/. The most important of these conventions are listed below:

We diverge from PEP8 as follows:

Show only one concept per example

This example shows two independent concepts, which is bad:

data = [1, 2, 3]
print map(lambda x: x*2, data)
print filter(lambda x: x<3, data)

Instead, split this into two separate examples:

data = [1, 2, 3]
print map(lambda x: x*2, data)
data = [1, 2, 3]
print filter(lambda x: x<3, data)

However, use a single example to demonstrate the multiple ways to use a particular function.

# TITLE: Retrieve the timezone from a string

# CODE
print gettz()
print gettz("UTC")
print gettz("America/Los Angeles")

Duplicate code instead of using loops or extra variables

It is perfectly fine to copy a line of code two or three times with small modifications. In fact this is often better than introducing a loop, because it takes less time for a human to understand three duplicated lines with small changes than to understand a loop. This is a good example:

d = defaultdict(list)

d["a"].append(1)
d["b"].append(2)
d["c"].append(3)

print d

Whereas this example takes longer to understand (despite technically having fewer lines of code):

d = defaultdict(list)

for i, v in enumerate(["a", "b", "c"]):
    d[v].append(i)

print d

Minimize look-back distance

When referring to variables, try to keep them close to the line where they are used. If a variable is used multiple times, it may be worth replacing them with their literal values, as the more times it is used, the further away it gets from its definition and the more times the user must look back and forth for the value.

document = "document_a.txt"
print fnmatch(document, "*.txt")
print fnmatch(document, "document_?.txt")
print fnmatch(document, "document_[abc].txt")
print fnmatch(document, "document_[!xyz].txt")
print fnmatch("document_a.txt", "*.txt")
print fnmatch("document_a.txt", "document_?.txt")
print fnmatch("document_a.txt", "document_[abc].txt")
print fnmatch("document_a.txt", "document_[!xyz].txt")
a = f.open("a.txt")
b = f.open("b.txt")

a.write("A")
b.write("B")

a.close()
b.close()
a = f.open("a.txt")
a.write("A")
a.close()

b = f.open("b.txt")
b.write("B")
b.close()

Use newlines only to separate functionality, and do not separate outputs

Aside from class and function definitions, newlines should only be used to separate a section of functionaily from another to make the example more readable. Do not add new lines before or after print statements, as they will naturally be separated by inline outputs.

q = Queue.Queue()

q.put("a")

q.put("b")

q.put("c")

print q.get()

print q.get()

print q.get()
q = Queue.Queue()
q.put("a")
q.put("b")
q.put("c")

print q.get()
print q.get()
print q.get()
f = open("sample.json")

print json.load(f)
f = open("sample.json")
print json.load(f)

For example, these two examples show how to express different recurrence rules using dateutil.rrule:

# TITLE: List the dates of the 100th day of every year

# CODE
for date in dateutil.rrule(YEARLY, byyearday=100, count=3):
    print date
# TITLE: List the dates of the next 10th week of the year

# CODE
for date in dateutil.rrule(DAILY, byweekno=10, count=3):
    print date

Whereas if the second example was written as follows, it would be more difficult to understand the differences:

# TITLE: List the dates of the next 10th week of the year

# CODE
dates = dateutil.rrule(DAILY, byweekno=10, count=3)
print list(dates)

Choose which examples to write using Kite usage metrics, official documentation, then Google

Typically, you want to prioritize the most popular classes, functions, and subpackages first. The curation tool provides usage metrics, which is a good baseline for evaluating what to cover. Using this in conjunction with the official documentation for the package will generally cover most, if not all of the major use cases. Supplement this with Google and StackOverflow search results.

General guideline:

  1. Cover what is on both the curation tool and the quick start/tutorial/overview page of the documentation
  2. Cover the rest of what is on the curation tool, until you reach niche or advanced functions
  3. Cover the rest of what is on the official documentation, until you reach niche or advanced functions
  4. Do searches online and cover any interesting use cases of the remaining content of the package

Titles

Writing high quality titles is, in many ways, the hardest part of all. You have to describe all of the essential parts of what the example does in one compact and easy-to-read sentence fragment.

Template for writing titles: [verb phrase] [(opt.) specification phrase]

Specification phrases are used to qualify or refine the verb phrase. They are often prepositional phrases.

Examples of good titles that use the template:

Verb phrase only:

Verb phrase plus specification phrase:

Start the title with a verb

Use verb roots instead of other verb forms

Good: "Construct an array" Bad: "Constructing an array" Bad: "Constructed an array"

Capitalize only the first letter of the first word

Use a verb phrase that captures the main behavior of the function

The verb included in the function name is often a good place to start.

# Example
import itertools

for i in itertools.count(10):
    print i
    if i > 20: break

In this case, "Count up from 10" would be a good title

Sometimes the documentation for a function can provide good inspiration for a title.

Construct short but comprehensive titles

If a word can be removed without changing the meaning or making the title nonsensical, it should be removed.

Construct titles that reflect what a user might query

Think about the kinds of queries that users might form to look for the code example, and use the most common query terms in your title.

# Example
for date in rrule(WEEKLY, byweekday=MO, count=3):
    print date

The above example uses an rrule to return the dates of every Monday. So while the example uses terms like rrule, byweekday and weekly, the most typical query for a script like this would contain terms like every, weekly, Monday, week day, and list, so you would want title it something like "List the dates of every Monday".

Use proper English for titles

Do not skip articles or have grammatical, spelling, or capitalization errors.

Use verbs that are most commonly used with a given noun

Use the specification to describe various ways to use the function.

For example, the specification can be the input argument type, the secondary behavior of the function, or a particular condition.

Examples:

Don't include specification phrases for incidental complexity

How to determine whether a specification is essential or incidental:

Imagine you are writing a code example based on the title with a specification. The specification is incidental if the same code (or a piece of similar code without substantial change) will be written without or with the specification.

Example: numpy.eye(3)

Candidate titles:

The last one is preferred because by looking at the title "Construct an identity matrix", the code you may write may be:

numpy.eye(2)
numpy.eye(3)
numpy.eye(4)

which are basically the same.

When a value in the title is essential, explicitly include the values; otherwise generalize

# Example
expected = [1.0, 2.0, 3.0, 4.0]
array2 = [1.0, 2.0, 3.0, 4.01]
array3 = [1.0, 2.0, 3.0, 4.1]
try:
    testing.assert_array_almost_equal(array2, expected, 2)
    print "expected and array2 are equal"
    testing.assert_array_almost_equal(array3, expected, 2)
except AssertionError:
    print "AssertionError: expected and array3 are not equal"

Candidate titles:

Both the number of arrays passed to the function and the decimal places are essential.

# Example
for i in itertools.repeat(10, 3):
    print i

Candidate titles:

In this case, because repeat by definition must repeat some number of times, the number of repetitions is essential; the value that it repeats is not.

# Example
print numpy.eye(3, dtype = int16)

Candidate titles:

The core concept being demonstrated here is the ability to specify a data type when initializing an identity matrix; therefore, the dimensions of the matrix is incidental, while the data type is essential.

Spell out terminologies

Avoid using parentheses

Titles that include parentheses, like "Compute the sum of the second dimension (rows) of an array", are unnecessarily verbose. If a title needs parentheses, it can be simplified to not include them.

Avoid unnecessary prepositional phrases

Avoid using redundant object or instance

Everything in Python is an object, so it is unnecessary to specify that something is an object or an instance.

This is also true for JSON.

You can, however, use object or instance to refer to Python objects in the general sense

Avoid all apostrophes, either for contractions or for the possessive case

# Example
numpy.all(my_matrix > 0, axis = 0)

Use the suggested terminology if there is not a more appropriate term

The following are preferred default terms that we would like all code examples to use for general cases. If there is not a better term based on the name of the function or what the documentation indicates, use these.

Additionally, use the following common terms, even though they are not English words:

Use consistent title structure across examples

If there are multiple concise ways to express a title, pick one and stick with it.

Use consistent vocabulary for interchangeable object types

For example, use "element" to refer to items of an array. Do not interchange between "element" and "item".

Use terms appropriate for the demonstrated concept

For example, when writing an example that work with HTTP requests, use "Send a GET/POST/etc. request..." instead of "Request a URL..." or "Make a request...".

Use the articles "a" and "an" for general behavior, and "the" for specific behavior

# Example
print mimetypes.guess_extension("text/html")
print mimetypes.guess_extension("audio/mpeg")
print mimetypes.guess_extension("fake/type")

Use plurals to generalize behavior that applies to multiple objects

# Example
ints = array.array("i", [1, 2, 3])
print ints.pop()
print ints
print ints.pop(0)
print ints
# Example
url = "http://mock.kite.com/text"
request = urllib2.Request(url)
request.add_header("custom-header", "header")
print request.header_items()

In both examples, even though the example only works with one object at a time, the concept applies to multiple objects, so the plural forms are more appropriate.

Note that we don't pluralize the non-essential part ("array" and "request").

Titles should never be duplicates

If two examples vary by a small amount, that variation is essential and therefore should be captured in the title.

Use backticks (`) to specify terminology that is not used as natural language

When using terms such as int, bytearray, and defaultdict, surround the term with backticks. This also applies to terms such as float and for, which are English words but not used in their English meaning, and class names like TextCalendar and ElementTree, which are composed of English words but are not themselves English words.

Note that this does not apply to abbreviations; terms like MD5, SHA256, and HMAC should not have backticks around them.

Don't blend identifiers into natural language

Prefer to use digits rather than spell out numbers

Use your best judgment on whether to use digits/numerals (e.g. 5) or spell out the word of numbers (e.g. five). Some general guidelines:

If a number is a parameter that is core to the example, use digits.

If a number expresses something that is typically written with digits (e.g. measurements, dimensions, constants), use digits.

If a number is none of the above, and is also less than 10, spell it out.

When in doubt, use digits.

Use acronyms if they are typically used as such; otherwise, spell them out with each word capitalized

Good:

Bad:

Exceptions:

When in doubt, spell them out.

Do not put a period at the end of titles

Titles are sentence fragments, not full sentences.

Preludes and postludes

Code examples are divided into three sections: prelude, main code, and postlude. These are combined at runtime to form the entire program, but only the main code is shown in the sidebar; the user only sees all three sections if they click on the example. We therefore put setup and teardown code in the prelude and postlude, respectively, and reserve the main code section for demonstrating the core concept.

Use the prelude and postlude for code that isn't directly relevant to the demonstrated concept

Preludes and postludes are not visible to a user until they expand a code example, so use them for code that is needed for the example to run, but is not immediately relevant to the core concept of the example.

# TITLE: Read a basic CSV file

# PRELUDE
import csv

# CODE
with open("sample.csv", "w") as f:
    f.write("a,1\n")
    f.write("b,2\n")

f = open("sample.csv")
csv_reader = csv.reader(f)

for row in csv_reader:
    print row

This example begins with setup code which creates sample.csv, a file used in the demonstration of csv.reader. The setup code should not be included in the main code section. A better division of the example would be:

# TITLE: Read a basic CSV file

# PRELUDE
import csv

with open("sample.csv", "w") as f:
    f.write("a,1\n")
    f.write("b,2\n")

# CODE
f = open("sample.csv")
csv_reader = csv.reader(f)

for row in csv_reader:
    print row

Thus the Kite sidebar will only show code that opens and reads the file, which is the central concept of this example.

Put import statements in the prelude

# PRELUDE
import yaml

# CODE
print yaml.dump("abc")

Use import x syntax by default

We want examples to be as easy to understand as possible, so for most packages, we want to import at the package level and access its functions from the package, rather than using from x import y to import functions directly.

# PRELUDE
from json import dumps

# CODE
print dumps({"a": 1})
# PRELUDE
import json

# CODE
print json.dumps({"a": 1})

Use from a.b.c import d when there are multiple subpackages

If a package contains subpackages, accessing them through the top-level package can make the example messy and hard to read. In these cases, it makes more sense to use from.

# TITLE: Map a URL to a function using `getattr`

# PRELUDE
from werkzeug.wrappers import Response, Request
from werkzeug.routing import Map, Rule
from werkzeug.exceptions import HTTPException

# CODE
class HelloWorld(object):

    url_map = Map([
        Rule("/home", endpoint="home"),
    ])

    def dispatch_request(self, request):
        url_adapter = self.url_map.bind_to_environ(request.environ)
        try:
            endpoint, values = url_adapter.match()

            # Call the corresponding function by prepending "on_"
            return getattr(self, "on_" + endpoint)(request, **values)
        except HTTPException, e:
            return e

    def on_home(self, request):
        return Response("Hello, World!")

    def wsgi_app(self, environ, start_response):
        request = Request(environ)
        response = self.dispatch_request(request)
        return response(environ, start_response)

    def __call__(self, environ, start_response):
        return self.wsgi_app(environ, start_response)

In general, prefer to put code in the main code section

If your example involves helper classes or methods that are central to the example then you should still include those in the main code section.

# PRELUDE
import yaml

# CODE
class Dice(object):
    def __init__(self, a, b):
        self.a = a
        self.b = b
    def __repr__(self):
        return 'Dice(%d, %d)' % (self.a, self.b)

def dice_constructor(loader, node):
    value = yaml.loader.construct_scalar(node)
    a, b = map(int, value.split('d'))
    return Dice(a, b)

add_constructor('!dice', dice_constructor)

print yaml.load("gold: !dice 10d6")

Variables

Use concise and purposeful variable names

Use variable names that are short, and describe what a variable is going to be used for. For example, this is good because you can see what the purpose of each variables is:

xdata = np.arange(10)
ydata = np.zeros(10)
plot(xdata, ydata)

Whereas this is bad because it's not clear what the variables do:

a = np.arange(10)
b = np.zeros(10)
plot(a, b)

Avoid 'foo' 'bar' etc. regardless of how/where you are considering using it.

Avoid variables like name or file that could be confused with part of the API

Consider the following example using Jinja2:

template = Template("<div>{{name}}</div>")
print(template.render(name="abc"))  # unclear - is "name" somehow special?

For somebody not familiar with Jinja, it unclear whether name has some special meaning in the Jinja2 API, or whether it's used as an arbitrary placeholder. To make it clear, use a word that could not be confused for part of the API:

template = Template("<div>{{person}}</div>")
print(template.render(person="abc"))

Follow language conventions for separating words in a variable name

# Python
my_variable = 1

# Java
int myVariable = 1

Don't create variables that are only referenced once

This is unnecessarily verbose:

pattern = "abc .* def"
regex = re.compile(pattern)

Instead, put it all on one line:

regex = re.compile("abc .* def")

But do introduce temporary variables rather than split expressions across lines

This is difficult to understand:

yaml.dump({"name": "abc", "age": 7},
     open("myfile.txt", "w"),
     default_flow_style=False)

Instead, it would be better to introduce two temporary variables:

data = {"name": "abc", "age": 7}
f = open("myfile.txt", "w")
yaml.dump(data, f, default_flow_style=False)

Introduce temporary variables when the meaning of a value is not clear

This is difficult to understand:

print np.where([False, True, True], [1, 2, 3], [100, 200, 300])

Instead, introduce temporaries to indicate what the variables mean:

condition = [False, True, True]
when_true = [1, 2, 3]
when_false = [100, 200, 300]
print np.where(condition, when_true, when_false)

This will sometimes conflict with the rule about not creating variables that are only referenced once. Use your best judgment.

Values and Placeholders

Use simple placeholders

This is unnecessarily long:

print json.dump({"first_name": "Graham", "last_name": "Johnson", "born_in": "Antarctica"})

Instead, use more concise data:

print json.dump({"a": 1, "b": 2})

Avoid 'foo' 'bar' etc. regardless of how/where you are considering using it.

When appropriate, use placeholders that are relevant to the package

This is simple, but makes no sense in the context of shlex:

print shlex.split("a b")

Instead, since shlex is used for parsing Unix shell commands, use a sample command:

print shlex.split('tar -cvf kite_source.tar /home/kite/')

Similarly, since HMAC is used to hash messages using a key, express those semantics in the placeholders:

h = hmac.new("key")
h.update("Hello, World!")

Minimize placeholders to only what is necessary

Use the smallest amount of placeholder content that still clearly demonstrates the concept. You should rarely ever need more than 3.

When choosing between 1, 2, or 3 placeholders, consider the cost of incremental cost:

Be careful when using only one placeholder, as the example may become ambiguous

Sometimes there are additional reasons to consider using two vs three items. For example, when multiplying two matrices it's required to use non-square dimensions to illustrate how the dimensions need to line up.

Use double quotes for string literals by default

Rationale: we could have chosen either one, but it's important to have a consistent standard, and double quotes are more consistent with string representations in other languages.

print "use double quotes by default"

Switch to single quotes if you need to include double-quotes inside a string

This is ugly:

s = "Greg said \"hello\" to Phil"

Instead, switch to single quotes:

s = 'Greg said "hello" to Phil'

Use triple-double-quotes for multi-line strings

document = """
{
  "a": 20,
  "b": [1,2,3,"a"],
  "c": {
    "d": [1,2,3],
    "e": 40
  }
}
"""
data = json.loads(document)

Put a new line at the start and end of multi-line strings, if possible

data = """This is a
little hard to read"""
data = """
This is a
lot easier to read
"""

Use the alphabet (a, b, c, ...) for string placeholders

my_string = "abc"

Use natural numbers (1, 2, 3, ...) for integer placeholders

numpy.array([1, 2, 3])

Use natural numbers with a ".0" suffix for float placeholders

my_list = [1.0, 2.0, 3.0]

Continue these sequences for hierarchies, sequences, or groups of placeholder content.

map(upper, ['abc', 'def'])
numpy.array([[1, 2, 3], [4, 5, 6]])

Use strings for dictionary keys

If the key-value pairs are purposeful, use key names that correspond to the meaning of the value. Otherwise, use "a", "b"... as placeholder keys, and 1, 2... as placeholder values.

json.dumps({1: "a", 2: "b"})
json.dumps({"a": 1, "b": 2})

Use C[n] for placeholder classes and f[n] for placeholder functions

Note that this only applies to placeholder classes and functions, i.e. classes and functions that have no functionality or purpose, such as those used in demonstrating sys and inspect functionality.

class Dog:
    def bark(self):
        return "Bark bark!"

class Cat:
    def meow(self):
        return "meow"
class C:
    pass

class C:
    def f(self):
        pass

class C1:
    def f1(self):
        return 1

class C2:
    def f2(self):
        return 2

Use "/path/to/file" for directory names

Again, only for non-purposeful placeholder directory names. Note that there is a / at the beginning.

os.path.split("/home/user/docs/sample.txt")
os.path.split("/path/to/file")

Don't use the same value twice unless for the same purpose each time

The following example creates an HMAC hash using a key, then updates it with a value:

h = hmac.new("abc")
h.update("abc")

This is confusing since it leaves the user wondering whether there was some important reason to use "abc" in both places. Instead, you should use different values so that there is no confusion:

h = hmac.new("key")
h.update("Hello, World!")

On the other hand, sometimes the same value is being used for the same purpose in two different places. In this case you should use the same value in both cases, for example:

data1 = numpy.zeros(8)
data2 = numpy.zeros(8)

For placeholder functions and classes, balance "simple" with "natural"

This example has a bunch of incidental complexity:

# TITLE: Add test cases to a suite

# CODE
class MyTest(TestCase):
    def setUp(self):
        self.name = "abc"
        self.num = 123

    def test_name_equals(self):
        self.assertEqual(self.name, "abc")

    def test_num_equals(self):
        self.assertEqual(self.num, 123)

suite = TestSuite()
suite.addTest(MyTest("test_name_equals"))
suite.addTest(MyTest("test_num_equals"))

Here is a much simpler version, that forms a better example:

# TITLE: Add test cases to a suite

# CODE
class MyTest(TestCase):
    def test_a(self):
        self.assertTrue(0 < 1)

suite = TestSuite()
suite.addTest(MyTest("test_a"))

First, we don't need to use setUp or instance variables. We do have a decision between assertTrue(True) or assertTrue(0 < 1). Here we've decided in favor of the latter, though not strongly.

Use mock.kite.com for examples that demonstrate communication with a server

A list of endpoints for mock.kite.com can be found here.

Files

Use sample files provided by Kite whenever possible

A list of sample files accessible from the examples can be found here. The following example shows how to use these files:

# CODE
f = open("sample.txt")
print f.read()

# POSTLUDE
'''
sample_files:
- sample.txt
'''

If the provided sample files are not enough, ask your correspondent about creating new sample files before explicitly creating files in new examples.

When an example requires a file to be created, create it in the prelude

# PRELUDE
import csv

with open("sample.csv", "w") as f:
    f.write("a,1\n")
    f.write("b,2\n")

# CODE
f = open("sample.csv")
csv_reader = csv.reader(f)

for row in csv_reader:
    print row

Name files following the same guidelines for naming variables

File names that appear in the main code section should reflect their purpose, just like variables.

When a file is simple a placeholder file, use a short name with a familiar extension:

Open file with a straightforward open

Unless absolutely necessary, do not use a with statement for opening files (rationale: this is a difficult one to decide on but open works fine for short examples, and with is a language-level feature that some users may not be familiar with).

f = open("input.txt")

Always write to files in the current working directory

f = open("output.txt", "w")
f.write("abc")

Never specify an explicit path (this would not run inside the sandbox environment):

f = open("/path/to/output.txt", "w")
f.write("abc")

Output

Generate the minimal output needed to clearly demonstrate the concept

Output must be read and understood by the user, to, so the more output there is, the more time it takes users to understand the example.

This code generates 24 lines of output, which is too much:

# CODE
for x in itertools.permutations([1, 2, 3, 4]):
    print x

# OUTPUT
(1, 2, 3, 4)
(1, 2, 4, 3)
(1, 3, 2, 4)
(1, 3, 4, 2)
(1, 4, 2, 3)
(1, 4, 3, 2)
(2, 1, 3, 4)
(2, 1, 4, 3)
(2, 3, 1, 4)
(2, 3, 4, 1)
(2, 4, 1, 3)
(2, 4, 3, 1)
(3, 1, 2, 4)
(3, 1, 4, 2)
(3, 2, 1, 4)
(3, 2, 4, 1)
(3, 4, 1, 2)
(3, 4, 2, 1)
(4, 1, 2, 3)
(4, 1, 3, 2)
(4, 2, 1, 3)
(4, 2, 3, 1)
(4, 3, 1, 2)
(4, 3, 2, 1)

Instead, do the following, which only generates six lines of output:

# CODE
for x in itertools.permutations([1, 2, 3]):
    print x

# OUTPUT
(1, 2, 3)
(1, 3, 2)
(2, 1, 3)
(2, 3, 1)
(3, 1, 2)
(3, 2, 1)

However, the following generates too little output and does not show the concept clearly:

# CODE
for x in itertools.permutations([1, 2]):
    print x

# OUTPUT
(1, 2)
(2, 1)

Keep print statements as simple as possible

Always use a simple print statement to output values. Note that we use Python 2-style print value, not print(value). Don't use statements like these:

[print item for item in some_list]
print(value1, value2, value3)
print value1, value2, value3
from pprint import pprint
pprint(some_dict)

Avoid unnecessary output formatting or explanations

If you feel the need to add expository, you probably need to simplify your example so that it does not create complex outputs.

print "This is the value for a: " + a
print "Person {name}: {sex}, age {age}".format(
    name = name,
    sex = sex,
    age = str(age)
)
def print_dict(d):
  output = ""
  for k, v in d.items():
    output += "Key: " + key + " Value: " + value + "\n"

print_dict(some_dict)

Prefer to print lists with a for statement

This example requires a more advanced understanding of Python iterators and should be avoided:

print list(itertools.permutations([1, 2, 3]))

Instead, use this syntax:

for x in itertools.permutations([1, 2, 3]):
    print x

Whenever possible, produce the same output every time

When demonstrating functions such as random number generators, set a deterministic seed in the prelude if it is available:

# PRELUDE
import random
random.seed(0)

# CODE
print random.randint(0, 10)

This is good because the user will only see the main code section, which will not be cluttered with the call to random.seed.

When using numpy.random, there is a similar seed function:

# PRELUDE
from numpy.random import randn, seed
seed(0)

# CODE
print randn(2, 5)

For examples that involve time stamps, HTTP requests, and random number generators with no seed, this is not possible, which is okay.

Output binary data as raw strings

Even though this can cause broken characters to appear, we still want to keep print statements as simple as possible.

print binary_data
print repr(binary_data)
print hexlify(binary_data)

Miscellaneous

Write some examples before titling them

Often it is helpful to write some initial examples to get a sense of a package's classes and functions first. Coming up with titles is easier after you map out the different examples you want to write. Also, titling an example essentially finalizes its contents, and you may miss opportunities to improve the content if you write the title too early.

Don't use advanced concepts unless necessary

This example works, but may be difficult for beginners who are not familiar with Python's dictionary unpacking syntax:

data = {"person": "abc", "age": 5}
print "{person} is age {age}".format(**data)

Instead, this example is easy for everyone to understand:

print "{person} is age {age}".format(person="abc", age=5)

Don't write examples that demonstrate what not to do

try:
    print pickle.loads("pickle")
except IndexError:
    print "String does not contain pickle data"

However, it is sometimes good to include a demonstration of common failure modes as part of a larger example.

dictionary = {"a": 1}

print dictionary.pop("a")
print dictionary

try:
    print dictionary.pop("a")
except KeyError as e:
    print "KeyError: " + e.message

print dictionary.pop("a", None)

Don't write examples that simply construct an object

Object construction on its own is not informative; it is much more helpful to see how an object is used.

pattern = re.compile("[a-z]+[0-9]+")
print pattern
pattern = re.compile("[a-z]+[0-9]+")
print pattern.match("test123")

Don't reimplement default behavior

Here is a bad example showing a class with an explicit pickling function:

import cPickle

class Foo(object):
    def __init__(self, value):
        self.value = value
    def __getstate__(self):
        return {'the_value': value}

f = Foo(123)
s = cPickle.dumps(f)

The problem with the code example above is that, by default, cPickle uses the __dict__ attribute whenever there is no getstate function. So the code in the example above would have produced the exact same output even if the getstate function had been omitted. This is bad because it's not clear to the user why the getstate function is important, since the result is exactly what would have happened anyway if getstate had been omitted. Instead, a better example would implement getstate in a way that is different to the default behavior.

Use explicit keyword arguments when there are many arguments to a function

xdata = np.arange(10)
ydata = np.zeros(10)
plot(xdata, ydata, style='r-', label='my data')

Use keyword arguments when it is standard practice to do so

This works but is non-standard:

a = array([1, 2, 3], float)

Whereas this is standard practice for numpy code:

a = array([1, 2, 3], dtype=float)

Demonstrate the purpose, not simply the functionality

Examples should demonstrate the purpose of its functions, not merely their functionality. Use functions in a way that mirrors their intended usage, and clearly show the purpose that the function serves.

print secure_filename("a b")
# OUTPUT: 'a_b'
print secure_filename("../../../etc/passwd")
# OUTPUT: 'etc_passwd'

When copying existing examples, modify them to fit the style guidelines

Many packages will provide examples in their official documentation pages, and it's okay to use these as examples. However, they will likely not abide by this style guide as is, so modify them as needed.

Only use comments when an example cannot be written in a self-explanatory way

Strive to write examples that do not need comments to explain what they do. If an explanation is absolutely necessary, include a brief comment above the section that requires explanation; do not use inline comments.

This is obvious and does not need a comment:

# Dump to string
output_string = csv.dumps([1, 2, 3])

This is obvious and is also using an inline comment:

csv.loads(string) # load from string

This comment is helpful to include because the line is confusing on its own, but cannot be written more clearly because has_header cannot be called with a keyword argument:

# Sample the first 256 bytes of the file
print sniffer.has_header(f.read(256))

Bundle together examples that use the same function with different parameters

Examples that call the same function with different parameters can generally be bundled together into a "cheat sheet" example that provides users with a quick reference to the usages of the function, if the function calls all demonstrate the same concept.

# TITLE: Construct a dictionary

# CODE
print dict(a=1, b=2)
print {"a": 1, "b": 2}
print dict([("a", 1), ("b", 2)])
print dict({"a": 1, "b": 2})
print dict(zip(["a", "b"], [1, 2]))

Functions like dateutil.rrule are exceptions because they provide conceptually different outputs based on the arguments provided.

When using multi-variable assignment with long tuples, print the tuple first

This allows users to more easily match up the variables with their values and helps users who are not familiar with the syntax understand what is going on.

st = os.stat(open("sample.txt"))
print st
mode, ino, dev, nlink, uid, gid, size, accessed, modified, created = st

Use the isoformat method to format datetime objects, instead of strftime

strftime tends to be long and unwieldy, and is not completely standardized. isoformat has a standard output that is decently readable and makes the code example much more concise. Of course, when you can, you should print out the datetime object directly, so you can take advantage of the default repr method that outputs a nicely-formatted representation of the date and time.

For combinations of similar functions and similar parameters, choose one function as canonical

When a package has x similar functions, each of which take in y similar parameters, we don't want to enumerate x * y examples that show all the different ways to call each of the functions with each of the parameters. For example, csv has two different readers, each of which take in various parameters for reading different delimiters, row indicators, quotes, etc. Showing examples for how to use each of these parameters for both readers would be redundant.

Instead, choose one of the function as the "canonical" example, and only show the different parameters or ways of calling the function for that function. For everything else, just provide one simple example for each function. In the case of csv, we would choose one reader and write an example for each of the different parameters, and only provide one simple example of using the other reader.

This is also true for objects - hashlib has 5 different hash objects, each of which can be initialized, updated, copied, converted to a hex value, or accessed as its raw binary value. In this case, we would choose one hash to show how to do each of these actions, and for all other hashes, simply have one example of each that shows how to initialize it.