Guide to Building Probabilistic Models

Getting Started with Probabilistic Models

InferPy focuses on hirearchical probabilistic models structured in two different layers:

  • A prior model defining a joint distribution \(p(\mathbf{w})\) over the global parameters of the model. \(\mathbf{w}\) can be a single random variable or a bunch of random variables with any given dependency structure.
  • A data or observation model defining a joint conditional distribution \(p(\mathbf{x},\mathbf{z}|\mathbf{w})\) over the observed quantities \(\mathbf{x}\) and the the local hidden variables \(\mathbf{z}\) governing the observation \(\mathbf{x}\). This data model is specified in a single-sample basis. There are many models of interest without local hidden variables, in that case, we simply specify the conditional \(p(\mathbf{x}|\mathbf{w})\). Similarly, either \(\mathbf{x}\) or \(\mathbf{z}\) can be a single random variable or a bunch of random variables with any given dependency structure.

For example, a Bayesian PCA model has the following graphical structure,

Linear Factor Model

Bayesian PCA

The prior model are the variables \(w_k\). The data model is the part of the model surrounded by the box indexed by N.

And this is how this Bayesian PCA model is denfined in InferPy:

import edward as ed
import inferpy as inf
from inferpy.models import Normal

K, d, N = 5, 10, 200

# model definition
with inf.ProbModel() as m:
    #define the weights
    with inf.replicate(size=K):
        w = Normal(0, 1, dim=d)

    # define the generative model
    with inf.replicate(size=N):
        z = Normal(0, 1, dim=K)
        x = Normal(inf.matmul(z,w), 1.0, observed=True, dim=d)

m.compile()


The with inf.replicate(size = N) sintaxis is used to replicate the random variables contained within this construct. It follows from the so-called plateau notation to define the data generation part of a probabilistic model. Every replicated variable is conditionally idependent given the previous random variables (if any) defined outside the with statement.

Random Variables

Following Edward’s approach, a random variable \(x\) is an object parametrized by a tensor \(\theta\) (i.e. a TensorFlow’s tensor or numpy’s ndarray). The number of random variables in one object is determined by the dimensions of its parameters (like in Edward) or by the ‘dim’ argument (inspired by PyMC3 and Keras):

import inferpy as inf
import tensorflow as tf
import numpy as np


# different ways of declaring 1 batch of 5 Normal distributions

x = inf.models.Normal(loc = 0, scale=1, dim=5)         # x.shape = [5]

x = inf.models.Normal(loc = [0, 0, 0, 0, 0], scale=1)  # x.shape = [5]

x = inf.models.Normal(loc = np.zeros(5), scale=1)      # x.shape = [5]

x = inf.models.Normal(loc = 0, scale=tf.ones(5))       # x.shape = [5]

The with inf.replicate(size = N) sintaxis can also be used to define multi-dimensional objects:

with inf.replicate(size=10):
    x = inf.models.Normal(loc=0, scale=1, dim=5)       # x.shape = [10,5]

Following Edward’s approach, the multivariate dimension is the innermost (right-most) dimension of the parameters.

Note that indexing is supported:

y = x[7,4]                                              # y.shape = [1]

y2 = x[7]                                               # y2.shape = [5]

y3 = x[7,:]                                             # y2.shape = [5]

y4 = x[:,4]                                             # y4.shape = [10]

Moreover, we may use indexation for defining new variables whose indexes may be other (discrete) variables:

z = inf.models.Categorical(logits = np.zeros(5))
yz = inf.models.Normal(loc=x[0,z], scale=1)            # yz.shape = [1]

Any random variable in InferPy contain the following (optional) input parameters in the constructor:

  • validate_args : Python boolean indicating that possibly expensive checks with the input parameters are enabled. By default, it is set to False.
  • allow_nan_stats : When True, the value “NaN” is used to indicate the result is undefined. Otherwise an exception is raised. Its default value is True.
  • name: Python string with the name of the underlying Tensor object.
  • observed: Python boolean which is used to indicate whether a variable is observable or not . The default value is False
  • dim: dimension of the variable. The default value is None

Inferpy supports a wide range of probability distributions. Details of the specific arguments for each supported distributions are specified in the following sections. `

Probabilistic Models

A probabilistic model defines a joint distribution over observable and non-observable variables, \(p(\mathbf{w}, \mathbf{z}, \mathbf{x})\) for the running example. The variables in the model are the ones defined using the with inf.ProbModel() as pca: construct. Alternatively, we can also use a builder,

m = inf.ProbModel(varlist=[w,z,x])
m.compile()

The model must be compiled before it can be used.

Like any random variable object, a probabilistic model is equipped with methods such as sample(), log_prob() and sum_log_prob(). Then, we can sample data from the model and compute the log-likelihood of a data set:

data = m.sample(1000)
log_like = m.log_prob(data)
sum_log_like = m.sum_log_prob(data)

Random variables can be involved in expressive deterministic operations. Dependecies between variables are modelled by setting a given variable as a parameter of another variable. For example:

with inf.ProbModel() as m:
    theta = inf.models.Beta(0.5,0.5)
    z =  inf.models.Categorical(probs=[theta, 1-theta], name="z")


m.sample()

Moreover, we might consider using the function inferpy.case as the parameter of other random variables:

# Categorical variable depending on another categorical variable

with inf.ProbModel() as m2:
    y =  inf.models.Categorical(probs=[0.4,0.6], name="y")
    x = inf.models.Categorical(probs=inf.case({y.equal(0): [0.0, 1.0],
                                               y.equal(1): [1.0, 0.0] }), name="x")
m2.sample()


# Categorical variable depending on a Normal distributed variable

with inf.ProbModel() as m3:
    a = inf.models.Normal(0,1, name="a")
    b = inf.models.Categorical(probs=inf.case({a>0: [0.0, 1.0],
                                               a<=0: [1.0, 0.0]}), name="b")
m3.sample()


# Normal distributed variable depending on a Categorical variable

with inf.ProbModel() as m4:
    d =  inf.models.Categorical(probs=[0.4,0.6], name="d")
    c = inf.models.Normal(loc=inf.case({d.equal(0): 0.,
                                        d.equal(1): 100.}), scale=1., name="c")
m4.sample()

Note that we might use the case function inside the replicate construct. The result will be a multi-batch random variable having the same distribution for each batch. When obtaining a sample from the model, each sample of a given batch in x is independent of the rest.

with inf.ProbModel() as m:
    y =  inf.models.Categorical(probs=[0.4,0.6], name="y")
    with inf.replicate(size=10):
        x = inf.models.Categorical(probs=inf.case({y.equal(0): [0.5, 0.5],
                                                    y.equal(1): [1.0, 0.0] }), name="x")
m.sample()

We can also use the functions inferpy.case_states or inferpy.gather for defining the same model.

with inf.ProbModel() as m:
    y =  inf.models.Categorical(probs=[0.4,0.6], name="y")
    x = inf.models.Categorical(probs=inf.case_states(y, {0: [0.0, 1.0],
                                                         1: [1.0, 0.0] }), name="x")
m.sample()


with inf.ProbModel() as m:
    y =  inf.models.Categorical(probs=[0.4,0.6], name="y")
    with inf.replicate(size=10):
        x = inf.models.Categorical(probs=inf.gather([[0.5, 0.5], [1.0, 0.0]], y), name="x")

m.sample()

We can use the function inferpy.case_states with a list of variables (or multidimensional variables):


y =  inf.models.Categorical(probs=[0.5,0.5], name="y", dim=2)
p = inf.case_states(y, {(0,0): [1.0, 0.0, 0.0, 0.0], (0,1): [0.0, 1.0, 0.0, 0.0],
                        (1, 0): [0.0, 0.0, 1.0, 0.0], (1,1): [0.0, 0.0, 0.0, 1.0]} )

with inf.replicate(size=10):
    x = inf.models.Categorical(probs=p, name="x")

####

y =  inf.models.Categorical(probs=[0.5,0.5], name="y", dim=1)
z =  inf.models.Categorical(probs=[0.5,0.5], name="z", dim=1)


p = inf.case_states((y,z), {(0,0): [1.0, 0.0, 0.0, 0.0], (0,1): [0.0, 1.0, 0.0, 0.0],
                            (1, 0): [0.0, 0.0, 1.0, 0.0], (1,1): [0.0, 0.0, 0.0, 1.0]} )

with inf.replicate(size=10):
    x = inf.models.Categorical(probs=p, name="x")

####

p = inf.case_states([y,z], {(0,0): [1.0, 0.0, 0.0, 0.0], (0,1): [0.0, 1.0, 0.0, 0.0],
                            (1, 0): [0.0, 0.0, 1.0, 0.0], (1,1): [0.0, 0.0, 0.0, 1.0]} )

with inf.replicate(size=10):
    x = inf.models.Categorical(probs=p, name="x")

Supported Probability Distributions

Supported probability distributions are located in the package inferpy.models. All of them have inferpy.models.RandomVariable as superclass. A list with all the supported distributions can be obtained as as follows.

>>> inf.models.ALLOWED_VARS
['Bernoulli', 'Beta', 'Categorical', 'Deterministic', 'Dirichlet', 'Exponential', 'Gamma', 'InverseGamma', 'Laplace', 'Multinomial', 'Normal', 'Poisson', 'Uniform']

Bernoulli

Binary distribution which takes the value 1 with probability \(p\) and the value with \(1-p\). Its probability mass function is

\[\begin{split}p(x;p) =\left\{\begin{array}{cc} p & \mathrm{if\ } x=1 \\ 1-p & \mathrm{if\ } x=0 \\ \end{array} \right.\end{split}\]

An example of definition in InferPy of a random variable following a Bernoulli distribution is shown below. Note that the input parameter probs corresponds to \(p\) in the previous equation.

x = inf.models.Bernoulli(probs=0.5)

# or

x = inf.models.Bernoulli(logits=0)

This distribution can be initialized by indicating the logit function of the probability, i.e., \(logit(p) = log(\frac{p}{1-p})\).

Beta

Continuous distribution defined in the interval \([0,1]\) and parametrized by two positive shape parameters, denoted \(\alpha\) and \(\beta\).

\[p(x;\alpha,\beta)=\frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha,\beta)}\]

where B is the beta function

\[B(\alpha,\beta)=\int_{0}^{1}t^{\alpha-1}(1-t)^{\beta-1}dt\]

The definition of a random variable following a Beta distribution is done as follows.

x = inf.models.Beta(concentration0=0.5, concentration1=0.5)

# or simply:

x = inf.models.Beta(0.5,0.5)

Note that the input parameters concentration0 and concentration1 correspond to the shape parameters \(\alpha\) and \(\beta\) respectively.

Categorical

Discrete probability distribution that can take \(k\) possible states or categories. The probability of each state is separately defined:

\[p(x;\mathbf{p}) = p_i\]

where \(\mathbf{p} = (p_1, p_2, \ldots, p_k)\) is a k-dimensional vector with the probability associated to each possible state.

The definition of a random variable following a Categorical distribution is done as follows.

x = inf.models.Categorical(probs=[0.5,0.5])

# or

x = inf.models.Categorical(logits=[0,0])

Deterministic

The deterministic distribution is a probability distribution in a space (continuous or discrete) that always takes the same value \(k_0\). Its probability density (or mass) function can be defined as follows.

\[\begin{split}p(x;k_0) =\left\{\begin{array}{cc} 1 & \mathrm{if\ } x=k_0 \\ 0 & \mathrm{if\ } x \neq k_0 \\ \end{array} \right.\end{split}\]

The definition of a random variable following a Beta distribution is done as follows:

x = inf.models.Deterministic(loc=5)

where the input parameter loc corresponds to the value \(k_0\).

Dirichlet

Dirichlet distribution is a continuous multivariate probability distribution parmeterized by a vector of positive reals \((\alpha_1,\alpha_2,\ldots,\alpha_k)\). It is a multivariate generalization of the beta distribution. Dirichlet distributions are commonly used as prior distributions in Bayesian statistics. The Dirichlet distribution of order \(k \geq 2\) has the following density function.

\[p(x_1,x_2,\ldots x_k; \alpha_1,\alpha_2,\ldots,\alpha_k) = {\frac{\Gamma\left(\sum_i \alpha_i\right)} {\prod_i \Gamma(\alpha_i)} \prod_{i=1}^k x_i^{\alpha_i-1}}{}\]

The definition of a random variable following a Beta distribution is done as follows:

x = inf.models.Dirichlet(concentration=[5,1])

# or simply:

x = inf.models.Dirichlet([5,1])

where the input parameter concentration is the vector \((\alpha_1,\alpha_2,\ldots,\alpha_k)\).

Exponential

The exponential distribution (also known as negative exponential distribution) is defined over a continuous domain and describes the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. Its probability density function is

\[\begin{split}p(x;\lambda) =\left\{\begin{array}{cc} \lambda e^{-\lambda x} & \mathrm{if\ } x\geq 0 \\ 0 & \mathrm{if\ } x < k_0 \\ \end{array} \right.\end{split}\]

where \(\lambda>0\) is the rate or inverse scale.

The definition of a random variable following a exponential distribution is done as follows:

x = inf.models.Exponential(rate=1)

# or simply

x = inf.models.Exponential(1)

where the input parameter rate corresponds to the value \(\lambda\).

Gamma

The Gamma distribution is a continuous probability distribution parametrized by a concentration (or shape) parameter \(\alpha>0\), and an inverse scale parameter \(\lambda>0\) called rate. Its density function is defined as follows.

\[p(x;\alpha, \beta) = \frac{\beta^\alpha x^{\alpha - 1} e^{\beta x}}{\Gamma(\alpha)}\]

for \(x > 0\) and where \(\Gamma(\alpha)\) is the gamma function.

The definition of a random variable following a gamma distribution is done as follows:

x = inf.models.Gamma(concentration=3, rate=2)

where the input parameters concentration and rate corespond to \(\alpha\) and \(\beta\) respectively.

Inverse-gamma

The Inverse-gamma distribution is a continuous probability distribution which is the distribution of the reciprocal of a variable distributed according to the gamma distribution. It is also parametrized by a concentration (or shape) parameter \(\alpha>0\), and an inverse scale parameter \(\lambda>0\) called rate. Its density function is defined as follows.

\[p(x;\alpha, \beta) = \frac{\beta^\alpha x^{-\alpha - 1} e^{-\frac{\beta}{x}}}{\Gamma(\alpha)}\]

for \(x > 0\) and where \(\Gamma(\alpha)\) is the gamma function.

The definition of a random variable following a inverse-gamma distribution is done as follows:

x = inf.models.InverseGamma(concentration=3, rate=2)

where the input parameters concentration and rate corespond to \(\alpha\) and \(\beta\) respectively.

Laplace

The Laplace distribution is a continuous probability distribution with the following density function

\[p(x;\mu,\sigma) = \frac{1}{2\sigma} exp \left( - \frac{|x - \mu |}{\sigma}\right)\]

The definition of a random variable following a Beta distribution is done as follows:

x = inf.models.Laplace(loc=0, scale=1)

# or simply

x = inf.models.Laplace(0,1)

where the input parameter loc and scale correspond to \(\mu\) and \(\sigma\) respectively.

Multinomial

The multinomial is a discrete distribution which models the probability of counts resulting from repeating \(n\) times an experiment with \(k\) possible outcomes. Its probability mass function is defined below.

\[p(x_1,x_2,\ldots x_k; \mathbf{p}) = \frac{n!}{\prod_{i=1}^k x_i}\prod_{i=1}^k p_i^{x^i}\]

where \(\mathbf{p}\) is a k-dimensional vector defined as \(\mathbf{p} = (p_1, p_2, \ldots, p_k)\) with the probability associated to each possible outcome.

The definition of a random variable following a multinomial distribution is done as follows:

x = inf.models.Multinomial(total_count=4, probs=[0.5,0.5])

# or

x = inf.models.Multinomial(total_count=4, logits=[0,0])

Multivariate-Normal

A multivariate-normal (or Gaussian) defines a set of normal-distributed variables which are assumed to be idependent. In other words, the covariance matrix is diagonal.

A single multivariate-normal distribution defined on \(\mathbb{R}^2\) can be defined as follows.

x = inf.models.MultivariateNormalDiag(
    loc=[1., -1],
    scale_diag=[1, 2.]
)

Normal

The normal (or Gaussian) distribution is a continuous probability distribution with the following density function

\[p(x;\mu,\sigma) = \frac{1}{2\sigma} exp \left( - \frac{|x - \mu |}{\sigma}\right)\]

where \(\mu\) is the mean or expectation of the distribution, \(\sigma\) is the standard deviation, and \(\sigma^{2}\) is the variance.

A normal distribution can be defined as follows.

x = inf.models.Normal(loc=0, scale=1)

# or

x = inf.models.Normal(0,1)

where the input parameter loc and scale correspond to \(\mu\) and \(\sigma\) respectively.

Poisson

The Poisson distribution is a discrete probability distribution for modeling the number of times an event occurs in an interval of time or space. Its probability mass function is

\[p(x;\lambda) = e^{- \lambda} \frac{\lambda^x}{x!}\]

where \(\lambda\) is the rate or number of events per interval.

A Poisson distribution can be defined as follows.

x = inf.models.Poisson(rate=4)

# or

x = inf.models.Poisson(4)

Uniform

The continuous uniform distribution or rectangular distribution assings the same probability to any \(x\) in the interval \([a,b]\).

\[\begin{split}p(x;a,b) =\left\{\begin{array}{cc} \frac{1}{b-a} & \mathrm{if\ } x\in [a,b]\\ 0 & \mathrm{if\ } x\not\in [a,b] \\ \end{array} \right.\end{split}\]

A uniform distribution can be defined as follows.

x = inf.models.Uniform(low=1, high=3)

# or

inf.models.Uniform(1,3)

where the input parameters low and high correspond to the lower and upper bounds of the interval \([a,b]\).