Simple Python Bayesian Network Inference with PyOpenPNL

The state of python libraries for performing bayesian graph inference is a bit frustrating.   libpgm is one of the few libraries which seems to exist, but it is quite limited in its abilities.   OpenPNL from Intel is a great c++ implementation of the Matlab Bayes-Net toolbox, but its C++ and Matlab interfaces are both not particularly convenient.   So we set about to properly swig the OpenPNL out to python where it can be used rapidly.   Additionally some of the build infrastructure for OpenPNL was a bit dated and needed some cleaning to work on modern Linux systems.

Our updated modules for both of these can be found at:

Not all of OpenPNL has yet been swigged and some of the python interface is still a little bit rough, but it does work.   Here we’ll work through the canonical Bayes-net example from Russell and Norvig, also used in Matlab BNT docs.  The Bayes network of interest is illustrated below.sprinkler

We have a simple graph with four discrete nodes and we would like to instantiate the model, provide evidence and infer marginal probabilities given this evidence.

We focus on the example included in the repo which can be viewed in full here

Syntax for defining a DAG’s adjacency matrix and conditional probability distribution types for the Bayes net reads as.

nnodes = 4
# set up the graph
# Dag must be square, with zero diag!
dag = np.zeros([nnodes,nnodes], dtype=np.int32)
dag[0,[1,2]] = 1
dag[2,3] = 1
dag[1,3] = 1
pGraph = openpnl.CGraph.CreateNP(dag)
# set up the node types
types = openpnl.pnlNodeTypeVector()
isDiscrete = 1
types[0].SetType( isDiscrete, 2 )
types[1].SetType( isDiscrete, 2 )
types[2].SetType( isDiscrete, 2 )
types[3].SetType( isDiscrete, 2 )
# node associations
nodeAssoc = openpnl.toConstIntVector([0]*nnodes)
# make the bayes net ...
pBNet = openpnl.CBNet.Create( nnodes, types, nodeAssoc, pGraph )

We can verify the DAG structure by plotting with python-networkx, shown below and verifying it matches our goal.


In this case we have allocated a Bayes-net with 4 nodes, each with 2 discrete states (T|F).   Next we assign CPDFs to each of the nodes.

for (node, cpdvals) in [
 (0, [0.5,0.5]),
 (1, [0.8, 0.2, 0.2, 0.8]),
 (2, [0.5, 0.9, 0.5, 0.1]),
 (3, [1, 0.1, 0.1, 0.01, 0, 0.9, 0.9, 0.99]),
    parents = pGraph.GetParents(node);
    print "node: ", node, " parents: ", parents
    domain = list(parents) + [node]
    cCPD = openpnl.CTabularCPD.Create( pBNet.GetModelDomain() , openpnl.toConstIntVector(domain) )
    cCPD.AllocMatrix( cpdvals, openpnl.matTable )

Assigning these known distributions we now have defined the DAG and the corresponding CPDFs.   We can allocate an inference engine and begin posing problems to it.

We start by assigning evidence that we know Cloudy=False and then seek to measure the marginal of WetGrass resulting.

# Set up the inference engine
infEngine = openpnl.CPearlInfEngine.Create( pBNet );
# Problem 1 P(W|C=0)
evidence = openpnl.mkEvidence( pBNet, [0], [0] )
infEngine.pyMarginalNodes( [3], 0 )

This provides an output value of [0.788497 0.211503] shown below when plotted.


Changing the evidence to Cloudy=True, we can compute this marginal and plot it in comparison.

# Problem 1 P(W|C=1)
evidence = openpnl.mkEvidence( pBNet, [0], [1] )
infEngine.pyMarginalNodes( [3], 0 )


This is an extremely simple BN, but it illustrates the relatively straightforward simplicity with which we can now set up and work with such problems using the PyOpenPNL interface.   Hopefully this project will be of much use to a number of people!   We’ll be largely adding and testing python API support now on an as needed basis.

Installing OpenPNL and PyOpenPNL is now relatively straightforward, I’ve gone through and put together some relatively sane build systems to make these usable, they can be built roughly by following

git clone
cd OpenPNL && ./configure && make && sudo make install
git clone
cd PyOpenPNL && sudo python build install

This is of course only scratching the surface of BN/PGM style inference.  OpenPNL support DBNs, GMM based continuous distributions, and numerous CPD & Structure learning as well as inference engines under the hood, much more to come soon.