Chapter 2 - Static Bayesian Networks:

Bayesian Networks in the Absence of

Temporal Information

Static Bayesian networks, or Bayesian networks are:

graphical models that allow a concise representation of the probabilistic dependencies

between a given set of random variables

X = {X1,X2, . . . ,Xp}

as a DAG G = (V,A) where each node corresponds to a random variable Xi.

2.1.1 Graph structure

Independence map - because there is no probabilistic independence, an Independence map (I-

map) is the probabilistic dependence structure P of X if there is a on-to-one correspondence

between the random variables in X and the nodes V of G

2.1.1 Fundamental connections

2.1.2 D- separation

V- structure: A and B are not independent given C

2.1.3 Equivalent structures.

Serial and diverging: A and B are independent

given C

2.1.3 Continued. Serial and diverging: Connections result in

equivalent factorizations, that is, each can be

obtained from the other with repeated application

of Bayes’ theorem

2.1.4 Markov blankets

Markov blankets facilitate the comparison of Bayesian networks with graphical

models based on undirected graphs, which are known as Markov networks or

Markov random fields

2.2 Static Bayesian modeling

1st step : Structure learning - identifying the structure of of the Bayesian network.

●Constraint based

●Score based

●Hybrid

2nd step: Parameter learning

Implements the estimation of the parameters of the global distribution.

Can be performed efficiently by estimating the parameters of the local distributions

implied by the structure obtained in the previous step

2.2.. 1st step : Constraint learning

1. identifies which pairs of variables are connected by an arc, regardless of its

direction.These variables cannot be independent given any other subset of

variables, because they cannot be d-separated. This step can also be seen

as a backward selection procedure starting from the saturated model with a

complete graph and pruning it based on statistical tests for conditional

independence.

2.2.. 1st step : Constraint learning

2. identification of the v-structures among all the pairs of nonadjacent nodes A

and B with a common neighbor C. By definition, v-structures are the only

fundamental connection in which the two nonadjacent nodes are not

independent conditional on the third one.

2.2.1. 1st step : Constraint learning

3. Identifies compelled arcs and orients them recursively to obtain the

completed partially DAG (CPDAG) describing the equivalence class

identified by the previous steps. Can not be applied to real-world problems due to exponential

number of possible conditional independence relationships

All these algorithms, with the

exception of PC, first learn

the Markov blanket of

each node in the network.

This preliminary step greatly

simplifies the identification

of neighbors of each node,

as the search can be limited

to its Markov blanket.

2.2.2. 1st step : Score-based structure learning

Score-based structure learning algorithms (also known a search-and-score

algorithms) represent the application of general heuristic optimization techniques

to the problem of learning the structure of a Bayesian network.

2.2.3. 1st step : Hybrid structure learning

Combine constraint-based and score-based algorithms to offset their weaknesses

and produce reliable network structures in a wide variety of situations.

Implemented in two steps called restrict and maximize

Restricts the set of parent nodes

Provides a smaller search space to maximize a given score function.

2.2.4. Choosing Distributions, Conditional Independence

Tests, and Network Scores

Will ultimately depend on the your data.

2.2.4. Choosing Distributions, Conditional Independence

Tests, and Network Scores

The choice of a particular set of global and local distributions also determined which

conditional independence tests and which network scores can be used to learn the structure.

●Mutual information - information-theoretic distance measure

● Pearson’s X²

●Bayesian Dirichlet equivalent score - the posterior density associated

with a uniform prior over both the space of the network structures and of the

parameters of each local distribution

●Bayesian information criteria - penalized likelihood score defined by the number of

parameters of the global distribution.

● Fisher’s Z test

2.2.5. Step 2: Parameter learning

The number of parameters needed to uniquely identify the global distribution,

which is the sum of the number of parameters of the local distributions, is

also reduced because the conditional independence relationships encoded in

the network structure fix large parts of the parameter space.

Can be problematic for some applications such as:

●microarrays , typical of high-throughput biological data

○Ten or hundred observations and thousands of genes

2.2.6. Discretization

Or binning involves convert all continuous variables to discrete ones and then to apply

the techniques described in the previous sections.

Side steps the problem of defining the probabilistic model by applying:

●Prior knowledge of the data

●Using heuristics

●Choosing the number of intervals

●Performing learning and discretization iteratively until no change.

Can also be applied to deal with continuous data when one or more variables present

severe departures from normality (skewness, heavy tails, etc.).

2.3 Static Bayesian Network Modeling in R

2.3 bnlearn

Create an empty graph with the nodes

corresponding to the variables

2.3 bnlearn

Create an empty graph with the nodes

corresponding to the variables

2.3 bnlearn

Add the arcs by assigning a two-column matrix

containing the labels of their end-nodes

2.3 bnlearn

The resulting ug object belongs to the class bn and contains:

●learning: a list containing some information about the results of the structure

learning algorithm and its tuning parameters, including the conditional

independence tests and network scores used in the analysis. It has never

changed after the object is created.

●nodes: a list containing one element per node. Each element contains the

Markov blanket, the neighborhood, the parents, and the children of that

particular node.

●arcs: the arcs present in the network, in the same two-column format used in

the call to arcs above.

2.3 bnlearn

The resulting ug object belongs to the class bn and contains:

●learning: a list containing some information about the results of the structure

learning algorithm and its tuning parameters, including the conditional

independence tests and network scores used in the analysis. It has never

changed after the object is created.

●nodes: a list containing one element per node. Each element contains the

Markov blanket, the neighborhood, the parents, and the children of that

particular node.

●arcs: the arcs present in the network, in the same two-column format used in

the call to arcs above.

2.3 bnlearn

2.5.1 Model Averaging using protein signaling study

The data consist of simultaneous measurements of 11 phosphorylated proteins and

phospholypids derived from thousands of individual primary immune system cells,

subjected to both general and specific molecular interventions.

●Structured learning was repeated several times to reduce the locally optimal

networks on learning and subsequent inference.

●Networks learned were averaged to produce a more robust model - model

averaging. The averaged network structure was created using the arcs present in

at least 85% of the networks. This proportion measures the strength of each arc

and provides the means to establish its significance given a threshold (85% in this

case).

●Validity was evaluated based on the literature

2.5.1 Model Averaging using protein signaling study

Choosing good values for these arguments is a trade-off between

quality and speed; high values of idisc preserve the characteristics of the original

data to a greater extent, whereas smaller values result in much smaller memory

usage and shorter running times.

2.5.1 Model Averaging using protein signaling study

Bootstrap resampling to

learn a set of 500

network structures

Arcs are considered

significant if they appear in

at least 85% of the networks

and in the direction that

appears most frequently.

2.5.1 Model Averaging using protein signaling study

Building the average

network structure

Alternatively, we can use a

hill-climbing approach

starting from a different

network.

2.5.2 Choosing a significance threshold

Any threshold that falls

between those two

values results in

the same averaged

network

Exercises

2.1. Consider the asia synthetic data set from Lauritzen and Spiegelhalter (1988),

which describes the diagnosis of a patient at a chest clinic who has just come back

from a trip to Asia and is showing dyspnea.

(a) Load the data set from the bnlearn package and investigate its characteristics

using the exploratory analysis techniques covered in Chap. 1.

(b) Create a bn object with the network structure described in the manual page of

Asia.

class of the network. Plot them using graphviz.plot.

(d) Identify the parents, the children, the neighbors, and the Markov blanket of each

node.

Exercises

2.2. Using the network structures created in Exercise 2.1 for the asia data set, produce the

following plots with graphviz.plot:

(a) A plot of the CPDAG of the equivalence class in which the arcs belonging to a

v-structure are highlighted (either with a different color or using a thicker line width).

(b) Fill the nodes with different colors according to their role in the diagnostic process:

causes (“visit to Asia” and “smoking”), effects (“Tuberculosis,” “lung cancer,”and “bronchitis”)

and the diagnosis proper (“chest X-ray,” “dyspnea,” and “either tuberculosis or lung

cancer/bronchitis”).

Exercises

2.3. Consider the marks data set analyzed in Sect. 2.3.

(a) Discretize the data using a quantile transform and different numbers of intervals

(say, from 2 to 5). How does the network structure learned from the resulting data sets

change as the number of intervals increases?

(b) Repeat the discretization using interval discretization using up to five intervals,

and compare the resulting networks with the ones obtained previously with quantile

discretization.

interval discretization? How does its behavior depend on the number of initial

breaks?

Exercises

2.4. The ALARM network (Beinlich et al., 1989) is a Bayesian network designed to provide

an alarm message system for patients hospitalized in intensive care units (ICU). Since

ALARM is commonly used as a benchmark in literature, a synthetic data set of 5,000

observations generated from this network is available from bnlearn as alarm.

(a) Create a bn object for the “true” structure of the network using the model string provided

in its manual page.

(b) Compare the networks learned with different constraint-based algorithms with the true

one, both in terms of structural differences and using either BIC or BDe.

Χ² conditional independence tests may not be appropriate for analyzing alarm. Are

permutation or shrinkage tests better choices?

(d) How are the above learning strategies affected by changes to alpha?

Exercises

2.5. Consider again the alarm network used in Exercise 2.4.

(a) Learn its structure with hill-climbing and tabu search, using the posterior density BDe as

a score function. How does the network structure change with the

imaginary sample size iss?

(b) Does the length of the tabu list have a significant impact on the network structures

learned with tabu?

and score of the learned network?

Exercises

2.6. Consider the observational data set from Sachs et al. (2005) used in Sect. 2.5.1

(the original data set, not the discretized one).

(a) Evaluate the networks learned by hill-climbing with BIC and BGe using cross validation

and the log-likelihood loss function.

(b) Use bootstrap resampling to evaluate the distribution of the number of arcs present in

each of the networks learned in the previous point. Do they differ significantly?

different imaginary sample sizes. How does the value of the significance threshold change

as iss increases?