Site Management | WordPress Blog

Links to other web pages: Basic Concepts | Causality | Classifiers | Inference | Learning | Reading List | Related Topics

Links in this web page: Conferences | Bayes Nets Construction | Bayes Nets Structure Learning | Database | Groups | Introduction | Journals | References | Researchers | Resources | Software

Bayes Nets or Bayesian networks [79] are graphical representation for probabilistic relationships among a set of random variables. Given a finite set of discrete random variables where each variable may take values from a finite set, denoted by . A Bayesian network is an annotated directed acyclic graph (DAG) G that encodes a joint probability distribution over. The nodes of the graph correspond to the random variables. The links of the graph correspond to the direct influence from one variable to the other. If there is a directed link from variable to variable , variable will be a parent of variable . Each node is annotated with a conditional probability distribution (CPD) that represents, where denotes the parents of in. The pair (, CPD) encodes the joint distribution. A unique joint probability distribution over from is factorized as:

Figure 2 is an example of a Bayesian network from Cooper’s paper [16], which is hypothetically about the medical domain with 5 variables. The corresponding CPDs are in Table 1.

Figure 2 A simple example of Bayesian network

P(X1=no)=0.8 |
P(X1 = yes)=0.2 |

P(X2=absent | X1=no)=0.95 P(X2=absent | X1=yes)=0.75 |
P(X2=present | X1=no)=0.05 P(X2=present | X1=yes)=0.25 |

P(X3=absent | X1=no)=0.99995 P(X3=absent | X1=yes)=0.997 |
P(X3=absent | X1=no)=0.00005 P(X3=absent | X1=yes)=0.003 |

P(X4=absent | X2=absent, X3=absent)=0.95 P(X4=absent | X2=absent, X3=present)=0.5 P(X4=absent | X2=present, X3=absent)=0.9 P(X4=absent | X2=present, X3=present)=0.25 |
P(X4=present | X2=absent, X3=absent)=0.05 P(X4=present | X2=absent, X3=present)=0.5 P(X4=present | X2=present, X3=absent)=0.1 P(X4=present | X2=present, X3=present)=0.75 |

P(X5=absent | X3=absent)=0.98 P(X5=absent | X3=present)=0.4 |
P(X5=present | X3=absent)=0.02 P(X5=present | X3=present)=0.6 |

Table 1 The probabilities associated with Figure 1

**Causal Bayesian networks**

A causal Bayesian network [78] of a domain is similar as the
normal Bayesian network. The difference is in the explanation of the links in
the Bayesian networks. In the normal Bayesian networks, the links between
variables can be explained as correlation or association. In a causal Bayesian
network, the links mean that the parent variables will causally influence the
values of the child variables. The causal influence in this thesis is defined
based on manipulation criteria (see Section 3.7.2 of Spirtes et al [96] for a detailed discussion of
manipulation): Suppose there are two variables A and B in the domain; If we can
manipulate the variables in the domain, set the value of variable A as a_{1}
or a_{2} and measure its effect on variable B, then the probability
distribution of variable B will change under the conditions of the different
values of variable A – (the operator is from Pearl’s
book [78]).

One way to construct Bayesian networks is from domain knowledge. There are three main steps to construct a Bayesian network by domain knowledge: 1) determine the number and the meanings of the variables in the interested domain; 2) determine the direct influence relationships among variables in the domain; and 3) determine the conditional probabilities given the structure of the Bayesian networks from step 2. Some methods have been proposed to facilitate the process to construct Bayesian networks with causal domain knowledge [22,49,74].

In many domains, however, domain knowledge is not sufficient to construct a complete Bayesian network. In this case, Bayesian networks can be learned from data when the data is available. In literature, many methods have been proposed for this purpose [17,47,79,96]. The Bayesian network learning problem can be categorized as 1) parameter learning problem when the structure is known, and 2) structure learning problem when the structure is unknown. Generally the parameter learning will be a part of the structure learning problem and used as an inner loop of structure learning in the score-and-search-based approach. In this thesis, we are interested in the Bayesian network structure learning problem.

The task in Bayesian network structure learning is to find a structure of Bayesian network that describes the observed data the most, which is proved to be a NP-complete problem [11]. Many heuristics have been proposed to learn Bayesian network structure. In literature, there are two categories of approaches for Bayesian network structure learning. The first one is the score-and-search-based approach [13,17,47]. The methods in this category start from an initial structure (generated randomly or from domain knowledge) and move to the neighbors with the best score in the structure space determinately or stochastically until a local maximum of the selected criteria is reached. The greedy learning process can re-start several times with different initial structures to improve the result. Another category of the Bayesian network structure learning approach is the constraint-based approach [80,95]. The methods under this category start to test the statistical significance of the pairs of variables conditioning on other variables to induce conditional independence. The pairs of variables which pass some threshold are deemed as directly connected in the Bayesian networks. The complete Bayesian network structure is constructed from the induced conditional independence and dependence information. In the later sub-sections, we will discuss the score-and-search-based approach and the constraint-based approach.

There are three main issues in the score-and-search-based approach for Bayesian network structure learning: the structure space, the search strategy and the model selection criterion.

The search space in Bayesian network structure learning is all the possible structures of directed acyclic graphs (DAGs) given the number of variables in the domain. Robinson [87,88] derived the efficiently computable recursive function to determine the number of possible DAGs that contain n nodes:

The numbers of Bayesian networks with small number of variables are shown in Table 2. From Table 2, we know that the number of DAGs is super-exponential in the number of nodes. It is impossible to enumerate all possible structures and score them, even only with a reasonably-sized number of nodes n in the Bayesian network. Then heuristic-based methods have been proposed to find a local maximum in the structure space. The idea is that a good heuristic will speed up the search process in order to reach a local maximum. The representative methods are K2 algorithm [17], greedy search, etc.

Number of variables in DAG |
Number of the possible DAGs |

1 |
1 |

2 |
3 |

3 |
25 |

4 |
543 |

5 |
29,281 |

6 |
3,781,503 |

7 |
1,138,779,265 |

8 |
78,370,2329,343 |

9 |
1,213,442,454,842,881 |

10 |
4,175,098,976,430,598,100 |

Table 2 Number of DAGs with specified number of variables

Besides the basic structure space described above, Checking [12] has considered the Markov-equivalent structures as the search space. A set of Bayesian networks are equivalent if they imply exactly the same set of independence statements among variables. Markov-equivalent set of Bayesian network structures can be represented with complete partial DAG (CPDAG). The edges in a CPDAG can be either directed, denoting that the edge direction is the same in all equivalent Bayesian networks, or undirected, denoting that either direction is possible in some equivalent Bayesian networks. Since our interest is the direct influence relationships among variables, the Markov-equivalent structures cannot model the exact direct influence relationship and we will not consider it in this report.

In the score-and-search-based approach, any search methods from artificial intelligence, such as brute-force, depth-first, width-first, best-first or simulated annealing, can be used. Here we only consider some basic and widely-used methods.

The brute-force approach to Bayesian network structure learning is to enumerate all possible DAGs, score each one and select the best one. This provides a "gold standard" to evaluate other algorithms.

As mentioned in last sub-section, the number of possible DAGs is super exponential to the number of the nodes. It is impossible to enumerate all possible DAGs for N > 5 and score them in a reasonable time. But one can evaluate any reasonably-sized set of structures in this way (e.g., the nearest neighbors of the best structure at the current stage).

Since we cannot enumerate all the
possible DAGs for Bayesian network structure learning, we have to try the
heuristic methods. Since DAGs are acyclic and the parents of the variables are
before children in causal ordering, knowing the ordering of the variables can
reduce the structure space. If we know a total ordering on the nodes, finding
the best structure amounts to picking the best set of parents for each node
independently. This is what the

If the ordering is unknown, we can search over orderings. The space of orderings is much smaller and more regular than the space of the structures, and has a smoother posterior “landscape”. As a result, the search over ordering is more efficient than searching over DAGs. Refer to Friedman and Koller’s work [32] for details of search over ordering.

If we know nothing about the structure, we can treat the structure learning problem as a general optimization problem in a discrete space. The intuitive way to solve such a problem is greedy search. Greedy search starts at a specific point (an initial structure) in the structure space, considers all nearest neighbors of the current point, and moves to the neighbor that has the highest score; if no neighbors have higher score than the current point (i.e., we have reached a local maximum), the algorithm stops.

The greedy search process for score-and-search-based approach is as follows:

*Input
of the algorithm:*

*1) **Observational data set;*

*Output
of the algorithm:*

*1)
**A
Bayesian network.*

*1.
**Generate
the initial Bayesian network, evaluate it and set it as the current Bayesian
network*

*2.
**Evaluate
the neighbors of the current Bayesian network*

*3.
**If the
best score of the neighbors is better than the score of the current Bayesian
network, set the neighbor with the best score as the current Bayesian network
and return to Step 2;*

*4.
**Otherwise,
stop the learning process*

Three different ways to generate the initial Bayesian network structures are often used in practice: 1) the structure without any arcs – it means every node is independent of any other nodes, 2) the structure with complete arcs – it means there are links between any pairs of nodes; and 3) randomly generated structures.

The score can be calculated with any reasonable score function for Bayesian network.

A common definition of "neighbor" is all structures that can be generated from the current structures by adding, deleting or reversing a single arc, subject to the acyclicity constraint.

When the algorithm stops, it always reaches a local maximum. The local maximum reached by the algorithm is essentially dependent on the initial structure. If a good initial structure is chosen, we can reach a good model in a short time. If a bad initial structure is chosen, we will reach a reasonable good model in a very long time, or can’t reach a reasonable good model. Although we know the initial structure is essential, we have no enough knowledge to justify which initial model is good. Instead of choosing one good initial structure, the alternative way is to restart the algorithm with different initial structures and choose the best model in the reached local maximums.

Other methods have been proposed to solve the local maximum problem, such as simulated annealing and best-first search. In the simulated annealing, the process does not always go to the neighbors of the current structure with the highest score, but in a stochastic way. It goes to the neighbor with the highest score with a probability or does a random walk to a neighbor with probability. The probability may change over time. Following a stochastic way, the process may escape from a local maximum and find a global maximum.

In the best-first approach, the space of the possible network structure will be searched systematically using a heuristic measure to determine the next best structure to examine.

Although the idea in greedy search is intuitive, it can reach good model in reasonable time compared with other methods. Chickering [12] has shown that, for a fixed amount of computational time, greedy search with random restarts produces better models than does either simulated annealing or best-first search. There is also a theoretic justification of greedy search method over the Markov equivalent class by Chickering [13].

In Bayesian network structure learning, a scoring function evaluates how well a given network G matches the data D. Given a scoring function, the best Bayesian network is the one that maximizes this scoring function.

An ad-hoc scoring function is based on the maximum likelihood (ML) principle: Selecting the structure which predicts the data D with the highest probability

where denotes the hypothesis of the Bayesian structure, is the vector of parameters , is the vector of parameters for the distribution of variable and is the maximum likelihood configuration of .

But the number of parameters (here and in the following used to measure the complexity of the model) can be very large depending on G. Thus an application of the maximum-likelihood principle might lead to an overfitting to Bayesian networks that match the given data well, but have low generalization quality for new data. Therefore, a penalty in the scoring function for the complexity of the model is needed.

The scoring functions that are most frequently used to learn Bayesian networks fulfill this condition: Bayesian Information Criterion (BIC) and Bayesian score combine the likelihood with some penalty relating to the complexity of the model.

The Bayesian Information Criterion (BIC) [90] is defined as

where D is the data, is the maximum likelihood (ML) estimate of the parameters, is the hypothesis of the Bayesian structure, d is the number of parameters in the Bayesian network, and N is the number of data cases. The BIC criterion has several properties. First, it does not depend on the prior, so we do not need to specify the prior. Second, the criterion is quite intuitive. Namely, it contains a term measuring how well the parameterized model predicts the data and a term that punishes the complexity of the model . Third, the BIC criterion is exactly minus the Minimum description Length (MDL) criterion [44,61]. Although BIC is intuitive and often used in practice, it tends to choose models that are too simple due to the heavy penalty on the complexity of the model.

The Bayesian score for measuring the quality of a Bayesian network G is its posterior probability given the database:

where is a normalization constant which does not depend on the Bayesian network structure G. Since is a constant and will not affect the ordering of the different models, the relative posterior probability is often used for model selection. This criterion has two components: the prior and the marginal likelihood. The prior can be specified by experts or just set uniformly to all possible structures. The marginal likelihood integrates out the parameters of the model.

The Bayesian score is originally discussed by Cooper and Herskovits [17] as BD metric and further developed by Heckerman et al [47] as BDe metric. Here only BDe metric is covered. The BDe metric (Bayesian metric with Dirichlet priors and equivalence) [47] evolves from the search for a network with the largest posterior probability given priors over network structures G and parameters. It was derived with the complete data under the following assumptions: multinomial sample assumption, parameter independence assumption, parameter modularity assumption, likelihood equivalence assumption, and structure possibility assumption. The BDe metric is based on the concept of sets of likelihood equivalent network structures, where all members in a set of equivalent network are given the same score.

The BDe metric also provides a simple way of identifying the virtual counts by having the user specify a prior Bayesian network Sp for variables X and an equivalent sample size:

The Bayesian score is a more accurate criterion, but it needs more computation in practice. BIC can be derived as a large sample approximation to the marginal likelihood, the second component of Bayesian score. In practice, the sample size does not need to be very large for the approximation to be good.

Except the criteria mentioned above, some other criteria, such as cross-validation criterion, entropy score, and minimum message length (MML) [101], are also used for Bayesian network structure selection, and are not covered here.

Constraint-based methods view the structure learning problem differently. Since a Bayesian network structure encodes many dependencies and (conditional) independencies of the underlying model, the algorithms of this approach try to discover the dependencies and conditional independencies from the data, and then use these dependencies and conditional independencies to infer the Bayesian network structure.

The dependency and conditional independency relationships are measured by using some kind of Conditional Independence (CI) test. The conditional independence tests that are used in practice are statistical tests on the data set. In order to use the results to reconstruct the structure, several assumptions have to be made: Causal Sufficiency assumption, Causal Markov assumption, and Faithfulness assumption.

**Causal
sufficiency assumption:** There exist no common unobserved (also known as
hidden or latent) variables in the domain that are parent of one or more
observed variables of the domain.

**Causal Markov
assumption:** Given a Bayesian network model G, any variable is independent
of all its non-descendants in G given its parents.

**Faithfulness
assumption:** A Bayesian network structure G and a probability distribution P
generated by G are faithful to one another if and only if every conditional
independence relationship valid in P is entailed by the Causal Markov
assumption in G.

With these assumptions in place, one can ascertain the existence of an edge between two variables, or the direction of that link, though the latter is only being possible in certain cases. The output of constraint-based methods will be a partial DAG (PDAG) to represent the whole Markov equivalence class. The representative algorithms in this category are SGS algorithm [94], CI algorithm [80], and PC algorithm [96].

In previous two sub-sections, we have talked about the score-and-search-based approach and constraint-based approach for Bayesian network structure learning. These two kinds of approaches can be combined together for Bayesian network structure learning: use the learned network from constraint-based methods as the start point for the search-and-score-based methods. Singh and Valtorta [92] proposed to generate the ordering of the variables with constraints-based approach and learn the Bayesian network structure with the score-and-search-based approach when the ordering is generated.

- UAI: Association for Uncertainty in Artificial Intelligence http://www.auai.org/
- AAAI: National Conference on Artificial Intelligence http://www.aaai.org/Conferences/AAAI/
- IJCAI: the International Joint Conference on Artificial Intelligence http://www.ijcai.org/

- Artificial Intelligence http://www.elsevier.com/locate/artint
- Journal of Artificial Intelligence Research (JAIR) http://www.jair.org/

Home Page for Steffen Lauritzen

Kathryn Blackmond Laskey HomePage

Discovery
Systems Laboratory website

Probably Informative Edinburgh Machine
Learning Group Pages

Microsoft
Bayesian Network Editor

Probabilistic Networks Library (PNL) -
Intel Research

Netica Bayesian Network Software from Norsys

The Interchange Format for Bayesian
Networks

selected list of research links for Bayesian Networks

[1] I.A. Beinlich, H.J.
Suermondt, R.M. Chavez, G.F. Cooper, The ALARM monitoring system: A case study
with two probabilistic inference techniques for belief networks, Proceedings of the Second European Conference
on Artificial intelligence in Medical Care, Springer-Verlag, Berlin, 1989, pp.
247-256.

[2] J. Binder, D. Koller, S. Russell, K. Kanazawa, Adaptive
probabilistic networks with hidden variables, Machine Learning 29 (1997)
213-244.

[3] C. Boutilier, N. Friedman, M. Goldszmidt, D. Koller, Context-specific
independence in Bayesian Networks, In
Proceeding of 12th Conf. on Uncertainity in Artificial Intelligence (UAI-96),
1996, p. 115–123.

[4] R.J. Brachman, T. Anand, The process of knowledge discovery in
databases, Advances in knowledge
discovery and data mining, American Association for Artificial Intelligence,
Menlo Park, CA, USA, 1996, pp. 37-57.

[5] N. Cartwright, Against modularity, the causal Markov condition,
and any link between the two: Comments on Hausman and Woodward, British Journal
for the Philosophy of Science 53 (2002) 411-453.

[6] N. Cartwright, What is wrong with Bayes Nets? The Monist 84
(2001) 242-264.

[7] P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C.
Shearer, R. Wirth, CRISP-DM 1.0 Step-by-step data mining guide http://www.crisp-dm.org/, 2000, pp. 1-78.

[8] P. Cheeseman, J. Stutz, Bayesian classification (AutoClass):
Theory and results, in: U.M. Fayyad, G. Diatetsky-Shapiro, P. Smyth, R. Uthurusamy
(Eds.), Advances in Knowledge Discovery and Data Mining, AAAI Press/The MIT
Press, 1996, pp. 153-180.

[9] Q. Chen, G. Li, B. Han, C.K. Heng, T.-Y. Leong, Coronary Artery
Disease Prediction with Bayesian Networks and Constraint Elicitation, Technical
report TRC9/06, School of Computing, National University of Singapore,
September 2006, 2006.

[10] Q. Chen, G. Li, T.-Y. Leong, C.-K. Heng, Predicting Coronary
Artery Disease with Medical Profile and Gene Polymorphisms Data, World Congress on Health (Medical)
Informatics (MedInfo), IOS Press, Brisbane, Australia, 2007, pp. 1219-1224.

[11] D.M. Chickering, Learning Bayesian networks is NP-complete, AI & STAT V, 1996.

[12] D.M. Chickering, Learning Equivalence Classes of Bayesian Network
Structures, in: E. Horvitz, F.V. Jensen (Eds.), Proceedings of the Twelfth
Annual Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann,
Reed College, Portland, Oregon, USA, 1996, pp. 150-157.

[13] D.M. Chickering, Optimal Structure Identification with Greedy
Search, Journal of Machine Learning Research 3 (2002) 507-554.

[14] D.M. Chickering, D. Heckerman, Efficient approximations for the
marginal likelihood of Bayesian networks with hidden variables, Machine
Learning 29 (1997) 181-212.

[15] G.F. Cooper, A Bayesian Method for Learning Belief Networks that
Contain Hidden Variables, Journal of Intelligent Information Systems 4 (1)
(1995) 71-88.

[16] G.F. Cooper, An overview of the representation and discovery of
causal relationships using Bayesian networks, in: C. Glymour, G.F. Cooper
(Eds.), Computation, Causation, and Discovery, AAAI Press and MIT Press, 1999,
pp. 3-62.

[17] G.F. Cooper, E. Herskovits, A Bayesian method for the induction of
probabilistic networks from data, Machine Learning 9 (1992) 309-347.

[18] G.F. Cooper, C. Yoo, Causal discovery from a mixture of
experimental and observational data,
Proceedings of the Fifteenth Annual Conference on Uncertainty in
Artificial Intelligence, Morgan Kaufmann Publishers, 1999, pp. 116-125.

[19] D.H. Dash, M.J. Druzdzel, A Hybrid Anytime Algorithm for the
Construction of Causal Models From Sparse Data, in: K. Laskey, H. Prade (Eds.),
Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence
(UAI-99), Morgan Kaufmann Publishers, San Francisco, CA, Stockholm, Sweden,
1999, pp. 142-149.

[20] A.P. Dawid, Conditional independence in statistical theory (with
discussion), Journal of the Royal Statistical Society, Series B 41 (1979) 1-31.

[21] S.K. Donoho, L.A. Rendell, Constructive Induction Using Fragmentary
Knowledge, Proceedings of International
Conference on Machine Learning (ICML'96), Morgan Kaufmann, Bari, Italy, 1996,
pp. 113-121.

[22] M.J. Druzdzel, L.C. van der Gaag, Building Probabilistic Networks:
Where Do the Numbers Come From? Guest Editors' Introduction, IEEE Transactions
on Knowledge and Data Engineering 12 (4) (2000) 481-486.

[23] F. Eberhardt, C. Glymour, R. Scheines, N-1 Experiments Suffice to
Determine the Causal Relations Among N Variables, In Department of Philosophy, Carnegie
Mellon University, Technical Report CMU-PHIL-161, 2004.

[24] F. Eberhardt, C. Glymour, R. Scheines, On the Number of
Experiments Sufficient and in the Worst Case Necessary to Identify All Causal
Relations Among N Variables, Proceedings
of the 21th Annual Conference on Uncertainty in Artificial Intelligence
(UAI-05), AUAI Press, 2005, pp. 178-184.

[25] M.B. Eisen, P.T. Spellman, P.O. Browndagger, D. Botstein, Cluster
analysis and display of genome-wide expression patterns, The Proceedings of the
National Academy of Sciences (PNAS) USA 95 (25) (1998) 14863-14868.

[26] G. Elidan, N. Lotner, N. Friedman, D. Koller, Discovering Hidden
Variables: A Structure-Based Approach,
Neural Information Processing Systems (NIPS), 2000, pp. 479-485.

[27] U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, From data mining to
knowledge discovery: An overview, in: U.M. Fayyad, G. Piatetsky-Shapiro, P.
Smyth, R. Uthurusamy (Eds.), Advances in knowledge discovery and data mining,
AAAI Press, Menlo Park, CA, USA, 1996, pp. 1-30.

[28] M. Firlej, D. Hellens, Knowledge Elicitation: a practical guide,
Prentice Hall, New York, 1991.

[29] N. Friedman, The Bayesian Structural EM Algorithm, in: G.F.
Cooper, S. Moral (Eds.), Proceedings of the Fourteenth Conference on
Uncertainty in Artificial Intelligence, Morgan Kaufmann, University of
Wisconsin Business School, Madison, Wisconsin, USA, 1998, pp. 129-138.

[30] N. Friedman, Learning belief networks in the presence of missing
values and hidden variables, in: D.H. Fisher (Ed.), Proceedings of the
Fourteenth International Conference on Machine Learning (ICML 1997), Morgan
Kaufmann, Nashville, Tennessee, USA, 1997, pp. 125-133.

[31] N. Friedman, M. Goldszmidt, A. Wyner, Data Analysis with Bayesian
Networks: A Bootstrap Approach,
Proceeding of the Fifteenth Conference on Uncertainty in Artificial
Intelligence (UAI), 1999, pp. 206-215.

[32] N. Friedman, D. Koller, Being Bayesian About Network Structure: A
Bayesian Approach to Structure Discovery in Bayesian Networks, Machine Learning
50 (1-2) (2003) 95-125.

[33] N. Friedman, M. Linial, I. Nachman, D. Pe'er, Using Bayesian
networks to analyze expression data, Journal of Computational Biology 7 (3-4)
(2000) 601-620.

[34] A.P. Gasch, P.T. Spellman, C.M. Kao, O. Carmel-Harel, M.B. Eisen,
G. Storz, D. Botstein, P.O. Brown, Genomic expression programs in the response
of yeast cells to environmental changes, Molecular Biology of the Cell 11 (12)
(2000) 4241-4257.

[35] D. Geiger, D. Heckerman, Knowledge representation and inference in
similarity networks and Bayesian multinets, Artificial Intelligence 82 (1-2)
(1996) 45-74.

[36] Z. Ghahramani, Learning Dynamic Bayesian Networks, in: C.L. Giles,
M. Gori (Eds.), Adaptive Processing of Sequences and Data Structures. Lecture
Notes in Artificial Intelligence, Springer-Verlag, Berlin, 1998, pp. 168-197.

[37] A. Ginsberg, S.M. Weiss, P. Politakis, Automatic knowledge base
refinement for classification systems, Artificial Intelligence 35 (2) (1988)
197-226.

[38] A.L. Givan, Flow cytometry: first principles, 2nd ed., Wiley-Liss,
New York, 2001.

[39] C. Glymour, G.F. Cooper (Eds.), Computation, Causation, and
Discovery, MIT Press, Cambridge, MA, USA, 1999. ISBN 0262571242

[40] G. Gorry, G. Barnett, Experience with a model of sequential
diagnosis, Computers and Biomedical Research 1 (1968) 490-507.

[41] T.L. Griffiths, E.R. Baraff, J.B. Tenenbaum, Using physical
theories to infer hidden causal structure,
Proceedings of the 26th Annual Conference of the Cognitive Science
Society, 2004.

[42] J. Han, L.V.S. Lakshmanan, R.T. Ng, Constraint-Based,
Multidimensional Data Mining, Computer 32 (8) (1999) 46-50.

[43] D. Heckerman, A Bayesian Approach to Learning Causal
Networks, Proceedings of Eleventh
Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers,
San Francisco, CA, 1995, pp. 285-295.

[44] D. Heckerman, A Tutorial on Learning with Bayesian Networks, in:
M. Jordan (Ed.), Learning in Graphical Models, MIT Press, Cambridge, MA, 1999.

[45] D. Heckerman, A Tutorial on Learning with Bayesian Networks, in:
M. Jordan (Ed.), Learning in Graphical Models, MIT Press, Cambridge, MA, 1998,
pp. 301-354.

[46] D. Heckerman, J.S. Breese, K. Rommelse, Troubleshooting Under
Uncertainty, DX-94-5th Int'l. Workshop
on Principles of Diagnosis, American Association for Artificial Intelligence,
1994, pp. 121-130.

[47] D. Heckerman, D. Geiger, D.M. Chickering, Learning Bayesian
networks: The combination of knowledge and statistical data, Machine Learning
20 (1995) 197-243.

[48] D. Heckerman, C. Meek, G.F. Cooper, A Bayesian approach to causal
discovery, in: C. Glymour, G.F. Cooper (Eds.), Computation, Causation,
Discovery, MIT Press, Cambridge, MA, USA, 1999, pp. 141-165.

[49] D.E. Heckerman, Probabilistic similarity networks, MIT Press,
Cambridge, Mass., 1991.

[50] L. Herzenberg, D. Parks, B. Sahaf, O. Perez, M. Roederer, L.
Herzenberg, The history and future of the fluorescence activated cell sorter
and flow cytometry: a view from Stanford, Clinical chemistry 48 (10) (2002)
1819-1827.

[51] P. Humphreys, D. Freedman, The Grand Leap, The British Journal for
the Philosophy of Science 47 (1) (1996) 113-123.

[52] M. Jaeger, On the complexity of inference about probabilistic
relational models, Artificial Intelligence 117 (2) (2000) 297-308.

[53] M. Jaeger, Relational Bayesian Networks, Proceedings of the 13th Conference on
Uncertainty in Artificial Intelligence, 1997, pp. 266-273.

[54] R. Joshi, T.Y. Leong, Towards a Context-Sensitive Probabilistic
Graphical Framework, AAAI-2006(In
review), 2006.

[55] R. Joshi, G. Li, T.-Y. Leong, Context-aware Probabilistic
Reasoning for Proactive Healthcare, Work
Notes of 2nd Workshop on Artificial Intelligence Techniques for Ambient
Intelligence (AITAmI 07), jointed with IJCAI2007, 2007.

[56] G. Karp, Cell and molecular biology: concepts and experiments, 3rd
ed ed., John Wiley & Sons, New York; Singapore, 2002.

[57] K.B. Korb, A.E. Nicholson, Bayesian artificial intelligence,
Chapman & Hall/CRC, 2003.

[58] K.B. Korb, C.S. Wallace, In search of the philosopher's stone:
Remarks on Humphreys and Freedman's critique of causal discovery, British
Journal for the Philosophy of Science 48 (1997) 543-553.

[59] L.A. Kurgan, P. Musilek, A survey of Knowledge Discovery and Data
Mining process models, The Knowledge Engineering Review 21 (2006) 1-24.

[60] W. Lam, Bayesian Network Refinement Via Machine Learning Approach,
IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (3) (1998)
240-251.

[61] W. Lam, F. Bacchus, Learning Bayesian belief networks: An approach
based on the MDL principle, Computational Intelligence 10 (4) (1994) 269-293.

[62] S.L. Lauritzen, D. Spiegelhalter, Local computations with
prababilities on graphical structures and their application to expert systems,
Journal of the Royal Statistical Society, Series B 50 (2) (1988) 157-224.

[63] G. Li, T.-Y. Leong, Biomedical Knowledge Discovery with
Topological Constraints Modeling in Bayesian Networks: A Preliminary
Report, World Congress on Health
(Medical) Informatics (MedInfo), IOS Press, Brisbane, Australia, 2007, pp.
560-565.

[64] G. Li, T.-Y. Leong, A framework to learn Bayesian Networks from
changing, multiple-source biomedical data,
Proceedings of the 2005 AAAI Spring Symposium on Challenges to Decision
Support in a Changing World, Stanford University, CA, USA, 2005, pp. 66-72.

[65] T.-C. Lu, M.J. Druzdzel, T.-Y. Leong, Causal Mechanism-based Model
Constructions, in: C. Boutilier, M. Goldszmidt (Eds.), Proceedings of the 16th
Conference in Uncertainty in Artificial Intelligence, Morgan Kaufmann, Stanford
University, Stanford, California, USA, 2000, pp. 353-362.

[66] J.D. Martin, K. VanLehn, Discrete factor analysis: learning hidden
variables in Bayesian networks. Technical Report LRGC ONR-94-1, Department of
Computer Science, University of Pittsburgh, 1994.

[67] V.R. McKim, S.P. Turner (Eds.), Causality in crisis? statistical
methods and the search for causal knowledge in the social sciences, University
of Notre Dame Press, Notre Dame, 1996.

[68] S. Meganck, P. Leray, B. Manderick, Learning Causal Bayesian
Networks from Observations and Experiments: A Decision Theoretic Approach, Proceedings of Modelling Decisions in
Artificial Intelligence (MDAI 2006), LNAI 3885, 2006, pp. 58-69.

[69] B. Milch, S. Russell, First-Order Probabilistic Languages: Into
the Unknown, Proceedings of the 16th
International Conference on Inductive Logic Programming (ILP 2006), 2006, pp.
10-24.

[70] K. Murphy, Active Learning of Causal Bayes Net Structure.
Technical report, Computer Science Division, University of California,
Berkeley, CA, 2001.

[71] K. Murphy, Bayes Net Toolbox for Matlab, http://www.cs.ubc.ca/~murphyk/Software/BNT/bnt.html,
2007.

[72] K. Murphy, Software Packages for Graphical Models / Bayesian
Networks, http://www.cs.ubc.ca/~murphyk/Software/BNT/bnsoft.html,
2005.

[73] K.P. Murphy, A Dynamic Bayesian Networks:Representation, Inference
and Learning., Doctoral dissertation, Computer Science, University of
California, Berkeley, 2002.

[74] S. Nadkarni, P.P. Shenoy, A Causal Mapping Approach to
Constructing Bayesian Networks, Decision Support Systems 38 (2) (2004) 259-281.

[75] R.E. Neapolitan, Learning Bayesian Networks, Prentice Hall, 2004.

[76] R.S. Niculescu, T.M. Mitchell, R.B. Rao, Bayesian Network Learning
with Parameter Constraints, Journal of Machine Learning Research 7 (2006)
1357-1383.

[77] D.J. Oliver, B. Nikolau, E.S. Wurtele, Functional Genomics:
High-Throughput mRNA, Protein, and Metabolite Analyses, Metabolic Engineering 4
(1) (2002) 98-106.

[78] J. Pearl, Causality: models, reasoning, and inference, Cambridge
University Press, New York, 2000. ISBN-13: 978-0521773621

[79] J. Pearl, Probabilistic Reasoning in Intelligent Systems, Morgan
Kaufmann, San Francisco, California, 1988. ISBN 0-934613-73-7

[80] J. Pearl, T. Verma, A theory of inferred causation, in: J. Allen,
R. Fikes, E. Sandewall (Eds.), Principles of Knowledge Representation and
Reasoning: Proceeding of the Second International Conference, Morgan Kaufmann,
San Mateo, CA, 1991, pp. 441-452.

[81] M. Peot, R. Shachter, Learning From What You Don't Observe, Proceedings of the 14th Annual Conference on
Uncertainty in Artificial Intelligence (UAI-98), Morgan Kaufmann, 1998, pp.
439-444.

[82] K.L. Poh, E.J. Horvitz, A graph-theoretic analysis of information
value, in: E. Horvitz, F. Jensen (Eds.), Proceedings of the Twelfth Conference
on Uncertainty in Artificial Intelligence, Morgan Kaufmann, 1996, pp. 427-435.

[83] K.L. Poh, E.J. Horvitz, Reasoning about the value of
decision-model refinement: methods and application, Proceedings of the Ninth Conference on
Uncertainty in Artificial Intelligence, Morgan Kaufmann, 1993, pp. 174-182.

[84] K.R. Popper, Conjectures and Refutations: The Growth of Scientific
Knowledge, Routledge & Kegan Paul, 1963.

[85] K.R. Popper, The Logic of Scientific Discovery, 1999., Routledge,
1934.

[86] H. Price, Agency and causal asymmetry, Mind 101 (403) (1992)
501-520.

[87] R.W. Robinson, Counting labeled acyclic digraphs, in: F. Harary
(Ed.), New Directions in the Theory of Graphs, Academic Press, New York, 1973,
pp. 239-273.

[88] R.W. Robinson, Counting unlabeled acyclic digraphs, in: C.H.C.
Little (Ed.), Combinatorial Mathematics V, volume 622 of Lecture Notes in
Mathematics, Springer-Verlag, Australia, 1977, pp. 28-43.

[89] K. Sachs, O. Perez, D. Pe'er, D.A. Lauffenburger, G.P. Nolan,
Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data,
Science 308 (5721) (2005) 523-529.

[90] G. Schwarz, Estimating the dimension of a model, The Annals of
Statistics 6 (1978) 461-464.

[91] E. Segal, D. Pe'er, A. Regev, D. Koller, N. Friedman, Learning
Module Networks, Proceedings of the 19th
Conference in Uncertainty in Artificial Intelligence (UAI), Acapulco, Mexico,
2003, pp. 525-534.

[92] M. Singh, M. Valtorta, An Algorithm for the Construction of
Bayesian Network Structures from Data,
Proceedings of the 9th Conference on Uncertainty in Artificial
Intelligence, Morgan Kaufmann, 1993, pp. 259-265.

[93] P.T. Spellman, G. Sherlock, M.Q. Zhang, V.R. Iyer, K. Anders, M.B.
Eisen, P.O. Brown, D. Botstein, B. Futcher, Comprehensive identification of
cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray
hybridization, Molecular Biology of the Cell 9 (12) (1998) 3273-3297.

[94] P. Spirtes, C. Glymour, R. Scheines, causality from
probability, Proceedings of the
Conference on Advanced Computing for the Social Sciences, Williamsburg, VA.,
1990.

[95] P. Spirtes, C. Glymour, R. Scheines, Causation, Prediction, and
Search (Second Edition), MIT Press, Cambridge, MA, USA, 2000.

[96] P. Spirtes, C. Glymour, R. Scheines, Causation, Prediction, and
Search. Number 81 in Lecture Notes in Statistics, Springer Verlag, New York,
1993. ISBN 0-262-19440-6

[97] J.E. Tiles, G.T. McKee, G.C. Dean (Eds.), Evolving knowledge in
natural science and artificial intelligence, Pitman, London, 1990.

[98] S. Tong, D. Koller, Active Learning for Structure in Bayesian
Networks, in: B. Nebel (Ed.), Proceedings of the Seventeenth International
Joint Conference on Artificial Intelligence, IJCAI 2001, Morgan Kaufmann,
Seattle, Washington, USA, 2001, pp.
863-869.

[99] G.G. Towell, J.W. Shavlik, M.O. Noordewier, Refinement of
Approximate Domain Theories by Knowledge-Based Neural Networks, AAAI-90, Proceedings of the 8th National
Conference on AI, 1990, pp. 861-866.

[100] M. Valtorta, Knowledge base refinement: A bibliography, Applied
Intelligence 1 (1) (1990) 87-94.

[101] C.S. Wallace, K.B. Korb, H. Dai, Causal discovery via MML, Proceedings of the Thirteenth International
Conference of Machine Learning (ICML'96), Morgan Kaufmann, San Francisco CA
USA, 1996, pp. 516-524.

[102] G. Wiederhold, Foreword: On the Barriers and Future of Knowledge
Discovery, in: U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy
(Eds.), Advances in Knowledge Discovery and Data Mining, AAAI Press/The MIT
Press, 1996, pp. vii-xi.

[103] J. Woodward, Making things happen: a theory of causal explanation,
Oxford University Press, 2003.

[104] C. Yoo, G.F. Cooper, Causal Discovery of Latent Variable Models
from a Mixture of Experimental and Observational Data. CBMI Research Report
CBMI-173, 2001.

[105] C. Yoo, G.F. Cooper, A Computer-Based Microarray Experiment
Design-System for Gene-Regulation Pathway Discovery, Proceedings of AMIA Symposium, 2003, pp.
733-737.

[106] C. Yoo, V. Thorsson, G.F. Cooper, Discovery of Causal Relationships
in a Gene-regulation Pathway from a Mixture of Experimental and Observational
DNA Microarray Data, Proceedings of
Pacific Symposium on Biocomputing, World Scientific, 2002, pp. 498-509.

Back to home
*last updated:
09/18/2007
© Copyright 2007*