Site Management | WordPress Blog

Bayes Nets

Links to other web pages: Basic Concepts | Causality | Classifiers | Inference | Learning | Reading List | Related Topics


Links in this web page: Conferences | Bayes Nets Construction | Bayes Nets Structure Learning | Database | Groups | Introduction | Journals | References | Researchers | Resources | Software



Bayes Nets or Bayesian networks [79] are graphical representation for probabilistic relationships among a set of random variables. Given a finite set of discrete random variables where each variable may take values from a finite set, denoted by . A Bayesian network is an annotated directed acyclic graph (DAG) G that encodes a joint probability distribution over. The nodes of the graph correspond to the random variables. The links of the graph correspond to the direct influence from one variable to the other. If there is a directed link from variable to variable , variable will be a parent of variable . Each node is annotated with a conditional probability distribution (CPD) that represents, where denotes the parents of in. The pair (, CPD) encodes the joint distribution. A unique joint probability distribution over from is factorized as:

Figure 2 is an example of a Bayesian network from Cooper’s paper [16], which is hypothetically about the medical domain with 5 variables. The corresponding CPDs are in Table 1.

Figure 2 A simple example of Bayesian network



P(X1 = yes)=0.2

P(X2=absent | X1=no)=0.95

P(X2=absent | X1=yes)=0.75

P(X2=present | X1=no)=0.05

P(X2=present | X1=yes)=0.25

P(X3=absent | X1=no)=0.99995

P(X3=absent | X1=yes)=0.997

P(X3=absent | X1=no)=0.00005

P(X3=absent | X1=yes)=0.003

P(X4=absent | X2=absent, X3=absent)=0.95

P(X4=absent | X2=absent, X3=present)=0.5

P(X4=absent | X2=present, X3=absent)=0.9

P(X4=absent | X2=present, X3=present)=0.25

P(X4=present | X2=absent, X3=absent)=0.05

P(X4=present | X2=absent, X3=present)=0.5

P(X4=present | X2=present, X3=absent)=0.1

P(X4=present | X2=present, X3=present)=0.75

P(X5=absent | X3=absent)=0.98

P(X5=absent | X3=present)=0.4

P(X5=present | X3=absent)=0.02

P(X5=present | X3=present)=0.6

Table 1 The probabilities associated with Figure 1

Causal Bayesian networks

A causal Bayesian network [78] of a domain is similar as the normal Bayesian network. The difference is in the explanation of the links in the Bayesian networks. In the normal Bayesian networks, the links between variables can be explained as correlation or association. In a causal Bayesian network, the links mean that the parent variables will causally influence the values of the child variables. The causal influence in this thesis is defined based on manipulation criteria (see Section 3.7.2 of Spirtes et al [96] for a detailed discussion of manipulation): Suppose there are two variables A and B in the domain; If we can manipulate the variables in the domain, set the value of variable A as a1 or a2 and measure its effect on variable B, then the probability distribution of variable B will change under the conditions of the different values of variable A – (the operator is from Pearl’s book [78]).

Bayesian network construction

One way to construct Bayesian networks is from domain knowledge. There are three main steps to construct a Bayesian network by domain knowledge: 1) determine the number and the meanings of the variables in the interested domain; 2) determine the direct influence relationships among variables in the domain; and 3) determine the conditional probabilities given the structure of the Bayesian networks from step 2. Some methods have been proposed to facilitate the process to construct Bayesian networks with causal domain knowledge [22,49,74].

In many domains, however, domain knowledge is not sufficient to construct a complete Bayesian network. In this case, Bayesian networks can be learned from data when the data is available. In literature, many methods have been proposed for this purpose [17,47,79,96]. The Bayesian network learning problem can be categorized as 1) parameter learning problem when the structure is known, and 2) structure learning problem when the structure is unknown. Generally the parameter learning will be a part of the structure learning problem and used as an inner loop of structure learning in the score-and-search-based approach. In this thesis, we are interested in the Bayesian network structure learning problem.

Bayesian network structure learning

The task in Bayesian network structure learning is to find a structure of Bayesian network that describes the observed data the most, which is proved to be a NP-complete problem [11]. Many heuristics have been proposed to learn Bayesian network structure. In literature, there are two categories of approaches for Bayesian network structure learning. The first one is the score-and-search-based approach [13,17,47]. The methods in this category start from an initial structure (generated randomly or from domain knowledge) and move to the neighbors with the best score in the structure space determinately or stochastically until a local maximum of the selected criteria is reached. The greedy learning process can re-start several times with different initial structures to improve the result. Another category of the Bayesian network structure learning approach is the constraint-based approach [80,95]. The methods under this category start to test the statistical significance of the pairs of variables conditioning on other variables to induce conditional independence. The pairs of variables which pass some threshold are deemed as directly connected in the Bayesian networks. The complete Bayesian network structure is constructed from the induced conditional independence and dependence information. In the later sub-sections, we will discuss the score-and-search-based approach and the constraint-based approach.

Score-and-search-based approach

There are three main issues in the score-and-search-based approach for Bayesian network structure learning: the structure space, the search strategy and the model selection criterion.

Structure space

The search space in Bayesian network structure learning is all the possible structures of directed acyclic graphs (DAGs) given the number of variables in the domain. Robinson [87,88] derived the efficiently computable recursive function to determine the number of possible DAGs that contain n nodes:

The numbers of Bayesian networks with small number of variables are shown in Table 2. From Table 2, we know that the number of DAGs is super-exponential in the number of nodes. It is impossible to enumerate all possible structures and score them, even only with a reasonably-sized number of nodes n in the Bayesian network. Then heuristic-based methods have been proposed to find a local maximum in the structure space. The idea is that a good heuristic will speed up the search process in order to reach a local maximum. The representative methods are K2 algorithm [17], greedy search, etc.

Number of variables in DAG

Number of the possible DAGs





















Table 2 Number of DAGs with specified number of variables

Besides the basic structure space described above, Checking [12] has considered the Markov-equivalent structures as the search space. A set of Bayesian networks are equivalent if they imply exactly the same set of independence statements among variables. Markov-equivalent set of Bayesian network structures can be represented with complete partial DAG (CPDAG). The edges in a CPDAG can be either directed, denoting that the edge direction is the same in all equivalent Bayesian networks, or undirected, denoting that either direction is possible in some equivalent Bayesian networks. Since our interest is the direct influence relationships among variables, the Markov-equivalent structures cannot model the exact direct influence relationship and we will not consider it in this report.

Search methods

In the score-and-search-based approach, any search methods from artificial intelligence, such as brute-force, depth-first, width-first, best-first or simulated annealing, can be used. Here we only consider some basic and widely-used methods.

Exhaustive search

The brute-force approach to Bayesian network structure learning is to enumerate all possible DAGs, score each one and select the best one. This provides a "gold standard" to evaluate other algorithms.

As mentioned in last sub-section, the number of possible DAGs is super exponential to the number of the nodes. It is impossible to enumerate all possible DAGs for N > 5 and score them in a reasonable time. But one can evaluate any reasonably-sized set of structures in this way (e.g., the nearest neighbors of the best structure at the current stage).

K2 algorithm

Since we cannot enumerate all the possible DAGs for Bayesian network structure learning, we have to try the heuristic methods. Since DAGs are acyclic and the parents of the variables are before children in causal ordering, knowing the ordering of the variables can reduce the structure space. If we know a total ordering on the nodes, finding the best structure amounts to picking the best set of parents for each node independently. This is what the K2 algorithm [17] does. K2 algorithm is a greedy search algorithm that works as follows. Suppose we know the total ordering of the nodes. Initially each node has no parents. The algorithm then incrementally adds the parent whose addition most increases the score of the resulting structure. When no addition of a single parent can increase the score, it stops adding parents to the node. Since an ordering of the nodes is known beforehand, the search space under this constraint is much smaller than the entire space. And we do not need to check for cycles, since the total ordering guarantees that there is no cycle in the deduced structures. Furthermore, based on some appropriate assumptions, we can choose the parents for each node independently.

If the ordering is unknown, we can search over orderings. The space of orderings is much smaller and more regular than the space of the structures, and has a smoother posterior “landscape”. As a result, the search over ordering is more efficient than searching over DAGs. Refer to Friedman and Koller’s work [32] for details of search over ordering.

Greedy search

If we know nothing about the structure, we can treat the structure learning problem as a general optimization problem in a discrete space. The intuitive way to solve such a problem is greedy search. Greedy search starts at a specific point (an initial structure) in the structure space, considers all nearest neighbors of the current point, and moves to the neighbor that has the highest score; if no neighbors have higher score than the current point (i.e., we have reached a local maximum), the algorithm stops.

The greedy search process for score-and-search-based approach is as follows:

Input of the algorithm:

1)      Observational data set;

Output of the algorithm:

1)      A Bayesian network.

1.      Generate the initial Bayesian network, evaluate it and set it as the current Bayesian network

2.      Evaluate the neighbors of the current Bayesian network

3.      If the best score of the neighbors is better than the score of the current Bayesian network, set the neighbor with the best score as the current Bayesian network and return to Step 2;

4.      Otherwise, stop the learning process

Three different ways to generate the initial Bayesian network structures are often used in practice: 1) the structure without any arcs – it means every node is independent of any other nodes, 2) the structure with complete arcs – it means there are links between any pairs of nodes; and 3) randomly generated structures.

The score can be calculated with any reasonable score function for Bayesian network.

A common definition of "neighbor" is all structures that can be generated from the current structures by adding, deleting or reversing a single arc, subject to the acyclicity constraint.

When the algorithm stops, it always reaches a local maximum. The local maximum reached by the algorithm is essentially dependent on the initial structure. If a good initial structure is chosen, we can reach a good model in a short time. If a bad initial structure is chosen, we will reach a reasonable good model in a very long time, or can’t reach a reasonable good model. Although we know the initial structure is essential, we have no enough knowledge to justify which initial model is good. Instead of choosing one good initial structure, the alternative way is to restart the algorithm with different initial structures and choose the best model in the reached local maximums.

Other methods have been proposed to solve the local maximum problem, such as simulated annealing and best-first search. In the simulated annealing, the process does not always go to the neighbors of the current structure with the highest score, but in a stochastic way. It goes to the neighbor with the highest score with a probability or does a random walk to a neighbor with probability. The probability may change over time. Following a stochastic way, the process may escape from a local maximum and find a global maximum.

In the best-first approach, the space of the possible network structure will be searched systematically using a heuristic measure to determine the next best structure to examine.

Although the idea in greedy search is intuitive, it can reach good model in reasonable time compared with other methods. Chickering [12] has shown that, for a fixed amount of computational time, greedy search with random restarts produces better models than does either simulated annealing or best-first search. There is also a theoretic justification of greedy search method over the Markov equivalent class by Chickering [13].

Model selection criteria

In Bayesian network structure learning, a scoring function evaluates how well a given network G matches the data D. Given a scoring function, the best Bayesian network is the one that maximizes this scoring function.

An ad-hoc scoring function is based on the maximum likelihood (ML) principle: Selecting the structure which predicts the data D with the highest probability

where denotes the hypothesis of the Bayesian structure, is the vector of parameters , is the vector of parameters for the distribution of variable and is the maximum likelihood configuration of .

But the number of parameters (here and in the following used to measure the complexity of the model) can be very large depending on G. Thus an application of the maximum-likelihood principle might lead to an overfitting to Bayesian networks that match the given data well, but have low generalization quality for new data. Therefore, a penalty in the scoring function for the complexity of the model is needed.

The scoring functions that are most frequently used to learn Bayesian networks fulfill this condition: Bayesian Information Criterion (BIC) and Bayesian score combine the likelihood with some penalty relating to the complexity of the model.

The Bayesian Information Criterion (BIC) [90] is defined as

where D is the data, is the maximum likelihood (ML) estimate of the parameters, is the hypothesis of the Bayesian structure, d is the number of parameters in the Bayesian network, and N is the number of data cases. The BIC criterion has several properties. First, it does not depend on the prior, so we do not need to specify the prior. Second, the criterion is quite intuitive. Namely, it contains a term measuring how well the parameterized model predicts the data and a term that punishes the complexity of the model . Third, the BIC criterion is exactly minus the Minimum description Length (MDL) criterion [44,61]. Although BIC is intuitive and often used in practice, it tends to choose models that are too simple due to the heavy penalty on the complexity of the model.

The Bayesian score for measuring the quality of a Bayesian network G is its posterior probability given the database:

where is a normalization constant which does not depend on the Bayesian network structure G. Since is a constant and will not affect the ordering of the different models, the relative posterior probability is often used for model selection. This criterion has two components: the prior and the marginal likelihood. The prior can be specified by experts or just set uniformly to all possible structures. The marginal likelihood integrates out the parameters of the model.

The Bayesian score is originally discussed by Cooper and Herskovits [17] as BD metric and further developed by Heckerman et al [47] as BDe metric. Here only BDe metric is covered. The BDe metric (Bayesian metric with Dirichlet priors and equivalence) [47] evolves from the search for a network with the largest posterior probability given priors over network structures G and parameters. It was derived with the complete data under the following assumptions: multinomial sample assumption, parameter independence assumption, parameter modularity assumption, likelihood equivalence assumption, and structure possibility assumption. The BDe metric is based on the concept of sets of likelihood equivalent network structures, where all members in a set of equivalent network are given the same score.

The BDe metric also provides a simple way of identifying the virtual counts by having the user specify a prior Bayesian network Sp for variables X and an equivalent sample size:

The Bayesian score is a more accurate criterion, but it needs more computation in practice. BIC can be derived as a large sample approximation to the marginal likelihood, the second component of Bayesian score. In practice, the sample size does not need to be very large for the approximation to be good.

Except the criteria mentioned above, some other criteria, such as cross-validation criterion, entropy score, and minimum message length (MML) [101], are also used for Bayesian network structure selection, and are not covered here.

Constraint-based approach

Constraint-based methods view the structure learning problem differently. Since a Bayesian network structure encodes many dependencies and (conditional) independencies of the underlying model, the algorithms of this approach try to discover the dependencies and conditional independencies from the data, and then use these dependencies and conditional independencies to infer the Bayesian network structure.

The dependency and conditional independency relationships are measured by using some kind of Conditional Independence (CI) test. The conditional independence tests that are used in practice are statistical tests on the data set. In order to use the results to reconstruct the structure, several assumptions have to be made: Causal Sufficiency assumption, Causal Markov assumption, and Faithfulness assumption.

Causal sufficiency assumption: There exist no common unobserved (also known as hidden or latent) variables in the domain that are parent of one or more observed variables of the domain.

Causal Markov assumption: Given a Bayesian network model G, any variable is independent of all its non-descendants in G given its parents.

Faithfulness assumption: A Bayesian network structure G and a probability distribution P generated by G are faithful to one another if and only if every conditional independence relationship valid in P is entailed by the Causal Markov assumption in G.

With these assumptions in place, one can ascertain the existence of an edge between two variables, or the direction of that link, though the latter is only being possible in certain cases. The output of constraint-based methods will be a partial DAG (PDAG) to represent the whole Markov equivalence class. The representative algorithms in this category are SGS algorithm [94], CI algorithm [80], and PC algorithm [96].

Hybrid methods

In previous two sub-sections, we have talked about the score-and-search-based approach and constraint-based approach for Bayesian network structure learning. These two kinds of approaches can be combined together for Bayesian network structure learning: use the learned network from constraint-based methods as the start point for the search-and-score-based methods. Singh and Valtorta [92] proposed to generate the ordering of the variables with constraints-based approach and learn the Bayesian network structure with the score-and-search-based approach when the ordering is generated.

Bayes Nets Conferences

  1. UAI: Association for Uncertainty in Artificial Intelligence
  2. AAAI: National Conference on Artificial Intelligence
  3. IJCAI: the International Joint Conference on Artificial Intelligence

Bayes Nets Journals

  1. Artificial Intelligence
  2. Journal of Artificial Intelligence Research (JAIR)

Bayes Nets Researchers

Daphne Koller

David Heckerman's Homepage

Duncan Fyfe Gillies

Home Page for Steffen Lauritzen

Jon Williamson

Judea Pearl

Kathryn Blackmond Laskey HomePage

Kevin Patrick Murphy

Manfred Jaeger's Homepage

Marek J. Druzdzel's Home Page

Michael I. Jordan's Home Page

Moises Goldszmidt's web page

Nir Friedman

Pedro Domingos

Peter Haddawy's Homepage

William H. Hsu

Zoubin Ghahramani

Lise Getoor's Homepage

Bayes Nets Groups

Discovery Systems Laboratory website

Probably Informative Edinburgh Machine Learning Group Pages

Yahoo! Groups bndev

Yahoo! Groups BayesNetToolbox

Bayes Nets Software

Microsoft Bayesian Network Editor

Probabilistic Networks Library (PNL) - Intel Research

Bayes Net Toolbox for Matlab



Netica Bayesian Network Software from Norsys


The AutoClass Project

Bayes Nets database

Bayesian Network Repository

BIF-BNT converter

The Interchange Format for Bayesian Networks

XBN Editor

Norsys - Bayes Net Library

Bayes Nets Resources

selected list of research links for Bayesian Networks

Papers and Presentations

Bayes Nets References

[1] I.A. Beinlich, H.J. Suermondt, R.M. Chavez, G.F. Cooper, The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks, Proceedings of the Second European Conference on Artificial intelligence in Medical Care, Springer-Verlag, Berlin, 1989, pp. 247-256.

[2] J. Binder, D. Koller, S. Russell, K. Kanazawa, Adaptive probabilistic networks with hidden variables, Machine Learning 29 (1997) 213-244.

[3] C. Boutilier, N. Friedman, M. Goldszmidt, D. Koller, Context-specific independence in Bayesian Networks, In Proceeding of 12th Conf. on Uncertainity in Artificial Intelligence (UAI-96), 1996, p. 115–123.

[4] R.J. Brachman, T. Anand, The process of knowledge discovery in databases, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, USA, 1996, pp. 37-57.

[5] N. Cartwright, Against modularity, the causal Markov condition, and any link between the two: Comments on Hausman and Woodward, British Journal for the Philosophy of Science 53 (2002) 411-453.

[6] N. Cartwright, What is wrong with Bayes Nets? The Monist 84 (2001) 242-264.

[7] P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer, R. Wirth, CRISP-DM 1.0 Step-by-step data mining guide, 2000, pp. 1-78.

[8] P. Cheeseman, J. Stutz, Bayesian classification (AutoClass): Theory and results, in: U.M. Fayyad, G. Diatetsky-Shapiro, P. Smyth, R. Uthurusamy (Eds.), Advances in Knowledge Discovery and Data Mining, AAAI Press/The MIT Press, 1996, pp. 153-180.

[9] Q. Chen, G. Li, B. Han, C.K. Heng, T.-Y. Leong, Coronary Artery Disease Prediction with Bayesian Networks and Constraint Elicitation, Technical report TRC9/06, School of Computing, National University of Singapore, September 2006, 2006.

[10] Q. Chen, G. Li, T.-Y. Leong, C.-K. Heng, Predicting Coronary Artery Disease with Medical Profile and Gene Polymorphisms Data, World Congress on Health (Medical) Informatics (MedInfo), IOS Press, Brisbane, Australia, 2007, pp. 1219-1224.

[11] D.M. Chickering, Learning Bayesian networks is NP-complete, AI & STAT V, 1996.

[12] D.M. Chickering, Learning Equivalence Classes of Bayesian Network Structures, in: E. Horvitz, F.V. Jensen (Eds.), Proceedings of the Twelfth Annual Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, Reed College, Portland, Oregon, USA, 1996, pp. 150-157.

[13] D.M. Chickering, Optimal Structure Identification with Greedy Search, Journal of Machine Learning Research 3 (2002) 507-554.

[14] D.M. Chickering, D. Heckerman, Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables, Machine Learning 29 (1997) 181-212.

[15] G.F. Cooper, A Bayesian Method for Learning Belief Networks that Contain Hidden Variables, Journal of Intelligent Information Systems 4 (1) (1995) 71-88.

[16] G.F. Cooper, An overview of the representation and discovery of causal relationships using Bayesian networks, in: C. Glymour, G.F. Cooper (Eds.), Computation, Causation, and Discovery, AAAI Press and MIT Press, 1999, pp. 3-62.

[17] G.F. Cooper, E. Herskovits, A Bayesian method for the induction of probabilistic networks from data, Machine Learning 9 (1992) 309-347.

[18] G.F. Cooper, C. Yoo, Causal discovery from a mixture of experimental and observational data, Proceedings of the Fifteenth Annual Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, 1999, pp. 116-125.

[19] D.H. Dash, M.J. Druzdzel, A Hybrid Anytime Algorithm for the Construction of Causal Models From Sparse Data, in: K. Laskey, H. Prade (Eds.), Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI-99), Morgan Kaufmann Publishers, San Francisco, CA, Stockholm, Sweden, 1999, pp. 142-149.

[20] A.P. Dawid, Conditional independence in statistical theory (with discussion), Journal of the Royal Statistical Society, Series B 41 (1979) 1-31.

[21] S.K. Donoho, L.A. Rendell, Constructive Induction Using Fragmentary Knowledge, Proceedings of International Conference on Machine Learning (ICML'96), Morgan Kaufmann, Bari, Italy, 1996, pp. 113-121.

[22] M.J. Druzdzel, L.C. van der Gaag, Building Probabilistic Networks: Where Do the Numbers Come From? Guest Editors' Introduction, IEEE Transactions on Knowledge and Data Engineering 12 (4) (2000) 481-486.

[23] F. Eberhardt, C. Glymour, R. Scheines, N-1 Experiments Suffice to Determine the Causal Relations Among N Variables, In Department of Philosophy, Carnegie Mellon University, Technical Report CMU-PHIL-161, 2004.

[24] F. Eberhardt, C. Glymour, R. Scheines, On the Number of Experiments Sufficient and in the Worst Case Necessary to Identify All Causal Relations Among N Variables, Proceedings of the 21th Annual Conference on Uncertainty in Artificial Intelligence (UAI-05), AUAI Press, 2005, pp. 178-184.

[25] M.B. Eisen, P.T. Spellman, P.O. Browndagger, D. Botstein, Cluster analysis and display of genome-wide expression patterns, The Proceedings of the National Academy of Sciences (PNAS) USA 95 (25) (1998) 14863-14868.

[26] G. Elidan, N. Lotner, N. Friedman, D. Koller, Discovering Hidden Variables: A Structure-Based Approach, Neural Information Processing Systems (NIPS), 2000, pp. 479-485.

[27] U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, From data mining to knowledge discovery: An overview, in: U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy (Eds.), Advances in knowledge discovery and data mining, AAAI Press, Menlo Park, CA, USA, 1996, pp. 1-30.

[28] M. Firlej, D. Hellens, Knowledge Elicitation: a practical guide, Prentice Hall, New York, 1991.

[29] N. Friedman, The Bayesian Structural EM Algorithm, in: G.F. Cooper, S. Moral (Eds.), Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, University of Wisconsin Business School, Madison, Wisconsin, USA, 1998, pp. 129-138.

[30] N. Friedman, Learning belief networks in the presence of missing values and hidden variables, in: D.H. Fisher (Ed.), Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), Morgan Kaufmann, Nashville, Tennessee, USA, 1997, pp. 125-133.

[31] N. Friedman, M. Goldszmidt, A. Wyner, Data Analysis with Bayesian Networks: A Bootstrap Approach, Proceeding of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI), 1999, pp. 206-215.

[32] N. Friedman, D. Koller, Being Bayesian About Network Structure: A Bayesian Approach to Structure Discovery in Bayesian Networks, Machine Learning 50 (1-2) (2003) 95-125.

[33] N. Friedman, M. Linial, I. Nachman, D. Pe'er, Using Bayesian networks to analyze expression data, Journal of Computational Biology 7 (3-4) (2000) 601-620.

[34] A.P. Gasch, P.T. Spellman, C.M. Kao, O. Carmel-Harel, M.B. Eisen, G. Storz, D. Botstein, P.O. Brown, Genomic expression programs in the response of yeast cells to environmental changes, Molecular Biology of the Cell 11 (12) (2000) 4241-4257.

[35] D. Geiger, D. Heckerman, Knowledge representation and inference in similarity networks and Bayesian multinets, Artificial Intelligence 82 (1-2) (1996) 45-74.

[36] Z. Ghahramani, Learning Dynamic Bayesian Networks, in: C.L. Giles, M. Gori (Eds.), Adaptive Processing of Sequences and Data Structures. Lecture Notes in Artificial Intelligence, Springer-Verlag, Berlin, 1998, pp. 168-197.

[37] A. Ginsberg, S.M. Weiss, P. Politakis, Automatic knowledge base refinement for classification systems, Artificial Intelligence 35 (2) (1988) 197-226.

[38] A.L. Givan, Flow cytometry: first principles, 2nd ed., Wiley-Liss, New York, 2001.

[39] C. Glymour, G.F. Cooper (Eds.), Computation, Causation, and Discovery, MIT Press, Cambridge, MA, USA, 1999. ISBN 0262571242

[40] G. Gorry, G. Barnett, Experience with a model of sequential diagnosis, Computers and Biomedical Research 1 (1968) 490-507.

[41] T.L. Griffiths, E.R. Baraff, J.B. Tenenbaum, Using physical theories to infer hidden causal structure, Proceedings of the 26th Annual Conference of the Cognitive Science Society, 2004.

[42] J. Han, L.V.S. Lakshmanan, R.T. Ng, Constraint-Based, Multidimensional Data Mining, Computer 32 (8) (1999) 46-50.

[43] D. Heckerman, A Bayesian Approach to Learning Causal Networks, Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, San Francisco, CA, 1995, pp. 285-295.

[44] D. Heckerman, A Tutorial on Learning with Bayesian Networks, in: M. Jordan (Ed.), Learning in Graphical Models, MIT Press, Cambridge, MA, 1999.

[45] D. Heckerman, A Tutorial on Learning with Bayesian Networks, in: M. Jordan (Ed.), Learning in Graphical Models, MIT Press, Cambridge, MA, 1998, pp. 301-354.

[46] D. Heckerman, J.S. Breese, K. Rommelse, Troubleshooting Under Uncertainty, DX-94-5th Int'l. Workshop on Principles of Diagnosis, American Association for Artificial Intelligence, 1994, pp. 121-130.

[47] D. Heckerman, D. Geiger, D.M. Chickering, Learning Bayesian networks: The combination of knowledge and statistical data, Machine Learning 20 (1995) 197-243.

[48] D. Heckerman, C. Meek, G.F. Cooper, A Bayesian approach to causal discovery, in: C. Glymour, G.F. Cooper (Eds.), Computation, Causation, Discovery, MIT Press, Cambridge, MA, USA, 1999, pp. 141-165.

[49] D.E. Heckerman, Probabilistic similarity networks, MIT Press, Cambridge, Mass., 1991.

[50] L. Herzenberg, D. Parks, B. Sahaf, O. Perez, M. Roederer, L. Herzenberg, The history and future of the fluorescence activated cell sorter and flow cytometry: a view from Stanford, Clinical chemistry 48 (10) (2002) 1819-1827.

[51] P. Humphreys, D. Freedman, The Grand Leap, The British Journal for the Philosophy of Science 47 (1) (1996) 113-123.

[52] M. Jaeger, On the complexity of inference about probabilistic relational models, Artificial Intelligence 117 (2) (2000) 297-308.

[53] M. Jaeger, Relational Bayesian Networks, Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence, 1997, pp. 266-273.

[54] R. Joshi, T.Y. Leong, Towards a Context-Sensitive Probabilistic Graphical Framework, AAAI-2006(In review), 2006.

[55] R. Joshi, G. Li, T.-Y. Leong, Context-aware Probabilistic Reasoning for Proactive Healthcare, Work Notes of 2nd Workshop on Artificial Intelligence Techniques for Ambient Intelligence (AITAmI 07), jointed with IJCAI2007, 2007.

[56] G. Karp, Cell and molecular biology: concepts and experiments, 3rd ed ed., John Wiley & Sons, New York; Singapore, 2002.

[57] K.B. Korb, A.E. Nicholson, Bayesian artificial intelligence, Chapman & Hall/CRC, 2003.

[58] K.B. Korb, C.S. Wallace, In search of the philosopher's stone: Remarks on Humphreys and Freedman's critique of causal discovery, British Journal for the Philosophy of Science 48 (1997) 543-553.

[59] L.A. Kurgan, P. Musilek, A survey of Knowledge Discovery and Data Mining process models, The Knowledge Engineering Review 21 (2006) 1-24.

[60] W. Lam, Bayesian Network Refinement Via Machine Learning Approach, IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (3) (1998) 240-251.

[61] W. Lam, F. Bacchus, Learning Bayesian belief networks: An approach based on the MDL principle, Computational Intelligence 10 (4) (1994) 269-293.

[62] S.L. Lauritzen, D. Spiegelhalter, Local computations with prababilities on graphical structures and their application to expert systems, Journal of the Royal Statistical Society, Series B 50 (2) (1988) 157-224.

[63] G. Li, T.-Y. Leong, Biomedical Knowledge Discovery with Topological Constraints Modeling in Bayesian Networks: A Preliminary Report, World Congress on Health (Medical) Informatics (MedInfo), IOS Press, Brisbane, Australia, 2007, pp. 560-565.

[64] G. Li, T.-Y. Leong, A framework to learn Bayesian Networks from changing, multiple-source biomedical data, Proceedings of the 2005 AAAI Spring Symposium on Challenges to Decision Support in a Changing World, Stanford University, CA, USA, 2005, pp. 66-72.

[65] T.-C. Lu, M.J. Druzdzel, T.-Y. Leong, Causal Mechanism-based Model Constructions, in: C. Boutilier, M. Goldszmidt (Eds.), Proceedings of the 16th Conference in Uncertainty in Artificial Intelligence, Morgan Kaufmann, Stanford University, Stanford, California, USA, 2000, pp. 353-362.

[66] J.D. Martin, K. VanLehn, Discrete factor analysis: learning hidden variables in Bayesian networks. Technical Report LRGC ONR-94-1, Department of Computer Science, University of Pittsburgh, 1994.

[67] V.R. McKim, S.P. Turner (Eds.), Causality in crisis? statistical methods and the search for causal knowledge in the social sciences, University of Notre Dame Press, Notre Dame, 1996.

[68] S. Meganck, P. Leray, B. Manderick, Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach, Proceedings of Modelling Decisions in Artificial Intelligence (MDAI 2006), LNAI 3885, 2006, pp. 58-69.

[69] B. Milch, S. Russell, First-Order Probabilistic Languages: Into the Unknown, Proceedings of the 16th International Conference on Inductive Logic Programming (ILP 2006), 2006, pp. 10-24.

[70] K. Murphy, Active Learning of Causal Bayes Net Structure. Technical report, Computer Science Division, University of California, Berkeley, CA, 2001.

[71] K. Murphy, Bayes Net Toolbox for Matlab,, 2007.

[72] K. Murphy, Software Packages for Graphical Models / Bayesian Networks,, 2005.

[73] K.P. Murphy, A Dynamic Bayesian Networks:Representation, Inference and Learning., Doctoral dissertation, Computer Science, University of California, Berkeley, 2002.

[74] S. Nadkarni, P.P. Shenoy, A Causal Mapping Approach to Constructing Bayesian Networks, Decision Support Systems 38 (2) (2004) 259-281.

[75] R.E. Neapolitan, Learning Bayesian Networks, Prentice Hall, 2004.

[76] R.S. Niculescu, T.M. Mitchell, R.B. Rao, Bayesian Network Learning with Parameter Constraints, Journal of Machine Learning Research 7 (2006) 1357-1383.

[77] D.J. Oliver, B. Nikolau, E.S. Wurtele, Functional Genomics: High-Throughput mRNA, Protein, and Metabolite Analyses, Metabolic Engineering 4 (1) (2002) 98-106.

[78] J. Pearl, Causality: models, reasoning, and inference, Cambridge University Press, New York, 2000. ISBN-13: 978-0521773621

[79] J. Pearl, Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann, San Francisco, California, 1988. ISBN 0-934613-73-7

[80] J. Pearl, T. Verma, A theory of inferred causation, in: J. Allen, R. Fikes, E. Sandewall (Eds.), Principles of Knowledge Representation and Reasoning: Proceeding of the Second International Conference, Morgan Kaufmann, San Mateo, CA, 1991, pp. 441-452.

[81] M. Peot, R. Shachter, Learning From What You Don't Observe, Proceedings of the 14th Annual Conference on Uncertainty in Artificial Intelligence (UAI-98), Morgan Kaufmann, 1998, pp. 439-444.

[82] K.L. Poh, E.J. Horvitz, A graph-theoretic analysis of information value, in: E. Horvitz, F. Jensen (Eds.), Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, 1996, pp. 427-435.

[83] K.L. Poh, E.J. Horvitz, Reasoning about the value of decision-model refinement: methods and application, Proceedings of the Ninth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, 1993, pp. 174-182.

[84] K.R. Popper, Conjectures and Refutations: The Growth of Scientific Knowledge, Routledge & Kegan Paul, 1963.

[85] K.R. Popper, The Logic of Scientific Discovery, 1999., Routledge, 1934.

[86] H. Price, Agency and causal asymmetry, Mind 101 (403) (1992) 501-520.

[87] R.W. Robinson, Counting labeled acyclic digraphs, in: F. Harary (Ed.), New Directions in the Theory of Graphs, Academic Press, New York, 1973, pp. 239-273.

[88] R.W. Robinson, Counting unlabeled acyclic digraphs, in: C.H.C. Little (Ed.), Combinatorial Mathematics V, volume 622 of Lecture Notes in Mathematics, Springer-Verlag, Australia, 1977, pp. 28-43.

[89] K. Sachs, O. Perez, D. Pe'er, D.A. Lauffenburger, G.P. Nolan, Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data, Science 308 (5721) (2005) 523-529.

[90] G. Schwarz, Estimating the dimension of a model, The Annals of Statistics 6 (1978) 461-464.

[91] E. Segal, D. Pe'er, A. Regev, D. Koller, N. Friedman, Learning Module Networks, Proceedings of the 19th Conference in Uncertainty in Artificial Intelligence (UAI), Acapulco, Mexico, 2003, pp. 525-534.

[92] M. Singh, M. Valtorta, An Algorithm for the Construction of Bayesian Network Structures from Data, Proceedings of the 9th Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, 1993, pp. 259-265.

[93] P.T. Spellman, G. Sherlock, M.Q. Zhang, V.R. Iyer, K. Anders, M.B. Eisen, P.O. Brown, D. Botstein, B. Futcher, Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization, Molecular Biology of the Cell 9 (12) (1998) 3273-3297.

[94] P. Spirtes, C. Glymour, R. Scheines, causality from probability, Proceedings of the Conference on Advanced Computing for the Social Sciences, Williamsburg, VA., 1990.

[95] P. Spirtes, C. Glymour, R. Scheines, Causation, Prediction, and Search (Second Edition), MIT Press, Cambridge, MA, USA, 2000.

[96] P. Spirtes, C. Glymour, R. Scheines, Causation, Prediction, and Search. Number 81 in Lecture Notes in Statistics, Springer Verlag, New York, 1993. ISBN 0-262-19440-6

[97] J.E. Tiles, G.T. McKee, G.C. Dean (Eds.), Evolving knowledge in natural science and artificial intelligence, Pitman, London, 1990.

[98] S. Tong, D. Koller, Active Learning for Structure in Bayesian Networks, in: B. Nebel (Ed.), Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, IJCAI 2001, Morgan Kaufmann, Seattle, Washington, USA, 2001, pp. 863-869.

[99] G.G. Towell, J.W. Shavlik, M.O. Noordewier, Refinement of Approximate Domain Theories by Knowledge-Based Neural Networks, AAAI-90, Proceedings of the 8th National Conference on AI, 1990, pp. 861-866.

[100] M. Valtorta, Knowledge base refinement: A bibliography, Applied Intelligence 1 (1) (1990) 87-94.

[101] C.S. Wallace, K.B. Korb, H. Dai, Causal discovery via MML, Proceedings of the Thirteenth International Conference of Machine Learning (ICML'96), Morgan Kaufmann, San Francisco CA USA, 1996, pp. 516-524.

[102] G. Wiederhold, Foreword: On the Barriers and Future of Knowledge Discovery, in: U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy (Eds.), Advances in Knowledge Discovery and Data Mining, AAAI Press/The MIT Press, 1996, pp. vii-xi.

[103] J. Woodward, Making things happen: a theory of causal explanation, Oxford University Press, 2003.

[104] C. Yoo, G.F. Cooper, Causal Discovery of Latent Variable Models from a Mixture of Experimental and Observational Data. CBMI Research Report CBMI-173, 2001.

[105] C. Yoo, G.F. Cooper, A Computer-Based Microarray Experiment Design-System for Gene-Regulation Pathway Discovery, Proceedings of AMIA Symposium, 2003, pp. 733-737.

[106] C. Yoo, V. Thorsson, G.F. Cooper, Discovery of Causal Relationships in a Gene-regulation Pathway from a Mixture of Experimental and Observational DNA Microarray Data, Proceedings of Pacific Symposium on Biocomputing, World Scientific, 2002, pp. 498-509.


Back to home    last updated: 09/18/2007   Copyright 2007