In the stud farm example, a Bayesian network
is used to calculate the probabilities of the horses in a stud farm
being carriers of a recessive gene causing a life threatening
disease.
Stud Farm -
Download
A Constructed Example from a Stud
Farm
The stallion Alan has with the mare
Ann sired Betsy and with the mare Alice sired Benny. Betsy has with
Bill born Carl, and Benny has with Bonnie sired Cecily. Both Bill
and Bonnie are born by Ann, but their fathers (A1 and (A2) are in
no way related. Carl and Cecily have just born a colt, Dennis.
|
| Figure
1: Dennis's genealogy |
It turns out that
Dennis suffers from a life threatening hereditary disease carried
by a recessive gene a. The corresponding dominant gene is A. The
disease is so serious that Dennis is put down instantly, and as the
stud farm wants the gene out of the production, Carl and Cecily are
taken out of breeding because they both must be carriers of the
gene having genotype Aa.
Now the problem is: Which other horses are to
be taken out of breeding? Bonnie is a very fine mare, whereas Alan
can be replaced more easily in the production. What will the stud
farm be best off doing? It would be nice to know the probabilities
of each of the horses being a carrier of the sick gene. Normally
the probability of being a carrier is known to be 0.01.
Bayesian Networks
The domain of the inheritance of genes
in the stud farm can easily be modeled by a Bayesian network (BN).
Actually, the genealogy in figure 1 only needs a conditional
probability table (CPT) on each node to be a BN. First we specify
the states of the nodes: All horses except Dennis are either
carriers (Aa) or not (AA) since none of them are sick. We give them
states "AA" and "Aa". Each of the nodes in the
upper layer in figure 1 has the CPT shown in table 1. The others
except for Dennis have the CPT shown in table 2. Dennis has the CPT
shown in table 3.
|
Alan="AA" |
Alan="Aa" |
| 0.99 |
0.01 |
|
| Table 1: CPT of
the nodes in the upper layer (
Alan used as an example). |
|
|
Alan="AA" |
Alan="Aa" |
|
Ann="AA" |
Ann="Aa" |
Ann="AA" |
Ann="Aa" |
|
Betsy="AA" |
1.00 |
0.50 |
0.50 |
0.33 |
|
Betsy="Aa" |
0.00 |
0.50 |
0.50 |
0.67 |
|
| Table 2: CPT of
the nodes in the middle layers (
Betsy used as an example). |
|
|
Cecily="AA" |
Cecily="Aa" |
|
Carl="AA" |
Carl="Aa" |
Carl="AA" |
Carl="Aa" |
|
Dennis="AA" |
1.00 |
0.50 |
0.50 |
0.25 |
|
Dennnis="Aa" |
0.00 |
0.50 |
0.50 |
0.50 |
|
Dennnis="aa" |
0.00 |
0.00 |
0.00 |
0.25 |
|
| Table 3: CPT of
the node
Dennis: P(Dennis | Carl, Cecily). |
This BN has been
implemented using the Hugin GUI in less than half an hour. Then,
the evidence that Dennis is aa is entered and sum propagation is
performed. The result is shown in figure 2.
|
| Figure
2: The probabilities of the horses being carriers (Aa) of the
sick gene. |
In figure 2, we
can see that it is very likely that Betsy is a carrier of the sick
gene. Both her parents (Ann and Alan) also have great probability
of being carriers. However, a more thorough investigation shows
that it is very unlikely that both of them are carriers at the same
time. In figure 3 we see that if Alan is known to be a carrier, it
becomes most unlikely that Ann is also a carrier. This is because a
sick gene is only inherited from one parent. The figure shows that
the gene is inherited from Alan to Betsy and Benny to Carl and
Cecily.
The conclusion to the results would be very
dependent on how much the farmer wants to be sure of getting the
sick gene out of production. He can never be absolutely sure that
he gets rid of the right horses, but he should at least get rid of
Betsy, Ann and Bonnie. If he also wants to get rid of Alan because
he is easily replaced, this would have no effect if he does not
also get rid of Benny, since Benny probably has inherited the sick
gene if Alan has it.
|
Figure
3: If we assume that Alan carries the sick gene, this figure
shows that Ann is probably not
carrier |
This network has
been installed on your computer with the Hugin software.
Open the network in the Hugin
GUI . You can find the network in the directory where you
installed Hugin (e.g. C:/Program Files/Hugin/Hugin
Lite/Samples).
Comments
A long list of areas have essential
characteristics in common with the above example, e.g. medical
diagnosis and treatment, credit valuation of customers, search for
minerals, monitoring of biological production plants, image
understanding, information retrieval and fault analysis.
The areas are characterized by a cause-effect
structure, where effects are not completely determined. Sometimes
an event has one effect and sometimes it has another. This
phenomenon is called causal uncertainty. A domain characterized by
causal uncertainty can be modeled by a BN.
Another characteristic of the areas is that the
number of essential properties can not be observed directly. This
is the diagnosis problem: You know only the symptoms, and from them
you must conclude the causes. You must so to speak reason in the
opposite direction of the arrows in the network.