The Reports submenu within the Network menu offers an array of information about the Bayesian network in the active Graph Window.
The Network Comments Report displays the information recorded in a network's Comment field.
And if available, the Network Comments Report also lists the associations of Node Names, Long Names, and Node Comments.
Select Main Menu > Network > Report > Comments
.
The Network Comments window opens, and a typical report resembles the following screenshot.
For a network that is in an early stage of development with little customization, the Network Comments Report may only feature default information:
The Network Report is a very comprehensive documentation of the network in the active Graph Window.
It includes statistics about the network structure as a whole, plus details for each node, such as the Node States, the Conditional Probability Tables, and equations.
As such, it presents all qualitative and quantitative knowledge contained in the network as a long, tabular report.
To some extent, you could recreate the network from all these details.
Select Main Menu > Network > Reports > Network
to create the Network Report.
The report can be quite substantial, depending on your network's size and complexity.
The following screenshot only shows the top portion of a much longer report:
For a thorough offline analysis, you may want to save the Network Report as an HTML file, which you can then open as a spreadsheet in Excel.
Occurrences refer to the number of observations in a cell of a Probability Table or a Conditional Probability Table.
The number of cells in a Conditional Probability Table is a function of the following parameters:
The number of Parent Nodes.
The number of Node States of the Parent Nodes.
The number of Node States of the Child Nodes.
Here, Age is discretized into 4 states and BMI into 6 for a total of 48 cells in the table associated with BMI.
The numbers in each cell are counts of observations or Occurrences. In our case, each Occurrence represents one person from the sample of 200 individuals.
For instance, the Occurrence table associated with BMI states that Count(BMI≤20 | Age≤30)=2. So, we have only two Occurrences of that particular condition, i.e., only two individuals who are 30 years old or younger have a BMI of 20 or lower.
To create a Bayesian network, BayesiaLab needs to translate the Occurrences in each cell into probabilities.
However, with a small number of Occurrences, that can become an issue.
We have repeatedly referenced a rule of thumb, which says that we should have a minimum of 5 Occurrences per cell to estimate a Probability Table or Conditional Probability Table reliably.
In our example, several cells fall below the recommended minimum.
Such deficiencies are easy to recognize in a small example, but in more complex networks, it can be difficult to spot such weaknesses.
That is the motivation for the Occurrence Report. It displays all tables in a network and visually highlights potentially problematic cells with low Occurrences.
Select the nodes you want to include in the Occurrences Report. I none are selected, the analysis will be performed on all nodes.
Select Main Menu > Network > Reports > Reports> Occurrences
to create the Occurrences Report.
The Occurrence Report opens up and shows all Probability Tables and Conditional Probability Tables.
The fields in the report are color-coded to highlight potential issues:
Cells with 0 Occurrences are marked in red.
Cells with 5 Occurrences are marked in yellow. This is generally considered the minimum number of Occurrences.
Cells with 40 or more Occurrences are marked in green.
Furthermore, the Occurrence Report calculates the mean number of Occurrences for each row in all Probability Tables and Conditional Probability Tables.
If the mean value of any row in any of the nodes drops below the threshold of 5, the corresponding nodes are called out at the top of the report.
The following example with one Parent Node (Age, measured in years) and one Child Node (BMI, i.e., Body Mass Index, measured in ) illustrates this with numbers:
The affected nodes in the Graph Panel are also marked with the information icon .
Whenever you learn a Bayesian network from a small dataset, you must consider whether the number of observations is sufficient for correctly estimating all Probability Tables and Conditional Probability Tables in the network.
For instance, using the Occurrences Report, you can evaluate whether all Conditional Probability Tables in your network meet the rule-of-thumb criterion of at least 5 observations per cell.
For a deeper analysis, BayesiaLab can produce the Confidence Intervals Report, which we discuss on this page.
To understand how Confidence Intervals can be computed, we first need to explain the estimation of probabilities in the Probability Tables and Conditional Probability Tables, the so-called parameters.
In BayesiaLab, these parameters are estimated using Maximum Likelihood, i.e., using the frequencies observed in the dataset:
where:
is the estimated probability,
is the state of variable ,
represents the number of occurrences of the argument in the data set.
So, the Parameter Estimation is straightforward and happens entirely in the background in BayesiaLab.
As a result, we may not always be aware of what numbers gave rise to the probabilities we see in a Probability Table or Conditional Probability Table, as the following diagram illustrates:
So, BayesiaLab could have estimated a probability of 0.1 (or 10%) for in numerous ways, e.g., based on a sample of 10 or 10,000: .
However, in terms of our confidence in the estimate, the two approaches are not the same. Our intuition tells us that we should have more confidence in the 0.1 value calculated based on the sample of 10,000.
From Frequentist Statistics, we know how to calculate a Confidence Interval
for a proportion in a sample, which is exactly what the parameter represents.
BayesiaLab is using precisely the same approach for the Confidence Intervals Report.
So, for a Confidence Level of 95%, the Confidence Interval is calculated as:
where
If zero observations were observed for a given state, e.g., , the Rule of Three would have to be used instead to produce Confidence Intervals:
However, in BayesiaLab, you can avoid resorting to this heuristic by using Uniform Prior Samples.
Within this network, focus on the three nodes BMI, Age, and Gender:
Go to Main Menu > Network > Reports > Confidence Intervals
to start the Confidence Intervals Report.
The Confidence Interval Report window opens up.
At the top of the report, the Confidence Level that serves as the basis for the reported Confidence Intervals is displayed.
Then, for each node, one table is shown.
For each cell containing a parameter estimate, an adjacent cell to the right displays the corresponding Confidence Interval in percentage points.
The color-coding scheme is identical to the one used in the Occurrences Report.
The fields in the report are color-coded to highlight potential issues:
Cells with 0 Occurrences are marked with a red background.
Cells with 5 Occurrences are highlighted with a yellow background. This is generally considered the minimum acceptable number of Occurrences.
Cells with 40 or more Occurrences are marked with a green background.
You can adjust the Confidence Level used for this report.
Go to Main Menu > Window > Preferences > Tools > Statistical Tools
.
Select the desired value from the Confidence Level dropdown menu.
Note that your selection here also applies to all other statistical tools and tests used in BayesiaLab.
To illustrate the Confidence Intervals Report, we use the following network: NHANES_DEMO_BMX.xbl