Roberta de Souza, Hildete P.
Pinheiro, Cibele Q. da Silva and Sérgio F. dos Reis, Analysis of Variance
for Genomic Sequences in Unbalanced Designs
Abstract:
In the study of genetic
divergence among organisms, generally the analysis is done directly from the
DNA molecule. Therefore, a possible outcome is categorical being one out of
four categories (looking at the nucleotide level). Light \& Margolin (1971)
developed an analysis of variance for categorical data (CATANOVA) and Pinheiro et al. (2000) employed a similar measure of variation
and extended the CATANOVA procedure taking into account several positions in
the sequence for balanced designs. Here we consider variable number of
sequences in each group, that is, the samples are unbalanced. In order to test the null hypothesis of
homogeneity among groups, the asymptotic distribution of the test statistic was
found and its power is evaluated. An application of the test to real data is
illustrated using resampling methods such as the bootstrap to generate the
empirical distribution of the test statistics.
KEYWORDS: Analysis of variance; Bootstrap; Categorical
data; Asymptotic distribution; Molecular data; Statistical genetics; Unbalanced
designs.
Copy
of the file:
rp20-04.ps (postscript)
rp20-04.ps.gz
(gzipped postscript)
April 19, 2004