Relatório de Pesquisa 20/2004


Roberta de Souza, Hildete P. Pinheiro, Cibele Q. da Silva and Sérgio F. dos Reis, Analysis of Variance for Genomic Sequences in Unbalanced Designs



In the study of genetic divergence among organisms, generally the analysis is done directly from the DNA molecule. Therefore, a possible outcome is categorical being one out of four categories (looking at the nucleotide level). Light \& Margolin (1971) developed an analysis of variance for categorical data  (CATANOVA) and  Pinheiro et al. (2000) employed a similar measure of variation and extended the CATANOVA procedure taking into account several positions in the sequence for balanced designs. Here we consider variable number of sequences in each group, that is, the samples are unbalanced.  In order to test the null hypothesis of homogeneity among groups, the asymptotic distribution of the test statistic was found and its power is evaluated. An application of the test to real data is illustrated using resampling methods such as the bootstrap to generate the empirical distribution of the test statistics.


KEYWORDS: Analysis of variance; Bootstrap; Categorical data; Asymptotic distribution; Molecular data; Statistical genetics; Unbalanced designs. 


April 19, 2004


