Research Reports

16/2013 Models Applied to DNA Sequences with Multinomial Correlated Responses
Beatriz Cuyabano, Hildete P. Pinheiro, Aluísio Pinheiro

Multinomial multivariate models are proposed to describe the codon frequencies in DNA sequences, as well as the order and frequency that nucleotide bases have in each codon considering the dependence among the bases inside a codon. Logistic regressive models are used with different dependence structures on the three codon positions. Also, multinomialextensions of the Bahadur’s representation are proposed to model correlated multinomial data. An application of these models to the NADH4 gene from human mitochondrial genome is presented. AIC, BIC and the leave-one-out cross validation are employed to compare the various models peformance.

PDF icon rp-2013-16.pdf
15/2013 Likelihood Based Inference for Quantile Regression Using the Asymmetric Laplace Distribution
Luis Benites Sánchez, Víctor H. Lachos, Filidor E. Vilca-Labra

To make inferences about the shape of a population distribution, the widely popular mean regression model, for example, is inadequate if the distribution is not approximately Gaussian (or symmetric). Compared to conventional mean regression (MR), quantile regression (QR)can characterize the entire conditional distribution of the outcome variable, and is more robust to outliers and misspecification of the error distribution. We present a likelihood-based approach to the estimation of the regression quantiles based on the asymmetric Laplace distribution (ALD), a choice that turns out to be natural in this context. The ALD has a nice hierarchical representation which facilitates the implementation of the EM algorithm for maximum-likelihood estimation of the parameters at the pth level with the observed information matrix as a byproduct. Inspired by the EM algorithm, we develop case-deletion diagnostics analysis for QR models, following the approach of Zhu et al. (2001). This is because the observed data log–likelihood function associated with the proposed model is somewhat complex (e.g., not differentiable at zero) and by using Cook’s well-known approach it can be very difficult to obtain case-deletion measures. The techniques are illustrated with both simulated and real data. In particular, in an empirical comparison, our approach out-performed other common classic estimators under a wide array of simulated data models and is flexible enough to easily accommodate changes in their assumed distribution. The proposed algorithm and methods are implemented in the R package ALDqr().

PDF icon rp-2013-15.pdf
14/2013 Bayesian Analysis Censored Linear Regression Models with Scale Mixtures of Normal Distributions
Aldo M. Garay, Heleno Bolfarine, Víctor H. Lachos, Celso R. B. Cabral

As is the case of many studies, the data collected are limited and an exact value is recorded only if it falls within an interval range. Hence, the responses can be either left, interval or right censored. Linear (and nonlinear) regression models are routinely used to analyze these types of data and are based on normality assumptions for the errors terms. However, those analyses might not provide robust inference when the normality assumptions are questionable. In this article, we develop a Bayesian framework for censored linear regression models by replacing the Gaussian assumptions for the random errors with scale mixtures of normal (SMN) distributions. The SMN is an attractive class of symmetric heavy-tailed densities that includes the normal, Studentt, Pearson type VII, slash and the contaminated normal distributions, asspecial cases. Using a Bayesian paradigm, an efficient Markov chain Monte Carlo (MCMC) algorithm is introduced to carry out posterior inference. A new hierarchical prior distribution is suggested for the degrees of freedom parameter in the Student-t distribution. The likelihood function is utilized to compute not only some Bayesian model selection measures but also todevelop Bayesian case-deletion influence diagnostics based on the q-divergence measures. The proposed Bayesian methods are implemented in the R package BayesCR. The newly developed procedures are illustrated with an application and simulated data.

PDF icon rp-2013-14.pdf
13/2013 Linear Censored Regression Models with Scale Mixtures of Normal Distributions
Aldo M. Garay, Víctor H. Lachos, Heleno Bolfarine, Celso R. B. Cabral

In the framework of censored regression models the random errors are routinely assumed to have a normal distribution, mainly for mathematical convenience. However, this method has been criticized in the literature because of its sensitivity to deviations from the normality assumption. In practice, data such as income or viral load in AIDS studies, often violate this assumption because of heavy tails. Here, we first establish a new link between the cen-sored regression model and a recently studied class of symmetric distributions, which extend the normal one by the inclusion of kurtosis, called scale mixtures of normal (SMN) distributions. The Student-t, Pearson type VII, slash, contaminated normal, among others distributions, are contained in this class. Choosing a member of this class can be a good alternative to model this kind of data, because they have been shown its flexibility in several applications. In this work, we develop an analytically simple and efficient EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters, with standard errors as a by-product. The algorithm has closed-form expressions at the E-step, that rely on formulas for the mean and variance of certain truncated SMN distributions. The proposed algorithm is implemented in the R package SMNCensReg. Applications with simulated and a real data are reported, illustrating the usefulness of the new methodology.

PDF icon rp-2013-13.pdf
12/2013 On Sliding Periodic Solutions for Piecewise Continuous Systems Defined on the 2-Cylinder
Douglas D. Novaes, Mike R. Jeffrey, Marco A. Teixeira

This paper deals with discontinuous differential equations defined on the 2--dimensional cylinder. The main goal is to exhibit conditions for the existence of typical periodic solutions of such systems. An averaging method for computing sliding periodic solutions is developed, subject to convenient assumptions. We also apply the method to example problems. The main tools used are structural stability theory for discontinuous differential systems and Brouwer degree theory.

PDF icon rp-2013-12.pdf
11/2013 Birth of limit cycles bifurcating from a nonsmooth center
Cláudio A. Buzzi, Tiago de Carvalho, Marco A. Teixeira

This paper is concerned with a codimension analysis of a two-fold singularity of piecewise smooth planar vector fields, when it behaves itself like a center of smooth vector fields (also called nondegenerate Σ-center). We prove that any nondegenerate Σ-center is Σ-equivalent to a particular normal form Z0 . Given a positive integer number k we explicitly construct families of piecewise smooth vector fields emerging from Z0 that have k hyperbolic limit cycles bifurcating from the nondegenerate Σ-center of Z0 (the same holds for k = ∞).Moreover, we also exhibit families of piecewise smooth vector fields of codimension k emerging from Z0 . As a consequence we prove that Z0 has infinite codimension.

PDF icon rp-2013-11.pdf
10/2013 Ruelle Operator Duality for Coupled Smooth Markov Maps of the Circle
Vincent Pit

Let TL and TR be two smooth surjective Markov maps of the circle, with TR expansive, coupled in such a way that there exists an extension (C, TC ) whose first factor is TL−1 and the second factor of TC is TR . Let AL piecewise continuous and AR piecewise absolutelycontinuous be two respective potentials. We show that, when those potentials are in involutionby a smooth kernel W on C, there is an explicit isomorphism between eigenfunctions of the Ruelle operator of (TL , AL ) and eigendistributions of the Ruelle operator of (TR , AR ) for the same eigenvalue. This gives a regularity result for eigendistributions of transfer operators associated with non-maximal eigenvalues.

PDF icon rp-2013-10.pdf
9/2013 Academic performance of students from entrance to graduation via quasi U-statistics: a study at a Brazilian research university
Rafael Pimentel Maia, Hildete P. Pinheiro, Aluísio Pinheiro

We present novel methodology to assess undergraduate students’ performance. The proposed methods are based on measures of diversity and on the decomposability of quasi U-statistics to define average distances between and within groups. They have been employed as an alternative to the classic analysis of variance especially when the assumption of normality is not met. The quasi U-statistics nonparametric method can handle tests for interaction and uses jackknife to get p-values for the tests. The nonparametric method also results in smaller error variances, illustrating its robustness against model misspecication.

PDF icon rp-2013-9.pdf
8/2013 A Nonsmooth Two-Sex Population Model
Eduardo Garibaldi, Marcelo Sobottka

This paper considers a two-dimensional logistic model to study populations with two genders.The growth behavior of a population is guided by two coupled ordinary differential equations given by a non-differentiable vector field whose parameters are the secondary sex ratio (the ratio of males to females at time of birth), inter- and intra-gender competitions, fertility rates and a mating function. Using geometrical techniques, we analyze the singularities and the basin of attraction of the system, determining the relationships between the parameters for which the system presents an equilibrium point. In particular, we describe conditions on the secondary sex ratio and discuss the role of the median number of female sexual partners of each male for the conservation of a two-sex species.

PDF icon rp-2013-8.pdf
7/2013 Application of prediction models using fuzzy sets: an Bayesian inspired approach
Felipo Bacani, Laécio C. Barros

A fuzzy inference framework based on fuzzy relations is explored and applied to a real set of simulated forecasts and experimental data referring to temperature and humidity in specific coffee crop sites in Brazil. In short, the used model consists of fuzzy relations over possibility distributions, resulting in a fuzzy model analog to a Bayesian inference process. The application of the fuzzy model to temperature and humidity data resulted in a set of revised forecasts, which were later compared to the correspondent set of experimental data using two different statistical measures of accuracy, MAPE (mean absolute percentage error) and Willmott D. Statistical results were confronted to the original simulated forecast fit to experimental data, showing that the methodology was, in most cases, able to improve the specialist’s forecasts in both statistical measures.

PDF icon rp-2013-7.pdf