Relatórios de Pesquisa

2/2016 Multivariate Measurement Error Models Based on Student-t Distribution under Censored Responses
Larissa A. Matos, Luis M. Castro, Celso R. B. Cabral, Vı́ctor H. Lachos

Measurement error models constitute a wide class of models, that include linear and nonlinear regression models. They are very useful to model many real life phenomena, particularly inthe medical and biological areas. The great advantage of these models is that, in some sense, they can be represented as mixed effects models, allowing to us the implementation of well-known techniques, like the EM-algorithm for the parameter estimation. In this paper, we consider a class of multivariate measurement error models where the observed response and/orcovariate are not fully observed, i.e., the observations are subject to certain threshold values below or above which the measurements are not quantifiable. Consequently, these observationsare considered censored. We assume a Student-t distribution for the unobserved true values of the mismeasured covariate and the error term of the model, providing a robust alternativefor parameter estimation. Our approach relies on a likelihood-based inference using the EM-algorithm. The proposed method is illustrated through simulation studies and the analysis of a real dataset.

PDF icon rp-2016-2.pdf
1/2016 Heavy-tailed longitudinal regression models for censored data: A likelihood based perspective
Larissa A. Matos, Víctor H. Lachos, Tsung-I Lin, Luis M. Castro

HIV RNA viral load measures are often subjected to some upper and lower detection limits depending on the quantification assays. Hence, the responses are either left or right censored. Moreover,it is quite common to observe viral load measurements collected irregularly over time. A complica-tion arises when these continuous repeated measures have a heavy-tailed behaviour. For such data structures, we propose a robust censored linear model based on the scale mixtures of normal distributions (SMN family). To take into account the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is considered. A stochastic approximation of the EM algorithm (SAEM algorithm) is developed to obtain the maximum likelihood estimates of the model parameters. The main advantage of this new procedure allows us to estimate the parameters of interest and evaluate the log-likelihood function in an easy and fast way. Furthermore, the standard errors of the fixed effects and predictions of unobservable values of the response can be obtained asa by-product. The practical utility of the proposed methodology is exemplified using both simulatedand real data.

PDF icon rp-2016-1.pdf
8/2015 Zero-temperature Phase Diagram for Double-Well Type Potentials in the Summable Variation Class
Rodrigo Bissacot, Eduardo Garibaldi, Philippe Thieullen

We study the zero-temperature limit of the Gibbs measures of a class of long-range potentials on a full shift of two symbols {0, 1}. These potentials were introduced by Walters as a natural space for the transfer operator. In our case, they are locally constant, Lipschitz continuous or, more generally, of summable variation. We assume there exists exactly two ground states: the fixed points 0 ∞ and 1 ∞ . We fully characterize, in terms of the Peierls barrier between the two ground states, the zero-temperature phase diagram of such potentials, that is, the regions of convergence or divergence of the Gibbs measures as the temperature goes to zero.

PDF icon rp-2015-8.pdf
7/2015 Likelihood Based Inference for Censored Linear Regression Models with Scale Mixtures of Skew-Normal Distributions
Thalita do Bem Mattos, Aldo M. Garay, Víctor H. Lachos

In many studies the data collected are subject to some upper and lower detection limits. Hence, theresponses are either left or right censored. A complication arises when these continuous measures presentheavy tails and asymmetrical behavior, simultaneously. For such data structures, we propose a robustcensored linear model based on the scale mixtures of skew-normal (SMSN) distributions. The SMSN is anattractive class of asymmetrical heavy-tailed densities that includes the skew-normal, skew-t, skew-slash,skew-contaminated normal and the entire family of scale mixtures of normal (SMN) distributions asspecial cases. We propose a fast estimation procedure to obtain the maximum likelihood (ML) estimatesof the parameters, using a stochastic approximation of the EM (SAEM) algorithm. This approach allowsus to estimate the parameters of interest easily and quickly, obtaining as a byproduct the standard errors,predictions of unobservable values of the response and the log-likelihood function. The proposed methodsare illustrated through a real data application and several simulation studies.

PDF icon rp-2015-7.pdf
6/2015 Influence Diagnostics in Spatial Models with Censored Response
Thais S. Barbosa, Víctor H. Lachos, Dipak K. Dey

Environmental data is often spatially correlated and sometimes include below detection limit observations (i.e.,censored values reported as less than a level of detection). Existing work mainly concentrates on parameter estimation using Gibbs sampling, and work conducted from a frequentist perspective in spatial censored models areelusive. In this paper, we propose an exact estimation procedure to obtain the maximum likelihood estimates of thefixed effects and variance components, using a stochastic approximation of the EM (SAEM) algorithm (Delyonet al., 1999). This approach permits estimation of the parameters of spatial linear models when censoring is presentin an easy and fast way. As a by-product, predictions of unobservable values of the response variable are possible.Motivated by this algorithm, we develop local influence measures on the basis of the conditional expectation ofthe complete-data log-likelihood function which eliminates the complexity associated with the approach of Cook(1977, 1986) for spatial censored models. Some useful perturbation schemes are discussed. The newly developedmethodology is illustrated using data from a dioxin contaminated site in Missouri. In addition, a simulation studyis presented, which explores the accuracy of the proposed measures in detecting influential observations underdifferent perturbation schemes.

PDF icon rp-2015-6.pdf
5/2015 Robust Regression Modeling for Censored Data Based on Mixtures of Student-t Distributions
Víctor H. Lachos, Luis Benites Sánchez, Celso R. B. Cabral

In the framework of censored regression models, the distribution of the error terms departs significantly from normality, for instance, in the presence of heavy tails, skewness and/or atypical observations. In this paper we extend the censored linear regression model with normal errors to the case where the random errors follow a finite mixture of Student-t distributions. Thisapproach allows us to model data with great flexibility, accommodating multimodality, heavy tails and also skewness depending on the structure of the mixture components. We develop an analytically simple and efficient EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters, with standard errors as a by-product. The algorithm has closed-form expressions at the E-step, that rely on formulas for the mean and variance of the truncated Student-t distributions. The efficacy of the method is verified through the analysis of simulated datasets and modeling a censored real dataset first analyzed under normal and Student-t errors. The proposed algorithm and methods are implemented in the R package CensMixReg().

PDF icon rp-2015-5.pdf
4/2015 Calibrated configurations for Frenkel-Kontorova type models in almost-periodic environments
Eduardo Garibaldi, Samuel Petite, Philippe Thieullen

The Frenkel-Kontorova model describes how an infinite chain of atoms minimizes the total energy of the system when the energy takes into account the interaction of nearest neighbors as well as the interaction with an exterior environment. An almost-periodic environment leads to consider a family of interaction energies which is stationary with respect to a minimal topological dynamical system. We introduce, in this context, the notion of calibrated configuration (stronger than the standard minimizing condition) and, for continuous superlinear interaction energies, we show the existence of these configurations for some environment of the dynamical system. Furthermore, in one dimension, we give sufficient conditions on the family of interaction energies to ensure, for any environment, the existence of calibrated configurations when the underlying dynamics is uniquely ergodic. The main mathematical tools for this study are developed in the frameworks of discrete weak KAM theory, Aubry-Mather theory and spaces of Delone sets.

PDF icon rp-2015-4.pdf
3/2015 Bayesian Analysis of Censored Linear Regression Models with Scale Mixtures of Skew-Normal Distributions
Monique B. Massuia, Aldo M. Garay, Víctor H. Lachos, Celso R. B. Cabral

As is the case of many studies, the data collected are limited and an exact value is recorded only if it falls within an interval range. Hence, the resp onses can b e either left, interval or right censored. Linear (and nonlinear) regression mo dels are routinely used to analyze these typ es of data and are based on the normality assumption for the errors terms. However, those analyses might not provide robust inference when the normality assumption (or symmetry) is questionable. In this article, we develop a Bayesian framework for censored linear regression mo dels by replacing the Gaussian assumption for the randomerrors with the asymmetric class of scale mixtures of skew-normal (SMSN) distributions. The SMSN is an attractive class of asymmetrical heavy-tailed densities that includes the skew-normal, skew-t, skew-slash, the skew-contaminated normal and the entire family of scale mixtures of normal distributions as sp ecial cases. Using a Bayesian paradigm, an efficient Markov chain Monte Carlo (MCMC) algorithm is intro duced to carry out p osterior inference. The likeliho o d function is utilized to compute not only some Bayesian mo del selection measures but also to develop Bayesian case-deletion influence diagnostics based on the q-divergence measures. The prop osed Bayesian metho ds are implemented in the R package BayesCR, prop osed by the authors. The newly develop ed pro cedures are illustrated with applications using real and simulated data.

PDF icon rp-2015-3.pdf
2/2015 Quantile Regression for Linear Mixed Models: A Stochastic Approximation EM approach
Christian E. Galarza, Dipankar Bandyopadhyay, Víctor H. Lachos

This paper develops a likelihood-based approach to analyze quantile regression (QR) models for continuous longitudinal data via the asymmetric Laplace distribution (ALD).Compared to the conventional mean regression approach, QR can characterize the entire conditional distribution of the outcome variable and is more robust to the presence of outliers and misspecification of the error distribution. Exploiting the nice hierarchical representation of the ALD, our classical approach follows a Stochastic Approximation of the EM (SAEM)algorithm in deriving exact maximum likelihood estimates of the fixed-effects and variance components. We evaluate the finite sample performance of the algorithm and the asymptotic properties of the ML estimates through empirical experiments and applications to two real life datasets. Our empirical results clearly indicate that the SAEM estimates outperforms theestimates obtained via the combination of Gaussian quadrature and non-smooth optimization routines of the Geraci (2014)’s approach in terms of standard errors and mean square error.The proposed SAEM algorithm is implemented in the R package qrLMM()

PDF icon rp-2015-2.pdf
1/2015 Modelling Performance of Students with Generalized Linear Mixed Models
Hildete P. Pinheiro, Mariana R. Motta, Gabriel Franco

We propose generalized linear mixed models (GLMM) to evaluate the performance of undergraduate students from the State University of Campinas (Unicamp). For each student we have the final GPA score as well as the number of courses he/she failed during his/her Bachelor's degree. The courses are separated in three categories: Required (R), Elective (E) and Extracurricular courses (Ex).Therefore, for each response variable, each student may have at most three measures. In this model we need to take into account the within student correlation between required, elective and extracurricular courses.The main purpose of this study is the sector of High School education from which college students come - Private or Public. As some affirmative action programs are being implemented by the Brazilian government to include more students from Public Schools in the Universities, there is a great interest in studies of performance of undergraduate students according to the sector of High School of which they come from. The data set comes from the State University of Campinas (Unicamp), a public institution, in the State of S~ao Paulo, Brazil and one of the top universities in Brazil. The socioeconomic status and academic data of more than 10,000 students admitted to Unicamp from 2000 through 2005 forms the study database.

PDF icon rp-2015-1.pdf