Relatórios de Pesquisa

5/2015 Robust Regression Modeling for Censored Data Based on Mixtures of Student-t Distributions
Víctor H. Lachos, Luis Benites Sánchez, Celso R. B. Cabral

In the framework of censored regression models, the distribution of the error terms departs significantly from normality, for instance, in the presence of heavy tails, skewness and/or atypical observations. In this paper we extend the censored linear regression model with normal errors to the case where the random errors follow a finite mixture of Student-t distributions. Thisapproach allows us to model data with great flexibility, accommodating multimodality, heavy tails and also skewness depending on the structure of the mixture components. We develop an analytically simple and efficient EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters, with standard errors as a by-product. The algorithm has closed-form expressions at the E-step, that rely on formulas for the mean and variance of the truncated Student-t distributions. The efficacy of the method is verified through the analysis of simulated datasets and modeling a censored real dataset first analyzed under normal and Student-t errors. The proposed algorithm and methods are implemented in the R package CensMixReg().

PDF icon rp-2015-5.pdf
4/2015 Calibrated configurations for Frenkel-Kontorova type models in almost-periodic environments
Eduardo Garibaldi, Samuel Petite, Philippe Thieullen

The Frenkel-Kontorova model describes how an infinite chain of atoms minimizes the total energy of the system when the energy takes into account the interaction of nearest neighbors as well as the interaction with an exterior environment. An almost-periodic environment leads to consider a family of interaction energies which is stationary with respect to a minimal topological dynamical system. We introduce, in this context, the notion of calibrated configuration (stronger than the standard minimizing condition) and, for continuous superlinear interaction energies, we show the existence of these configurations for some environment of the dynamical system. Furthermore, in one dimension, we give sufficient conditions on the family of interaction energies to ensure, for any environment, the existence of calibrated configurations when the underlying dynamics is uniquely ergodic. The main mathematical tools for this study are developed in the frameworks of discrete weak KAM theory, Aubry-Mather theory and spaces of Delone sets.

PDF icon rp-2015-4.pdf
3/2015 Bayesian Analysis of Censored Linear Regression Models with Scale Mixtures of Skew-Normal Distributions
Monique B. Massuia, Aldo M. Garay, Víctor H. Lachos, Celso R. B. Cabral

As is the case of many studies, the data collected are limited and an exact value is recorded only if it falls within an interval range. Hence, the resp onses can b e either left, interval or right censored. Linear (and nonlinear) regression mo dels are routinely used to analyze these typ es of data and are based on the normality assumption for the errors terms. However, those analyses might not provide robust inference when the normality assumption (or symmetry) is questionable. In this article, we develop a Bayesian framework for censored linear regression mo dels by replacing the Gaussian assumption for the randomerrors with the asymmetric class of scale mixtures of skew-normal (SMSN) distributions. The SMSN is an attractive class of asymmetrical heavy-tailed densities that includes the skew-normal, skew-t, skew-slash, the skew-contaminated normal and the entire family of scale mixtures of normal distributions as sp ecial cases. Using a Bayesian paradigm, an efficient Markov chain Monte Carlo (MCMC) algorithm is intro duced to carry out p osterior inference. The likeliho o d function is utilized to compute not only some Bayesian mo del selection measures but also to develop Bayesian case-deletion influence diagnostics based on the q-divergence measures. The prop osed Bayesian metho ds are implemented in the R package BayesCR, prop osed by the authors. The newly develop ed pro cedures are illustrated with applications using real and simulated data.

PDF icon rp-2015-3.pdf
2/2015 Quantile Regression for Linear Mixed Models: A Stochastic Approximation EM approach
Christian E. Galarza, Dipankar Bandyopadhyay, Víctor H. Lachos

This paper develops a likelihood-based approach to analyze quantile regression (QR) models for continuous longitudinal data via the asymmetric Laplace distribution (ALD).Compared to the conventional mean regression approach, QR can characterize the entire conditional distribution of the outcome variable and is more robust to the presence of outliers and misspecification of the error distribution. Exploiting the nice hierarchical representation of the ALD, our classical approach follows a Stochastic Approximation of the EM (SAEM)algorithm in deriving exact maximum likelihood estimates of the fixed-effects and variance components. We evaluate the finite sample performance of the algorithm and the asymptotic properties of the ML estimates through empirical experiments and applications to two real life datasets. Our empirical results clearly indicate that the SAEM estimates outperforms theestimates obtained via the combination of Gaussian quadrature and non-smooth optimization routines of the Geraci (2014)’s approach in terms of standard errors and mean square error.The proposed SAEM algorithm is implemented in the R package qrLMM()

PDF icon rp-2015-2.pdf
1/2015 Modelling Performance of Students with Generalized Linear Mixed Models
Hildete P. Pinheiro, Mariana R. Motta, Gabriel Franco

We propose generalized linear mixed models (GLMM) to evaluate the performance of undergraduate students from the State University of Campinas (Unicamp). For each student we have the final GPA score as well as the number of courses he/she failed during his/her Bachelor's degree. The courses are separated in three categories: Required (R), Elective (E) and Extracurricular courses (Ex).Therefore, for each response variable, each student may have at most three measures. In this model we need to take into account the within student correlation between required, elective and extracurricular courses.The main purpose of this study is the sector of High School education from which college students come - Private or Public. As some affirmative action programs are being implemented by the Brazilian government to include more students from Public Schools in the Universities, there is a great interest in studies of performance of undergraduate students according to the sector of High School of which they come from. The data set comes from the State University of Campinas (Unicamp), a public institution, in the State of S~ao Paulo, Brazil and one of the top universities in Brazil. The socioeconomic status and academic data of more than 10,000 students admitted to Unicamp from 2000 through 2005 forms the study database.

PDF icon rp-2015-1.pdf
15/2014 Near weights on higher dimensional varieties
Cícero Carvalho, Rafael Peixoto, Fernando Torres

We generalize the concept of near weight stated in [2007, IEEE Trans. Inform. Theory 53(5), 1919–1924] in the sense that we consider maps to arbitrary well-ordered semigroups instead of the nonnegative integers. This concept can be used as a tool to study AG codes based on more than one point via elementary methods only.

PDF icon rp-2014-15.pdf
14/2014 Introduction to expanding ergodic optimization
Eduardo Garibaldi

These lecture notes grew out of a graduate course on ergodic optimization given by the author at the University of Campinas. Obviously some back-ground in ergodic theory is required to follow the text. Moreover, these notes are by no means meant to be exhaustive. As a matter of fact, we focus mostly on the interpretation of ergodic optimal problems as questions of variational dynamics (see, for instance, [30, 37, 38, 55]), in a compara-ble way to the Aubry-Mather theory for Lagrangian systems. The reader shall be conscious that other points of view are also useful in ergodic op-timization, like the one based on properties of Sturmian measures and its generalizations (see, for example, [14, 21, 48]). Ergodic optimization is a theoretical branch primarily concerned with the study of the so-called optimizing probability measures. The goal of this introductory monograph is hence twofold. One objective is to present and

PDF icon rp-2014-14.pdf
13/2014 Aubry set for Asymptotically Sub-Additive Potentials
Eduardo Garibaldi, João Tiago Assunção Gomes

Given a topological dynamical systems \((X, T)\), consider a sequence of continuous potentials \(F := \{f_n: X → \mathbb{R}\}_{n\geq 1}\) that is asymptotically approached by sub-additive families. In a generalized version of ergodic optimization theory, one is interested in describing the set \(M_{\rm max}(F)\) of \(T\)-invariant probabilities that attain the following maximum value \({\rm max} \{\lim_{ n\to\infty} \frac{1}{n} \int f_n d\mu : \mu\ {\rm is}\ T{\rm -invariant\ probability}\}\). For this purpose, we extend the notion of Aubry set, denoted by \(\Omega(F)\). Our main result provide a sufficient condition for the Aubry set to be a maximizing set, i. e., \(\mu\) belongs to \(M_{\rm max}(F)\) if, and only if, its support lies on \(\Omega(F)\). Furthermore, we apply this result to the study of the generalized spectral radius in order to show the existence of periodic matrix configurations approaching this value.

12/2014 Censored Mixed-Effects Models for Irregularly Observed Repeated Measures with Applications to HIV Viral Loads
Larissa A. Matos, Luis M. Castro, Víctor H. Lachos

In some AIDS clinical trials, the HIV-1 RNA measurements are collected irregularly over time and are often subject to some upper and lower detection limits, depending on the quantification assays. Linear and nonlinear mixed-effects models, withmodifications to accommodate censored observations, are routinely used to analyze this type of data Vaida & Liu (2009); Matos et al. (2013a). This paper presents a framework for fitting LMEC/NLMEC with response variables recorded at irregular intervals. To address the serial correlation among the within-subject errors, a damped exponential correlation structure is considered in the random error and an EM-type algorithm is developed for computing the maximum likelihood estimates,obtaining as a byproduct the standard errors of the fixed effects and the likelihood value. The proposed methods are illustrated with simulations and the analysis of two real AIDS case studies.

PDF icon rp-2014-12.pdf
11/2014 Robust Mixture Regression Modeling Based on Scale Mixtures of Skew-Normal Distributions
Camila Borelli Zeller, Celso R. B. Cabral, Víctor H. Lachos

The traditional estimation of mixture regression models is based on the assumption of normality (symmetry) of component errors and thus is sensitive to outliers, heavy-tailed errors and/or asymmetric errors. In this work we present a proposal to deal with these issues simultaneously in the context of the mixture regression by extending the classic normal model byassuming that the random errors follow a scale mixtures of skew-normal distributions. This approach allows us to model data with great flexibility, accommodating skewness and heavy tails. The main virtue of considering the mixture regression models under the class of scale mixtures of skew-normal distributions is that they have a nice hierarchical representation whichallows easy implementation of inference. We develop a simple EM-type algorithm to perform maximum likelihood inference of the parameters of the proposed model. In order to examine the robust aspect of this flexible model against outlying observations, some simulation studies are also been presented. Finally, a real data set is analyzed, illustrating the usefulness of the proposed method.

PDF icon rp-2014-11.pdf