Relatórios de Pesquisa

10/2016 Modelling the proportion of failed courses and GPA scores for engineering major students
Hildete P. Pinheiro, Rafael P. Maia, Eufrásio A. Lima Neto, Mariana R. Motta

There is special interest on the factors which may contribute for the best academic performance of undergraduate students. Particularly, in Brazil, because of the recent quota system and affirmative action programs implemented by some universities and the Federal Government, this issue has been of great interest. We use here zero-one inflated beta models with heteroscedasticity to model the proportion of failed courses taken by Engineering major students at the State University of Campinas, Brazil. We also model the grade point average score for those students with a heteroscedastic skew t distribution. The database consists of records of 3,549 students with Engineering major who entered in the University from 2000 to 2005. The entrance exam score in each subject, some academic variables and their socioeconomic status are considered as covariates in the models. A residual analysis based on randomized quantile residuals is performed as well. Finally, we believe that the results found in this study can be useful to improve the university polices for new students since it was possible to identify student profiles with respect to their academic performance.

PDF icon rp-2016-10.pdf
9/2016 Censored Regression Models with Autoregressive Errors: A Likelihood-Based Perspective
Fernanda L. Schumacher, Víctor H. Lachos, Dipak K. Dey

In many studies that involve time series variables, limited or censored data are naturallycollected. This occurs, in several practical situations, for reasons such as limitations of mea-suring equipment or from experimental design. Hence, the exact true value is recorded only ifit falls within an interval range, so the responses can be either left, interval or right censored.Practitioners commonly disregard censored data cases or replace these observations with somefunction of the limit of detection, which often results in biased estimates. In this paper, wepropose an analytically tractable and efficient stochastic approximation of the EM (SAEM)algorithm to obtain the maximum likelihood estimates of the parameter of censored regressionmodels with autoregressive errors of order p. This approach permits easy and fast estimationof the parameters of autoregressive models when censoring is present and as a byproduct, en-ables predictions of unobservable values of the response variable. The observed informationmatrix is derived analytically to account for standard errors. We use simulations to investigatethe asymptotic properties of the SAEM estimates and prediction accuracy. In this simulationstudy comparisons are also made between inferences based on the censored data and thosebased on complete data obtained by crude/ad-hoc imputation methods. Finally, the method isillustrated using a meteorological time series dataset on cloud ceiling height, where the mea-surements are subject to the detection limit of the recording device. The proposed algorithmand methods are implemented in the new R package ARCensReg.

PDF icon rp-2016-9.pdf
8/2016 Quantile Regression for Nonlinear Mixed Effects Models: A Likelihood Based Perspective
Christian E. Galarza, Luis M. Castro, Francisco Louzada, Víctor H. Lachos

Longitudinal data are frequently analyzed using normal mixed effects models. Moreover,the traditional estimation methods are based on mean regression, which leads to non-robustparameter estimation for non-normal error distributions. Compared to the conventional meanregression approach, quantile regression (QR) can characterize the entire conditional distribu-tion of the outcome variable and is more robust to the presence of outliers and misspecificationof the error distribution. This paper develops a likelihood-based approach to analyzing QRmodels for correlated continuous longitudinal data via the asymmetric Laplace (AL) distri-bution. Exploiting the nice hierarchical representation of the AL distribution, our classicalapproach follows the Stochastic Approximation of the EM (SAEM) algorithm for deriving ex-act maximum likelihood estimates of the fixed-effects and variance components in nonlinearmixed effects models (NLMEMs). We evaluate the finite sample performance of the algorithmand the asymptotic properties of the ML estimates through empirical experiments and applica-tions to two real life datasets. The proposed SAEM algorithm is implemented in the R packageqrNLMM.

PDF icon rp-2016-8.pdf
7/2016 Robust Quantile Regression using a Generalized Class of Skewed Distributions
Christian E. Galarza, Víctor H. Lachos, Celso R. B. Cabral, Luis M. Castro

It is well known that the widely popular mean regression model could be inadequate if the probability distribution of the observed responses do not follow a symmetric distribution. To deal with this situation, the quantile regression turns to be a more robust alternative for accommodating outliers and the misspecification of the error distribution since it characterizesthe entire conditional distribution of the outcome variable. This paper presents a likelihood-based approach for the estimation of the regression quantiles based on a new family of skewed distributions introduced by Wichitaksorn et al. (2014). This family includes the skewed version of Normal, Student-t, Laplace, contaminated Normal and slash distribution, all with the zeroquantile property for the error term, and with a convenient and novel stochastic representation which facilitates the implementation of the EM algorithm for maximum-likelihood estimation of the pth quantile regression parameters. We evaluate the performance of the proposed EM algorithm and the asymptotic properties of the maximum-likelihood estimates through empirical experiments and application to a real life dataset. The algorithm is implemented in the R package lqr(), providing full estimation and inference for the parameters as well as simulation envelopes plots useful for assessing the goodness-of-fit.

PDF icon rp-2016-7.pdf
6/2016 Linear Regression Models with Finite Mixtures of Skew Heavy-Tailed Errors
Luis Benites Sánchez, Rocío Maehara, Víctor H. Lachos

We consider estimation of regression models whose error terms follow a finite mixture of scale mixtures of skew-normal (SMSN) distributions, a rich class of distributions that contains the skew-normal, skew-t, skew-slash and skew-contaminated normal distributions as proper elements. This approach allows us to model data with great flexibility, accommodating simultaneously multimodality, skewness and heavy tails. We developed a simple EM-type algorithm to perform maximum likelihood (ML) inference of theparameters of the proposed model with closed-form expression at the E-step. Furthermore, the standard errors of the ML estimates can be obtained as a byproduct. The practical utility of the new method is illustrated with the analysis of real dataset and several simulation studies. The proposed algorithm and methods are implemented in the R package FMsmsnReg().

PDF icon rp-2016-6.pdf
5/2016 Influence Diagnostics for Censored Linear Regression Models with Skewed and Heavy-tailed Distributions
Thalita do Bem Mattos, Víctor H. Lachos, Aldo M. Garay

The scale mixtures of skew-normal (SMSN) distributions (Lachos et al., 2010) form an attractive class of asymmetrical heavy-tailed densities that includes the skew-normal, skew-t, skew-slash, skew - contaminated normal and the entire family of cale mixtures of normal (SMN) distributions as special cases. A robust censored linear model based on the scale mixtures of skew-normal (SMSN) distributions has been recently proposed by Mattos et al.(2015), where a stochastic approximation of the EM (SAEM) algorithm is presented for iteratively computing maximum likelihood estimates of the parameters. In this paper, to examine the performance of the proposed model, case-deletion and local influence techniques are de veloped to show its robust aspect against outlying and influential observations. This is done by analyzing the sensitivity of the SAEM estimates under some usual perturbation schemes in the model or data and by inspecting some proposed diagnostic graphs. The efficacy of the method is verified through the analysis of simulated datasets and modeling a real dataset from stellar astronomy previously analyzed under normal errors.

PDF icon rp-2016-5.pdf
4/2016 Estimation in Spatial Models with Censored Response
Thais S. Barbosa, Víctor H. Lachos, Larissa A. Matos, Marcos O. Prates

Spatial environmental data can be subject to some upper and lower limits of detection (LOD), below orabove which the measures are not quantifiable. As a result, the responses are either left or right censored.Historically, the most common practice for analysis of such data has been to replace the censored observa-tions with some function of the limit of detection (LOD/2, 2LOD), or through data augmentation, by usingMarkov chain Monte Carlo methods. In this paper, we propose an exact estimation procedure to obtain themaximum likelihood estimates of the fixed effects and variance components, using a stochastic approxi-mation of the EM algorithm, the SAEM algorithm (Delyon et al., 1999). This approach permits easy andfast estimation of the parameters of spatial linear models when censoring is present. As a byproduct, pre-dictions of unobservable values of the response variable are possible. The proposed algorithm is appliedto a spatial dataset of depths of a geological horizon that contains both left- and right-censored data. Wealso use simulation to investigate the small sample properties of predictions and parameter estimates andthe robustness of the SAEM algorithm. In this simulation study comparisons are made between inferencesbased on the censored data and inferences based on complete data obtained by a crude/ad hoc imputationmethod (LOD/2, 2LOD). The results show that differences in inference between the two approaches can besubstantial.

PDF icon rp-2016-4.pdf
3/2016 Moments of truncated skew-normal/independent distributions
Víctor H. Lachos, Aldo M. Garay, Celso R. B. Cabral

In this work we consider the problem of finding the moments of a doubly truncated member of theclass of skew-normal/independent (TSNI) distributions. We obtained a general result and then use itto derive the moments in the case of doubly truncated versions of skew-normal, skew-t, skew-slash andskew-contaminated normal distributions. Many properties of the TSNI family are studied, inference pro-cedures are developed and a simulation study is performed to assess the procedures. An application inthe context of censored regression models is also provided.

PDF icon rp-2016-3.pdf
2/2016 Multivariate Measurement Error Models Based on Student-t Distribution under Censored Responses
Larissa A. Matos, Luis M. Castro, Celso R. B. Cabral, Vı́ctor H. Lachos

Measurement error models constitute a wide class of models, that include linear and nonlinear regression models. They are very useful to model many real life phenomena, particularly inthe medical and biological areas. The great advantage of these models is that, in some sense, they can be represented as mixed effects models, allowing to us the implementation of well-known techniques, like the EM-algorithm for the parameter estimation. In this paper, we consider a class of multivariate measurement error models where the observed response and/orcovariate are not fully observed, i.e., the observations are subject to certain threshold values below or above which the measurements are not quantifiable. Consequently, these observationsare considered censored. We assume a Student-t distribution for the unobserved true values of the mismeasured covariate and the error term of the model, providing a robust alternativefor parameter estimation. Our approach relies on a likelihood-based inference using the EM-algorithm. The proposed method is illustrated through simulation studies and the analysis of a real dataset.

PDF icon rp-2016-2.pdf
1/2016 Heavy-tailed longitudinal regression models for censored data: A likelihood based perspective
Larissa A. Matos, Víctor H. Lachos, Tsung-I Lin, Luis M. Castro

HIV RNA viral load measures are often subjected to some upper and lower detection limits depending on the quantification assays. Hence, the responses are either left or right censored. Moreover,it is quite common to observe viral load measurements collected irregularly over time. A complica-tion arises when these continuous repeated measures have a heavy-tailed behaviour. For such data structures, we propose a robust censored linear model based on the scale mixtures of normal distributions (SMN family). To take into account the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is considered. A stochastic approximation of the EM algorithm (SAEM algorithm) is developed to obtain the maximum likelihood estimates of the model parameters. The main advantage of this new procedure allows us to estimate the parameters of interest and evaluate the log-likelihood function in an easy and fast way. Furthermore, the standard errors of the fixed effects and predictions of unobservable values of the response can be obtained asa by-product. The practical utility of the proposed methodology is exemplified using both simulatedand real data.

PDF icon rp-2016-1.pdf