Modified Estimators of Population Mean Using Robust Multiple Regression Methods
Ahmed Audua, Ishaq O. Olawoyinb, Jamiu O. Muilia
aDepartment of Mathematics, Usmanu Danfidiyo University, Sokoto, Nigeria.
e-mail: [email protected] of Statistics, University of Science and Technology, Wudil, Nigeria.
e-mail: [email protected]
In this paper, finite population mean estimators with independent multi-auxiliary variables have been proposed. The mean squared errors (MSEs) of the proposed estimators were derived up to second degree approximation. The empirical study was conducted and the results revealed that proposed estimators were more efficient.
Keywords: Estimators, Auxiliary variables, Multiple Regression, Outliers, Efficiency
Introduction
Auxiliary variables which are correlated with the study variables are helpful in improving efficiency of ratio, product and regression estimators both at planning and estimation stages. Nevertheless, the efficiency of these aforementioned estimators can be affected due to presence of outliers or leverages in the data. Authors like [1], [2], [3], e.t.c. utilized Huber-M estimators in place of least squares estimators to reduce the effects of outliers on the efficiency of [4] estimators.
However, the studies of situations when study variables are correlated with independent multi-auxiliary variables like expenditure with salary and teacher-pupils ratio, GDP with inflation rate, export rate and import rate, obesity with body weight, height and blood pressure etc in estimators which use robust regression methods have been given no or less attention. Therefore, in this study modified estimators using Robust Multiple Regression Methods have been suggested.
The estimators of population mean proposed by [2] using robust regression methods coefficients are;
(1.1)
(1.2)
(1.3)
(1.4)
(1.5)
where and are population coefficients of variation, kurtosis and robust regression methods.
(1.6)
where , , , is the sample size, is the population size,are coefficients of slope obtained from Tukey-M [5], Hampel-M [6], Huber-M [7], LMS [8] and LAD [9] methods,and are the population variances of the study and auxiliary variables respectively,is the population covariance between the auxiliary and the study variables and
Robust Regression Methods
Ordinary least squared (OLS) regression estimators do not gives reliable results when the data is characterized by outliers. To overcome this challenge, alternative methods called robust regression methods which are not affected by outliers were proposed in literature. Robust regression methods with multiple predictors that were used in this paper were introduced in this section.
2.1 L-Estimators
L-Estimators are scale equivariant and regression equivariant. However, the breakdown point is still 0% and for holds, 0.5-regression quantile estimator is least absolute values estimator. Examples of L-Estimator adopted in this paper are LAD, LMS and LTS.
2.1.1 Least absolute deviation (LAD) method
LAD is robust regression method which minimizes the sum of the absolute error and is given as
(2.1)
2.1.2 Least trimmed squares (LTS) method
In LTS, squared error terms are firstly arranged in ascending order and then, sum of first h of sorted error terms are obtained and finally minimized as;
(2.2) where are ordered squared residuals
2.2 M-Estimators
M-Estimators are based on the idea of replacing the squared residuals used in OLS estimation by another function of the residuals given as
(2.3) where is a symmetric function with a unique minimum at zero.
2.2.1 Huber-M estimation function
Huber-M estimator is defined by the function
(2.4) where is a turning constant determining the degree of robustness
2.2.2 Hampel-M estimation function
Hampel-M estimator is defined by the function
(2.5) where and constants.2.2.3 Tukey-M estimation function
Tukey-M estimator is defined by the function
(2.6) where is a constant.
Suggested estimators
Having study the work of [2] the suggested estimators are presented in general form as
(3.1) where and are either population coefficients of variation or kurtosis of independent auxiliary variables , but .
To obtain the mean squared error of , the error terms and are defined such that the expectations are given as
(3.2)
The MSE of to second degree approximation using Taylor series method is obtained as; (See Appendix A for details of derivation)
(3.3)
If and , then the suggested estimator becomes;
(3.4)
The MSE of is equivalent to MSE of but is replaced by 1.
If and , then the suggested estimator becomes;
(3.5)
The MSE of is equivalent to MSE of but is replaced by .
If and , then the suggested estimator becomes;
(3.6)
The MSE of is equivalent to MSE of but is replaced by .
If and , then the suggested estimator becomes;
(3.7)
The MSE of is equivalent to MSE of but is replaced by .
If and , then the suggested estimator becomes;
(3.8)
The MSE of is equivalent to MSE of but is replaced by.
Numerical illustration
A Simulation study is conducted to assess the performance of the suggested estimators with respect to [2] estimators. The steps for stimulation are as follows;
Step1: sample of size 30,000 from normal population is drawn without replacement using simple random sampling scheme as
and
Step2: construct regression models as:
(3.9) where are regression coefficient of Huber-M, Tukey-M, Hampel-M, LTS and LAD robust estimators.
Step 3: calculate MSE as given below;
(3.10) where is the estimated mean with sample sizes and is the population mean.
Table 1: MSE of and when sample size
[2] estimators
0.05001653 0.05001781 0.05001696 0.05001721 0.05001656
0.05001653 0.05001781 0.05001696 0.05001721 0.05001656
0.05001497 0.05001614 0.05001536 0.05001559 0.05001500
0.05001094 0.05001177 0.05001122 0.05001138 0.05001096
0.05002385 0.05002554 0.05002442 0.05002475 0.05002389
Proposed estimators
0.05001337 0.05001503 0.05001389 0.05001424 0.05001341
0.05001337 0.05001503 0.05001389 0.05001424 0.05001341
0.05001238 0.05001411 0.05001291 0.05001328 0.05001241
0.05000743
0.0500084
0.05000773 0.05000793
0.05000745
0.05002095 0.0500223 0.05002143 0.05002166 0.05002099
Table 2: MSE of and when sample size
[2] estimators
0.01998659 0.0199871 0.01998677 0.01998686 0.01998661
0.01998659 0.0199871 0.01998677 0.01998686 0.01998661
0.01998597 0.01998643 0.01998613 0.01998622 0.01998598
0.01998436 0.01998469 0.01998447 0.01998453 0.01998437
0.01998952 0.01999019 0.01998975 0.01998988 0.01998953
Proposed estimators
0.01998533 0.01998599 0.01998554 0.01998568 0.01998535
0.01998533 0.01998599 0.01998554 0.01998568 0.01998535
0.01998493 0.01998562 0.01998514 0.01998529 0.01998495
0.01998296 0.01998334 0.01998308 0.01998315 0.01998296
0.01998836 0.0199889 0.01998855 0.01998864 0.01998837
Table 3: MSE of and when sample size
[2] estimators
0.009976613 0.009976867 0.0099767 0.009976749 0.009976619
0.009976613 0.009976867 0.0099767 0.009976749 0.009976619
0.009976301 0.009976534 0.00997638 0.009976425 0.009976307
0.009975498 0.009975662 0.009975553 0.009975585 0.009975502
0.009978072 0.009978409 0.00997819 0.009978253 0.00997808
Proposed estimators
0.009975983 0.009976314 0.00997609 0.009976156 0.009975991
0.009975983 0.009976314 0.00997609 0.009976156 0.009975991
0.009975785 0.00997613 0.00997589 0.009975965 0.009975792
0.009975498 0.009975662 0.009975553 0.009975585 0.009975502
0.009977495 0.009977763 0.00997759 0.009977636 0.009977502
Tables 1, 2 and 3 showed MSE of proposed and [2] estimators for sample sizes 20, 50 and 100 respectively. The results of the tables revealed that the proposed estimators have minimum MSE compared to their counterparts in [2] under Huber-M, Tukey-M, Hampel-M, LTS and LAD robust estimators
Conclusion
From the results of empirical results, it is revealed that the proposed estimators are more efficient than estimators suggested by [2].
Appendix A
(A1)
Express (A1) in term of and defined in (2.2) , we have
(A2)
(A3)
where
Simplify (A3) to second degree approximation, we have
(A4)
(A5)
Simplify (A5), square and take expectation, we have
(A6)
Apply the results of (2.2), we obtain the MSE of to second degree approximation as
(A7) where
References
[1] C. Kad?lar , M. Candan, and H. ??ng?. Ratio estimators using robust regression. Hacettepe Journal of Mathematics and Statistics36 (2007):18188.
[2] T. Zaman, H. Bulut. Modified ratio estimators using robust regression methods. Commun. Stat.-Theory Methods (2018).[3] T. Zaman. Improved modified ratio estimators using robust regression methods. Appl. Math. Comput, 348 (2019), 627-631.
[4] C. Kadilar, H. ??ng?. Ratio estimators in simple random sampling. Appl. Math. Comput. 151(2004):893902.
[5] J. W. Tukey. Exploratorydataanalysis.MA:Addison-Wesley, (1977).
[6] F. R. Hampel. A general qualitative definition of robustness. The Annals of Math. Stat. 42(1971):1887-1896.
[7] V. J. Yohai. High breakdown-point and high efficiency robust estimates for regression. The Annals of Stat.,15(1987):642656.
[8] P. J. Rousseeuw, A.M. Leroy. Robust regression and outlier detection. Wiley Series in Probability and Mathematical Statistics. NewYork: Wiley (1987).
[9] H. Nadia, and A. A. Mohammad . Model of robust regression with parametric and nonparametric methods. Mathematical Theory and Modeling 3 (2013): 27-39.