With the inclusion of categorical predictors, statisticians noted that analysis of variance models with a single error term and similar models are special cases of regression, and the two methods (i.e., regression and analysis of variance) are seen as different facets of a general linear model.Ī second important expansion of regression allowed for different types of outcome variables such as binary, ordinal, nominal, and count variables. An initial and important expansion of the model allowed for multiple predictors and multiple types of predictors, including continuous, binary, and categorical. However, regression has been extended in numerous ways. If regression only summarized associations between two continuous variables, it would be a very limited tool for social scientists. Although leastsquares regression estimates and p-values based on frequentist inference are the most common default settings within statistical packages, they are not the only methods of estimation and inference available, nor are they inherently aspects of regression. Historically, regression has used least-squares estimation (i.e., coefficient values are found that minimize the squared errors ε i) and frequentist inference (i.e., variability of sample regression coefficients are examined within theoretical sampling distributions and summarized by p-values or confidence intervals). This definition makes no reference to estimation (i.e., how are the regression coefficients determined?) or statistical inference (i.e., how well do the sample coefficients reflect the population from which they were selected?). As such, it is a summary of the relationship between the two variables, which leads directly to a definition of regression: “ as far as possible with the available data how the conditional distribution of the response … varies across subpopulations determined by the possible values of the predictor or predictors ” (Cook and Weisberg 1999, p. The regression line describes the conditional mean of the outcome at specific values of the predictor. Most data points (i.e., child and parent height pairs) will not lie directly on the simple regression line the scatter of the data points around the regression line is captured by the residual error term ε i, which is the vertical displacement of each datum point from the regression line. The intercept and slope define the regression line, which describes a linear relationship between children ’s and parents ’ heights. The y -intercept estimates the average value of children ’s height when parent ’s height equals 0, and the slope coefficient estimates the increase in average children ’s height for a 1-inch increase in parent ’s height, assuming height is measured in inches. The two regression coefficients (i.e., β 0 and β 1) represent the y -intercept and slope. The simple regression equation above is identical with the mathematical equation for a straight line, often expressed as y = mx + b. Each family provides values for child ’s height (i.e., Child i) and parent ’s height (i.e., Parent i). The foundation of regression is the regression equation for Galton ’s study of height, the equation might be: Child i = β 0 + β 1 (Parent i) + ∊ i. Thus, the term regression described a specific finding (i.e., relationship between parents ’ and children ’s heights) but quickly became attached to the statistical method. ![]() He described the relationship between parents ’ and children ’s heights using a type of regression line and termed the phenomenon regression to mediocrity. Galton noted that tall parents tended to have somewhat shorter children, and vice versa. The original application of regression was Sir Francis Galton ’s study of the heights of parents and children in the late 1800s. ![]() The flexibility of regression and its many extensions make it the primary statistical tool that social scientists use to model their substantive hypotheses with empirical data. At its heart, regression describes systematic relationships between one or more predictor variables with (typically) one outcome. Moreover, many contemporary statistical methods derive from the linear regression model. Regression is a broad class of statistical models that is the foundation of data analysis and inference in the social sciences. REGRESSION AS A TOOL IN SOCIAL SCIENCE RESEARCH
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |