M E M O R A N D U M

To:        Members of CARE

From:     Sangy Panicker, APA Staff Liaison

                     Task Force on Statistical Inference (TFSI)

Date:      March 17, 1997

Subject:  Report of the first meeting of TFSI

     Please find attached a copy of the initial report of APA's Task Force on Statistical Inference. The task force is interested in collecting reactions to this initial report from a wide variety of interested parties, including liaison groups to the task force, journal editors, other researchers and educators. This feedback from the various constituencies will be used to set the agenda for, the next meeting of the task force, as well as to ascertain the nature and format in which the task force's final recommendations will be presented. Please forward your coments to me no later than May 30, 1997.
     If you have any further questions, please do not hesitate to contact me by phone at (202) 336-5940, by fax at (202) 336-5953, or e-mail at sxp.apa@email.apa.org

*************************************************

Initial Report

Task Force on Statistical Inference

Board of Scientific Affairs

American Psychological Association

_________________________________________________________

     This report is the result of the initial two-day meeting of the Task Force on Statistical Inference held December 14-15, 1996 at the Newark Airport. The task force welcomes written reactions to this initial report from all interested parties, input that will inform our future deliberations. The deadline for receiving reactions to this initial report is May 30, 1997. Written responses should be sent to: Sangeeta Panicker, Liaison to the Task Force on Statistical Inference, APA - Science Directorate, 750 First Street (NE), Washington, DC 20002, e-mail: sxp.apa@email.apa.org

Members present:

Robert Rosenthal, PhD (Co-Chair)
Jacob Cohen, PhD (Co-Chair)
Leona S. Aiken, PhD
Mark Appelbaum, PhD
Gwyneth M. Boodoo, PhD
David A. Kenny, PhD
Helena C. Kraemer, PhD
Donald B. Rubin, PhD
Howard Wainer, PhD*
Leland Wilkinson, PhD

Members absent:

Robert Abelson, PhD (Co-Chair)
Bruce Thompson, PhD

APA Staff:

Christine R. Hartel, PhD*
Sangeeta Panicker, Liaison

* denotes partial attendance

     This report reflects the initial deliberations of the task force and the first of our recommendations to the Board of Scientific Affairs (BSA). We address two issues. First, we consider the issue that brought the task force into existence, namely the role of null hypothesis significance testing in psychological research. Second, we consider the modification of current practice in the quantitative treatment of data in the science of psychology.

Null Hypothesis Significance Testing

     Many have assumed the charge to this task force to be narrowly focused on the issue of null hypothesis significance testing and particularly the use of the p value. The charge this task force has accepted, however, is broader. It is the view of the task force that there are many ways of using statistical methods to help us understand the phenomena we are studying (e.g., Bayesian methods, graphical and exploratory data analysis methods, hypothesis testing strategies). We endorse a policy of inclusiveness that allows any procedure that appropriately sheds light on the phenomenon of interest to be included in the arsenal of the research scientist. In this spirit, the task force does not support any action that could be interpreted as banning the use of null hypothesis significance testing or p values in psychological research and publication.

The Broader Topics of Recommendations

     Four broad topics of the quantitative treatment of research data in which the task force believes major improvements in current practice could and should be made were identified at this meeting.. These topics are: (1) approaches to enhance the quality of data usage and to protect against potential misinterpretation of quantitative results, (2) the need for theory-generating studies, (3) the use of minimally sufficient designs and analytic strategies, and (4)issues with computerized data analyses.

(1) Approaches to enhance the quality of data usage and to protect against potential misrepresentation of quantitative results

     Of these four topics, the first has so far received the greatest attention from the task force. With respect to this topic the task force has identified three issues that are particularly germane to current practice.

(a) we recommend that more extensive descriptions of the data be provided to reviewers and readers. This should include means, standard deviations, sample sizes, five-point summaries,box-and-whisker plots, other graphics, and descriptions related to missing data as appropriate.

(b) enhanced characterization of the results of analyses (beyond simple p value statements) to include both direction and size of effect (e.g., mean difference, regression and correlation coefficients, odds-ratios, more complex effect size indicators) and their confidence intervals should be provided routinely as part of the presentation. These characterizations should be reported in the most interpretable metric (e.g., the expected unit change in the criterion for a unit change in the predictor, Cohen's d).

(c) the use of techniques to assure that the reported results are not produced by anomalies in the data (e.g., outliers, points of high influence, non-random missing data, selection, attrition problems) should be a standard component of all analyses.

(2) The need for theory-generating studies

     In its recent history, psychology has been dominated by the hypothetico-deductive approach. It is the view of the task force that researchers have too often been forced into the premature formulation of theoretical models in order to have their work funded or published. The premature formulation of theoretical models has often led to the worst problems seen in the use of null hypothesis testing, such as misrepresentation of exploratory results as confirmatory studies, or poor design of confirmatory studies in the absence of necessary exploratory results. We propose that the field become more open to well formulated and well conducted exploratory studies with the appropriate quantitative treatment of their results,thereby enhancing the quality and utility of future theory generation and assessment.

(3) The use of minimally sufficient designs and analytic strategies

     The wide array of quantitative techniques and the vast number of designs available to address research questions leave the researcher with the non-trivial task of matching analysis and design to the research question. Many forces (including reviewers of grants and papers, journal editors, and dissertation advisors) compel researchers to select increasingly complex ("state-of-the-art," "cutting edge," etc.)analytic and design strategies. Sometimes such complex designs and analytic strategies are necessary to address research questions effectively; it is also true that simpler approaches can provide elegant answers to important questions. It is the recommendation of the task force that the principle of parsimony be applied to the selection of designs and analyses. The minimally sufficient design and analysis is typically to be preferred because:
(a) it is often based on the fewest and least restrictive assumptions,
(b) its use is less prone to errors of application, and errors are more easily recognized, and
(c) its results are easier to communicate--to both the scientific and lay communities. This is not to say that new advances in both design and analysis are not needed, but simply that newer is not necessarily better and that more complex is not necessarily preferable.

(4) Issues with computerized data analysis

     Elegant and sophisticated computer programs have increased our ability to analyze data with substantially greater sophistication than was possible only a short time ago. The ease of access to state-of-the-art statistical analysis packages, however, has not universally advanced our science. Common misuses of computerized data analysis include:
(a) reporting statistics without understanding how they are computed or what they mean,
(b) relying on results without regard to their reasonableness, or without verification by independent computation, and
(c) reporting results to greater precision than supported by the data, simply because they are printed by the program. The task force encourages efforts to avoid the sanctification of computerized data analysis. Computer programs have placed a much greater demand on researchers to understand and control their analysis and design choices.


Please take a moment to tell me your thoughts on this matter.