303 11 Siegel Nonparametric statistics for the behavioral sciences

Non'Parametric Statistics FOR THE BEHAVIORAL SIDNEY SCIENCES SIEGEL Associate Profer' of Statisticsand Socia/Psycho...

0 downloads 153 Views 15MB Size
Non'Parametric Statistics FOR THE

BEHAVIORAL

SIDNEY

SCIENCES

SIEGEL

Associate Profer' of Statisticsand Socia/Psychology The Pennsylvania State Unkelsity

McGRAW-HILL BOOK COMPANY, INC. New York

Toronto 1956

London

NONPARAMETRICSTATISTICS:FoRTREBEHhvIoRhL SciExcEs Copyright) 1956 by the McGraw-HillHookCompany, Inc. Printedin the

UnitedStatesof America. All rightsreserved. Thisbook,or partsthereof,maynotbereproduced in anyformwithoutpermission of thepublishers. Library of Congreee CalalogCardNumber56-8185 IV

THE MhPLE PRESSCOMPhNT,YORK, Ph,

To Jay

PREFACE

I believethat the nonparametric techniques of hypothesis testingare uniquely suited to the data of the behavioral sciences. The two alterna-

tive nameswhicharefrequentlygivento thesetestssuggest two reasons for their suitability. The testsareoftencalled"distribution-free,"one of their primary merits being that they do not assumethat the scores under analysiswere drawn from a population distributed in a certain

way, e.g., from a normally distributed population. Alternatively, many of these tests are identified as "ranking tests," and this title

suggests their otherprincipalmerit: nonparametric techniquesmay be usedwith scores whicharenot exactin anynumericalsense, but whichin effect are simply ranks. A third advantageof thesetechniques,of course,is their computationalsimplicity. Many believethat researchers and studentsin the behavioralsciencesneedto spend more time and

reflectionin the carefulformulationof their researchproblemsand in collecting precise and relevant data. Perhaps they will turn more

attentionto thesepursuitsif they are relievedof the necessityof computing statisticswhich are complicatedand time-consuming.A final advantage of the nonparametric tests is their usefulness with small

samples, a featurewhichshouldbe helpfulto the researcher collecting pilot study data and to the researcher whosesamplesmust be small becauseof their very nature (e.g.,samplesof personswith a rareform of mental illness, or samplesof cultures).

To date, no sourceis availablewhich presentsthe nonparametric techniques in usable form and in terms which are familiar to the behav-

ioral scientist..The techniquesare presentedin various mathematics and statistics publications. Most behavioral scientists do not have the

mathematicalsophistication requiredfor consultingthesesources.In

addition,certainwritershavepresented summaries of thesetechniques in articlesaddressed to socialscientists. NotablesamongtheseareBlum andFattu (1954),Moses(1952a),MostellerandBush(1954),andSmith (1953). Moreover, some of the newer texts on statistics for social scientistshave containedchapterson nonparametricmethods. These

includethe textsby Edwards(1954),McNemar(1955),andWalkerand

Lev (1953).Valuableas thesesources are,they havetypicallyeither

Vill

PREFACE

been highly selective in the techniques presented or have not included the tables of significance values which are used in the application of the various tests. Therefore I have felt that a text o» the nonparametric

methods would be a desirable addition to the literature formed by the sources mentioned.

In this book I have presented the tests according to the research designfor which eachis suited. In discussingeachtest, I have attempted to indicate its "function,"

i.e., to indicate the sort of data to which it

is applicable, to convey some notion of the rationale or proof underlying the test, to explain its computation, to give examplesof its application in behavioral scientific research,and to comparethe test to its parametric equivalent, if any, and to any nonparametric tests of similar function.

The reader may be surprised at the amount of spacegiven to examples of the use of these tests, and even astonished at the repetitiousness which

these examples introduce. I may justify this allocation of space by pointing out that (a) the examples help to teach the computation of the test, (b) the examples illustrate the application of the test to research problems in the behavioral sciences,and (c) the use of the same six steps in every hypothesis test demonstrates that identical logic underlies each of the many statistical techniques, a fact which is not well understood by many researchers. Since I have tried to present all the raw data for each of the examples,

I was not able to draw thesefrom a catholic group of sources. Research publications typically do not present raw data, and therefore I was compelled to draw upon a rather parochial group of sourcesfor most examples those sources from which raw data were readily available. The reader will understand that this is an apology for the frequency with which I have presentedin the examplesmy own researchand that of my immediate colleagues. SometimesI have not found appropriate data to illustrate

the use of a test and therefore

have "concocted"

data for the

purpose.

In writing this book, I have become acutely aware of the important influence which various teachers and colleagues have exercised upon my

thinking.

Professor Quinn McYemar gave me fundamental trai»i»g in

statistical inference and first introduced me to the importance of the

assumptions underlying various statistical tests. Professor Lincoln Moseshas enriched my understanding of statistics, and it was he who first interested me in the literature

of nonparametric statistics.

My study

with ProfessorGeorge Polya yielded exciting insights in probability theory. ProfessorsKenneth J. Arrow, Albert H. Bowker, DouglasH. Lawrence,and the late J. C. C. McKinsey have each contributed significantly to my understandingof statistics and experimentaldesign. My comprehension of measurementtheory wasdeepenedby my research

PREFACE

collaboration with Professors Donald Davidson and Patrick Suppes. This book has benefited enormously from the stimulating and detailed

suggestions and criticisms which Professors James B. Bartoo, Quinn McKemar, and Lincoln Moses gave me after each had read the manu-

script.

I am greatly indebted to each of them for their valuable gifts

of time and knowledge.

I am also grateful to Professors John F. Hall

and Robert E. Stover, who encouraged my undertaking to write this book and who contributed helpful critical comments on some of the

chapters. Of course,none of these personsis in any way responsiblefor the faults which remain; these are entirely my responsibility, and I should be grateful if any readers who detect errors and obscurities would call my attention to them.

Much of the usefulnessof this book is due to the generosity of the many authors and publishers who have kindly permitted me to adapt or reproduce tables and other material originally presented by them. I have mentioned each source where the materials appear, and I also wish to mention here my gratitude to Donovan Auble, Irvin L. Child, Frieda Swed Cohn, Churchill Eisenhart, D. J. Finney, Milton Friedman, Leo A. Goodman, M. G. Kendall, William Kruskal, Joseph Lev, Henry B. Mann, Frank J. Massey, Jr., Edwin G. Olds, George W. Snedecor, Helen M. Walker, W. Allen Wallis, John E. Walsh, John W. M. Whiting, D. R. Whitney, and Frank Wilcoxon, and to the Institute of Mathematical Statistics, the American Statistical Association, Biometrika, the American Psychological Association, Iowa State College Press, Yale University Press, the Institute of Educational Research at Indiana University, the American Cyanamid Company, Charles Griffin & Co., Ltd., John Wiley 4 Sons, Inc., and Henry Holt and Company, Inc. I am indebted to Professor Sir Ronald A. Fisher, Cambridge, to Dr. Frank Yates, Rothamsted, and to Messrs. Oliver and Boyd, Ltd., Edinburgh, for permission to reprint Tables No. III and IV from their book

Statistical Tablesfor Biological, Agricultural, and Medical Research. My greatest personal indebtednessis to my wife, Dr. Alberta Engvall Siegel, without whose help this book could not have been written.

She

has worked closely with me in every phase of its planning and writing. I know it has benefited not only from her knowledge of the behavioral sciencesbut also from her careful editing, which has greatly enhancedany expository merits the book may have. SIDNEY

SIEGEL

CONTENTS

PREFhcE

~~

~~

~~

~~

~~

~~

~~

GLOSShRY OFSYMBOLS....,...,,,

~

vn

~

CHhPTER I.

INTRODUCTION

CHhPTER2.

THE UsE oF SThTIsTIchL TEsTs IN RESEhROH .

i. The Null Hypothesis .

77

ii. The Choice of the Statistical Test

iii. The Level of Significanceand the SampleSise

8

iv. The Sampling Distribution v. The Region of Rejection .

11

vi. The Decision

14

Example

14

CHhPTER3.

13

CHOOSINGhN APPROPRIhTESThTISTIChLTEST .

The Statistical

Model.

18 18

Power-Efficiency

20

Measurement

21

Parametric and Nonparametric Statistical Tests

30

CHhPTER 4.

THE ONE-shMPLE ChsE

The Binomial

Test.

The z' One-sampleTest . The Kolmogorov-Smirnov One~pie The On~ample

36 42

Test

Runs Test

Discussion CHhPTER 5.

47 52 59

THE ChsE QF Two RELhTED ShMPLES

The McNemar Test for the Significance of Changes . The SignTest .

63

The WilcoxonMatched-pairs Signed-ranks Test

75

The Walsh Test

83

The Randomisation Test for Matched Pairs

88

Discussion

92

CHhPTER6.

.

THE ChsE oF Two INDEPENDENTShMPLES.

The Fisher Exact Probability Test .

68

95 96

The x~Test for Two IndependentSamples

104

The Median Test .

111

CONTENTS

The Mann-Whitney U Teat . The Kolmogorov-Smirnov Two-sample Test

116

The Wsld-Wolfowitz Runs Test. The Moses Test of Extreme Reactions

136

.

127 145

The Randomization Test for Two Independent Samples.

152

Discussion

156

CHAPTER 7.

THE ChsE QF k RELATED SAMPLES

159

The Cochran Q Test

161

The Friedmsn Two-way Analysis of Variance by Ranks.

166

Discussion

173

CHAPTER 8.

THE CASE OF k INDEPENDENT SAMPLES .

174

The xz Test for k Independent Samples

175

The Extension

179

of the Median

The Kruskal-Wallis

Test

.

One-way Analysis of Variance by Ranks

CHAPTER9.

184 193

Discussion

MEASUREsoF CQRRELATIQNANDTHEIR TEsTs QF SIGNIFIchNGE

195

The Contingency CoefBcient: C .

196

The SpearmanRank CorrelationCoefficient:rz

202

The Kendall

213

Rank

Correlation

CoefBcient:

r

The Kendsll Partial Rank CorrelationCoefBcient:r,,

223

The Kendsll CoefBcient of Concordance: W

229

Discussion

238

REFERENCES.....

241

~

245

APPENDIX

A. Table of Probabilities

Associated with Values as Extreme as Observed

Values of z in the Normal Distribution. B. Table of Critical Values of f .

247

C. Table of Critical Values oi Chi Square .

249

D. Table of Probabilities Values

of z in

the

248

Associated with Values aa Small as Observed

Binomial

Test

250

E. Table of Critical Values of D in the Kolmogorov-Smirnov One~mple Test

251

.

F. Table of Critical

Values of r in the Runs Test.

252

G. Table of Critical Values of T in the Wilcoxon Matched-pairs Signedranks Test

.

H. Table of Critical Values for the Walah Test

I. Table of Critical Values of D (or C) in the Fisher Teat

254 255 256

J. Table of Probabilities Associated with Values as Small as Observed Values of U in the Mann-Whitney Test

K. Table of Critical Values of U in the Mann-Whitney Test L. Table of Critical Values of Ko in the Kolmogorov-Smirnov Two~mple Test (Small Samples) . M. Table of Critical Values of D in the Kolmogorov-Smirnov Two~mple

Test (LargeSamples: Two-tailedTest).

278 279

N. Table of Probabilities Associated with Values aa Large as Observed

Valuesof x,~in the FriedmanTwo-wayAnalysisof Varianceby Ranks.

280

CONTENTS

Xlll

O. Table of ProbabilitiesAssociatedwith Valuesas Large as Observed Valuesof H in the Kruskal-%allis One-wayAnalysisof Varianceby Ranks...................

282

P. Table of Critical Values of ra, the Spearman Rank Correlation Coefficient....................

284

Q. Table of ProbabilitiesAssociatedwith Valuesas Large as Observed Values of $ in the Kendall Rank Correlation Coefficient..... R. Table of Critical Values of e in the Kendall Coefficient of Concordance.

285 286

S. Table of Factorials...............

287

T. Table of Binomial Coefficients............

288

U. Table of Squaresand Square Roots...........

289

I NDEX s ~

~~

~~

~~

303

GLOSSARY

OF SYMBOLS

Upperleft-hand cell in a 2 X 2 table; numberof casesobservedin that cell.

Alpha. Level of significance ~ probability of a Type I error.

Upperright-handcell in a 2 X 2 table; numberof casesobservedin that cell.

Beta.

Power of the test ~ probabiTity of a Type II error.

Lower left-hand cell in a 2 X 2 table; number of casesobservedin that cell.

C

Contingency coefficient.

Chi square A random variable which follows the chiwquare distribution, certain values of which are shown in Table C of the Appendix. A statistic whosevalue is computed from observeddata. X The statistic in the Friedman two-way analysis of variance by ranks. x' A difference score, used in the caseof matched pairs, obtained for any

pair by subtractingthe scoreof onememberfrom that of the other. Degreesof freedom. Lower right-hand cell in a 2 X 2 table; number of casesobservedin that cell.

The maximum difference between the two cumulative distributions in the

Kolmogorov-smirnow test. Under Ho, the expected number of casesin the ith row and the jth column in a x' test. JP

Eo(x)

Frequency, i.e., number of cases. The F test: the parametric analysis of variance. Under Ho, the proportion of casesin the population whose scores are

equal to or less than X. Smirnov

This is a statistic in the Kolmogorov-

test.

In the Mosestest, the amount by which an observedvalue of ei exceeds iic 2h, where iic 2h is the minimum span of the ranks of the control cases,

In the Cochran Q test, the total number of "successes"in the jth column (sample).

In the Moses test, the predetermined number of extreme control ranks which are dropped from each end of the span of control ranks before si is determined.

The statistic used in the Kruskal-Wallis one-way analysis of variance by ranks.

Ho Hi

The null hypothesis.

The alternative hypothesis, the operational statement of the research hypothesis. A variable subscript, usually denoting rows.

GLOSSARY

OF

SYMBOLS

A variable subscript, usually denoting columns.

In the Kolmogorov-Smirnov test, the number of observationswhich are equal to or less than X. In the Kolmogorov-Smirnov test, the numeratorof D. In the CochranQ test, the total numberof "successes" in the ith row. Mu.

The population mean.

The populationmean under Ho. The populationmean under H>. The numberof independentlydrawn casesin a singlesample. The total numberof independentlydrawn casesusedin a statisticaltest. The observed number of casesin the ith row and the jth column in a x' test

Probabilityassociated with the occurrence underHoof a valueasextreme as or more extreme than the observed value.

In the binomialtest, the proportionof "successes." In the binomial test, 1 P. The statistic used in the Cochran test. The number of runs.

The Pearsonproduct-momentcorrelationcoefficient, The number of rows in a k X r table. The sum of the ranks in the jth column or sample.

The Spearman rank correlation coefficient. The mean of several rs's.

In the Kendall W, the sum of the squaresof the deviationsof the Rt from the mean value of R;.

In the Mosestest, the spanor rangeof the ranksof the controlcases.

In the Mosestest,thespanorrangeof theranksof thecontrolcases after A,caseshave been droppedfrom eachextremeof that range. A statistic in the Kendall T.

In theKolmogorov-Smirnov test,theobserved cumulative stepfunction of s randomsampleof N observations.

Sigma.Thestandard deviation of thepopulation.Whena subscript is given,thestandard errorof a sampling distribution, for example, ~v ~ the standarderrorof the samplingdistributionof U. The variance of the population. Summation

of.

Student's t test, s parametric test. The number of observations in any tied group.

In the Wilcoxontest,the smallerof the sumsof like-signed rants. A correction factor for ties.

Tau.

The Kendall rank correlation coefficient.

The Kendsll partial rank correlationcoefficient. The statistic in the Mann-Whitney test.

U = n>n>U', a transformationin the Mann-Whitneytest. The Kendall coefficient of concordance,

In the binomial test, the numberof casesin oneof the groups. Any observed score.

The meanof a sampleof observations. Deviation of the observedvalue from po when e ~ 1. s is normally distributed. Probabilitiesassociatedunder Ho with the occurrence of valuesssextremeasvariouss'ssre givenin TableA ofthe Appendix.

GLOSSARY

(')

OF

SYMBOLS

/aX Thebinomial coefficient (l,)

XV11

b! a!' !, ! TableT oftheAppen-

dix gives binomial coefficients for N from 1 to 20.

Factorial.

N! = N(N

1)(N 2)

5! = (5)(4)(3)(2)(1)

0! IX Yl

~ ~ 1. For example, 120

1. Tables 8 of the Appendix gives factorials for N from 1 to 20.

The absolute value of the difference between X and

Y.

That is, the

numerical value of the difference regardless of sign. For example, i5 3} i3 5i 2. X>Y XY X) may be accepted. The alternative hypothesis is the operational statement of the experimenter's research hypothesis. The researchhypothesisis the prediction derived from the theory under test. When we want to make a decisionabout differences,we test Ho against Hl. Hl constitutes the assertion that is acceptedif Ho is rejected. suppose a certain social scientific theory would lead us to predict that two specified groups of people differ in the amount of time they spend

in readingnewspapers.This prediction would be our researchhypothesis. Confirmation of that prediction would lend support to the social

scientifictheoryfrom whichit wasderived. To test this researchhypoth-

esis,we state it in operationalform as the alternativehypothesis,H,. Hl would be that pl w y2,that is, that the meanamountof time spentin reading newspapers by the membersof the two populationsis unequal.

Ho wouldbe that yl = sc2,that is, that the meanamountof time spent in reading newspapersby the membersof the two populations is the same,

If the data permit us to rejectHo,then H>canbeaccepted,and this would support the researchhypothesisand its underlyingtheory. The nature of the research hypothesis determines how H> should be

stated. If the researchhypothesissimply states that two groups will

differ with respectto means,thenHl is that y>W ym.But if the theory predicts the directionof the difference,i.e., that one specifiedgroup will have a largermeanthan the other, then H >may be either that pl ) pg or

that p, < p~(where) means"greaterthan" and < means"lessthan"). ii. THE CHOICE OF THE STATISTICAL TEST

Thefieldof statistics hasdeveloped to theextentthatwenowhave, for almostall research designs, alternativestatisticaltestswhichmight beusedin orderto coIneto a decision abouta hypothesis.Havingalter-

8

THE VSE OF STATISTICAL TESTS IN RESEARCH

native tests, we needsomerational basisfor choosingamongthem. Since this book is concernedwith nonparametric statistics, the choice among (parametric and nonparametric) statistics is one of its central topics. Therefore the discussionof this point is reservedfor a separatechapter; Chap. 3 gives sn extended discussionof the basesfor choosingamongthe various tests applicable to s given researchdesign. m. THE

LEVEL

OF SIGNIFICANCE

AND

THE

SAMPLE

SIZE

When the null hypothesis and the alternative hypothesis have been stated, snd when the statistical test appropriate to the researchhas been selected,the next step is to specify a level of significance (a) and to select a sample size (N), In brief, this is our decision-making procedure: In advanceof the data collection, we specify the set of all possible samples that could occur when Hp is true. From these, we specify a subset of possible samples which are so extreme that the probability is very small, if Hp is true, that the sample we actually observe will be among them. If in our research we then observea samplewhich was included in that subset,we reject H p. Stated differently, our procedureis to reject H pin favor of H> if a statistical test yields a value whoseassociatedprobability of occurrenceunder H pis equal to or lessthan somesmall probability symbolized as a. That small probability is called the level of significance. Common values of a are .05 and .01. To repeat: if the probability associatedwith the occurrence under Hp, i.e., when the null hypothesis is true, of the particular value yielded by a statistical test is equal to or less than n, we reject H p and accept H>, the operational statement of the researchhypothesis.' ' In contemporarystatisticaldecisiontheory, the procedureof adheringrigidly to an arbitrary level of significance,say .05 or .01, hasbeenrejectedin favor of the procedureof making decisionsin terms of lossfunctions,utilizing suchprinciplesas the minimax principle (the principleof minimizingthe maximumloss). For a discussion of this approach,the reader may turn to Blackwell and Girshick (1954), Savage (1954), or Wald (1950). Although the desirability of such a technique for arriving at decisions is clear, its practicality in most research in the behavioral sciencesat

presentis dubious,becausewe lack the information which would be basicto the use of loss functions.

~,

A common practice, which reflects the notion that different investigators and readers

may hold different views as to the "losses" or "gains" involved in implementinga social scientific finding, is for the researchersimply to report the probability level associated with his finding, indicating that the null hypothesis may be rejected at that

level.

From the discussion of significance levels which is given in this book, the reader should not infer that the writer believes in a rigid or hard-and-fast approach to the

settingof significancelevels. Rather, it is for heuristicreasonsthat significancelevels

areemphasized; suchan exposition seems the bestmethodof clarifyingtherolewhich the information contained in the sampling distribution plays in the decision-making procedure.

THE LEVEL OP SICNIFIChNCE

AND THE SAMPLE SiZE

.9

It can be seen, then, that a gives the probability of mistakenly or falsely rejecting Ho. This interpretation of a will be amplified when the Type I error is discussed. Since the value of a enters into the determination of whether Ho is or is not rejected, the requirement of objectivity demands that a be set in advance of the collection

of the data.

The level at which the

researcherchoosesto set a should be determined by his estimate of the importance or possiblepractical significance of his findings. In a study of the possible therapeutic sects of brain surgery, for example, the researchermay well chooseto set a rather stringent level of significance, for the dangersof rejecting the null hypothesisimproperly (and therefore unjustifiably advocating or recommendinga drastic clinical technique) are great indeed. In reporting his findings, the researchershould indicate the actual probability level associatedwith his findings, so that the reader may use his own judgment in deciding whether or not the null hypothesis should be rejected. A researchermay decideto work at the .05 level, but a reader may refuse to accept any finding not significant at the .01, .005, or .001levels, while another reader may be interested in any finding which reaches,say, the .08 or .10 levels. The researchershould give his readers the information they require by reporting, if possible,the probability level actually associatedwith the finding.

There are tw'o types of errors which may be made in arriving at a decision about Ho, The first, the Type I error, is to reject Ho when in fact it is true. The second,the Type II error, is to accept Ho when in fact it is false.

The probability of committing a Type I error is given by u. The larger is a, the morelikely it is that Ho will be rejectedfalsely,i.e. t e more likely it is that the Type I error will be committed The Type II error Is usually representedby P. a and p will be used here to Indicate

boththetypeof errorandtheprobabilityof makingthat error. That is, p (Type I error) = a y (Type II error) = p

Ideally, the specificvaluesof both a and P wouldbe specifiedby the experimenter before he began his research. These values would deter-

mine the sizeof the sample(N) he would haveto draw for computingthe statistical

test he had chosen.

In practice,however,it is usualfor a andN to be specifiedin advance. Once a and N have been specified,P is determined. Inasmuch as there

is an inverserelationbetweenthe likelihoodof makingthe two typesof errors, a decreasein a will increaseP for argr given N. If we wish to reducethe possibility of both types of errors,we must increaseX.

It shouldbe clearthat in any statisticalinferencea dangerexistsof committingone of the two alternativetypes of errors,and that the

10

THE USE OF SThTISTIChL TESTS IN RESEhRCH

experimentershould reach somecompromisewhich optimizesthe balance between the probabilities of making the two errors. The various statistical tests oEer the possibility of diferent balances. It is in achieving this balancethat the notion of the power function of a statistical test is relevant.

The poler of a testis definedas the probability of rejectingHo whenit is in fact false. That is, Power = 1 probability

of Type II error = 1 P

The curves in Fig. 1 show that the probability of committing a Type II

error (P) decreases as the samplesize(N) increases,and thus that power increaseswith the size of N.

Figure 1 illustrates the increasein power

of the two-tailed test of the mean which comes with increasing sample

sizes:N = 4, 10,20, 50, and 100. Thesesamplesare takenfrom normal

populationswith varianceo'. The meanunderthe null hypothesis is symbolized here as yc.

Fro. 1. Power curves of the two-tailed test at a .05

with vary.'ng sample sizes.

Figure 1 alsoshowsthat whenHois true, i.e., whenthe true mean= ps,

the probabilityof rejectingHs = .05. This is asit shouldbe,inasmuch asa = .05,and a givesthe probabilityof rejectingHowhenit is in fact true.

From this discussionit is important that the readerunderstandthe

followingfivepoints,whichsummarize whatwehavesaidabouttheselection of the level of significanceand of the sample size:

1. The significance levela isthe probabilitythat a statisticaltestwill

yielda valueunderwhichthe null hypothesis will be rejectedwhenin fact it is true. That is, the significance levelindicatesthe probabilityof committing the Type I error.

2. P is the probabilitythat a statisticaltest will yield a valueunder

THE

8hMPLING

DI8TRIBUTION

which the null hypothesiswill be acceptedwhenin fact it is false. That is, p givesthe probability of committing the Type II error.

3. Thepowerof a test,1 P,tellsthe probabilityof rejectingthe null hypothesiswhenit is false (and thus shouldbe rejected). 4. Power is related to the nature of the statistical test chosen.'

5. Generally the power of a statistical test increaseswith an increase in N. iv. THE SAMPLING

DISTRIBUTION

When an experimenter has chosena certain statistical test to use with

his data, he must next determinewhat is the samplingdistribution of the test statistic.

The sampbng thstribution asa theoretical dIstribution.

It is that 4s-

tribution we would get if we took all possiblesamplesof the samesize

from the samepopulation,drawingeachrandomly. Anotherway of

sayingthis is to saythat the sampling distributionis the distribution, underHo,of aQpossible valuesthatsomestatistic(saythesample mean,

g') cantakewhenthatstatisticis computed fromrandomly drawnsamples of equal size.

The samplingdistribution of a statistic showsthe probabilitiesunder Hs associatedwith variouspossiblenumericalvaluesof the statistic. The probability "associatedwith" the occurrenceof a particular value of the

statisticunderHsisnottheexactprobabilityofjust thatvalue. Rather, "the probability associatedwith the occurrenceunder Hs" is hereused

to referto the probabilityof a particularvalueplusthe probabilitiesof

all moreextreme possible values.Thatis,the"associated probability" or "the probabilityassociated withtheoccurrence underHs"is theprobability of the occurrenceunderHo of a valueaseztremeasor moreextreme gharry the particular value of the test statistic. In this book weshall have

frequentoccasion to usetheabovephrases, andin eachcasetheyshall carry the meaning given above.

Supposewe wereinterestedin the probability that th ~ heads would

landup whenthree"fair" coinsweretossed simultaneously. pling distributionof the numberof headscouldbedrawnfrom the h t of all possibleresultsof tossingthreecoins,whichis givenin Table2.1. The total numberof possibleevents(possiblecombinations of H's and T'sheadsand tails) is eight,only oneof whichis the eventin whichweare

interested: thesimultaneous occurrence ofthreeH's. Thustheprobability of the occurrence underHs of threeheadson the tossof threecoins

HereHois theassertion thatthecoinsare"fair," whichmeans that i poweris alsorelatedto thenatureof Hi. If H~hasdirection, a one-tailed testis A one-tailed testismorepowerful thana two-tailed test. Thisshould beclear

from the de6nitionof power.

12

THE

VSE

OF

STATISTICAL

TESTS

IN

RESEARCH

TABLE 2.1. PossIBLE OUTcoMEs OF THE Toss OF THREE CoINs Outcomes 78 Coin

1

H

Coin

2

H

Coin 3

H

H TH T H TH H HH T T

H

T

T

T

T

T

TT

H

for eachcointhe probability of a headoccurringis equalto the probability of a tail occurring. Thus the sampling distribution of sll possible events has shown us the probability of the occurrence under Ha of the event with which we are concerned.

It is obvious that it would be essentially impossible for us to use this method of imagining all possible results in order to write down the sampling distributions for even moderately large samplesfrom large populations.

This being the case, we rely on the authority

of statements of

"proved" mathematical theorems. These theorems invariably involve assumptions,and in applying the theoremswe must keep the assumptions in mind. Usually these assumptionsconcernthe distribution of the population and/or the size of the sample. the central-limit

An example of such a theorem is

theorem.

When s variable is normally distributed, its distribution is completely characterized by the mean and the standard deviation. This being the case,we know, for example, that the probability that an observedvalue of such a variable will differ from the mean by more than 1.96 standard

deviations is lessthan .05. (The probabilities associatedwith any differencein standard deviations from the mean of a normally distributed variable are given in Table A of the Appendix,) Supposethen we want to know, before the sample is drawn, the prob-

ability associated with the occurrence of a particularvalueof X' (the arithmetic mean of the sample), i.e., the probability under Ho of the occurrenceof a value at least as large as a particular value of X, when the

sampleis randomlydrawnfrom somepopulationwhosemeany,andstandard deviation e we know.

One version of the central-limit theorem states

that:

If a variableis distributed with mean = pand standard deviation = o; and if random samplesof size X are drawn, then the meansof these

samples,the X's, will be approximatelynormallydistributedwith

meanII andstandard deviatione/~N for N sufficiently large.

THE

REGION

OP RESECTION

13

In otherwords,if N is suSciently large,weknowthat thesampling distributionof X (a) is approximately normal,(5) hasa meanequalto the population meanp, and(c)hasa standard deviationwhichis equal to the population standard deviationdividedby thesquarerootof the samplesize,that is, eg = ~/~N.

For example,supposewe know that in the populationof American

college students, some psychological attribute, asmeasured bysome test, is distributed with p = 100 and o = 16. Now we want to know the

probabilityof drawinga random sample of 64cases fromthispopulation andfindingthatthemeanscore in thatsample, X, ihaslargeas104. The central-limit theoremtells us that the samplingdistribution of X's of all

possible samples of size64 will be approximately normallydistributed and will havea meanequalto 100(p = 100)and a standarddeviation

equalto ~/~N = 16/~64= 2. Wecanseethat 104differsfrom100

by twostandard errors.' Reference to TableA reveals thattheprobabilIty associated withtheoccurrence underHsof a value aslargeassuchan observed valueof X, thatis,ofanX whichis at leasttwostandard errors above the mean (z > 2.0), is y (

.023.

It should beclearfromthisdiscussion andthisexample thatbyknowingthesampling distribution ofsome statisticweareableto makeprobability statementsaboutthe occurrence of certainnumericalvaluesof

thatstatistic.Thefollowing sections willshowhowweusesucha probability statement in making a decision about Hs. v. THB REGION OF REJECTION

Theregionof rejectionis a regionof thesampling distribution. The sampling distribution includes all possible valuesa teststatisticcantake

underHo,'theregionof rejection consists of a subset of thesepossible

values,andis defined sothat theprobabilityunderHoof theoccurrence of a teststatistichavinga valuewhichis in that subsetis a. In other

words,the regionof rejectionconsists of a setof possible valueswhich

aresoextreme thatwhen Hsistruetheprobability isverysmall(i.e.,the

probability is a) thatthesample weactuallyobserve will yielda value whichis among them. Theprobability associated withanyvaluein the region of rejectionis equalto or lessthan a.

Thelocation of theregionof rejection is affected by thenatureof Hi. If Hi indicates thepredicted directionof thedifference, thena one-tailed

testis called for. If Hi does notindicate thedirection ofthepredicted deference, thena two-tailed testis calledfor. One-tailed andtwo-tailed

t stsdier in thelocation (butnotin thesize)oftheregion ofrejection. Thatis in a one-tailed testtheregion ofrejection isentirely at oneend a ~estaudard deviatioa ofa sampling distributioa isusually oalled astasderd error.

14

THE USE OF STATISTICAL TESTS IN RESEARCH

(or tail) of the samplingdistribution. In a two-tailedtest,theregionof rejectionis locatedat both endsof the samplingdistribution. The sizeof the regionof rejectionis expressed by a, the levelof significance. If a = .05, then the sizeof the regionof rejectionis 5 per cent of the entire spaceincludedunder the curvein the samplingdistribution. One-tailed and two-tailed regions of

rejection for a = .05 are illustrated in Fig. 2. Observe that these two regions dier in location but not in total

size.

A. Oarkened areo shows one-toiled

regionof rejectionwhenac=.05

vL THE

DECISION

If the statistical test yields a value which is in the region of rejection, we reject Ha.

P

Z5

The reasoningbehind this decision

8 OaAened orna shows two-tailedprocess isverysimple.If theprob-

region afrejection when a =.05 ability associatedwith the occurFIo. 2. Regions of rejectionfor one- renceunder the null hypothesisof a tailedandtwo-tailedtests. particular value in the samplingdistribution

is very small, we may

explaintheactualoccurrence of thatvaluein twoways:first,wemayexplainit by decidingthat thenull hypothesis is false,or second, wemsy

explain it bydeciding thata rare andunlikely eventhasoccurred.In the decision process, wechoose thefirstof theseexplanations.Occasionally, of course,the second maybethe correctone. In fact,the probability

that the secondexplanationis the correctoneis givenby a, for rejecting He when in fact it is true is the Type I error.

Whentheprobabilityassociated withanobserved valueof a statistical testis equalto or lessthanthepreviously determined valueof a, weconcludethat Ha is false. Suchan observedvalueis called"significant."

He,thehypothesis undertest,isrejected whenever a "significant" result occurs.A "significant"valueis onewhoseassociated probabilityof occurrence underHa (asshownby the samplingdistribution)is equalto or less than a. EXAMPLE

In the discussionsof the variousnonparametricstatistical tests,many

examples of statistical decisions will begivenin this book. Herewe

shallgivejustoneexample of howa statistical decision is reached, to

illustratethe pointsmadein this chapter.

EXAMPLE

Suppose wesuspect a particularcoinof beingbiased.Oursuspicionis that thecoinis biasedto landwithheadup. To testthis suspicion (whichwe heremay dignifyby callingit a "research hypothesis"),we decideto tossthe coin 12 timesand to observe the frequency with which head occurs.

i. NttQHypothesis. Hs. p(H) = p(T) = z. That is, for this cointhereis nodifference between theprobabilityof theoccurrence of a head,that is, p(H), and the probabilityof the occurrence of a

tail, that is, p(T); thecoinis "fair." H~.p(H) ) p(T), ii. Stati8tical Test. The statisticaltest whichis appropriate to test this hypothesis is the binomialtest,whichis basedon the bino-

mialexpansion.(Thistestis presented fully in Chap.4.) iii. SignificanceLevel. In advancewe decideto use a = .01 as

our levelof significance. N = 12= the numberof independent tosses.

iv. Sampling Diatribution.Thesampling distribution whichgives the probabilityof obtainingx headsandN

x tails underthe null

hypothesis (thehypothesis thatthecoinisin factfair)is givenby Nt thebinomial distribution function: t N t x)! P Q"-';x =0, 1, s! (N

2,...,

N. Table2.2shows thesampling distribution of g, the

Tmm 2.2. ShMPLING DrsvmsUTroN os'x (Ntamsaos'Hmns) FQR 2" S~prxs

os Sr'

N=

12

Samplingdistribution

(Expected frequency of occurrence if Number of heade

2" ssmples of 12tosses weretsken)

12

1

11

12

10 9

66 220

87 65

495 792 924 792

4

495

3 12

220 66 12

0

1

Totsl

2" = 4 096

numberof heads.Thissampling distributionshowsthat the most likely outcome of tossinga coin12timesis to obtain6 heads and

6 tails.Obtaining 7 heads and5 tailsis somewhat lesslikelybut

THE

16

USE OF STATISTICAL

TESTS

IN

RESEARCH

still quite probable. But the occurrenceof 12 headson 12 tossesis very unlikely indeed. The occurrenceof 0 heads(12tails) is equally unlikely.

v. RejectumRegion. Since H~ has direction, a one-tailed test will be used, and thus the region of rejection is entirely at one end of the sampling distribution. The region of rejection consists of all values of x (number of heads)so large that the probability associatedwith their occurrence under Ho is equal to or less than a=

.01. 1

The probability of obtaining 12 headsis

= .00024. Since

1

p = .00024 is smallerthann = .01,clearlytheoccurrence of 12heads would be in the region of rejection.

The probability of obtaining either 11or 12 headsis 1 12

4,096

13

4,096

4,096

Sincep = .0032is smallerthan a = .01,the occurrenceof 11heads would also be in the region of rejection.

The probability of obtaining 10 heads(or a value moreextreme: 1 12

66

79

11or 12heads) is 4096+4096+4096 4096 p = .019is largerthan n

. Since

.01,the occurrence of 10headswould

not be in the regionof rejection. That is, if 10 or fewerheadsturn

up in our sampleof 12 tosseswe cannotrejectHp at the a = .01 level of significance.

vi. Deciaion. Supposein our sampleof tossesweobtain 11heads. The probability associatedwith an occurrenceasextremeasthis one

is p = .0032. Inasmuchasthis p is smallerthanour previouslyset level of significance(a = .01), our decisionis to reject H pin favor of H~. We concludethat the coin is biasedto land headup.

This chapterhasdiscussed the procedure for makinga decisionas to whethera particular hypothesis,as operationallydefined,shouldbe accepted or rejectedin termsof the informationyieldedby the research. Chapter3 completesthe generaldiscussion by goinginto the question

of howonemaychoose themostappropriate statisticaltestfor usewith one'sresearchdata. (This choiceis step ii in the procedureoutlined

above.) Thediscussion in Chap.3 clarifies the conditions underwhich the parametrictestsare optimumand indicatesthe conditions under whichnonparametric testsaremoreappropriate. The readerwhowishesto gaina morecomprehensive or fundamental

E3QlMPLE

17

understanding of the topicssummarized in bareoutlinein thepresent chaptermayreferto DixonandMassey(1950,chap.14)for anunusually clearintroductorydiscussion of powerfunctionsand of the two types of errors, and to Andersonand Bancroft (1952, chap. 11) or Mood

(1950,chap.12)for moreadvanceddiscussions of the theoryof testing hypotheses.

CHAPTER 3

CHOOSING AN APPROPRIATE

STATISTICAL

TEST

When alternative statistical tests are available for a given research

design,as is very often the case,it is necessaryto employsomerationale for choosingamong them. In Chap. 2 we presentedone criterion to use

in choosingamong alternative statistical tests: the criterion of power. In this chapter other criteria will be presented.

The reader will rememberthat the powerof a statistical analysisis partly a function of the statistical test employedin the analysis. A statistical test is a goodone if it has a small probability of rejectingHp when Hp is true, but a large probability of rejecting lip when II p is false. Suppose we find two statistical tests, A and B, which have the same probability of rejecting Hp when it is true. It might seemthat we should

simply selectthe onethat hasthe largerprobability of rejectingH pwhen it is false.

However, there are considerations other than power which enter into the choice of a statistical

test.

In this choice we must consider the

mannerin which the sampleof scoreswasdrawn, the nature of the population from which the sample was drawn, and the kind of measurement or scaling which was employed in the operational definitions of the variables involved, i.e., in the scores. All these matters enter iiito determin-

ing which statistical test is optimum or most appropriatefor analyzinga particular set of research data. THE

STATISTICAL

MODEL

When we have asserted the nature of the population and the tnanner of sampling, we have established a statistical model. Associated with

every statistical test is a model and a measurementrequirement;the test is valid under certain conditions, and the model and the measurement requirement specify those conditions. Sometimes we are able to test whether the conditions of a particular statistical model are met, but more often we have to assumethat they are met. Thus the conditions of the statistical model of a test are often called the "assumptions" of the test. All decisionsarrived at by the use of any statistical test must 18

THE

STATISTICAL

MODEL

19

carry with themthis qualification:"If the modelusedwascorrect,and if the measurement requirementwassatisfied,then...." It is obviousthat the feweror weakerare the assumptionsthat define a particular model, the less qualifying we need to do about our decision

arrivedat by the statisticaltest associated with that model. That is, the fewer or weaker are the assumptions,the more general are the conclusions.

However,the mostpowerfultestsarethosewhichhavethe strongest or mostextensiveassumptions.The parametrictests,for example,the t or F tests, have a variety of strong assumptionsunderlying their use.

Whenthoseassumptions arevalid, thesetestsare the mostlikely of all teststo rejectHo whenHo is false. That is, whenresearchdata may appropriately be analyzedby a parametric test, that test will be more

powerfulthan any otherin rejectingHo whenit is false. Notice,however, the requirementthat the researchdata must beappropriatefor the test. What constitutes such appropriateness? What are the conditions that are associatedwith the statistical model and the measurement requirement underlying, say, the t test? The conditions which must be

satisfied to makethet testthemostpowerfulone,andin factbeforeany confidence canbe placedin any probabilitystatementobtainedby the use of the t test, are at least these:

1. The observationsmust be independent. That is, the selectionof

any onecasefrom the populationfor inclusionin the samplemustnot bias the chancesof any other casefor inclusion, and the score which is

assigned to anycasemustnot biasthe scorewhichis assigned to any other

case.

2. The observations must be drawnfrom normallydistributedpopulations.

3. These populations musthavethesame variance (or,in special cases, they must have a known ratio of variances).

4. The variables involved must have been measuredin at teas~an interval scale,so that it is possibleto use the operationsof arithmetic (adding, dividing, finding means,etc.) on the scores.

In the caseof the analysisof variance(theF test), anothercondition is added to those already given:

5. The meansof thesenormaland homoscedastic populationsmust

belinearcombinations of eEects dueto columns and/orrows. That is, the eHects must

be additive.

All the above conditions [except (4), which states the measuremen requirement]are elementsof the parametricstatistical model. With the

possible exception of the assumption of homoscedasticity (equalvar iances),these conditionsare ordinarily not tested in the courseof the

performance of a statisticalanalysis.Rather,they are presumptions

20

CHOOSING

AN APPROPRIATE

STATISTICAL

TEST

which are accepted, and their truth or falsity determines the meaningfulness of the probability statement arrived at by the parametric test.

When we have reason to believe that these conditions

are met in the

data under analysis, then we should certainly choosea parametric statistical test, such as t or F, for analyzing those data. Such a choice is optimum becausethe parametric test will be most powerful for rejecting Ho when it should be rejected.

But what if these conditions are not met? What happens when the population is not normally distributed? What happens when the measurement is not so strong as an interval scale? What happenswhen the populations are not equal in variance? When the assumptions constituting

the statistical

model for a test

are in fact not met, or when the measurement is not of the required strength, then it is difBcult if not impossible to say what is really the power of the test. It is even diScult to estimate the extent to which a probability statement about the hypothesis in question is meaningful when that probability statement results from the unacceptableapplication of a test. Although some empirical evidence has been gathered to show that slight deviations in meeting the assumptionsunderlying para-

metric tests may not have radical effectson the obtained probability figure, there is as yet no general agreement as to what constitutes a "slight" deviation. POWER-EFFICIENCY

We have already noticed that the fewer or weaker are the assumptions that constitute a particular model, the more general are the conclusions

derived from the application of the statistical test associatedwith that

modelbut the lesspowerfulis the test of Ho. This assertionis generally true for any given samplesize. But it may not be true in the comparison of two statistical tests which are applied to two samplesof unequal size. That is, if N = 30 in both instances, test A may be more powerful than test B. But the sametest B may be more powerful with N = 30 than is test A with N = 20. In other words, we can avoid the dilemma of having to choosebetween power and generality by selecting a statistical test which has broad generality and then increasing its power to that of the most powerful test available by enlarging the size of the sample. The concept of pmner-egcieecyis concernedwith the amount of increase in sample size which is necessaryto make test B as powerful as test A. If test A is the most powerful known test of its type (when used with data which meet its conditions), and if test B is another test for the same researchdesign which is just as powerful with N~ casesas is test A with

21

N, cases, then

Power-efBciency of test B = (100) ' per cent Na

For example,if test B requiresa sampleof N = 25 casesto have the same

poweras test A has with N = 20 cases,then test B haspower-efBciency of (100)~ per cent, i.e., its power-efBciency is 80 per cent. A powerefBciencyof 80 per centmeansthat in orderto equatethe powerof test A and test B (whenall the conditionsof both testsare met, and whentest A is the morepowerful)weneedto draw 10casesfor test B for every8 cases drawn for test A.

Thus we canavoid havingto meetsomeof the assumptionsof the most

powerfultests,the parametrictests, without losingpowerby simply choosinga different test and drawing a larger N. In other words, by choosinganotherstatistical test with fewerassumptionsin its modeland thus with greatergeneralitythan the t and E tests,and by enlargingour N', we canavoid havingto makeassumptions2, 3, and 5 above,and still retain equivalent power to reject Ko.

Two other conditions,1 and 4 above,underlie parametric statistical tests. Assumption1, that the scoresare independentlydrawn from the

population,is anassumption whichunderlies all statisticaltests,parametric or nonparametric.But assumption 4, whichconcerns the strength of measurement required for parametric tests measurement must be at least in an interval scaleis not shared by all statistical tests. Different tests require measurementof different strengths. In order to under-

standthe measurement requirements of the variousstatisticaltests,the readershouldbeconversant with someof the basicnotionsin the theory of measurement.The discussionof measurement which occupiesthe next few pagesgives the required information. MEASUREMENT

~en a physicalscientisttalks aboutmeasurement, he usuallymeans the assigning of numbersto observations in sucha waythat the numbers areamenable to analysisby manipulationor operationaccordingto certain rules. This analysisby manipulation will reveal new information. about the objectsbeingmeasured. In other words,the relation between the things beingobservedand the numbersassignedto the observations

is so direct that by manipulatingthe numbersthe physicalscientist

obtainsnewinformation aboutthethings. Forexample, hemaydeterminehowmucha homogeneous massof materialwouldweighif cut in ha]f by simply dividing its weight by 2.

Thesocialscientist, takingphysics ashismodel, usuallyattempts to

22

CHOOSING AN APPROPRIATE STATISTICAL TEST

do likewise in his scoring or measurement of social variables. But in his scaling the social scientist very often overlooks a fundamental fact

in measurementtheory. He overlooks the fact that in order for him to be able to make certain operations with numbers that have beenassigned to observations,the structure of his method of mapping numbers(assigning scores) to observations must be isomorphic to some numerical structure

which includes these operations. If two systems are isomorphic, their structures are the same in the relations and operations they allow. For example, if a researcher collects data made up of numerical scores and then manipulates these scores by, say, adding and dividing (which

are necessaryoperations in finding means and standard deviations), he is assuming that the structure of his measurement is isomorphic to that

numerical structure known as arithmetic.

That is, he is assumingthat

he has attained a high level of measurement. The theory of measurement consists of a set of separate or distinct

theories, eachconcerninga distinct teuelof measurement. The operations allowable on a given set of scores are dependent on the level of measurement

achieved.

Here

we will

discuss four

levels of measurement

nominal, ordinal, interval, and ratio and will discussthe operationsand thus the statistics and statistical tests that are permitted with eachlevel. The Nominal or Classificatory Scale Definition.

Measurement

at its weakest level exists when numbers

or other symbols are used simply to classify an object, person, or characteristic.

When numbers or other symbols are used to identify

the

groups to which various objects belong, these numbers or symbols constitute a nominal or classificatory scale.

Examples. The psychiatric system of diagnostic groups constitutes a nominal scale. When a diagnostician identifies a person as "schizophrenic," "paranoid," "manic-depressive," or "psychoneurotic," he is using a symbol to represent the class of persons to which this person belongs, and thus he is using nominal scaling. The numbers on automobile license plates constitute a nominal scale. If the assignment of plate numbers is purely arbitrary, then each plated car is a member of a unique subclass. But if, as is common in the United States, a certain number or letter on the license plate indicates the county in which the car owner resides, then each subclass in the noininal scale

consists of a group of entities: all owners residing in the same county. Here the assignment of numbers must be such that the samenumber (or letter) is given to all personsresiding in the same county and that different numbers (or letters) are given to people residing in different counties. That is, the number or letter on the license plate must clearly indicate to which of a set of mutually exclusive subclassesthe owner belongs,

MEASVREMENT

Numbers on football jerseys and social-securitynumbersare other examplesof the use of numbers in nominal scaling.

Formal properties. All scaleshave certain formal properties. These propertiesprovide fairly exact definitions of the scale'scharacteristics, moreexactdefinitionsthan we can give in verbal terms. Theseproperties may be formulatedmoreabstractly than we havedonehereby a set pf axiomswhichspecifythe operationsof scalingand the relationsamong the objects that have been scaled.

In a nominal scale, the scaling operation is partitioning a given class into a set of mutually exclusive subclasses. The only relation involved js that of equivalence. That is, the membersof any one subclassmust be

equivalentin the property beingscaled. This relation is symbolizedby the familiar sign: =. The equivalencerelation is reflexive,symmetrical, and transitive.'

Admissible operations. Since in any nominal scale the classification may be equally well representedby any set of symbols, the nominal scale js said to be "unique up to a one-to-onetransformation." The symbols

designatingthe various subclasses in the scalemay be interchanged,if this is done consistently and completely. For example,when new license

platesare issued,the licensenumberwhichformerly stoodfor onecounty can be interchangedwith that which had stood for another county. Nominal scalingwould be preservedif this change-overwereperformed consistently and thoroughly in the issuing of all license plates. Such

pne-to-one transformations are sometimes called"the symmetricgroup of transformations."

Since the symbolswhich designatethe various groupspn a npmjnal scale may be interchanged without altering the essential information in the scale, the only kinds of admissible descriptive statistics are those

whichwouldbeunchanged by sucha transformation: themode,frequency counts,etc. Undercertainconditions,wecantesthypptheses regarding the distribution of casesamongcategoriesby using the nonparametrjc statistical test, x', or by using a test basedon the binomial expansion. These tests are appropriate for nominal data becausethey fpcus pn fr~

quenciesin categories,j.e., on enumerativedata. The most common

measure of association for nominaldatais the contingency coe@cjent, C, a nonparametric statistic. The Ordinal or Ranldng Scale

Definition. It may happenthat the objectsin onecategor of a seal

are not just differentfrom the objectsin othercategories of that scale, ' Rejlezave: s = x for all valuesof z. 8ym~~~n): if + = y then if'

=yandy

=e,then@ =g.

24

CHOOSING

AN APPROPRIATE

STATISTICAL

TEST

but that they stand in some kind of rection to them. Typical relations among classesare: higher, more preferred, more diScult, more disturbed, more mature, etc. Such relations may be designatedby the carat (>) which, in general, means "greater than." In reference to particular scales, > may be used to designateis preferredto, is higher than, is more

dificult than,etc. Its specificmeaningdependson the natureof the relation

that

defines the scale.

Given a group of equivalenceclasses(i.e., given a nominal scale),if the relation > holds between some but not all pairs of classes,we have

a partially orderedscale. If the relation > holds for all pairsof classes so that a completerank orderingof classesarises,we havean ordinalscale. Examples. Socioeconomicstatus, as conceivedby Warner and his associates,'constitutesan ordinal scale. In prestigeor social acceptability, all membersof the upper middle classare higher than (>) all members of the lower middle class. The lower middles, in turn, are

higher than the upper lowers. The = relation holdsamongmembersof the same class,and the > relation holds between any pair of classes.

The systemof gradesin the military servicesis anotherexampleof an ordinal scale. Sergeant > corporal > private.

Many personalityinventoriesand tests of ability or aptituderesultin scores which have the strength of ranks. Although the scores may

appearto be moreprecisethan ranks, generallythesescalesdo not meet the requirements of any higher level of measurementand may properly be viewed

as ordinal.

Foanal properties. Axiomatically, the fundamental differencebetween a nominal and an ordinal scale is that the ordinal scale incorporates not

only the relation of equivalence(=) but alsothe relation "greater than" (>). The latter relation is irreflexive,asymmetrical,and transitive.' Admissible operations. Since any order-preserving transformation does not change the information contained in an ordinal scale, the scale

is said to be "unique up to a monotonictransformation." That is, it doesnot matter what numbers we give to a pair of classesor to members of those classes,just as long as we give a higher number to the members of the class which is "greater" or "more preferred." (Of course, one

may usethe lower numbersfor the "more preferred" grades. Thus we usually refer to excellentperformanceas "first-class," and to progres-

sively inferior performances as "second-class" and "third-class." So long as we are consistent,it doesnot matter whetherhigheror lowernumbersare usedto denote"greater" or "more preferred.") ~ Warner, W. L., Meeker,M., andEells,K. 1949. Socialdossie America. New York: Science Research Associates.

'Irrejfezive:it is not true for any s that s > s. AeymmctricaL' if s > y, then p > s. Transitive:if x > y andy > s, thens > s.

MEhSURRRENT

Forexample, a corporal in thearmywears twostripes onhissleeve anda sergeant wears three.These insignia denote thatsergeant > cor-

poral.Thisrelation would beequally wellexpressed if thecorporal wore

fourstripes andthesergeant woreseven.Thatis,a transformation

which does notchange theorder oftheclassea iscompletely admissible

because it does notinvolve anylose ofinformation. Anyorallthenumbers

applied toclasses inanordinal scale maybechanged inanyfashion which

doesnotaltertheordermg (ranking) of theobjects.

Thestatistic most appropriate fordescribing thecentral tendency of

scores in anordinal scale isthemedian, since themedian ianotaffected

by changes of anyscores which areabove or below it aslongasthe

number of scores above andbelowremains thesame.Withordinal

scaling, hypothesea canbetested byusing thatlarge group ofnonpara-

metricstatistical testswhicharesometimes called "orderstatistics" or

"ranking statistics." Correlation coefBcients based onrankings (e.g.,

theSpearman r8or theKendall r) areappropriate.

Theonlyassumption made bysome ranking testsisthatthescores we pbserve aredrawnfromanunderlying continuous distribution. Parametrictestsalsomake thisassumption. Anunderlying continuous var-

isteisonethatisnotrestricted tohaving onlyisolated values. It may haveanyvaluein a certain interval.A discrete variste, ontheother

hand, is onewhich cantakeononlya fmite number ofvalues; a con-

tinuous variateis onewhichcan(butmaynot)takeona continuous in6nity

of values.

Forsome nonparametric techniques whichrequire ordinalmeasure-

ment,therequirement is thattherebea continuum underlying the

pbserved scores.Theactualscores weobserve mayfall intodiscrete

cs,tegpries. Forexample, theactualscores maybeeither«pass" pr

"fail"ona particular item.Wemaywellassume thatunderlying such s dichotomy there isa continuum ofpossible results. Thatis,some individuals who were categorized asfailing mayhave been closer topassingthanwere others whowere categorized asfailing.Similarly, some passed onlyminimally, whereas others passed withease anddispatch. Theassumption isthat"pass" and"fail"represent a continuum dichot-

omized into two intervals.

Similarly, in matters ofopinion those whoarecls~gged aa«a) =

N

P Q"

whereP = proportionof casesexpected in oneof the categories Q = 1 P = proportionof casesexpected in the othercategory N Nt

z s! ~~

(N

s)!

A, simpleillustration will clarify formula (4.1). Supposeafair die is rolled five times. What is the probability that exactlytwo of the

rolls will show"six"? In this case,N = the numberof rolls= 5; g = thenumberof sixes= 2; P = theexpected proportionof sixes= ~~ (sincethe die is fair and thereforeeachaidemay be expectedto show

equallyoften);andQ = 1

P = f. Theprobabilitythat exactlytwo

of the five rolls will showsix is given by formula (4.1): N

p(g)

PaQNx

(4 1)

66 = .16

Theapplication of theformulato theproblem shows usthat theprobability of obtaining exactly two "sixes" when rolling a fair die five times is p = .16.

Now whenwe do researchour questionia usuallynet "What ia the probability of obtaining exactly the values which were observed?"

Rather,we usuallyask, "What is the probabilityof obtainingthe observed valuesor valuesevenmoreextreme?"To answerquestions of this type, the sampling distribution of the binomial is PCQN-i i~0

~ Nlis N factorial, whichmeans N(N 1)(N 2) ~~ ~(2)(1).Forexample, 4! ~ (4)(3)(2)(1) 24. Table S oftheAppendix gives factorials forvalues through N TableT of the Appendix givesbinomialcoeKcients,, s

through 20.

for valuesof N

38

THE

ONE-SAMPLE

CASE

In other words, we sum the probability of the observed value with the probabilities of values even more extreme.

Supposenow that we want to know the probability of obtaining two or feurer"sixes" whena fair die is rolled five times. Here againN = 5, s = 2, P = z, and Q = in'. Now the probability of obtaining 2 or fewer "sixes" is p(s < 2). The probability of obtaining 0 "sixes" is p(0). The probability of obtaining 1 "six" is p(1). The probability of obtaining 2 "sixes" is p(2). We know from formula (4.2) above that p(

< 2) = p(0) + p(1) + p(2)

That is, the probability of obtaining two or fewer "sixes" is the sum of

the three probabilities mentioned above. If we use formula (4.1) to determine each of these probabilities, we have:

P(0)= 0f5f 6 6

and thus

p(* p2 = z. That is, thereisnodifferenc betweenthe probability of usingthe first-learnedmethodunder stress(p,) and the probability of using the second-learned method

understress (p2);anydifference between thefrequencies whichmaybe observed is ofsucha magnitude thatit mightbeexpected in a sample fromthepopulation of possible resultsunderHo. H~. 'p~ ) pl. ii. StatisticalTeat. The binomial test is chosenbecausethe data

arein twodiscrete categories andthedesign isoftheone-sample type. Sincemethods A andB wererandomlyassigned to beingfirst-learned and second-learned, thereis no reasonto think that the first-learned

methodwouldbepreferred to thesecond-learned underHa,andthus P=Q=4

iii. SignificanceLevel. Let a = .01. N = the number of cases = 18.

iv. Sampling Distribution.Thesampling distributionis givenin formula(4.2)above. However, whenN is 25or smaller,andwhen P = Q = ~~,Table D gives the probabilitiesassociatedwith the

occurrence underHo of observed valuesas smallas x, and thus

obviates thenecessity for usingthesampling distribution directly in the employment of this test.

v. Rejection Region. Theregionof rejectionconsistsof all values ~ Barthol,R. P., and Ku, Nani D. 1955. Specificregression undera nonrelated stresssituation. Amer.Psychologist, 10,482. (Abstract)

THE

ONE-ShMPLE

ChSE

of z (where s = the number of subjects who used the second-learned

methodunderstress)whicharesosmallthat the probabilityassociated with their occurrenceunder H pis equal to or lessthan a = .01.

Sincethe directionof the differencewaspredictedin advance,the region of rejection is one-tailed.

vi. Decision. In the experiment, all but two of the subjectsused the first-learned methodwhenaskedto tie theknot understress(late at night after a long final examination). These data are shown in Table

4.1 ~ ALE

4.1. KNOT-TTING METHOD CHOSEN UNDER STRESS

In this case,N = the number of independent observations = 18.

s = the smallerfrequency= 2. Table D showsthat for N = 18, the probability associatedwith x < 2 is p = .001. Inasmuch as this p is smaller than a = .01, the decision is to reject Hp in favor of H~. We concludethat p» p2, that is, that personsunder stressrevert to the first-learned

of two methods.

Large samples. Table D cannot be used when N is larger than 25. However, it can be shown that as N increases, the binomial distribution tends toward the normal distribution. This tendency is rapid when P is close to >, but slow when P is near 0 or 1. That is, the greater is the disparity betweenP and Q, the larger must be N before the approximation is usefully close. When P is near z, the approximation may be usedfor a statistical test for N > 25. When P is near 0 or 1, a rule of thumb is that NPQ must equal at least 9 before the statistical test based on the normal

approximation is applicable. Within these limitations, the sampling distribution of s is approximately normal, with mean = NP and standard

deviation= QNPQ, andthereforeHo maybe testedby z=

s

p,

g

NP

QNPQ z is approximately normally distributed with zero mean and unit variance, The approximation becomesan excellent one if a correction for continuity is incorporated. The correction is necessarybecausethe normal

THE

BINOMIAL

41

TE8T

distribution is for a continuousvariable, whereasthe binomial distribution involves a discrete variable. To correct for continuity, we regard the observedfrequency s of formula (4.3) as occupying an interval, the lower limit of which is half a unit below the observedfrequency while the upper limit is half a unit above the observedfrequency. The correction for continuity consists of reducing, by .5, the difference between the observed value of s and the expected value, p, = NP. Therefore when s < y,,

we add .5 to s, and whenx > p, we subtract.5 from s. That is, the observeddifference is reduced by .5. Thus z becomes

where x + .5 is used when s < NP, and s

.5 is used when x > NP.

The value of z obtainedby the applicationof formula (4.4) may be considered to be normally distributed with zero mean and unit variance,

Thereforethe significanceof an obtainedz may be determinedby referenceto Table A of the Appendix. That is, Table A givesthe one-tailed probability associatedwith the occurrenceunder Ho of values as extreme

asanobserved z. (If a two-tailedtestis required,they yieldedby Table A should be doubled.)

To showhow goodan approximationthis is whenP = ~~evenfor N < 25, we can apply it to the knot-tying data discussedearlier. In that

case, N = 18,s = 2, andP = Q = 4. Forthesedata,s < NP,that is, 2 < 9, and, by formula (4.4), (2+ .5) (18)(.5) =

3.07

TableA shows that a zasextreme as 3.O7hasa one-tailed probability associated withits occurrence underHoof p = .OO11.Thisis essentia the sameprobability we found by the other analysis,which useda table of exact probabilities.

Summaryof pmedure. In brief,thesearethe stepsin the useof the binomial

test:

1. Determine N = the total number of casesobserved.

2. Determinethe frequenciesof the observedoccurrences in eachof the two categories.

3. The methodof findingthe probabilityof occurrence underHo of the observed values,or valuesevenmoreextreme,varies: a. If N is 25or smaller,andif P = Q = z, TableD givesthe one-tailed probabilitiesunder Ho of variousvaluesas small as an observeds.

THE

ONE-SAMPLE

CASE

A one-tailedtest is usedwhenthe researcher haspredictedwhich

categorywill havethe smallerfrequency.For a two-tailed test,, double the p shown in Table D.

5. If P g Q, determinethe probabilityof the occurrence underHpof

the observed valueof x or of an evenmoreextreme valueby substitutingtheobserved valuesin formula(4.2). TableT ishelpfulin N this computation; it givesbinomialcoefficients,,

for N < 20.

c. If N islargerthan25,andP closeto q-,testH0byusingformula(4.4). Table A gives the probability associatedwith the occurrenceunder

HD of valuesas large as an observedz yielded by that formula.

TableA givesone-tailed p's; for a two-tailed test,doublethep it yields.

If the p associated with the observedvalue of x or an evenmoreextreme value is equal to or less than a, reject H0. Power-Efficiency

Inasmuchas thereis no parametrictechniqueapplicableto data meas-

uredin a nominalscale,it ivouldbe meaningless to inquireabout the power-efficiencyof the binomial test when used with nominal data. If a continuum is dichotomized and the binomial test usedon the result-

ing data,that testmaybewastefulof data. In suchcases, the binomial test haspower-efficiency (in the sensedefinedin Chap.3) of 95per cent for N 6, decreasingto an eventual (asymptotic)efficiencyof 2= = 63 percent(Mood,1954). However,if the dataarebasicallydichotomous, eventhough the variablehas an underlying continuousdistribution, the binomial test may have no more powerful alternative. References

For other discussionsof the binomial test, the reader may turn to

Clopperand Pearson(1934),David (1949,chaps.3, 4), McNemar(1955, pp. 42-49), and Mood (1950, pp. 54 58). THE

xm ONE"SAMPLE

TEST

Function

Frequently researchis undertaken in which the researcheris interested in the number of subjects, objects, or responseswhich fall in various categories. For example, a group of patients may be classifiedaccording to their preponderant type of Rorschach response,and the investigator may predict that certain types will be more frequent than others. Or children may be categorized according to their most frequent modes of

play, to test the hypothesisthat thesemodeswill dier in frequency. Qr

THE

g ONE-SAMPLE

TEST

43

persons maybecategorized according to whetherthey are"in favorof," "indifferent to," or "opposedto" somestatementof opinion, to enable the researcherto test the hypothesisthat theseresponses will differ in t'requency.

The g' test is suitable for analyzing data like these. The number of

categoriesmay be two or more. The techniqueis of the goodness-of-fit type in that it may be used to test whether a significant differenceexists

betweenan observed numberof objectsor responses falling in eachcategory and an expectednumber based on the null hypothesis. Method

In order to be ableto comparean observedwith an expectedgroup of frequencies,we must of coursebe able to state what frequencieswould be

expected. The null hypothesisstatesthe proportionof objectsfalling in eachof the categories in the presumedpopulation. That is, from the null hypothesiswe may deducewhat are the expectedfrequencies. The g' techniquetestswhetherthe observedfrequenciesare sufficientlycloseto the expectedonesto be likely to have occurred under Ks. The null hypothesis may be tested by ( 0; E;) ' E;

(4 5)

where0, = observednumberof casescategorizedin ith category E; = expected number of casesin ith category under Hs directs one to sum over all (A') categories i~i

Thus formula (4.5) directs one to sum over k categoriesthe squared differencesbetweeneachobservedand expectedfrequencydivided by the correspondingexpectedfrequency,

If the agreementbetweenthe observedand expectedfrequenciesis close,the differences(0, E;) will be small and consequentlyy~ wiQ be

small. If the divergence is large,however,the valueof y~as computed from formula(4.5)will alsobelarge. Roughlyspeaking, the largerg' is, the more likely it is that the observedfrequenciesdid not come from the population on which the null hypothesis is based.

It canbeshownthat the samplingdistributionof g' underHs,ascorn puted from formula (4.5), follows the chi-square' distribution with > Toavoid confusion,the symbolx' will be usedfor the quantity which is calculated

from the observed data[usingformula(4.5)]whena x' testis performed.Thewords «chi square"will referto a randomvariablewhichfollowsthe chi-square distribution, certain values of which are shown in Table C.

THE

df = k

ONE-SAMPLE

CASE

1. (df refersto degreesof freedom;thesearediscussed below.)

Table C of the Appendix is taken from the sampling distribution of chi square, and gives certain critical values. At the top of each column in Table C are given the associatedprobabilities of occurrence(two-tailed) under Ho. The values in any column are the values of chi square which have the associated probability of occurrenceunder Ho given at the top of that column. There is a diferent value of chi square for each df.

There are a number of diferent sampling distributions for chi square, one for each value of df. The size of df reflects the number of observations that are free to vary after certain restrictions have been placed on the data.

These restrictions are not arbitrary, but rather are inherent in

the organization of the data. For example, if the data for 50 casesare classified in two categories, then as soon as we know that, say, 35 cases fall in one category, we also know that 15 must fall in the other. For this example, df = 1, becausewith two categoriesand any fixed value of

N, assoonas the numberof casesin onecategoryis ascertainedthen the number of casesin the other category is determined.

In general,for the one-samplecase,whenHo fully specifiesthe E 2, if morethan 20per centof theE saresmaller

than 5, combineadjacentcategories, wherethis is reasonable, thereby reducing the value of k and increasing the values of some of the E s.

Wherek = 2, the y' test for the one-sample casemaybeusedappropriately only if each expectedfrequency is 5 or larger. 3. Using formula (4.5), compute the value of g'. 4. Determine the value of df. df = k l.

5. By referenceto Table C, determinethe probability associatedwith the occurrence underHo of a valueaslargeasthe observedvalueof x' for

the observed valueof df. If that p isequalto or lessthan0,,rejectHo. Power

Theliteraturedoesnotcontainmuchinformationaboutthepowerfunction of the g' test. Inasmuchasthis testis mostcommonlyusedwhen wedo not havea clearalternativeavailable,weareusuallynot in a position to compute the exact power of the test.

When nominal measurementis used or when the data consist of fre-

quenciesin inherentlydiscretecategories, then the notion of powerefficiencyof the g' testis meaningless, for in suchcasesthereis no parametric test that is suitable. If the data aresuchthat a parametrictest is available, then the g' test may be wasteful of information.

It shouldbe noted that when df > 1, g' tests are insensitiveto the

effectsof order,andthuswhena hypothesis takesorderinto account, x' maynot bethe besttest. For methodsthat strengthenthe commong'

testswhenHois testedagainstspecific alternatives, seeCochran (1954). References

Usefuldiscussions of this y' testarecontained in Cochran(19521954)

Dixon and Massey(1951,chap.13),Lewisand Burke (1949),and McNemar (1955, chap. 13). THE KOLMOGOROV-SMIRNOVONE-SAMPLE TEST Function

and Rationale

TheKolmogorov-Smirnov one-sample testis a test,of goodness of fit. That is, it is concerned with the degreeof agreementbetweenthe distribu-

tion of a setof sample values(observed scores) andsomespecified theoreticaldistribution. It determines whetherthescores in thesample can reasonably be thoughtto havecomefrom a populationhavingthe theoretical

distribution.

48

THE

ONE-ShMPLE

ChSE

BrieBy, the test involvesspecifyingthe cumulativefrequencydistribution which would occurunder the theoreticaldistribution and comparing that with the observedcumulative frequency distribution.

The theoreti-

cal distribution representswhat would beexpectedunderH p. The point at which these two distributions, theoretical and observed, show the greatest divergenceis determined. Referenceto the sampling distribution indicates whether such a large divergence is likely on the basis of chance. That is, the sampling distribution indicates whether a divergenceof the observedmagnitude would probably occur if the observations were really a random sample from the theoretical distribution. Method

Let Fp(X) = a completely specified cumulative frequency distribution function, the theoretical cumulative distribution under Hp. That is, for any value of X, the value of Fp(X) is the proportion of casesexpectedto have scoresequal to or less than X. And let Sw(X) = the observedcumulative frequency distribution of a

random sample of N observations. Where X is any possiblescore, S>(X) = k/N, where Ir = the number of observations equal to or less than

X.

Now under the null hypothesis that the sample has been drawn from the specified theoretical distribution, it is expected that for every value of X, S~(X) should be fairly closeto Fp(X). That is, under H pwe would

expectthe differencesbetweenSw(X) and F p(X)to be small and within the limits of random errors.

The Kolmogorov-Smirnov

test focuses on

the largestof the deviations. The largest value of Fp(X) Sz(X) is called the nmximum den,ation, D:

D = maximum IFp(X)

Spy(X)I

(4.6)

The samplingdistribution of D under Hp is known. Table E of the Appendixgives certain critical valuesfrom that samplingdistribution. Notice that the significanceof a givenvalue of D dependson N, For example,supposeone found by formula (4.6) that D = .325 when N = 15. Table E shows that D > .325 has an associatedprobability of occurrence(two-tailed) between p = .10 and .05. If N is over 35, one determinesthe critical values of D by the divisions indicated in Table E.

For example, suppose a researcher uses N = 43

casesand sets a = .05. Table E shows that any D equal to or greater than

1.36

~N

will be significant. That is, any D, as definedby formula (4.6),

which is equal to or greater than level (two-tailed test).

1.36

~43

= .207 will be significant at the .05

THE KOLMOQOROVWMIRNOV ON~h.MPLE

TEST

49

Critical values for one-tailed tests have not as yet been adequately tabled. For a method of finding associatedprobabilities for one-tailed tests, the readermay refer to Birnbaum and Tingey (1951) and Goodman (1954, p. 166). Example

Supposea researcherwereinterested in confirming by experimental meansthe sociological observation that American Negroes seem to have a hierarchy of preferencesamongshadesof skin color.' To test how systematic Negroes' skin-color preferencesare, our fictitious researcherarrangesto have a photograph taken of each of ten Negro subjects. The photographer develops these in such a way that he obtains five copies of each photograph, each copy differing slightly in darknessfrom the others, so that the five copies can reliably be ranked from darkest to lightest skin color. The picture showing the darkest skin color for any subject is ranked as 1, the next darkest as 2, and so on, the lightest being ranked as 5. Each subject is then offered a choice among the five prints of his own photograph. If skin shade is unimportant to the subjects, the photographs of each rank should be chosenequally often except for random differences. If skin shade is important, as we hypothesize, then the subjects should consistently favor one of the extreme ranks.

i. Null Hypothesis. Hp. there is no difference in t'he expected number of choices for each of the five ranks, and any observed differencesare merely chancevariationa to be expectedin a random samplefrom the rectangular population wheref~ f p ~ ~ ~ f'.the frequenciesf~, fi,..., f< are not all equal. ii. Statistical Test. The Kolmogorov-Smirnov one-sampletest ia chosenbecausethe researcherwishes to compare an observed distribution of scores on an ordinal scale with a theoretical distribution.

iii. SignificanceLevel. Let a = .01. N = the number of Negroes who served as subjects in the study = 10. iv. Sampling Distribution. Various critical values of D from the sampling distribution are presented in Table E, together with their associated probabilities of occurrence under H p.

v. RejectionRegion. The region of rejection consistsof all values

of D [computedby formula (4.6)]which are solarge that the probability associatedwith their occurrenceunder H pis equal to or less than a =

.01.

vi. Decision In this hypothetical study, each Negro subject choMesoneof five print of the samephot graph. Supp~ onesub ject chooses print 2 (the next-to-darkestprint), five subjectscho > Warner,W. L., Buford, H. J., and Walter, A. A.

Washing:

AmericanCouncilon Education.

1941. Cokrrand Semenmature.

50

THE

ONE-SAMPLE

CASE

print 4 (thenext-to-lightestprint), andfour chooseprint 5 (thelightest print). Table4.8showsthesedata andcaststhemin the form appropriate for applying the Kolmogorov-Smirnov one-sampletest. ThBLE 4.3. HYPGTHETIchL SKIN-coLQR PREFERENcEsQP 10 NEGRo

SvBJEcTS

Rank of photo chosen

(1 is darkest skin color)

Notice that Fs(X) is the theoretical cumulative distribution under

Hs, where Hs is that each of the 5 prints would receive~ of the choices. Sic(X) is the cumulative distribution of the observed

choicesof the 10Negrosubjects. The bottom row of Table4.8gives the absolutedeviationof eachsamplevaluefrom its pairedexpected value. Thus the 6rst absolutedeviationis ~, whichis obtainedby subtracting 0 from z.

Inspection of the bottom row of Table 4.8 quickly revealsthat the

D for thesedata is Tss,whichis .500. TableE showsthat for N = 10, D > .500 has an associatedprobability under Ha of p < .01. Inasmuch as the p associated with the observed value of D is smaller than

a = .01, our decisionin this fictitious study is to reject Hs in favor of Hi. %'e conclude that our subjects show significant preferences among skin colors.

Summary of procedure. In the computation of the KolmogorovSmirnov test, these are the steps:

1. Specifythe theoreticalcumulativestepfunction,i.e., the cumulative distribution expectedunder Ho.

2. Arrange the observedscoresin a cumulative distribution, pairing eachinterval of S~(X) with the comparableinterval of Fo(X). 8. For eachstep on the cumulativedistributions,subtractSz(X) from Fs(X).

THE

KOLMOGOROVWMIRNOV

ONE-SAMPLE

TEST

4. Using formula (4.6), find D. 5. Refer to Table E to find the probability (two-tailed) associatedwith the occurrenceunder Ho of values as large as the observedvalue of D. If that p is equal to or less than a, reject Ho. Power

The Kolmogorov-Smirnov one-sampletest treats individual observations separately and thus, unlike the x' test for one sample,neednot lose information through the combining of categories. When samples are small, and therefore adjacent categoriesmust be combined before x' may properly be computed, the x' test is definitely less powerful than the Kolmogorov-Smirnov test. Moreover, for very small samplesthe x' test is not applicable at all, but the Kolmogorov-Smirnov test is. Thesefacts suggest that the Kolmogorov-Smirnov test may in all cases be more powerful than its alternative, the y' test. A reanalysis by the y' test of the data given in the exampleabove will highlight the superior power of the Kolmogorov-Sinirnov test. In the form in which the data are presentedin Table 4.3, x' could not be computed, becausethe expected frequencies are only 2 when N = 10 and It = 5. We must combine adjacent categoriesin order to increase the expected frequency per cell. By doing that we end up with the two-

categorybreakdownshownin Table 4.4. Any subject'schoiceis simply classifiedas being for a light or a dark skin color; finer gradations must be ignored. TABLE 4.4. HYPOTHETICALSKINWOLORPREFERENCESOP 10 NEGRO SUBAICT8

For thesedata, x' (uncorrectedfor continuity) = 3.75. Table C shows that the probability associatedwith the occurrenceunder Ho of such a

valuewhendf = k 1 = 1 is between.10 and.05. That is, .10) p p .05. This value of p does not enable us to reject Ho at the .01 level of significance. Notice that the p we found by the Kolrnogorov-Smirnov test is smaller than .01, while that found by the x' test is larger than .05. This differ-

encegivessomeindicationof the superiorpowerof the KolmogorovSmirnov

test.

THE

52

ONE-SAMPLE

CASE

References

The reader may find other discussionsof the Kolmogorov-Smirnov test in Birnbaum (1952; 1953), Birnbaum and Tingey (1951), Goodman (1954), and Massey (1951a). THE

Function

and

ONE-SAMPLE

RUNS

TEST

Rationale

If an experimenterwishesto arrive at someconclusionabout a population by using the information contained in a samplefrom that population, then his sample must be a random one. In recent years, several techniques have been developed to enable us to test the hypothesis that a

sample is random. These techniques are based on the order or sequence in which the individual scores or observations originally were obtained.

The technique to be presented here is based on the number of runs which a sample exhibits. A run is defined as a successionof identical symbols which are followed and precededby dif7erent symbols or by no symbols at all. For example, suppose a series of plus or minus scores occurred in this order:

This sample of scoresbegins with a run of 2 pluses. A run of 3 minuses follows.

Then comes another run which consists of 1 plus.

It is foljowed

by a run of 4 minuses,after which comesa run of 2 pluses,etc. We can group thesescoresinto runs by underlining and numbering eachsuccession of identical symbols: ++

+

12

++

+

34

We observe 7 runs in all: r =

56

7

number of runs = 7.

The total number of runs in a sampleof any given size gives an indication of whether or not the sample is random. If very few runs occur, a time trend or some bunching due to lack of independenceis suggested. If a great many runs occur, systematic short-period cyclical fluctuations seem to be influencing the scores.

For example, suppose a coin were tossed 20 times and the following sequenceof heads (H) and tails (T) was observed: HH

HH

HH

HH

HH

TT

TT

TT

TT

TT

Only two runs occurred in 20 tosses. This would seemto be too few for a "fair" coin (or a fair tosser.'). Somelack of independencein the events

THE ONE&LE

RUNS TEST

is suggested. On the otherhand,suppose the following sequence occurred:

HT

HT

HT

HT

HT

HT

HT

K TH

TH

T

IIere too manyrunsare observed.In this case,with r = 20 when N = 20,it wouldalsoseemreasonable to rejectthehypothesis that the coin is "fair." Neitherof the abovesequences seemsto be a random series of H's and T's.

Notice thatouranalysis, which isbased ontheorder oftheevents, gives

usinformation whichis notindicated by thefrequency of theevents. In

both of the abovecases,10tails and 10headsoccurred. If the scores

wereanalyzed according to theirfrequencies, e.g.,by useof they' test

orthebinomial test,wewouldhavenoreason to suspect the"fairness"of

thecoin. It isonlya runs test,focusing ontheorderoftheevents, which reveals thestrikinglackofrandomness ofthescores andthusthepossible lack of "fairness"

in the coin.

Thesampling distribution ofthevalues ofr which wecouldexpect from

repeated random samples is known.Usingthissampling distribution, we maydecidewhethera givenobserved samplehasmoreor fewerruns than wouldprobablyoccurin a randomsample. Method

I,etn~= thenumberof elements of onekind,andnm= thenumberof elements oftheotherkind. Thatis,n~mightbethenumber ofheads and

n>thenumber of tails;or n~mightbethenumber of pluses andn2the number ofminuses. N = thetotalnumber ofobserved events = n~+n~. To usetheone-sample runstest,first observe then~andn2eventsin

thesequence in whichtheyoccurred anddetermine thevalueofr, the number

of runs.

Smallsamples. If bothn>andn>areequalto orlessthan20,then

TableF oftheAppendix givesthecriticalvaluesofr underHofora = .05.

These arecriticalvalues fromthesampling distribution of r underHo.

If,theobserved valueofr fallsbetween thecriticalvalues, weaccept Ho.

If the observed viue of r is equ~to or moreextremethanoneof the critical values, we reject HD.

Twotables aregiven:F>andF». TableFi givesvalues ofr whichare

so8matl thattheprobability associated withtheiroccurrence under Hois p = .025.TableF»gives values ofr which aresolarge thattheproba-

bilityassociated withtheiroccurrence underH, is p = 025 Anyobserved valueofr whichisequalto orlessthanthevalueshown in TableFi orisequalto orlargerthanthevalueshown in TableF»isin tb

res ion of rejection for a = .05.

THE

ONE-SAMPLE

CASE

For example,in the first tossing of the coin discussedabove,we observed two runs: one run of 10 heads followed by one run of 10 tails.

Here

n> 10, n~ 10, and r = 2. Table F showsthat for thesevaluesof ni andn~,a randomsamplewouldbeexpectedto containmorethan 6 runs but less than 16. Any observed r of 6 or less or of 16 or more is in the

regionof rejectionfor a = .05. Theobserved r = 2 is smallerthan6, so at the .05 significancelevel we reject the null hypothesisthat the coin is

producinga randomseriesof headsandtails. If a one-tailed test is called for, i.e., if the direction of the deviation

from randomness is predictedin advance,then only oneof the two tables needbeexamined. If the predictionis that too few runswill beobserved,

TableFi givesthecriticalvaluesof r. If theobserved r undersucha onetailed test is equalto or smallerthan that shownin TableF 10 and n~

10. Since our observedvalue

of r is equalto or largerthan that shownin TableF», wemayrejectHo

at a = .025,and conclude that the coinis "unfair" in the predicted direction.

Examplefor Small Sample8

In a studyof the dynamicsof aggression in youngchildren,the

experimenter observed pairsof childrenin a controlled playsituation.' Most of the 24 children who servedas subjectsin the study camefrom the samenurseryschooland thus playedtogetherdaily.

Sincethe experimenter wasableto arrangeto observebut two childrenon anyday,shewasconcerned that biasesmightbeintroduced into thestudyby discussions between thosechildrenwhohadalready servedas subjectsand thosewhowereto servelater. If suchdiscussionshad any eKecton the level of aggression in the play sessions, this eeet might showup as a lack of randomness in the aggression scoresin the order in which they were collected. After the study

wascompleted, the randomness of the sequence of scoreswastested by convertingeachchild's aggression scoreto a plus or minus, depending on whetherit fell aboveor belowthe groupmedian,and thenapplyingthe one-sample runstest to the observed sequence of plusesand minuses. ' Siegel,Alberta E. 1955. The effectof 61m-mediated fantasyaggression on strengthof aggressive drive in youngchildren. Unpublished doctor'sdissertation, Stanford University.

THE

ONE-Sh.MPLE

RUNS

TEST

i. Null Hypothesis. Ho, the plusesand minusesoccur in random order. HI.'the order of the pluses and minuses deviates from randomness.

ii. Statistical Test. Sincethe hypothesis concernsthe randomness

of a single sequenceof observations,the one-sampleruns test is chosen.

iii. Significance reve/.

Let a = .05.

N = the number of sub-

jects = 24. Since the scoreswill be characterized as plus or minus dependingon whether they fall above or below the middlemost score in the group, nI = 12 and n~

12.

iv. Sampling Diatribution. Table F gives the critical values of r from the sampling distribution.

v. RejectionRegion. Since HI does not predict the direction of the deviation from randomness,a two-tailed test is used. Ho will be

rejectedat the .05levelof significanceif the observedr is eitherequal to or lessthan the appropriatevalue in Table FI or is equal to or larger than the appropriate value in Table FII. For nI = 12 and n> 12, Table F showsthat the region of rejection consistsof all r's of 7 or less and all r's of 19 or more.

vi. Ded sion. Table 4.5 showsthe aggressionscoresfor each child in the order in which those scores occurred.

The median of this set

of scoresis 24.5. All scoresfalling belowthat medianaredesignated ThBLR 4.5. AGGRE88IONSCORESIN ORDER OF OCCURRENCE

asminusin Table4.5;all abovethat medianaredesignated asplus. From the column showing the sequenceof +'s and 's the reader

can readily observethat 10 runs occurredin this series,that is, r =10.

THE

ONE-shMPLE

ChsE

Reference to TableF revealsthat r = 10for n~= 12andnp 12 does not fall in the region of rejection, and thus our decision is

that the null hypothesisthat the sample of scoresoccurredin random order is acceptable.

Large samples. If either nI or np is larger than 20, Table F cannot

beused. For suchlargesamples, a goodapproximation to thesampling distribution of r is the normal distribution, with Mean=p,= Standard deviation = 0, =

''

nI + np

+1

2ngnp(2n~npnI np ) (nI + nQ)'(nl + np )

Therefore,wheneithernI or n>is largerthan 20, H pmay be testedby

2ngnp +1 z=

r

p

n> + nm

o'g

Since the values of z which sre yielded by formula (4.7) under Hp are

approximatelynormally distributed with zero meanand unit variance, the significance of any observed value of z computed from this formula msy be determined by referenceto the normal curve table, Table A of the Appendix. That is, Table A gives the one-tailed probabilities associated with the occurrence under H p of values as extreme as an observed z.

The large-sampleexamplewhich follows usesthis normal curve approximation to the sampling distribution of r. Example for Large Samples

The writer was interested in ascertaining whether the arrangement of men and women in the queue in front of the box office of a motionpicture theater was a random arrangement. The data were obtained

by simply tallying the sexof eachof s succession of 50personsasthey approached the box office.

i. Null Hypothesis. Hp'. the order of males and females in the queue was random.

Hx. the order of males and females was not

rand oIn.

ii. Statistical Test. The one-sampleruns test was chosenbecause

the hypothesisconcernsthe randomnessof a singlegroup of events. iii. Signi~nee

Level. Let a = .05. N = 50 = the number of

personsobserved.The valuesof n>andn. will bedetermined only after

the data are collected.

THE ONFA5hMPLE RUNS TEST

iv. SamplerDistribution.For largesamples, the valuesof z

which arecomputed fromformula (4.7)under Hp areapproximately

normally distributed. TableA givesthe one-tailed probability

associated with theoccurrence underHpof valuesasextreme asan observed z.

v. Rejection Region.SinceH~doesnot predictthe directionof

thedeviation fromrandomness, a two-tailed region of rejection is

used.It consists ofallvalues ofz,ascomputed fromformula (4.7),

whichareso extreme that the probabilityassociated with their occurrence underHp is equalto or lessthan a = .05. Thus the

regionof rejection includes all values ofz equal to or moreextreme than J1.96.

vi. Decision. Themales (M) andfemales (F) werequeued in

front of theboxoScein theordershownin Table4.6. Thereader Tmm4.6.ORDER oP30Mamas (M)hNn20Fmrhms (F)IN@URUS mph

Tax~vxa

Box arm

(Runsareindicatedby underlining) M F M F MMM FF M F M F M F MMMM F

F M F M F MM

FFF

M

M

MM

F MMMM

F

M F

F M

MM F

F MM

willobserve thatthere were 30males and20females inthissample.

Byinspection ofthedatainTable 4.6,hemayalsoreadily determine

that r = 35 = the number of runs.

To determine whetherr > 35mightreadilyhaveoccurred under

Hp,wecompute thevalueof z asdefined by formula (4.7).Let

n~= the numberof males= 30,andel = the numberof females = 20.

Then

2$ /san ng+ np

(4.7) 2(30)(20) 30 + 20 II

II

II

+ 20)'(30 + 20 1) = 2.98

TableA shows that theprobability of occurrence underHoof

THE

ONE-SAMPLE

CASE

z > 2.98 is p = 2(.0014) = .0028. (The probability is twice the y given in the table becausea two-tailed test is called for.) Inasmuch as the probability associatedwith the observedoccurrence,p = .0028, is lessthan the level of significance,e = .05, our decisionis to reject the null hypothesis in favor of the alternative

hypothesis.

We

conclude that in the queue the order of males and femaleswas not random.

Summary of procedure. These are the steps in the use of the onesample runs test:

1. Arrange the n~ and n2 observations in their order of occurrence. 2. Count the number of runs, r. 3. Determine the probability under HD associated with a value as extreme as the observed value of r. If that probability is equal to or

less than a, reject Ho. The technique for determining the value of p dependson the size of the ni and ne groups: a. If n~ and n~ are both 20 or less,refer to Table F. Table F 4.5 and df = 1, the

probability of occurrenceunder Hp is p < z(.05) which is p < .025. (The probability value given in Table C is halved becausea onetailed test is called for and the table gives two-tailed values.)

Inasmuch as the probability under Hp associatedwith the occurrence we observedis p < .025 and is lessthan a = .05, the observed value of x' is in the region of rejection and thus our decision is to reject Hp in favor of HI. With these artificial data we conclude that children show a significant tendency to change their objects of initiation from adults to children after 30 days of nursery school experience.

Sma11expected frequencies. If the expected frequency, that is, ~(A + D), is very small (lessthan 5), the binomial test (Chap. 4) should be

THE MCNEMAR TEST FOR THE SIGNIFICANCE OF CHANGES

67

usedrather than the McNemartest. For the binomialtest, N = A + D, and x = the smaller of the two observedfrequencies,either A or D. Notice that we could have tested the data in Table 5.3 with the binomial

test.

The null hypothesis would be that the sample of N =

A+ D

easescamefrom a binomialpopulationwhereP = Q = ~. For the above data, N = 18 and x = 4, the smallerof the two frequenciesobserved. Table D of the Appendix showsthe probability under Ho associatedwith

such a smallvalueis p = .015which is essentiallythe samep yieldedby the McNemartest. The differencebetweenthe two p's is due mainly to the fact that the chi-square table does not include all values between

p = .05 and p = .01. Siimmary of procedure. Theseare the steps in the computation of the McNemar

test:

1. Cast the observed frequencies in a fourfold table of the form illustrated

in Table 5.1.

2. Determine the expected frequencies in cells A and D.

E = ~(A+D) If the expected frequenciesare less than 5, use the binomial test rather than the McNemar

test.

3. If the expected frequenciesare 5 or larger, compute the value of x' using formula (5.2).

4. Determinethe probability underHo associated with a valueaslarge as the observed value of y' by referring to Table C.

If a one-tailed test

is calledfor, halvethe probability shownin that table. If the p shownby Table C for the observedvalue of x' with df = 1 is equal to or less than

a, reject Ho in favor of H>. Power-Efficiency

WhentheMcNemartestis usedwith nominalmeasures, the conceptof power-efficiencyis meaningless inasmuchas there is no alterriative with which to comparethe test. However, when the measurementand other

aspectsof the data are suchthat it is possibleto apply the parametric

t test, the McNemartest,like the binomialtest, haspower-efficiency of about 95 per centfor A + D = 6, and the power-efficiency declinesas the sizeof A + D increases to an eventual(asymptotic)efficiencyof about 63 per cent. References

Discussions of this testarepreseated by Bowker(1948) (1947I 1955,pp. 228-231).

THE

CASE

OF

THE

TWO

RELATED

SIGN

SAMPLES

TEST

Function

The signtestgetsits namefromthe fact that it usesplusandminus

signsratherthanquantitative measures asits data. It is particularly usefulfor research in whichquantitativemeasurement is impossible or infeasible,but in whichit is possibleto rank with respectto eachother the two membersof eachpair.

The signtest is applicableto the caseof two relatedsamples when the experimenter wishes to establish that two conditions are different.

The only assumptionunderlyingthis test is that the variableundercon-

siderationhasa continuous distribution. The test doesnot makeany assumptions about the form of the distributionof differences, nor doesit assumethat all subjectsare drawn from the same population. The

differentpairsmaybefromdifferentpopulations withrespect to age,sex, intelligence, etc.;theonlyrequirement is that withineachpairtheexperimenter has achievedmatchingwith respectto the relevant extraneous . variables. As was notedbefore,oneway of accomplishing this is to use eachsubjectas his own control. Method

The null hypothesistestedby the signtest is that p(Xg > Xa)

= p(Xg

( Xa)

=g

whereX~ is the judgmentor scoreunder one of the conditions(or after the treatment) and Xg is the judgmentor scoreunderthe othercondition (or before the treatment). That is, Xz and Xg are the two "scores" for a matched pair. Another way of stating Ho is: the mediandifference is zero.

In applyingthe sign test, we focuson the directionof the differences betweenevery X~; and Xs;, noting whether the sign of the difference is plusor minus. Under Ho, we wouldexpectthe numberof pairswhich have X~ > Xa

to equal the number of pairs which have X~ ( Xs.

That is, if the null hypothesiswere true we would expectabout half of the differences to be negativeand half to be positive. Ho is rejectedif too few differencesof one signoccur. Small samples. The probability associatedwith the occurrenceof a particular number of +'s and 's can be determinedby referenceto the binomial distribution with P = Q = L where N = the number of

pairs. If a matchedpair showsno difference (i.e., the difference, being zero,hasnosign)it isdropped fromtheanalysis andN istherebyreduced. Table D of the Appendix gives the probabilitiesassociatedwith the

THE

SIGN

TE8T

69

occurrenceunder Hs of valuesas small as s for N < 25. To use this table, let s = the numberof fewersigns. For example,suppose20 pairsare observed. Sixteenshowdifferences

in onedirection ( +) andtheotherfourshow differences intheother( ) . HereN = 20 ands = 4. Reference to TableD reveals that theprobability of this distributionof + 'a and 's or an even moreextremeone under Hs ia p = .006 (one-tailed). The signtest may be either one-tailedor two-tailed. In a one-tailed

test,theadvance prediction stateswhichsign,+ or , willoccurmore frequently. In a two-tailedtest, the predictionis simplythat the fre-

quenciea withwhichthe twosignsoccurwill be significantldifferent. For a two-tailedtest, doublethe valuesof p shownin Table D. Examplefor Seal Samples

In a studyof theeffects offather-absence uponthedevelopment of children,1 7marriedcoupleswho had beenseparatedby war snd whosefirst child was born during the father's absencewere interviewed,husbandsand wivesseparately. Each wasaskedto discuss

various topicsconcerning thechildwhose firstyearhadbeenspent in a fatherlesshome. Each parentwssaskedto discussthe father's disciplinaryrelationswith the child in the years after his return from war. These statementswere extracted from the recorded

interviews,anda psychologist whokneweachfamilywasaskedto rate the statementson the degreeof insightwhicheachparent showedin discussing paternal discipline. ' The predictionwas that the mother,becauseof her longerand closerassociation with the

childandbecause of a varietyof othercircumstances typically associatedwith father-separation becauseof war, would have greaterinsightinto her husband's disciplinary relationswith their child than he would have.

i. NuQ Hypotheeie.Hs.' the medianof the differences is sero. That is, thereare sa manyhusbands whoseinsightinto theu own disciplinary relationswith theirchildrenis greaterthan their wives'

astherearewiveswhose insightintopaternaldiscipline ia greater

thantheirhusbands'. Hi . 'themedian ofthedifferences ispositive. ii. Statistical Test. Theratingscale usedin thisstudyeonstitu at besta partiallyorderedscale. The informationcontained in the

ratingsis preserved if thedifference between eachcouple's tworst

ingsis expressed by a sign.Eachmarried couple in thisstudy constitutes a matchedpair;theyarematchedin the sensethat each Kngvall,Alberta. 1 954.Comparison of motherand fatherattitudestoward war~eparated children.In LoisM. Stolsetal.,Eafher relations oftear-born ckimren. Stanford p calif StanfordUniver.Press. Pp, 149-l,80.

THE

70

CASE

OF

TWO

RELATED

SA.MPLES

discussedthe samechild and the samefamily situation in the material

rated. The sign test is appropriatefor measuresof the strength indicated, and of courseis appropriatefor a caseof two related samples.

iii. SignificanceLeuel. Let a = .05. N = 17, the number of war-separatedcouples. (N may be reducedif ties occur.) iv. SamphngDistribution. The associatedprobability of occurrence of values as small as s is given by the binomial distribution

for P = Q = ~. The associatedprobabilitiesare givenin Table D. v. RejectionRegion. SinceH> predictsthe directionof the differ-

ences,the regionof rejectionis one-tailed. It consists of all values of s (wherex = the numberof minuses,sincethe predictionis that

pluseswill predominate andx = the numberof fewersigns)whose one-tailed associatedprobability of occurrenceunder Ho is equal to or less than a =

.05.

vi. Decision. The statements of each parent were rated on a five-

pointratingscale. Onthisscale,a ratingof 1 represents highinsight. ThBLE 6.4. WhR-sEPhRhTED PhRENTs INsroHT INTQ PhTERNhLDIscIPLINE Rating on insight' into paternal discipline Couple (pseudonym)

D>rection

of

Sign

~ Aratingof 1 represents greatinsight;a ratingof 6 represents little or no insight.

Table5.4showsthe ratingsassigned to eachmother(M) andfather(F)

amongthe 17 war-separated couples. The signsof the differences

THE

SIQN

TEST

71

between each couple are shown in the final column. Observethat 3 couples (the Holmans, Mathewses,and Soules)showeddifferences

in the oppositedirection from that predicted, i.e., in each case X» < X~, andthus eachof these3 receiveda minus. For 3 couples (the Harlows, Marstons, and Wagners), there was no difference

betweenthe two ratings, that is, X» = Xm, and thus thesecouples received no sign. The remaining 11 couples showed differencesin the predicted direction.

For the data in Table5.4,s = the numberof fewersigns= 3, and N = the number of matched pairs who showed differences = 14.

Table D showsthat for N = 14,anx < 3hasa one-tailedprobability of occurrenceunder Ho of p = .029. This value is in the region of rejection for a = .05; thus our decisionis to reject Ho in favor of

H,. We concludethat war-separatedwives show greater insight into

their husbands

children

disciplinary

relations with

their war-born

than do the husbands themselves.

Ties. For the sign test, a "tie" occurswhen it is not possibleto discriminate betweena matched pair on the variable under study, or when the two scoresearned by any pair are equal. In the caseof the war-separated couples, three ties occurred: the psychologist rated three

cpuplesas having equalinsight into paternaldiscipline. M tied casesaredroppedfrom the analysisfor the signtest, and the N is correspondinglyreduced. Thus N = the number of matchedpairs whosedifferencescorehasa sign. In the example,14 of the 17 couples had difference scores with a sign, so for that case N =

14.

Relation to the binomial expansion. In the study just discussed,we should expectunder Ho that the frequency of plusesand minuseswould be the same as the frequency of heads and tails in a toss of 14 unbiased

cpins. (More exactly,the analogyis to the tossof 17 unbiasedcoins, 3 pf which rolled out of sight and thus could not be included in the

analysis.) The probability of getting as extreme an occurrenceas 3 heads and 11 tails in a tossof 14coinsis given by the binomial distribution as

PsQNs s0

where N = total number of coins g=

observed number of heads

N Nt

g st(N

s)t

72

THE

CASE

OF

TWO

RELATED

SAMPLES

In the caseof 3 orfewerheadswhen14 coinsaretossed, thisis 14

14

14

14

2l4

1+

14 + 91

+ 364

16,284 = .029

The probabilityvalue found by this methodis of courseidenticalto that foundby the methodusedin the example:p = .029. Large samples. If N is larger than 25, the normalapproximationto the binomial distribution can be used.

This distribution has

Mean = p, = NP = >N

Standarddeviation= o = QNPQ = z ~N

That is, the value ofz given is by 'N

(5.3) This expression is approximatelynormallydistributedwith zeromeanand unit variance.

The approximation becomesan excellentonewhena correction for continuityis employed. The correctionis efFected by reducingthe difference between the observednumber of pluses (or minuses)and the expectednumber,i.e., the meanunderHo, by .5. (Seepages40 to 41 for a more completediscussion of this point.) That is, with the correction

for continuity

(x+ .5)'N ~ ~/V

(5.4)

wherex + .5 is usedwhenx < zN, and x .5 is usedwhenx ) >N. The value of z obtainedby the applicationof formula (5.4) may be considered to be normally distributed with zero mean and unit variance.

Thereforethe signi6canceof an obtainedz may be determinedby referenceto Table A in the Appendix. That is, Table A givesthe one-tailed probabilityassociatedwith the occurrence underHo of valuesas extreme

asan observed z. (If a two-tailedtestis required,the p yielded by Table A shouldbe doubled.) Examplefor LargeSamplee

Suppose an experimenter wereinterested in determining whethera certainBlmaboutjuveniledelinquency wouldchange the opinions of; the members of a particularcommunity abouthowseverely juvenile

THE

8IQN

TE8T

delinquentsshouldbe punished, He drawsa randomsampleof 100 adults from the community, and conducts a "before and after" study, having each subject serve as his own control. He asks each

subject to take a position on whethermoreor lesepunitive action against juvenile delinquents should be taken than is taken at

present. He then showsthe film to the 100adults, after which he repeats the question.

i. Null Hypothesie. Ho. the film has no systematic effect. That

is, of thosewhoseopinionschangeafter seeingthe film, just asmany changefrom moreto lessas changefrom lessto more,and any differenceobservedis of a magnitudewhich might be expectedin a random sample from a population on which the fllm would have no systematic effect. Hr. the film has a systematic effect.

ii. StatisticalTest. The sign test is chosenfor this study of two related groups because the study uses ordinal measures within

matchedpairs, and thereforethe differencesmay appropriatelybe representedby plus and minus signs. iii. Significance Level. Let a =.01. N = the number of subjects (out of 100)who show an opinion changein either direction. iv. Sampling Dietribution. Under Ho, z ascomputed from formula (5.4) is approximately normally distributed for N > 25. Table A gives the probability associated with the occurrence of values as extreme as an obtained z.

v. Rejection Region.

Smce Hi does not state the dIrection of the

predicted differences, the region of rejection is two-tailed. It consists of all values of z which are so extreme that their associated probability of occurrenceunder Ho is equal to or less than a = .01.

vi. Decjgion. The resultsof this hypotheticalstudy of the effects of a film upon opinion are shown in Table 5.5. Tmus 6.6. AnULT OPINIONsCONCERNING W~T 8)gypmTYop PUNISHb05NT Is Dnsnusrx vos JmrsNma Dy~Nq~NcY

(Arti5cial data)Amount ofpunishment favored

Less

after Slm

More

More

Amount of punishment favored before Qm

Did the film have any effect? The data showthat there were 15

adults(8 + 7) whowereunaffected and85whowere. Thehypothesisof the studyappliesonlyto those85. If the film hadno systematiceffect,wewouldexpectabouthalf of thosewhoseopinions

74

THE

CASE

OF

TWO

RELATED

SAMPLES

changedfrom beforeto after to havechangedfrom moreto less, and abouthalf to havechangedfrom lessto more. That is, wewould expectabout 42.5subjectsto showeachof the two kinds of change. Now weobservethat 59 changedfrom moreto less,while26 changed

from lessto more. We may determinethe associated probability under HDof such an extreme split by using formula (5.4). For these data, s > ~.V, that is, 59 > 42.5. z

(* + .5) ,'N (5.4)

~ ~N (59 .5)

Y(85)

4 v~~ = 3.47

Reference to Table A reveals that the probability under Ho of z > 3.47 is p = 2(.0003) = .0006. (The p shown in the table is

doubledbecausethe tabled valuesare for a one-tailedtest, whereas the region of rejection in this case is two-tailed.) Inasmuch as p = .0006 is smaller than u = .01, the decision is to reject the null hypothesis in favor of the alternative hypothesis. We conclude

from thesefictitious data that the film had a significantsystematic eEect on adults' opinions regarding the severity of punishment which is desirable for juvenile delinquents.

This example was included not only becauseit demonstratesa useful application of the sign test, but also because data of this sort are often

analyzed incorrectly. It is not too uncommon for researchersto analyze such data by using the row and column totals as if they representedindependent samples. This is not the case; the row and column totals are separate but not independent representationsof the samedata. This example could also have been analyzed by the McNemar test for

the significance of changes (discussedon pages 63 to 67). With the data shown in Table 5.5,

(~A

D)

1)'

A+D

(i59 26'

(5.2)

1)'

59 + 26 = 12.05

Table C showsthat x' > 12.05 with df = 1 hasa probability of occurrence under Ho of p ( .001. This finding is not in conflict with that

yieldedby the sign test. The dilTerence betweenthe two findingsis due to the limitations of the chi-square table used.

THE WILCOXON MATCHED-PAIRS SIGNED-RANKS TEST

75

SnmInaryof procedure. Theseare the stepsin the useof the signtest: 1. Determine the sign of the difference between the two members of each pair.

2. By counting, determinethe value of N = the numbersof pairs whose differences show a sign.

3. The method for determining the probability associated with the occurrence under Ha of a value as extreme as the observed value of x

depends on the size of N:

a. If N is 25 or smaller,Table D shows the one-tailedp associated with a value as small as the observed value of x = the number of

fewer signs. For a two-tailedtest, doublethe value of p shownin Table

D.

g. If N is larger than 25, computethe value of z, usingformula (5.4). Table A gives one-tailed p's associated with values as extreme as

variousvaluesof z. For a two-tailedtest, doublethe valueof p shown

in Table

A.

If the p yielded by the test is equal to or less than a, reject Ho. power-EEciency

The powerefficiency of the sign test is about 95 per cent for N = 6

but it declines asthe sizeof the sampleincreases to aneventual(asymptotic) efficiencyof 63 per cent. For discussions of the power-efficiency of the sign test for largesamples,seeMood (1954)and Walsh (1946). References

For other discussionsof the sign test, the reader is directed to Dixon

and Massey(1951,chap.17),DixonandMood(1946),McNemar(1955, pp 357358),Moses(1952a),andWalsh,(1946). THE WILCOXON

MATCHED-PAIRS

SIGNED-RANKS

TEST

Fnnction

The test we have just discussed,the sign test, utilizes information

simply about the directionof the differenceswithin pairs. If the rela-

tive magnitude aswell asthe directionof the differences is considered, a morepowerfultest canbe made. The Wilcoxonmatched-pairs signedranks test does just that: it gives more weight to a pair which shows a

large differencebetweenthe two conditionsthan to a pair which showsa small difference. coxon test is a most useful test for the beha o l

With

avioral data, it is not uncommon that the

tell which memberof a pair is "greater than" wh' h,, the dIfference between any Pair and (g) ra

h

t0 h

THE

76

CASE

OF

TWO

RELATED

SAMPLES

absolute size. That is, he can make the judgment of "greater than"

betweenany pair's two performances,and also can make that judgment betweenany two differencescoresarising from any two pairs With suchinformation,' the experimentermay usethe Wilcoxontest. Rationale

and Method

Let d; = the difference score for any matched pair, representingthe

differencebetweenthe pair's scoresunder the two treatments. Each

pair hasoned,. To usethe Wilcoxontest,rankall the d s without regardto sign:givethe rank of 1 to the smallest d;, the rankof 2 to the next smallest,etc. When one ranks scoreswithout respectto sign, a d; of 1 is given a lowerrank than a d; of either 2 or +2. Then to each rank affix the sign of the difference. That is, indicate

which ranks arose from negative d s and which ranks arosefrom positive d s. Now if treatments A and B are equivalent, that is, if Ho is true, we

shouldexpectto find someof the largerd s favoringtreatmentA and somefavoringtreatmentB. That is, someof the largerrankswould comefrom positived s while otherswould comefrom negatived s.

Thus,if wesummed therankshavinga plussignandsummed theranks havinga minussign,wewouldexpectthetwo sumsto beaboutequal under Hs. But if the sum of the positive ranks is very much different from the sum of the negativeranks, we would infer that treatment A differs from treatment B, and thus we would reject Ho. That is, we

rejectHs if eitherthe sumof theranksfor thenegative d s Orthe sum of the ranksfor the positived s is too small.

Ties. Occasionally the two scoresof any pair areequal. That is, no differencebetweenthe two treatmentsis observedfor that pair, so that

d = 0. Suchpairsaredroppedfrom the analysis.Thisis the same

practice thatwefollowwiththesigntest. N = thenumber ofmatched

pairsminusthenumberof pairswhose d = 0. Another sort of tie can occur. Two or more d's can be of the same

size. Weassignsuchtied cases the samerank. Therankassigned is the average of therank8whichwouldhavebeenassigned if the d'shad differedslightly. Thusthreepairsmightyieldd'sof 1, 1, and+l. '

Eachpairwouldbeassigned therankof 2, for

1+2+3

2. Then

the next d in order would receivethe rank of 4, becauseranks 1, 2, and ' Torequirethattheresearcher haveordinalinformation notonlywithinpairsbut aho concerning the differences betweenpairsseemsto be tantamountto requiring measurement in the strengthof an orderedeuAricscale. In strength,an ordered metricscaleliesbetweenan ordinalscaleandan intervalscale. For a discussion of orderedmetric scaling,seeCoombs(1950)and Siegel(1956).

THE WILCOXON MATCHED-PAIRS SIGNED-RANKS TEST

77

3 havealready been used. If two pairs had yielded d's of 1, both would receivethe rank of 1.5, and the next largest d would receivethe rank of 3. The practice of giving tied observationsthe average of the ranks they would otherwisehave gotten has a negligible effect on T, the statistic on which the Wilcoxon

test is based.

For applications of these principles for the handling of ties, see the examplefor large samples,later in this section. Sma11 samples. Let T = thesmallersumof like-signedranks. That

is, T is either the sum of the positive ranks or the sum of the negative

ranks,whichever sumis smaller. TableG oftheAppendixgivesvarious valuesof T and their associated levelsof significance.That is, if an observed T is equalto or lessthanthevaluegivenin thebodyof TableG undera particularsignificance levelfor the observed valueof N, thenull

hypothesis maythenberejected at that levelof significance. Table G is adaptedfor usewith both one-tailedand two-tailed tests.

A one-tailed testmaybeusedif in advance of examining thedatathe experimenter predictsthesignof thesmallersumof ranks. ~ 1 That is asis

thecase wit all pne-tailed tests,hemustpredictin advance thedirection pf the differences.

Forexample, If T =3were thesum ofthenegative ranks when N =9,

one could reject Hoatthea =.02level if H>hadbeen thatthetwogroups ould d ffer, andone could reject Hoatthea =.01level if Hlhadb n

thatthesumofnegative rankswould bethesmaller sum. Example for 8~l 8lee

Suppose a childpsychologist wished to test s wwheth e ernurseryschool

attendance hasanyeffectonchildren's social ti oci perce perceptiveness. He scoressocialperceptiveness by rating children's res rens responses

to a

group ofpictures which depict a variety ofsocial situatipns asking a standard groupof questions abouteachpicture. gy ~ he obtains a scorebetween0 and 100 for each child.

Although theexperimenter is confident thata higher score repiesentshighersocialperceptiveness than a lowerscore,he is npt sure

that the scoresare sufficientlyexactto be treatednumerically That is, he is not willing to say that a child whose score is 60 is

twiceas socially perceptiveas a child whosescoreis 80, npr is he willing to say that the difference between scores of 60 and 40 Is

exactly twice as large as the differencebetweenscoresof 40 and 30. However,he is confidentthat the differencebetweena scoreof, say, 60andoneof 40 is greaterthan the differencebetweena scoreof40and oneof 30. That is, he cannotassertthat the differencesarenumer-

icallyexact,but hedoesmaintainthat theyaresufficientlymeaningful

>hattheymayappropriately berankedin orderof absolute size.

THE

78

CASE OF TWO

RELATED

SAMPLES

To test the eKect of nursery school attendance on children's

socialperceptiveness scores,he obtains8 pairsof identicaltwins tq serveas subjects. At random, 1 twin from eachpair is assignedto attend nurseryschoolfor a term. The other twin in eachpair is to remain out of school. At the cnd of the term, the 16 children are

eachgiven the test of socialperceptiveness.

i. Vull Hypothesis.Ho.'thesocialperceptiveness of "home" and "nursery school" childrendoesnot differ. In termsof the Wilcoxontest,the sumof the positiveranks= the sumof the negative ranks. Hi. the social perceptiveness of the two groupsof children

divers,i.e.,the sumof the positiveranksg the sumof the negative ranks.

ii. Statistical Test. The Wi1«oxon matched-pairs signed-ranks test is chosenbecausethe study einploys two related samplesand it

yieldsdifference scoreswhichmaybe ra»kcdi» orderof absolute magnitude.

iii. Significance Level. Let n = .05. N = the numberof pairs (8) minusany pairs whosed is zero.

iv. SamplingDistribution. Table G gives critical valuesfrom the samplingdistribution of T, for N < 25.

v. Rejection Region. Sincethedirectionof thedifference isnotpredicted,a two-tailedregionof rejectionis appropriate.The region of rejectionconsistsof all valuesof T whichareso smallthat the

probabilityassociated with theiroccurrence uiiderEIO is equalto or less than u = .05 for a two-tailed test.

vi. Decision. In this fictitious study, the 8 pairsof "home" and

"nurseryschool"childrenaregiventhetestin socialperceptiveness after the latter have beenin schoolfor one term. Their scoresare

givenin Table5.6. Thetableshows thatonly2 pairs oftwins,c and

g,showed differences in thedirection ofgreater social perceptiveness in the "home" twin.

And these difference scores are among the

smallest: their ranks are 1 and 3.

The smaller of the sumsof the like-signedranks = 1 + 3 = 4 = T. Table G shows that for N = 8, a T of 4 allows us to reject the null

hypothesisat a = .05for a two-tailedtest. Thereforewe reject Hp in favor of Hi in this fictitious study, concluding that nursery school

experiencedoesaffect the socialperceptiveness of children. It is worth noting that the data in Table 5.6 are amenableto treatment

by the sign test (pages68 to 75), a less powerful test. For that test, x = 2 and N = 8.

Table D gives the probability associated with such an

occurrenceunder Ho as p = 2(.145) = .290 for a two-tailed test. With

the signtest, therefore,our decisionwould be to acceptffo whena = .05,

THE

WILCOXON

MATCHED-PAIRS

SIGNED-RANKS

TABLE 5.6. SOCIAL PERCEPTIVENESS SCORES OF AND

HOME

TEST

79

NURSERY SCHOOL

CHILDREN

(Artificial

data)

whereasthe Wilcoxon test enabled us to reject Hp at that level. This difference is not surprising, for the Wilcoxon test utilizes more of the information in the data.

Notice that the Wilcoxon

test takes into consider-

ation the fact that the 2 minus d's are among the smallest d's observed, whereasthe sign test is unaffected by the relative magnitude of the d s. I,arge samples. When N is larger than 25, Table G cannot be used. However, it can be shown that in such casesthe sum of the ranks, T, is

practicallynormally distributed, with N(N + 1)

Mean = pY

and

Standard deviation

4

=o Y=

N(N + 1) (2N + 1) 24

T N(N+ 1) Therefore

z

T

PI

4

N(N + l)(2N

+ 1)

(5.5)

24

ls approximatelynormallydistributed with zeromeanand unit variance. Thus Table A of the Appendix gives the probabilities associatedwith the occurrenceunder Ho of various values as extreme as an observed z com-

putedfromformula(5.5). To show what an excellent approximation this is, even for small sam-

ples weshalltreatthe datagivenin Table5.6,whereN = 8, by this large-sample approximation. In that case,T = 4. Inserting the values

80

THE

CASE

OF

TWO

RELATED

SAMPLES

in formula (5.5), we have (8)(9) 4

=

1.96

Reference to Table A reveals that the probability associatedwith the occurrenceunder Hc of a zas extreme as 1.96 is p = 2(.025) = .05, for a two-tailed test. This is the same p we found by using Table G for the same data.

Example for Large Samples

Inmates in a federal prison served as subjects in a decision-making study.' First the prisoners' utility (subjective value) for cigarettes was measured individually, cigarettes being negotiable in prison society. Using each subject's utility function, the experimenter then attempted to predict the decisions the man would make in a game in which he repeatedly had to choose between two different

(varying) gambles, and in which cigarettes might be won or lost. The first hypothesistested was that the experimentercould predict the subjects' decisionsby meansof their utility functions better than he could by assumingthat their utility for cigaretteswas equal to the cigarettes' objective value and therefore predicting the "rational" choice in terms of objective value. This hypothesis was confirmed. However, as was expected,someresponseswere not predicted successfully by this hypothesis of maximization of expected utility. Anticipating this outcome, the experimenter had hypothesized that such errors in prediction would be due to the indifference of the sub-

jects between the two gambles offered. That is, a prisoner might find two gambles either equally attractive or equally unattractive, and therefore

be indifferent

in the choice between

them.

Such

choices would be difficult to predict. But in such choices, it was reasonedthat the subject might vacillate considerably before stating a decision. That is, the latency time betweenthe offer of the gamble and his statement of a decisionwould be high. The secondhypothesis, then, was that the latency times for those choices which would

not be predicted successfully by maximization of expected utility would be longer than the latency times for those choiceswhich would be successfully predicted.

i. Null Hypothesis. Hc.' there is no differencebetweenthe latency times of incorrectly predicted and correctly predicted decisions.

H>.

' Hurst, P. M., and Siegel, S. 1956. Prediction of decisions from a higher ordered metric scale of utility. J. esp. Psychol., 52, 138144.

THE WILCOXON MATCHED-PAIRS SIGNED-RANKS TEST

81

the latency times of incorrectly predicted decisions are longer than the latency times of correctly predicted decisions. ii. Statistical Test. The Wilcoxon matched-pairssigned-rankstest is selected because the data are difference scores from two related

samples(correctly predicted choicesand incorrectly predicted choices made by the sameprisoners), where each subject is used as his own control.

iii. Significance Level. Let a = .01.

N = 80 = the number of

prisoners who served as subjects. (This N will be reduced if any prisoner's d is zero.)

iv. Sampling Distribution.

Under Ko, the values of z ascomputed

from formula (5.5) are normally distributed with zero mean and unit

variance. Thus Table A gives the probability associatedwith the occurrence under Ho of values as extreme as an obtained z.

v. Rejection Region. Since the direction of the difference is predicted, the region of rejection is one-tailed.

If the difference is in

the predicted direction, T, the smaller of the sums of the like-signed ranks, will be the sum of the ranks of those prisoners whose d's are in the opposite direction from that predicted. The region of rejection consists of all z's (obtained from data with such T's) which are so extreme that the probability associated with their occurrence under Ho is equal to or less than a = .01.

vi. Decision. A difference score(d) wasobtainedfor eachsubject by subtracting his median time in coming to correctly predicted decisions from his median time in coming to incorrectly predicted decisions. Table 5.7 gives thesevalues of d for the 30 prisoners,and gives the other information necessaryfor computing the Wilcoxon test. A minus d indicates that the prisoner's median time in coming to correctly predicted decisionswas longer than his median time in coming to incorrectly predicted decisions. For the data in Table 5.7, T = 58.0, the smaller of the sums of the

like-signed ranks. We apply formula (5.5);

T N(N+

1)

4

N(N + 1)(2N 24

(26)(27) 4

(26)(27)(58) 24

=

8.11

+ 1)

'THE

82

CASE

OF

TWO

RELATED

SAMPLES

TABLE 5.7. DIFFERENcE IN MEDIAN TIME BETwEEN PRIsoNE'Rs' CORRECTLY AND INCORRECTLY PREDICTED DECISIONS

Rank

Prisoner

1

2

of d

Rank

11.5

0 10 00 5 74 4 11

with

less

frequent sign 11.5

2

3

4.5

20.0

8

20.0

9

4.5

10

4.5

11

5

23.0

35

12

16.5

13

23.0

14

16.5 1

4.5

4.5

1

4.5 4.5

4.5

18

5

23.0

19

8

25.5

20

22

15 16 17

1

21 22

2

11.5 11.5 11.5

23

3

16.5

16.5

24

2

11.5

11.5

25

1

4.5

26

4

20.0

27

8

25.5

28

2

11.5

29 30

16.5 1

4.5 T=

53.0

Notice that we have N = 26, for 4 of the prisoners' median times

were the samefor both correctly and incorrectly predicted decisions and thus their d's were 0.

Notice also that our T is the sum of the

ranks of those prisoners whosed's are in the opposite direction from predicted, and therefore we are justified in proceeding with a onetailed test.

Table A shows that z as extreme as 3.11 has a one-

tailed probability associatedwith its occurrenceunder H pof p = .000g. Inasmuch as this p is less than a = .01 and thus the value of z is in

the region of rejection, our decision is to reject Hp in favor of HI. We conclude that the prisoners' latency times for incorrectly pre-

THE

WALSH

83

TEST

dieted decisions were significantly longer than their latency times for correctly predicted decisions. This conclusion lends some support to the idea that the incorrectly predicted decisions concerned gambles which were equal, or approximately equal, in expected utility to the subjects.

Summary of procedure. Theseare the stepsin the useof the Wilcoxon matched-pairssigned-rankstest: 1. For each matched pair, determine the signeddifference (d;) between the two scores.

2. Rank these d s without respect to sign. With tied d's, assign the averageof the tied ranks. 3. Affix to each rank the sign (+ or ) of the d which it represents.

4. 5. 6. of T a.

Determine T = the smaller of the sums of the like-signed ranks. By counting, determine N = the total number of d's having a sign. The procedurefor determining the significanceof the observedvalue dependson the side of N: If N is 25 or less,Table G showscritical values of T for various sizes of N. If the observedvalue of T is equal to or lessthan that given in the table for a particular significancelevel and a particular N, KD may be rejected at that level of significance. ti. If N is larger than 25, compute the value of z as defined by formula

(5.5). Determineits associatedprobability under Ho by referring to Table A. For a two-tailed test, double the p shown. If the p thus obtained is equal to or lessthan a, reject Ho. Power-EfBciency

When the assumptions of the parametric t test (see 19) fact met, the asymptotic efficiencynear Ho of the Wilcoxon m t h d-

signed-ranks testcompared with the l testis 3/'+ = 95 5 pe

t (M

1954). This meansthat 3/~ is the limiting ratio of sam l for the Wilcoxon test and the t test to attain the samep F

samples,the efficiencyis near95 per cent References

r may find other discussionsof the Wilcoxon m t h d-

signed-ranks test in Mood (1954) Moses(19 2 ) 1947; 1949). THE

WALSH

TEST

Function

If the experimenter can assumethat the difference scoreshe observes

in two relatedsamples aredrawnfrom symmetrical populations, he may

84

THE

CASE

OF

TWO

RELATED

SAMPLES

usethe very powerful test developedby Walsh. Notice that the assump-

tion is not that the d s arefrom normalpopulations(whichis the assumption of the parametric t test), and notice that the d s do not even have to be from the same population. What the test does assume is that the

populations are symmetrical, so that the mean is an accurate representation of central tendency, and is equal to the median. requires measurement in at least an interval scale.

The Walsh test

Method

To use the )Valsh test, one first obtains differencescores(d s) for each

of the X pairs. Thesed s arethenarrangedin orderof size,with the sign of each d taken into consideration in this arrangement. Let di = the lowest difference score (this may well be a negative d), d~ = the next lowest difference etc. Thus d~ < d2 < d3 < d4 < . is that pi W 0. For a one-tailed test, Hj may be either that y,~) 0, or that p> is that pg ) 0, Hp will be rejectedif any of the valuesgiven in the right-hand column of the table for N = 15 should occur, since the levels of significance for all of the values tabled for N = less than

e=

15 are

.05.

vi. Decision. The number of shock and nonshock syllables recalled by each subject after 48 hours is given in Table 5.8, which ThBLE 5.8. NU54BER oF SHocK AND NoiwsHocK AFTER

Number S ubject

SYLLABLEs REcALLED

48 HDURS

of

nonshock syllables recalled

Number

of

shock syllables recalled

b a

3

23

C

de

1 2

f 1

1

h

31 1

ip

1 3 1

n

1

0

also gives the d for each. Thus subject a recalled 5 of the nonshock syllables but only 2 of the shock syllables; his d = 5

2 = 3.

Xiotice that the smallest d is 1. Thus d> the lowest d, taking sign into consideration = 1. Five of the d's are 1's; therefore dy =

1, d2 =

1, d3 =

1, d4 =

1, and dp =

l.

The next smallest d's are 1's. Three subjects (h,j, and o) have d's of 1. Therefore dp = 1, d7 = 1, and ds = 1. Three of the d's are 2's. Thus dp 2, d~p= 2 and d,y = 2.

The largest d's are 3's. There are four of them. Thus d~~ 3, d~3 3, d~4 3, and dip = 3. Now Table H shows that for N = 15, the one-tailed test for the Hi that p4 ) 0 at n = .047 is

Minimum [z(di + d~2), z(d2 + d»)j > 0

THE

The "minimum"

WALSH

87

TEST

means that we should choose the smaller of the two

values given, in terms of our observed values of d. That is, if ~(d~+ d~~) or z(d2 + dpi), whichever is smaller, is larger than zero, then we may reject Hp at a = .047.

As we have shown, d~ =

1, d~>= 3, d2 =

1, and dji = 2.

Substituting these values, we have

Minimum [z( 1 + 3)) ~( 1 + 2)] = minimum = '(1)

[~(2), ~(1)]

We seethat for our data the smaller of thesetwo values is z(1) = ~. Since this value is larger than zero, we can reject Ho at n = .047. Since the probability

under Ho associated with the values which

pccurred is lessthan a = .05, we decide to reject Ho in favor of H~.* We concludethat the number of nonshocksyllables rememberedwas significantly larger than the number of shock syllables remembered, a conclusion which supports the theory that negative affect induces repression. Summary of procedure. These are the steps in the use of the Walsh test:

1. Determinethe signeddifferencescore(d;) for eachmatchedpair. 2. Determine N, the number of matched pairs. 3. Arrange the d s in order of increasingsize,from di to d~. Take the

signof the d into accountin this ordering. Thus d~is the largestnegative d, and dNis the largestpositived. 4. Consult Table H to determine whether Ho may be rejected in favor

of Hy with the observedvaluesof d>,d2,d3,..., gN. The techniquepf using Table H is explainedaboveat somelength. Power-Efficiency

Whencompaed with the mostpowerfultest, the parametri t t t, th Walsh test has power-efficiency(in the sensedefined in Ch 3) f 95 cent for most values of N and u. Its power-efficiencyis as hi h 99 cent (for N = 9 and = .01, one-tailed test) and is nowhere low th

87.5per cent (for N = 10and0, = .06, one-tailedt st)

F

on its powerefficiency, seeWalsh (1949b) References

Fpr pther discussionsof the Walsh test, the reader is referred to Dixon and Massey (1951, chap. 17) and to Walsh (1949a; 1949b). ~ Usingthe nonparametricWilcoxonmatched-pairssigned-rankstest, Lowenfeld came to the samedecision.

88

THE

CASE OF TWO

THE RANDOMIZATION

RELATED

SAMPLES

TEST FOR MATCHED PAIRS

Function

Randomization testsare nonparametric testswhichnot only have practical value in the analysisof researchdata but also have heuristic

valuein that they helpexposethe underlyingnatureof nonparametric tests in general. With a randomizationtest, we can obtain the exact

probability underHpassociated withtheoccurrence ofourobserved data, andwecandothiswithoutmakinganyassumptions aboutnormalityor homogeneityof variance. Randomizationtests, under certain condi-

tions,arethemostpowerful of thenonparametric techniques, andmaybe usedwhenevermeasurementis so precisethat the valuesof the scores have numerical meaning. Rationale

and Method

Consider the smallsampleexample to whichweearlierappliedthe Wilcoxonmatched-pairs signed-ranks test(discussed onpages 77to 78). In that study,wehad8 matched pairs,andonemember of eachpairwas randomlyassigned to eachconditionone twin attended nursery school while the other stayedat home. The researchhypothesispredicted differencesbetweenthesetwo groupsin "social perceptiveness"because of the different treatment conditions. The null hypothesiswasthat the two conditionsproducedno differencein socialperceptiveness.It will be

remembered that the two members of any matchedpair wereassigned to the conditionsby somerandom method,say by tossinga coin. For this discussion,let us assumethat in the fictitious researchunder discussionmeasurementwas achieved in the senseof an interval scale.

Nowif thenull hypothesis that thereis notreatment effectwerereally true, we would have obtained the samesocialperceptivenessscoresif both

groupshadattendedthe nurseryschoolor if both groupshadstayedat home. That is, underHp thesechildrenwouldhavescoredas they did regardlessof the conditions. We may not know why the childrendiffer among themselvesin social perceptiveness,but under H pwe do know how

the signaof the differencescoresarose:they resultedfrom the random

assignment of thechildrento thetwo conditions.For example, for the two twins in pair a we observeda differenceof 19 points betweentheir two scoresin socialperceptiveness.Under Hp, we presumethat this d

was+19 ratherthan 19 simplybecause wehappened to assignto the

nurseryschool groupthat twinwhowouldhavebeenhigherin socialperceptiveness anyway. Thed was+19 ratherthan 19 simplybecause whenwe wereassigningthe twins to treatmentsour coin fell on head ratherthan on tail. By this reasoning, underHpeverydifference score we observed couldequallylikely havehadthe oppositesign.

THE

RANDOMIZATION

TEST

FOR MATCHED

PAIRS

89

Thedifference scores that weobserved in oursamplein that studyhappened to be +19

+27

1

+6

+7

+13

4

UnderHo,if ourcointosses hadbeendifferent,theymightjust asprobably have been

19

27

+1

6

7

13

+4

or if the coins had fallen still another way they would have been +19

27

+1

6

7

13

4

+3

As a matter of fact, if the null hypothesis is true, then there are 2 =

2'

equally likely outcomes,and the onewhich we observedependsentirely on how the coinlandedfor eachof the 8 tosseswhenweassignedthe twins to the two groups. This meansthat associatedwith the sampleof scores we observedthereare many other possibleones,the total possiblecombinations being 2' = 256. Under Ho, any one of these256 possibleoutcomes was just as likely to occur as the one which did occur.

For each of the possible outcomes there is a sum of the differences: gd;. Now many of the 256 Zd; are near zero, about what we should expect if Ho were true. A few Zd; are far from zero. Theseare for those combinations in which nearly all of the signs are plus or are minus. It is

such combinationswhich we shouldexpectif the populationmeanunder

oneof the treatmentsexceeds that underthe other,that is, if Hois false. If we wish to test Ho againstsomeHi, we set up a regionof rejection consistingof the combinationswhoseZd; is largest. Supposecx= .05.

Then the regionof rejectionconsistsof that 5 per cent of the possible combinations which contains the most extreme values of Zd;.

In the exampleunderdiscussion, 256 possibleoutcomesare equally jikely under Ho. The region of rejection consistsof the 12 most extreme

possibleoutcomes, for (.05)(256) = 12.8. Underthe null hypothesis, the probability that we will observeone of these 12 extreme outcomesis » = .047. If we actually observeone of those extreme outcomeswhich

is includedin the regionof rejection,we reject Ho in favor of H~. ~hen a one-tailedtest is calledfor, the regionof rejection consistsof the samenumberof samples. However,it consistsof that numberof the

mostextreme possible outcomes in onedirection,eitherpositiveor negative, depending on the direction of the prediction in H j.

~hen a two-tailed testis calledfor,asis thecasein theexample under discussion, the regionof rejectionconsists of the mostextremepossible outcomesat both the positiveand the negativeendsof the distribution of

Thatis,in theexample, the12outcomes in theregionof rejection wouldincludethe6 yielding thelargestpositiveZd;andthe6 yielding the largest negative >d;.

90

THE

CASE

0%' TWO

RELATED

SAMPLES

Example i. >Vull Hypothesis. Ho. the two treatments are equivalent. That is, there is no difference in social perceptiveness under the two conditions (attendance at nursery school or staying at home). In social perceptiveness, all 16 observations (8 pairs) are from a common population. Hi. 'the two treatments are not equivalent.

ii. Statistical Test. The randomization test for matched pairs is chosen because of its appropriateness to this design (two related sam-

ples, V not cumbersomely large) and becausefor these (artificial) data we are willing to consider that its requirement of measurement in at least an interval

scaje is met.

iii. Significance Level. Let e = .05.

N = the nuinber of pairs

= 8.

iv. Sampling Distribution. The sampling distribution consists of the permutation of the signs of the difTerenccsto include all possible (2' ) occurrences of Zd;. In this case, 2 = 2' = 256. v. Rejection Region. Since HI does not predict the direction of

the differences, a two-tailed test is used. The region of rejection consists of those 12 outcoines which have the most extreme Zd s, 6 being the most extreme positive Zd s and 6 being the most extreme negative Zd s. vi. Decision. The data of this study are shown in Table 5.6. The d's observed

were:

+19

+27

1

+6

+7

+13

4

+3

For these d's, Zd; = +70.

Table 5.9 shows the 6 possible outcomes with the most extreme TABLE 5.9. THE Six IIosT LrxTREME POSSIBI.E POSITIVE OUTCOMES FoR TIIE d s SHowN IN ThBI.E 5,6

(These constitute one tail of the rejection region for the randomization test when u =

.05)

Outcome

(1) (2) (3) (4) (5)

(6) '

+19 +19 +19 +19 +19 +19

+27 +27 +27 +27 +27 +27

+1 1 +1 +1 1 1

+6 +6 +6 +6 +6 +6

+7 +7 +7 +7 +7 +7

+13 +13 +13 +13 +13 +13

+4 +4 +4 4 +4 4

+3 +3 3 +3 3 +3

80 78 74 72 72 70

Zd s at the positive end of the sampling distribution. These 6 outcomes constitute one tail of the two-tailed region of rejection for

THE RANDOMIZATION TEST FOR MATCHED PAIRS

91

N' = 3. Outcome 6 (with an asterisk) is the outcome we actually observed. The probability of its occurrenceor a set more extreme under IIO is p = .047. Sincethis p is lessthan a = .05, our decision in this fictitious study is to reject the null hypothesis of no condition differences.

Large samples. If the number of pairs exceeds,say, N = 12, the randomization test becomes unwieldy. For example, if N = 13, the number of possibleoutcomesis 2" = 8,192. Thus the region of rejection fpr 0. = .05 would consist of (.05)(8,192) = 409.6 possibleextreme outcpmes. The computation necessaryto specify the region of rejection would therefore be quite tedious. Because of the computational cumbersomeness of the randomization

test when N is at all large, it is suggestedthat the Wilcoxon matched-pairs signed-ranks test be used in such cases. In the Wilcoxon test, ranks are substituted for numbers. randomization

It provides a very efficient alternative to the

test because it is in fact a randomization

test on the ranks.'

Even if we did not have the use of Table G, it would not be too tedious to cpmpute the test by permuting the signs (+ and ) on the set of ranks

in all possiblewaysand then tabulating the upper and lower significance points for a givensamplesize. If N is larger than 25, and if the differencesshow little variability, anptheralternative is available. If the d; be all about the samesize,so 2

Zd,.2

2N

pwhere d ..' is the square of the largest observeddiffer-

ence,then the central-limit theorem(seeChap. 2) may be expectedto hpld (Moses,1952a). Under theseconditions,we can expectZd; to be approximatelynormally distributed with Mean

aIld and therefore

=0

Standard deviation = QZd;z Zd; p

Zd;

QZd;z

is approximately normally distributed with zero mean and unit variance. Table A of the Appendix gives the probability associatedwith the occur> 1ns,randomization test on ranks, all 2N permutations of the signs of the ranks are considered,and the most extreme possible constitute the region of rejection. For the

data shownin Table 5.6, thereare 2' = 256 possibleand equallylikely combinations of signedranks under Ho. The curiousreadercan satisfy himselfthat the sampleof signedranksobservedis amongthe 12mostextremepossibleoutcomesand thus leads

us to rejectHoat a = .05,whichwasour decisionwhichwebasedon TableG. By this randomization method,Table G, the table of significantvaluesof T, can be reconstructed.

92

THE

CASE

OF

TWO

RELATED

SAMPLES

renceunderH pof valuesasextremeasanyz obtained throughtheapplication of formula (5.6).

However,therequirement that thed s showlittle variability,i.e.,that dms x' 4

5

~

~ isnot too commonlymet. For this reason,and alsobecause

theefficiency of the Wilcoxon test(approximately 95percentfor large samples) is verylikely to besuperiorto that of thislargesampleapproximation to the randomization test when nonnormal populations are involved, it would seemthat the Wilcoxon test is the better alternative when N's are cumbersomelylarge. Summary of procedure. When N is small and when measurementis

in at leastanintervalscale,therandomization testfor matchedpairsmay be used.

These are the steps:

1. Observethe values of the various d s and their signs. 2. Determine the number of possible outcomes under Hp for these values:

2~.

3. Determinethe numberof possibleoutcomes in the regionof rejection: (a) (2").

4. Identify thosepossible outcomes whicharein the regionof rejection by choosingfrom the possibleoutcomesthosewith the largestZd s. For a one-tailedtest, the outcomesin the region of rejection are all in one direction (either positive or negative). For a two-tailedtest, half of the outcomesin the region of rejection are those with the largest positive Zd s and half are those with the largest negative Zd s. 5. Determine whether the observed outcome is one of those in the

region of rejection. If it is, reject H0 in favor of Hi. When N is large, the Wilcoxon matched-pairs signed-ranks test is recommended for use rather than the randomization test.

When N is 25

or largerand whenthe data meetcertainspeci6edconditions,an approximation [formula (5.6)] may also be used. Power-Efficiency

The randomization test for matched pairs, becauseit uses all of the information in the sample, has power-efficiencyof 100 per cent. References

Discussions of the randomization method are contained in Fisher

(1935),Moses(1952a),Pitman(1937a;1937b;1937c), Scheff6 (1943),and Welch (1937). DISCUSSION

In this chapterwe have presentedfive nonparametricstatistical tests

for thecaseof two relatedsamples (thedesignin whichmatchedpairsare

DISC USSION

used). The comparison and contrast of these tests which are presented below may aid the reader in choosingfrom among these tests that one which will be most appropriate to the data of a particular experiment. All the tests but the McNemar test for the significance of changes assume that the variable under consideration

has a continuous

distribu-

tion underlying the scores. Notice that there is no requirement that the measurement itself be continuous; the requirement concerns the variable

of which the measurementgivessomegrossor approximate representation. The McNemar test for the significance of changesmay be used when one or both of the conditions under study has been measuredonly in the sense of a nominal scale. For the case of two related samples, the McNemar test is unique in its suitability for such data. That is, this test should be used when the data are in frequencieswhich can only be classified by separate categories which have no relation to each other of the

"greater than" type.

No assumption of a continuous variable need.be

made, because this test is equivalent to a test by the binomial distribution

with P = Q = -'where N = the number of changes. If ordinal measurementwithin pairs is possible (i.e., if the scoreof one member of a pair can be ranked as "greater than" the score of the other

member of the samepair), then the sign test is applicable. That is, this test is useful for data on a variable which has underlying continuity but

which can be measuredin only a very grossway. When the sign test is appliedto data which meetthe conditionsof the parametricalternative (the t test), it haspower-efficiency of about95 per centfor N = 6, but its power-efficiency declinesasN increases to about 63per centfor very large samples. When the measurement is in an ordinal scale both within

and betiiieen

pairs, the Wilcoxontest shouldbe used. That is, it is applicablewhen the researcher can meaningfully rank the differences observed for the

various matched pairs. It is not uncommon.for behavioral scientists to be able to rank differencescoresin the order of their absolute sizewithout

beingableto give truly numericalscoresto the observationsin eachpair. When the Wilcoxon test is used for data which in fact meet the conditions

of the t test, its power-efficiency is about95 per centfor largesamplesand not much less than that for smaller samples. If the experimenter can assumethat the populations from which he has

sampledare both symmetricaland continuous,then the Walsh test is

applicablewhenN is 15 or less. This test requiresmeasurement in at leastan intervalscale. It haspower-efficiency (in the sensepreviously defined) of about 95 per cent for most values of N and a.

The randomizationtest shouldbe usedwheneverN is sufficientlysmaQ to make it computationally feasible and when the measurement of the variable is at least in an interval scale. The randomization test usesall

94

THE

CASE

OF

TWO

RELATED

SAMPLES

the information in the sampleand thus is 100per cent efficienton data which may properly be analyzed by the t test.

Of coursenoneof thesenonparametric testsmakesthe assumption of normality which is madeby the comparableparametrictest, the t test. In summary,weconcludethat the McNemartest for the significanceof changesshouldbe usedfor both largeand small sampleswhenthe measurement of at least one of the variables is merely nominal. For the crudest of ordinal measurement,the sign test should be used. For more

refinedmeasurement, the Wilcoxonmatched-pairs signed-ranks testmay be usedin all cases. For N's of 15or fewer,the Walshtest may be used. If interval measurementis achieved, the randomizatioii test should be used when the N is not so large as to make its computation cumbersome.

CHAPTER

THE

CASE OF TWO

6

INDEPENDENT

SAMPLES

In studying differencesbetweentwo groups, we may use either related pr independent groups. Chapter 5 offered statistical tests for use in a design having two related groups. The present chapter presentsstatistical tests for use in a design having two independent groups. Like those

presentedin Chap. 5, the testspresentedheredeterminewhetherdifferences in the samples constitute convincing evidence of a difference in the

processes appliedto them. Although the merits of usingtwo relatedsamplesin a researchdesign are great, to do so is frequently impractical.

Frequently the nature of

the dependentvariableprecludesusingthe subjectsastheir own controls, as is the casewhenthe dependentvariableis length of time in solvinga particular unfamiliar problem. A problemcan be unfamiliar only once. It may also be impossibleto designa study which usesmatchedpairs,

perhapsbecause of theresearcher's ignorance of usefulmatchingvariables, pr becauseof his inability to obtain adequatemeasures(to usein selecting matchedpairs) of somevariableknown to be relevant,or finally because gppd "matches" are simply unavailable.

.hen the useof two relatedsamplesis impracticalor inappropriate, pne may use two independentsamples. In this designthe two samples may be obtainedby either of two methods:(a) they may eachbe drawn at random from two populations,or (5) they may arisefrom the assignment at randomof two treatmentsto the membersof somesamplewhose prigins are arbitrary. In either caseit is not necessary that the two sam-

plesbe of the samesize. An exampleof random samplingfrom two populationswould be the drawing of every tenth Democratand every tenth Republicanfrom an alphabetical list of registeredvoters. This would result in a random

sampleof registered Democrats andRepublicans from the votingarea cpveredby the list, andthe numberof Democratswould equalthe number

pf Hepublicans only if the registrationof the two partieshappened to be

substantially equalin thatarea. AnotherexamPle wouldbethedrawing of every eighth upperclassman and every twelfth lowerclassman from a list of students in a college. 95

96

THE

CASE OF TWO

INDEPENDENT

SAMPLES

An exampleof the random assignmentof method might occur in a study of the effectiveness of two instructorsin teachingthe samecourse. A registrationcard might be collectedfrom everystudentenrolledin the course,and at randomone half of thesecardswould be assignedto one instructor

and one half to the other.

Theusualparametric techniquefor analyzingdatafromtwo independent samplesis to apply a t test to the meansof the two groups. The t test

assumesthat the scores(which are summedin the computingof the means)are independentobservationsfrom normally distributed populations with equal variances. This test, becauseit uses meansand other statistics arrived at by arithmetical computation, requiresthat the observations

be measured

on at least an interval

scale.

For a given research,the t test may be inapplicablefor a variety of reasons. The researchermay find that (a) the assumptionsof the t test

areunrealisticfor his data, (b) heprefersto avoidmakingthe assumptions and thus to give his conclusionsgreater generality, or (c) his "scores" may not be truly numerical and therefore fail to meet the measurement,

requirementof the t test. In instanceslike these,the researcher may chooseto analyze his data with one of the nonparametric statistical tests

for two independentsampleswhich are presentedin this chapter. The comparison and contrast of these tests in the discussion at the conclusion

of the chaptermay aid him in choosingfrom amongthe testspresented that one which is best suited for the data of his study. THE

FISHER

EXACT

PROBABILITY

TEST

Function

The Fisher exact probability test is an extremely useful nonparametric techniquefor analyzing discrete data (either nominal or ordinal) when the two independent samplesare small in size. It is used when the scores from two independent random samples all fall into one or the other of two

mutually exclusiveclasses. In other words,everysubjectin both groups obtains one of two possiblescores. The scoresare representedby frequenciesin a 2 X 2 contingencytable, like Table 6.1. GroupsI and II

mightbeanytwoindependent groups,suchasexperimentals andcontrols, Tmaz

6.1. 2 X 2 CoNTINOENCY

ThBLE

+ Total

Group I

A+B

Group II

C+D

Total

A+C

B+D

N

THE

FISHER

EXACT

PROBABILITY

TEST

97

malesand females,employedand unemployed,Democratsand Republicans,fathers and mothers,etc. The column headings,here arbitrarily indicated as plus and minus, may be any two classifications: above and

below the median,passedand failed, sciencemajors and arts majors, agreeand disagree,etc. The test determineswhether the two groups differ in the proportion with which they fall into the two classifications.

For the data in Table 6.1 (where A, B, C, and D stand for frequencies)it

would determinewhetherGroupI and GroupII differ significantlyin the proportion of plusesand minusesattributed to them. Method

The exact probability of observing a particular set of frequenciesin a

2 )( 2 table, whenthe marginaltotals are regardedas fixed, is given by the hypergeometric distribution A+

B+D

(A+ B) (A + C)!

(B + D)!

A! Cl

B! D!

(A + B)!

andthus

p-

(C + D) l

(A + B)l (C+ D)1(A + C)! (B+ D)!

all Al BlClD'I

That is, the exactprobability of the observedoccurrenceis found by tak-

ing the ratio of the productof the factorialsof the four marginaltotals to the product of the cell frequenciesmultiplied by N factorial. (Table 8 of the Appendix may be helpful in these computations.) To illustrate the use of formula (6.1): suppose we observe the data

shownin Table 6.2. In that table, A = 10, B = 0, C = 4, and D = 5. The marginal totals are A + B = 10, C+ D = 9, A + C = 14, and B + D = 5. N, the total number of independentobservations,is 19. The exact probability that these 19 casesshould fall in the four cells as TABLE 6.2 + Total

Group I

10

Group II Total

14

5

19

98

THE

CASE

OF

TWO

INDEPENDENT

SAMPLES

they did may be determined by substituting the observed values in formula (6.1): 1019! 14! 51 19! 10! 014'1 5! = .0108

We determine that the probability of such a distribution of frequencies under Ho is p = .0108. Now the above example was a comparatively simple one to compute because one of the cells (cell B) had a frequency of 0. But if none of the cell frequencies is zero, we must remember that more extreme deviations

from the distributionunderHocouldoccurwith the samemarginaltotals, and we must take into consideration these possible "more extreme"

deviations, for a statistical test of the null hypothesis asks: What is the probability under Ho of such an occurrenceor of one evenmoreextremely For example,supposethe data from a particular study werethosegiven in Table 6.3. With the marginal totals unchanged, a more extreme TABLE 6.3 + Total

Group I Group II Total

57

12

occurrencewould be that shownin Table 6.4. Thus, if we wish to apply ThBLE

6.4

+ Total

Group I Group II Total

57

12

a statistical test of the null hypothesis to the data given in Table 6.3, we must sum the probability of that occurrencewith the probability of the

moreextremepossibleone(shownin Table 6.4). We computeeachp by using formula (6.1). Thus we have 715! 5! 71 12111614111 = .04399 and

71515171 12! 0! 7! 510! = .00126

THE

FISHER

EXACT

PROBhBILITY

99

TEST

Thus the probability of the occurrence in Table 6.8 or of an even more extreme occurrence (shown in Table 6.4) is p = .04899 + .00126 = .04525

That is, p = .04525is the value of p which we usein deciding whether the data in Table 6.8 permit us to reject Hc. The reader can readily see that if the smallest cell value in the con-

tingency table is even moderately large, the Fisher test becomescomputationally very tedious. For example, if the smallest cell value is 2, then three exact probabilities must be determined by formula (6.1) and then summed; if the smallest cell value is 3, then four exact probabilities must be found and summed, etc. If the researcher is content to use significance levels rather than exact

values of p, Table I of the Appendix may be used. It eliminates the necessity for the tedious computations illustrated above. Using it, the researcher may determine directly the significance of an observedset of values in a 2 X 2 contingencytable. Table I is applicable to data where

+ is 30 or smaller,andwhereneitherof the totals in the right-handmargin is larger than 15. That is, neither A + B nor C + D may be larger than (The researchermay find that the bottom marginal totals in his data

meet this requirementbut the right hand totals do not. Obviously,in that casehe may meetthe requirementby simply recastingthe data, i.e.,

by shiftingthe labelsat the top of the contingency tableto the left mar-

gin,andviceversa.) Because of its very size, Table I is somewhat more difficult to use th

are most tables of significance values. Therefore we include detailed directions for its use. These are the steps in the use of Table I: 1. Determine the values of A + B and C + D in the data.

2. Find the observedvalue of A + B in Table I under the heing "Totals in Right Margin." 3. In that section of the table, locate the observed value of C+ D

under the sameheading. 4. For the observedvalue of C + D, several possiblevalues of B*

1istedin the table. Find the observedvalueof B amongth~ 5. Now observeyour value of D.

o ibBit

If the observedvalue of D

to or lessthan the valuegivenin the table underyour level of sig 'fi then the observeddata are significantat that level. It should be noted that the significancelevels given in T bl I

approximate.And theyerr on the conservative side. Thusth

probability ofsome datamaybep = 007butTabl

I the observed valueof B isnot includedamongthem,usetheob

d al

A inst i. If A @~din PlsceofB, thenCia usedin phceofDm te 6.

t of

THE

CASE

OF

TWO

INDEPENDENT

SAMPI.ES

cant at a = .01. If the reader requires exact probabilities rather than

significancelevels,he may find thesein Finney (1948,pp. 145-156)or he may compute them by using formula (6.1) in the manner described earlier.

Notice also that the levels of significancegiven in Table I are for one-

tailed regionsof rejection. If a two-tailedrejectionregionis calledfor, double the significancelevel given in Table I.

The reader'sunderstandingof the useof Table I may be aidedby an example. We recur to the data given in Table 6.3, for which we have already determined the exact probability by using formula (6.1). For Table 6.3, A + B = 7 and C+ D = 5. The reader may find the appropriate section in Table I for such right marginal tot@a.

In that

section he will find that three alternative values of B (7, 6, and 5) are tabled. Now in Table 6.3, B = 6. Therefore the reader should use the middle of the three lines of values, that in which B = 6. thevalueofDinourdata:D

= 1in

Table6.3.

Now observe

TableIshowathatD

=1

is significant at the .05 level (one-tailed). This agreeswith the exact probability we computed: p = .045.

For a two-tailedtest we would doublethe observedsignificancelevel, and conclude that the data in Table 6.3 permit us to reject Ho at the a = 2(.05) = .10level.

Example In a study of the personaland social backgroundsof the leadersof the Nazi movement,Lerner and his collaborators' comparedthe Nazi elite with the established and respected elite of the older German society. One such comparison concerned the career histories of the 15 men who constituted

the German

Cabinet

at the end of 1934.

These men were categorized in two groups: Nazis and non-Nazis.

To test the hypothesis that Nazi leaders had taken political party work as their careers while non-Nazis had come from other, more stable and conventional, occupations, each man was categorized according to his first job in his career.

The first job of each waa

classifiedas either "stable occupation" or as "party administration and communication." The hypothesis was that the two groups would differ in the proportion with which they were assigned to these two categories.

i. Null Hypothesis. Hs. Nazis and non-Nazisshowequalproportions in the kind of "first jobs" they had. H>'. a greater proportion of Nazis' " first jobs" werein party administration and communication than were the "first jobs" of non-Nazi politicians. ii. Statist''eal Teat. This study calls for a test to determine the

significanc of the differencebetweentwo independentsamples. ' Lerner,D., Pool,L de S., and Schueller,G. K. 1951. TheNazi elite. Stanford, Calif.: Stanford Univer. Press. The data cited in this exampleare given on p. 101.

THE

FI8HER

EXhCT

PROBABILITY

TE8T

101

Sincethemeasures arebothdichotomous andsinceN is small,the Fisher test is selected.

iii. SignificanceLevel. Let a = .05.

N=

15.

iv. SamplingDistribution.The probabilityof the occurrence underH pof an observed setof valuesin a 2 X 2 tablemaybefound by the use of formula(6.1). However,for N < 30 (whichis the casewith thesedata),TableI may beused. It givescriticalvalues of D for variouslevelsof significance. v. RejectionRegion. Since HI predictsthe direction of the dif-

ference, the regionof rejection is one-tailed.Hp will be rejected if the observedcall valuesdiffer in the predicteddirectionand if

they are of suchmagnitudethat the probabilityassociated with their occurrenceunder H pis equal to or lessthan a = .05.

vi. Decision.The information concerning the "first jobs" of eachmember of theGermanCabinetlatein 1934is givenin Table 6.5. For this table, A+ B = 9 and C+ D = 6. Referenceto TABm 6.5. FIEI.n oz' FlasT Jon oI 1934 MEMBEasor GERMAN CABINET

StableoccupationsParty administration

pawandcivilservice)andcommunicationTotal Nazis Non-Nazis

Total

78

15

TableI revealsthatwiththesemarginal totals,andwithB = 8, the observedD = 0 has a one-tailedprobabilityof occurrence under Hp of p ( .005. Since this p is smaller than our level of

significance, a = .05, our decisionis to rejectHp in favor of HI. ~e concludethat Nazi and non-Nazi political leadersdid dMer in the fields of their first jobs.'

yocher>smodification.In the literatureof statistics,' therehasbeen nsiderable discussion of the applicabihty of the Fishertest to various

of data inasmuch as thereseems to be something arbitraryor

improper aboutconsidering themarginal totalsfixed,forthemarginal

Ilnpro

t tais mighteasilyvaryif weactuallydrewrepeated samples ofthesame

jze by the samemethod fromthe samepopulation. Fisher(1934)

Slse

ecolnmends thetestfor all typesof dichotomous data,but thisrecomrecomm mendation hasbeenquestioned by otheM. q I rnerptal, cometo thesameconclusion, although theydonotreportanystatisti~ testof thesedata.

THE

102

CASE

OF

TWO

INDEPENDENT

SAMPLES

However, Tocher (1950) has proved that a slight modification of the Fisher test provides the most powerful one-tailed test for data in a 2 X 2 table. AVewill illustrate this modification by giving Tocher's example. Table 6.6 shows some observed frequencies (in a) and shows the two more FBI.E Observed

6.6. TOCHER s ExAMpl,E

beforeextreme outcomes with same marginal totals

data

6

57

57

12

extreme distributions

C

12

57

12

of frequencies which could occur with the same

marginal totals (b and c). Given the observeddata (a), we wish to test HD at a = .05. Applying formula (6.1) to the data in each of the three tables, we have 715l 517f

P.

12l 2f 51312

P

1211f6f 411I

7! 5! 5! 7!

71 5f 51 71

P = 12lOl7!5tOl

=

The probability associatedwith the occurrenceof valuesas extremeas the observedscores(a) under Ho is given by adding these three p's: .26515 + .04399 + .00126 = .31040

Thusp = .31040is the probability wewouldfind by the Fishertest. Tocher's modification first determines the probability of all the cases more extreme than the observed one, and not including the observed one. Thus in this case one would sum only pb and p,: .04399 + .00126

= .04525

Now if this probability of the more extreme outcomes is larger than a, we cannot reject H0. But if this probability is lessthan a while the probability yielded by the Fisher test is greater than a (as is the casewith these data), then Tocher recommendscomputing this ratio: pmnreeztrsme esses Pobssrvad essatsEessfose

(6.2)

THE

FISHER

EXACT

PROBABILITY

TEST

103

For the data shown in Table 6.6, this would be (pb + pc) po

which

is 05

0425 .26515

NQw we go to a table of random numbers and at random draw a number between 0 and 1. If this random number is smaller than our ratio above

(I.e., if it is smallerthan .01791),we reject Ho. If it is larger,we cannot reject Ho. Of coursein this caseit is highly unlikely that the randomly drawn number will be sufficiently small to permit us to reject Ho. But this added small probability of rejecting Ho makes the Fisher test slightly less conservative.

perhaps the readerwill gain an intuitive understandingof the logic aIId power of Tocher's modification by considering what a one-tailed

test at n = .05 really is for the data given in Table 6.6. Supposewe Ieject Hp only when casesb or c occur. Then we are actually working at ~

.04525. In order to move to exactly the n = .05 level, we also

declare as significant (by Tocher's modification)a proportion (.01791) pf the caseswhen a occurs in the sampling distribution.

Whether we

Inay considerour observedcaseasoneof thosein the proportionis determined by a table of random numbers.

Summary of procedure. These are the steps in the use of the Fisher test:

1. Cast the observed frequencies in a 2 X 2 table.

2. Determine the marginal totals. Each set of marginal totals sums to N, the number of independent casesobserved.

3. The methodof decidingwhetheror not to reject Ho dependson whether or not exact probabilities are required: g. For a test of significance, refer to Table I.

b. For an exact probability,the recursiveuse of formula (6.1) required.

In either case,the value yielded will be for a one-tailed test. For a

two-tailedtest,thesignificance levelshownby TableI or thep yielded by the useof formula (6.1) must be doubled.

4. If thesignificance levelshownby TableI or the p yieldedby the use of formula (6.1) is equalto or lessthan a, reject Ho.

5. If the observedfrequencies are insignificantbut all moreextreme

possible outcomes withthesame marginal totalswouldbesignificant, use Tocher'smodification to determine whetheror not to rejectHp for a one-tailed test.

THE

CASE

OF

TWO

INDEPENDENT

SAMPLES

Power

With Tocher's modification,the Fisher test is the most powerful of one-tailed tests (in the senseof Neyman and Pearson) for data of the kind for which the test is appropriate (Cochran, 1952). References

Other discussions of the FisherTest may be found in Barnard (1947), Cochran (1952), Finney (1948), Fisher (1934, sec. 21.02), McNemar (1955,pp. 240 242), and Tocher (1950). THE x' TEST FOR TWO INDEPENDENT

SAMPLES

Function

Whenthe data of researchconsistof frequenciesin discretecategories, the g' test may be used to determine the significance of differences

betweentwo independent groups. The measurementinvolved may be as weak as nominal scaling.

The hypothesis under test is usually that the two groups differ with respect to some characteristic and therefore with respect to the relative frequency with which group members fall in several categories. To test this hypothesis, we count the number of casesfrom eachgroup which fall in the various categories,and compare the proportion of casesfrom one group in the various categories with the proportion of casesfrom the

other group. For example, we might test whether two political groups differ in their agreementor disagreementwith someopinion, or we might test whether the sexesdiffer in the frequency with which they choosecertain leisure time activities, etc. Method

The null hypothesis may be tested by

(6.3) i1

jI

whereOii = observed number of casescategorizedin ith row pf jth column

E;; = number of casesexpectedunder Ho to be categorizedin ith row of jth

column

directs one to sum over sll (ri rows snd sii (ti co]umns i

Ij

I

i.e., to sum over all cells

THE

X TEST

FOR TWO

INDEPENDENT

SAMPLES

105

The valuesof g' yieldedby formula (6.3)aredistributedapproximatelyas chi square with df = (r 1)(k' 1), where r = the number of rows and p = the number of columns in the contingency table.

To find the expectedfrequencyfor eachcell (Ey), multiply the two marginal totals commonto a particularcell, and then divide this product by the total number of cases,N.

We mayillustratethe methodof findingexpectedvaluesby a simple example, using artificial data. Supposewe wished to test whether tall

and short personsdier with respectto leadershipqualities. Table 6.7 Tmm

6.7. HEIGET aND LEaDEEsHIp (Artificial data) Short

Tall

Total

Leader Follower

36

Unclassifiable

15

Total

43

52

95

showsthe frequencies with which43 shortpeopleand 52 tall peopleare categorized as "leaders," "followers," and as "unclassifiable."

Now

the null hypothesis would be that height is independentof leaderfollower position,i.e., that the proportion of tall peoplewho are leaders the sameas the proportion of short peoplewho are leaders,that the

pI'oportionof tall peoplewhoarefollowersis the sameasthe proportion

Qfshortpeoplewhoarefollowers, etc. With sucha hypothesis, wemay determine the expectedfrequency for each cell by the method indicated. 6.8, HEIGHTh.NDLEhDEEsHIP: OBSEEvED ~ EKPEcTED FREQUENcIEs (Artificial Short

data) Tall

Total

Leader

Follower

Unclassifiable

Total

52

95

yn eachcasewemultiplythetwomarginal totalscommon to a particular

cell,andthendividethisproduct by N to obtaintheexpected frequency.

106

THE

CASE

OF

TWO

INDEPENDENT

SAMPLES

Thus, for example, the expected frequency for the lower right-hand cell (52) (15)

in Table

6.7 is E>2

95

= 8.2.

Table 6.8 shows the expected

frequencies for each of the six cells for the data shown in Table 6.7.

In

each case the expected frequencies are shown in italics in Table 6.8, which also shows the various observed frequencies.

Now if the observed frequencies are in close agreement with the expected frequencies,the differences (0;, E;;) will of course be small, and consequently the value of g' will be small.

With a small value of

x'we may not reject the null hypothesisthat the two setsof characteristics are independent of each other. However, if some or many of the differences are large, then the value of y' will also be large.

'I'he larger is

g', the more likely it is that the two groups differ with respect to the classi fications.

The sampling distribution of y' as defined by formula (6.3) can be shown to be approximated by a chi-square' distribution with

The probabilities associated with various values of chi square are given

in Table C of the Appendix. If an observed value of y' is equal to or greater than the value given in Table C for a particular level of significance, at a particular df, then Ho may be rejected at that level of significance. Notice that there is a different sampling distribution

for every value

of df. That is, the significance of any particular value of g' dependson the number of degreesof freedom in the data from which it was computed. The size of df reflects the number of observations that are free to vary after certain restrictions have beenplaced on the data. (Degrees of freedom are discussed in Chap. 4.)

The degreesof freedom for an r X k contingency table may be found by df = (r

1)(k 1)

where r = number of classifications (rows) k = number of groups (columns) For the data in Table 6.8, r = 3 and k = 2, for we have 3 classifications

(leader, follower, and unclassifiable)and 2 groups (tall and short). Thus the df = (3 1)(2

1)

= 2.

' To avoid confusion, the symbol x' is used for the quantity

in formula (6.3) which

is computed from the observed data when a x' test is performed. Thc words "chi square" refer to a random variable which follows the chi-square distribution, tabled in Table C.

THE y TEST

FOR TWO INDEPENDENT SAMPLES

107

The computationof y' for the datain Table6.8is straightforward: rk

s=li

I

(12 19.9)' (82 19.9

24.1)' (22 24.1

16.8)' (14 16.8

19.7)' 19.7

(9 6.8)' 6.8

(6 8.2)' 8.2

= 8.14 + 2.59 + 1.99 + 1.65 + .71 + .59 = 10.67

To determinethe significance of g' = 10.67whendf = 2, we turn to TableC. Thetableshowsthat this valueof g'-is significantbeyondthe 0] level. Thereforewe couldreject the null hypothesisof no differences at a =

.01.

2 )< 2 contingencytables. Perhapsthe most commonof all usesof the

x2 test is the test of whetheran observed breakdownof frequencies in a 2 y 2 contingency table could have occurred under Hs. We are familiar

wjth theformof sucha table;anexample is Table6.1. Whenapplying the x' test to data whereboth r andA equal2, formula(6.4)shouldbe used:

N I AD BCI

N' 2

(A + B)(C + D)(A + C)(B + D)

Thjs formulais somewhat easierto applythanformula(6.8),inasmuch as only onedivisionis necessary in the computation. Moreover,it lends

jtselfreadilyto machine computation.It hastheadditionaladvantage of jncorporating a correction for continuitywhichmarkedlyimproves the approximationof the distributionof the computedy' by the chisquare dsstnbution. Example

Adamsstudiedthe relation of vocationalinterestsand curriculum choiceto rate of withdrawalfrom collegeby bright students.' Her

subjectswerestudentswhoscoredat or abovethe 90thpercentilein

college entrance testsof intelligence, andwhochanged theirmajors followingmatriculation. Shecompared thosebrightstudentswhose

curriculum choice wasin thedirectionindicated asdesirable by their i Adams,Lois. 1955. A study of intellectuallygifted studentswho withdrew

fromthePennsylvania Stat University.Unpublished mastr'sthesis, Pennsylvania State University.

108

THE

CASE

OF

TWO

INDEPENDENT

SAMPLES

scores on the Strong Vocational Interest Test (such a change was

called "positive")

with those bright students whose curriculum

change was in a directioii contrary to that suggested by their tested

interests. Her hypothesis was that thosewho madepositivecurricular changeswould more frequently remain in school. i. Null Hypothesis. H0. there is no difference betweenthe two groups (positive curriculum changersand negative curriculum changers) in the proportion of members who remain in college. HI'. a greater proportion of students who make positive curriculum changes remain in college than is the case with those who make negative curriculum changes. ii. Statistical Test. The x' test for two independe»t samples is

chosen becausethe two groups (positive and negative curriculum changers) are indepe»dent, and becausethe "scores" under study are frequencies in discrete categories (withdrew and remained). iii. Significance Level. Let o. = .05. N = the number of students in the sainple = 80.

iv. SamphngDistribution. y' as computedfrom formula (6.4) has a sampling distribution which is approximated by the chi-square distribution with df = 1. Critical values of chi square are givenin Table

C.

v. RejectionRegion. The region of rejection consistsof all values of g' which are so large that the probability associatedwith their occurrenceis equal to or less than a = .05. Since HI predicts the direction of the difference between the two groups, the region of

rejection is one-tailed. Table C showsthat for a one-tailedtest, when df = 1, a y' of 2.71 or larger has probability of occurrence under Ho of p = ~(.10) = .05. Thereforethe region of rejection consistsof all x' > 2.71 if the directionof the resultsis that predicted by HI.

vi. Decision. Adams' findings are presentedin Table G.9. This table shows that of the 56 bright students who made positive cur-

riculum changes,10 withdrew and 40 remainedin college. Of the ThBLE 6.9. CURRIcULUM C»hNoE hNn Wir»nnhwhL FRQM COLLEOE hMONO BRIOIIT STUDENTS

Direction of curriculum change Positive Negative

21

Withdrew Remained Total

Total

56

THE X TEST FOR TWO INDEPENDENT SAMPLES

109

24 whomadenegativechanges, 11withdrewfrom college and 18 remained.

The value of g' for these data is

N I AD BCI X

(A + B) (C + D)(A + C)(B + D) 80(l(10)(13) (11)( 6)I ~so)

(6 4)

(21)(59)(56)(24) 80(336)~ 1,665,216 = 5.42

Theprobabilityof occurrence underHofor g' ) 5.42withdf = 1 is p ( ~(.02)= p < .01. Inasmuch as this p is lessthan a = .05,

thedecision isto rejectHoin favorofH~. Weconclude thatbright students whomake"positive"curriculum changes remain in college morefrequentlythan do bright studentswho make"negative" curriculum changes.

Smallexpectedfrequencies.The y' test is applicableto data in a

contingency tableonlyif theexpected frequencies aresufBciently large. The size requirementsfor expectedfrequenciesare discussedbelow.

Whentheobserved expected frequencies donotmeetthese requirements, onemayincrease theirvaluesby combining cells,i.e.,by combining adjacentclassificationsand therebyreducingthe numberof cells. This may be properly doneonly if such combiningdoesnot rob the data of

their meaning.In our fictitious"study" of heightandleadership, of course,any combiningof categorieswould haverenderedthe data use-

lessfor testingourhypothesis.Theresearcher mayusuallyavoidthis problemby planningin advanceto collecta fairly largenumberof cases

relativeto thenumberof classifications hewishes to usein hisanalysis. Summary of procedure.Thesearethestepsin theuseof thex' test for two independent samples:

1. Casttheobserved frequencies in a k X r contingency table,using the k columnsfor the groupsand the r rows for the conditions for this test k =

2.

2. Determine the expected frequency for eachceQby findingthe productof themarginal totalscommon to it anddividingthisby N (N is the sumof eachgroupof marginaltotals. It represents the total

number of independen~ observations. InflatedN'sinvalidate thetest)

Step2 is unnecessary if the dataarein a 2 X 2 tableand thusformula (6.4) is to be used.

THE

110

CASE

OF

TWO

INDEPENDENT

SAMPLES

3. For a 2 X 2 table, computey' by formula (6.4). Whenr is larger than 2, computeg' by formula (6.3).

4. Determinethe significance of the observed y' by reference to Table C. For a one-tailed test, halve the significancelevel shown. If the

probability givenby TableC isequalto or smaller thann, rejectHoin favor of H~. When to Use the x' Test

As we havealreadynoted,the x' test requiresthat the expected fre-

quencies (E;;)in eachcellshouldnot be toosmall. Whentheyare smallerthan minimal,the test may not be properlyor meaningfully used. Cochran(1954)makestheserecommendations:

The 2 X 2 case. If the frequencies arein a 2 X 2 co»ti»gency table, the decisionconcerningthe useof y' shouldbe guidedby theseconsiderations:

1. WhenN > 40, usey' correctedfor continuity,i.e., useformula

(6.4).

2. When N is between20 and 40, the x' test [formula (6.4)] may be

used if all expected frequencies are5 ormore. If thesmallest expected frequency islessthan5,usetheFishertest(pages 94to 104). 3. When N (

20, use the Fisher test in all cases.

Contingency tableswithdf largerthani. Whenk is largerthan2 (andthusdf > 1),thex' testmaybeused if fewerthan20percentofthe cellshavean expected frequency of lessthan 5 andif no cellhasan

expected frequency oflessthan1. If these requirements arenotmetby thedatain theformin whichtheywereoriginallycollected, theresearcher

mustcombine adjacentcategories in orderto increase theexpected fre-

quencies in thevarious cells.Onlyafterhehascombined categories to meettheaboverequirements mayhemeaningfully applythex' test.

When df > 1,x' testsareinsensitive to theeffects oforder, andthus

whena hypothesis takesorderinto account,x' may not be the besttest. The readermay consultCochran(1954)for methods that

strengthen the common x' testswhenHo is testedagainstspecific alternatives. Power

Whentheg' testis usedthereis usuallynoclearalternative andthus theexactpowerof the testis difficultto compute.However, Cochran

(1952)hasshown that thelimitingpowerdistribution of x' tendsto 1

as .V becomes large. References

For other discussionsof the x' test, the readermay refer to Cochran

(1952;1954),Dixon and Massey(1951,chap.13),Edwards(1954.

THE

MEDIAN

TEST

chap. 18), Lewis and Burke (1949), McNemar (1955, chap. 13),. and Walker and Lev (1953, chap. 4.). THE

MEDIAN

TEST

Function

The median test is a procedure for testing whether two independent

groups dier in central tendencies. More precisely,the median test will give information as to whether it is likely that two independent

groups (not necessarilyof the samesize)have beendrawn from populations with the samemedian. The null hypothesis is that the two groups are from populations with the same median; the alternative hypothesis may be that the median of one population is diferent from that of the other (two-tailed test) or that the median of one population is higherthan that of the other (one-tailed test). The test may be used wheneverthe scores for the two groups are in at least an ordinal scale. Rationale

and Method

To perform the median test, we first determine the median score for the combined group (i.e., the median for all scores in both samples). Then we dichotomize both sets of scoresat that combined median, and cast these data in a 2 X 2 table like Table 6.10. ThBLE 6.10. MEDIhN

TEST: FORM FOR DhTh

Group I Group

II

Total

go. of scoresabove combined median

A+B

No. of scores below combined median

C+D

Total

A+C

B+D

N

a +a~

Now if both group I and group II are samplesfrom populations whose median is the same,we would expect about half of each group's scoresto be above the combined median and about half to be below.

would expectfrequenciesA and C to be about equal,and frequencies and D to be about equal. It can be shown (Mood, 1950,pp. 394 395) that if A is the number of casesin group I which fall above the combined median, and if B is the

numberof casesin grouPII whichfall abovethe combinedmedian,th the samplingdistribution of A and B under the nuQ hypothesis(Q js that A = zni and B = zn2) is the hypergeometric distribution A+C

B+D n> + ns

THE

Ch8E

OF

TWO

INDEPENDENT

ShMPI,E8

Thereforeif the total numberof casesin both groups(ni + nr) is small, one may use the Fisher test (pages96 to 104) to test Hs. If the total number of casesis sufficiently large, the g' test with df = 1 (page 107) may be used to test Ho.

When analyzingdata split at the median,the researchershouldbe guidedby theseconsiderationsin choosingbetweenthe Fishertest and the y' test:

1. When n> + nl is larger than 40, use x' corrected for continuity, i.e., use formula (6.4). 2. When n i + n>is between20 and 40 and when no cell hasan expected

frequency'of lessthan 5, useg' correctedfor continuity [formula(6.4)j. If the smallestexpectedfrequency is lessthan 5, usethe Fisher test. 3. When ni + n~ is lessthan 20, use the Fisher test.

Onedifficulty Inay arisein the computationof the mediantest: several scoresmay fall right at the combinedmedian. If this happens,the researcherhas two alternatives: (a) if n> + nl is large, and if only a few casesfall at the combined median, those few casesmay be dropped from

the analysis,or (b) the groupsmay be dichotomizedasthosescoreswhich exceedthe median and those which do not. In this case, the troublesome scores would be included in the second category.

Example

In a cross-cultural test of some behavior theory hypotheses

adapted from psychoanalytictheory,' Whiting and Child studied the relation between child-rearing practices and customs related to illness in various nonliterate cultures. One hypothesis of their

study,derivedfrom the notionof negativefixation,wasthat oral explanationsof illnesswould be usedin societiesin which the socialization of oral drives is such as to produce anxiety. Typical

oral explanationsof illness are these: illness results from eating

poison,illnessresultsfrom drinking certainliquids,illness.results from verbal spells and incantations performedby others, Judgments of the typical oral socialization anxiety in any society were based on the rapidity of oral socialization, the severity of oral socialization, the frequency of punishment typical in oral socializa-

tion, and the severity of emotionalconflict typically evidencedby the children during the period of oral socialization. Excerpts from ethnological reports of nonliterate cultures were used in the collection of the data. Using only excerpts concerning ' The method for computing expected frequenciesis given on pages 105and 106.

~ Whiting,J. W. M., and Child, L L. Haven:

Yale Univer.

Press.

1953. Childbrainingandpersonality. New

THE

MEDIAN

118

TE8T

customs relating to illness, judges clasei6edthe societies into two groups: those with oral explanations of illness present and those with oral explanations absent. Other judges, using the excerpts concerning child-rearing practices, rated each society on the degree of oral socialization anxiety typical in its children.

For the 39 societies

for which judgments of the presenceor absenceof oral explanations were possible, these ratings ranged from 6 to 17.

i. Null Hypothesis. Ho.'there is no differencebetweenthe median oral socialization anxiety in societieswhich give oral explanations of illness and the median oral socialization anxiety in societieswhich do not give oral explanations of illness.

H>. the median oral

socialization anxiety in societies with oral explanations present is higher than the median in societieswith oral explanationsabsent. ii. Stetistical Test. The ratings constitute ordinal measures at

best; thus a nonparametric test is appropriate. This choice also eliminates the necessity of assuming that oral socialization anxiety is normally distributed among the cultures sampled, as well as eliminating the necessityof assuming that the variances of the two groups sampled are equal. For the data from the two independent groups of societies,the mediantest may be usedto test Ho. iii. Significance lese/. Let a = .01. N = 89 = the number of societies for which ethnological information on both variables was

available. n~

16 = the number of societieswith oral explana-

tions absent; es = 23 = the number of societieswith oral explanations present. iv. Sampling Distribution.

Since we cannot at this time state

which test (Fishertest or x' test) will be usedfor the scoressplit at the median, since n~ + ns = 89 is between 20 and 40 and therefore

our choicemust be determinedby the sizeof the smallestexpected frequency, we cannot state the sampling distribution. v. Reject~

Region. Since Hi predicts the direction of the dif-

ference, the region of rejection is one-tailed. It consists of all outcomes in a median-split table which are in the predicted direction and which are so extreme that the probability associatedwith their

occurrence under Ho (as determinedby the appropriateteat) is equal to or lessthan e = .01. vi. Decision. Table 6.11 shows the ratings assignedto each of the 39 societies. These are divided at the combined median for

the n, + n> ratings. (We have followed Whiting and Child in calling 10.5 the median of the 89 ratings.)

Table 6.12 shows these

datacastin theformfor themediantest. Sincenoneof theexpected frequenciesis lessthan 5, andsincen~+ nl > 20, wemay usethe g' test to test Ho.'

114

THE

CASE

OF

TWO

INDEPENDENT

N iAD

BCi

SAMPLES

N' 2

(A + B)(C + D)(A + C)(B + D)

89(l(8) (6) (»)

(>8)I

(6.4)

")'

(20) (19) (16) (23) = 9.89 TABLE 6.11. ORAL SocIALIZATIQN ANZIETY AND ORAL ExPLANATIQNs OF ILLNESS

(The name of each society is precededby its rating on oral socialization anxiety) Societies with

oral

Societies with oral

explanations absent explanations present

Societies above median on

oral socialization anxiety

Societies below median

on

oral socialization anxiety

~ Reproducedfrom Table 4 of Whiting, J. W. M,, and Child, I. L.

1953. C4ld

training and personahty. New Haven: Yale Univer. Press,p. 1M, with the kind permission of the authors and the publisher.

THE

MEDIAN

115

TEST

ThELE 6.12. ORhL SocIhLIKhTIoN ANxIETY hND ORhL ExPLhNATIQNS OF ILLNESS

Societies with oral

Societies with

oral

explanationsabsent explanationspresent Total Societies above median on oral socialization

anxiety Societies below median 19

on oral socialisation

anxiety Total

39

23

Referenceto Table C showsthat x' > 9.89 with df = 1 hasprobabil-

ity of occurrence underHpof p < ~(.01) = p < .005 for a one-tailed test. Thus our decisionis to reject Hp for e = .01.' W'e conclude that the median oral socialization anxiety is higher in societieswith

oral explanationsof illnesspresentthan is the medianin societieswith oral explanations absent.

Sumamryof procedure. Theseare the stepsin the useof the median test:

1. Determine the combined median of the nI + e, scores.

2. Split each group's scoresat that combinedmedian. Enfer the resultantfrequenciesin a table like Table 6.10. If many scoresfall at the combined median, split the scoresinto these categories:those which exceedthe median and those which do not. 8. Find the probability of the observedvalues by either the Fisher test

or the x' test, choosingbetweentheseaccordingto the criteria given above.

4. If the p yielded by that test is equal to or smaller than a, rejec power-ESciency

Mood (1954)has shownthat whenthe mediantest is appliedto data measuredin at least an interval scalefrom normal distributions w't

commonvariance(i.e., data that might properlybe analyzedby the

parametric t test),it hasthe samepower-efficiency as the signtest. '7hat is, its power~fficiency is about95per centfor nI + n, aslow as6 This power-efficiency decreases asthe samplesizesincrease, reachingan eventualasymptoticefficiencyof 2/s = 68 per cent. > Thisdecisionagreeswith that reachedby Whitingand ~id. Inetric t test on thesedata,they foundthat f = 4.05 p < 0005

Usmgthe p~~

116

THE

CASE

OF

TWO

INDEPENDENT

SAMPLES

References

Discussions

of the median

test are contained

in Brown and Mood

(1951), Mood (1950, pp. 394 395), and Moses (1952a). THE

MANN-WHITNEY

U TEST

Function

When at least ordinal measurement has been achieved, the MannWhitney U test may be used to test ivhcther two independent groups have been draxvn from the same population. This is one of the most powerful of the nonparametric tests, and it is a most useful alternative to the

parametric t test ivhen the researcherwishesto avoid the t test's assumptions, or when the measurementin the researchis weaker than interval scaling.

Suppose we have samples from two populations, population A and population B. The null hypothesis is that A and 8 have the same distribution. The alternative hypothesis, H>, against which we test

Ha, is that A is stochastically larger than 8, a directional hypothesis. We may accept H> if the probability that a score from A is larger than a score from 8 is greater than one-half. That is, if a is one observation

from population A, and b is one observation from population 8, then HI is that p(a > b) > z.

If the evidence supports Hi, this implies that

the "bulk" of population A is higher than the bulk of population 8. Of course, ~vemight predict instead that 8 is stochastically larger than A. Then H j would be that p(a > b) ( ~. Confirmation of this

assertionsvouldimply that the bulk of 8 is higher than the bulk of A. For a two-tailed test, i.c., for a prediction of differences which does not state direction, H> would be that p(a > b) g z. Method

Let n~ the number of casesin the smaller of two independentgroups, and n2 the number of casesin the larger. To apply the U test, we first combine the observations or scores from both groups, and rank these

in order of increasing size. In this ranking, algebraic size is considered, i.e., the lowest ranks are assigned to the largest negative numbers, if any.

Row focus on one of the groups, say the group with ni cases. The value of U (the statistic used in this test) is given by the number of times that a scorein the group with n~ casesprecedesa scorein the group with ni casesin the ranking.

For example, supposewe had an experimental group of 3 casesand a control group of 4 cases. Here n> 3 and n> 4. Supposethesewere

THE MANN-WHITNEY

U TEST

117

the scores: E scores

9 ll

15

C scores

68

10

To find U, we first rank these scoresin order of increasing size, being careful to retain each score's identity as either an E or C score: 89

10

CE

CE

11

13

15

CE

Now consider the control group, and count the number of E scores that

precedeeachscorein the controlgroup. For the C scoreof 6, no E score precedes. This is also true for the C score of 8. For the next C score (10), one E score precedes. And for the final C score (13), two E scores precede. Thus U = 0+ 0+ 1+ 2 = 3. The number of times that an E scoreprecedesa C scoreis 3 = U. The sampling distribution of U under Ho is known, and with this knowl-

edgewe can determinethe probabilityassociated with the occurrence under Hs of any U as extreme as an observedvalue of U,

Very small samples. Whenneithern>nor ns is largerthan 8, Table J of the Appendixmay be usedto determinethe exact probability associated with the occurrenceunder Ho of any U as extreme as an observed value of U. The reader will observe that Table J is made up of six

separatesubtables,onefor'each value of n2,from n>3

to n2 8.

To determine the probability under H0 associated with his data, the researcherneed know only n> (the size of the smaller group), n2, and U. Wjth this information he may read the value of p from the subtable

appropriate to hisvalueof n2. In our example,n> = 3, ns = 4, and U = 3. The subtable for n> 4 jn Table J shows that U < 3 has probability of occurrenceunder Hs of

p = .200. The probabilitiesgiven in Table J are one-tailed. For a two-tailed

test,the valueof p givenin the tableshouldbe doubled. Now it may happen that the observed value of U is so large that it

doesnot appearin the subtablefor the observedvalue of n~. Such a va]ueariseswhenthe researcherfocuseson the "wrong" groupin deter-

miningU. Weshallcall sucha too-largevalueU'. For example,sup-

posethat ln theabovecasewehadcounted thenumberof C scores preceding eachE score ratherthancounting thenumber ofEscores precedingeachC score. We wouldhavefoundthat U = 2 + 3 + 4 = 9. The subtablefor ns = 4 doesnot go up to U = 9, We thereforedenote our observedvalueas U' = 9. We cantransformany U' to U by U = n>ns U' 0p(U > U') ~ p(U < nina U').

(6.6)4

118

THE

CASE OF TWO

INDEPENDENT

SAMPLES

In our example,by this transformationU = (3)(4) 9 = 3. Of coursethis is the U wefound directly whenwe countedthe numberof E scorespreceding each C score.

Examplefor VerySmallSamples Solomonand Coles'studiedwhetherrats wouldgeneralizelearned

imitation whenplacedundera newdrive and in a newsituation. Five rats weretrainedto imitate leaderrats in a T maze. They were

trainedto followthe leaderswhenhungry,in orderto attaina food incentive. Then the 5 rats were each transferred to a shock-

avoidancesituation, whereimitation of leaderrats would have enabledthem to avoid electricshock. Their behaviorin the shock-

avoidance situationwascompared to that of 4 controlswhohadhad

noprevious trainingto followleaders.Thehypothesis wasthatthe 5 ratswhohad alreadybeentrainedto imitate wouldtransferthis

trainingto the newsituation,andthuswouldreachthe learning criterion in the shock-avoidance situation soonerthan would the 4

controlrats. The comparison is in termsof howmanytrials each rat took to reacha criterion of 10 correctresponses in 10trials.

i. Null Hypothesis.Flo'.the numberof trialsto the criterionin the shock-avoidance situationis thesamefor ratspreviouslytrainedto follow a leaderto a foodincentiveasfor rats not previouslytrained.

H>.ratspreviously trainedto followa leader to a foodincentive will reachthe criterionin the shock-avoidance situationin fewertrials than will rats not previouslytrained. ii. StatisticalTest. The Mann-Whitney U test is chosenbecause

this studyemploystwo independent samples, usessmallsamples, and usesmeasurement (numberof trials to criterionasan indexto

speed of learning) whichis probably at mostin anordinalscale.

iii. Significance Level.Let a = .05. n~ 4 controlrats, and

nz = 5 experimental rats.

iv. Sampling Distribution.Theprobabilities associated with the

occurrence underHo of valuesas small as an observedU for ni, nz < 8 are given in Table J.

v. Rejection Region.SinceH>statesthedirection ofthepredicted difference, the regionof rejectionis one-tailed.It consists of all

valuesof U which are so small that the probability associatedwith their occurrenceunder Hs is equal to or less than a = .05.

vi. Decision. The numberof trials to criterion requiredby the E ' Solomon, R. L., andColes,M. R. 1954.A case of failureof generalization of imitation acrossdrives and acrosssituations. J. Abnorm.Soc.I'sychol.,49, 7 13.

Onlytwoof thegroups studiedby theseinvestigators areincluded in thisexample.

THE MANN-WHITNEY and C rats

U TEST

119

were: E rats

78

64

75

C rats

110

70

53

82

51

Wearrangethesescoresin the orderof their size,retainingthe identity of each: 51

45

CC

53

64

70

EC

75

78 EE

82

110 C

We obtain U by countingthe numberof E scoresprecedingeachC score: U =

1 + 1 + 2 + 5 = 9.

In Table J, we locate the subtable for n2 5.

We seethat U < 9

whenn> = 4 hasa probability of occurrenceunder H0 of p = .452. Our decisionis that the data do not give evidencewhich justify rejecting Ha at the previously set level of significance. The con-

clusionis that thesedata do not supportthe hypothesisthat previous training to imitate will generalize across situations and across drives.'

n2 between 9 and 20. If n2 (the size of the larger of the two inde-

pendentsamples)is larger than 8, Table J may not be used. When n~ is between 9 and 20, significance tests may be made with the MannWhitney test by using Table K of the Appendix which gives critical values pf U for significancelevels .001, .01, .025, and .05 for a one-tailed test.

Fpr a two-tailedtest,the significance levelsgivenare .002,.02,.05,and .10.

Notice that this set of tablesgivescritical valuesof U, and doesnot

giveexactprobabilities(asdoesTableJ). That is,if anobserved U fora particularn>< 20 andn2between9 and20is equalto or lessthan that value given in the table, Ho may be rejectedat the level of significance indicated at the head of that table.

Fpr example,if n>= 6 and nz = 13,a U of 12 enablesus to reject H, at 0, .01

for a one-tailedtest, and to reject Ho at a = .02 for a two-

tailed test.

computingthe valueof U. For fairly largevaluesof n>and n2,the counting methodof determiningthe value of U may be rather tedious.

An alternativemethod,whichgivesidenticalresults,is to assignthe 1 solomon andColesreportthesameconclusion.Thestatistical testwhichthey utilized is not disclosed.

120

THE

CA8E

OF

TWO

INDEPENDENT

8AMPLE8

rank of 1 to the lowestscorein the combined(nI + n2) groupof scores, assign rank 2 to the next lowest score,etc. Then = nl'n2

+

nI(nI+ 1) RI

(6.7a)

or, equivalently, nIn2

n,(n, + 1)

(6.7b)

2

whereRI = sum of the ranks assignedto group whosesamplesizeis nI R2 = sum of the ranks assignedto groupwhosesamplesizeis n2 For example,we might haveusedthis methodin finding the valueof U for the data given in the examplefor small samplesabove. The E and C scoresfor that exampleare given again in Table 6.13, with their ranks. ThIILs 6.13. TIIIhL8 To CRITERIQN oF E hND C RhT8 E Score

C Score

78

110

64

70

75

53

45

51

Rank

82

R2 = 26

RI

19

For those data, RI = 19 and R2 = 26, and it will be rememberedthat nI = 4 and n2 = 5.

By applying formula (6.7b), we have

U =(4)(5)+'",+" -26 9

U = 9 isof courseexactlythe valuewefoundearlierby counting. Formulas (6.7a) and (6.7b) yield different U's. It is the smaller

of thesethat we want. The larger value is U'. The investigator

shouldcheckwhetherhe hssfoundU' ratherthan U by applyingthe' transformation

U = nIn2 U' (6.6) Thesmallerof thetwovalues,U, is theonewhose sampling distribution

isthebasis forTable K. Although thisvalue canbefound bycomputing both formulas(6.7a)and (6.7b)and choosingthe smallerof the two results,a simplermethodis to useonly oneof thoseformulasand then find the other value by formula (6.6).

Largesamples(n2largerthan 20). NeitherTableJ nor TableK is

usable when n» 20. However, it hasbeen shown (Mann andWhitney,

THE MhNN-wHITNEY

U TEsT

121

1947)that ssni, n>increase in size,thesampling distribution of U rapidly approachesthe normal distribution, with Mean = uv =

ning

2

Standard deviation = ep

snd

That is, whennm> 20 we msy determinethe significanc of an observed value of U by U

pp

U

ning

2

(ni)(ns)(ng y ng + 1) 12

which is practically normally distributed with zero mean snd unit vari-

ance. That is, the probability associatedwith the occurrenceunder Ho of valuesas extremess an observedz msy be determinedby referenceto Table A of the Appendix.

When the norma)approximationto the samplingdistribution of U is

usedin a testof Ho,it doesnot matterwhetherformula(6.7a)or (6.75) is usedin thecomputation of U, for the absolute valueof z yieldedby formula(6.8)will bethesame if eitheris used. Thesignofthez depends on whether U or U' was used, but the value doesnot.

Examplefor LargeSamplee

For our example,wewill reexamine the Whitingand Child data

whichwehavealreadyanalyzed by the mediantest (onpages112 to 115).

i. NuQHypothesis. Ho. oral socialization anxietyis equally severe in bothsocieties with oralexplanations of illnesspresent and societieswith oral explanationsabsent. H>'. societieswith oral

explanations of illnesspresentare (stochastically) higherin oral socialization anxietythansocieties whichdonot haveoralexplanations of illness.

II. Statistcal Teet. The two groupsof societiesconstitutetwo

independent groups, andthemeasure of oralsocialization anxiety (ratingscale)constitutes an ordinalmeasure at best. For these

reasons theMann-Whitney U testisappropriate foranalyzing these data.

iii. Significance Level. Let e = .01. ni = 16 = the numberof societieswith oral explanations absent;n~= 28 = the numberof societieswith oral explanationspresent.

THE

122

CASE

OF

TWO

INDEPENDENT

SAMPLES

iv. Sampling Distribution. For nm> 20, formula (6.8) yields values of z. The probability associatedwith the occurrenceunder Hp of values as extreme as an observed z may be determined by reference

to Table

A.

v. RejectionRegion. Since HI predicts the direction of the difference,the region of rejection is one-tailed. It consistsof all valuesof z (from data in which the difference is in the predicted direction) which are so extreme that their associatedprobability under Hp is equal to or less than a = .01.

vi. Decision. The ratings assignedto each of the 39 societiesare shown in Table 6.14, together with the rank of eachin the combined TABLE 6.14. ORAL SocIhLIzhTION

Rating

Societies with

ANZIETY hND ORhL EZPLhNATIQNs oF [LLNEss

Societies

on oral

oral

with

socialiss-

explanations

tion

absent

oral

explanations present

anxiety

Rating on oral socislisation

anxiety

Lapp

13

29. 5

Msrqucsans

17

39

Chamorro

12

24.5

Dobusns

16

38

Samoans

12

24.5

Bsigs

15

36

Arapesh

10

16

Kwoma

15

36

Balinese

10

16

Thongs

15

36

Hopi

10

16

Alorese

14

33

Tan ala

10

16

Chagga

14

33

Navaho

14

33

Dahomesns

13

29.5

9.5

Lesu

13

29.5

5

Masai

13

29.5

Lepcha

12

24.5

12 9.5

9

Psiute Chenchu

8

Teton

8

Flathead

Papago Venda Wsrrau

Wogeo Ontong-Javane"c

77 77 G

5 1.5 5

BI = 200.0

Maori

12

24.5

Pukapukans

12

24.5

Trobrianders Kwskiutl

12

24.5

11

20.5

Manus

ll

20.5

Chiricahua

lj

16

Comanche

10

16

Siriono

1J

16 9.5

Bena

Slave Kurtatchi

88

9.5

6

1.5

BI

580.0

group. Notice that tied ratings are assignedthe averageof the tied ranks.

For these data, RI = 200.0 and Rp = 580.0.

The value of

U may be found by substituting the observed values in formula (6.7a):

THE %ANN-WHITNEY

U+

U TEST

n~(n~+) g

123

(6.7a)

2

= (16)(28) +2

200

= 304

Knowing that U = 304,we may 6nd the value of z by substituting in formula (6.8): nn

2

804 (16)(28) 2

= 3.43

Referenceto TableA revealsthat z > 8.48hasa one-tailedprobabB-

Ity underHoof p ( .0008. Sincethis p issmallerthana = .01,our decision is to reject Ho in favor of Hi.*

We conclude that societies

with oralexplanations of illnesspresentare(stochastically) higherin pral socialization anxiety than societieswith oral explanations a,bsent.

Q, is importantto noticethat for thesedata the Mann-WhitneyU test exhibitsgreaterpowerto rejectHothan the mediantest. Testing a similar hypothesisabout thesedata, the mediantest yielded a value

whichpermittedrejectionof Hoat thep < .005level(one-tailed test), whereasthe Mann-Whitneytest yieldeda valuewhichpermittedrejectIpn pf Ho at the p < .0003 level (one-tailed test). The fact that the Mann-Whitney test is more powerful than the median test is not sur-

prising inasmuch as it considers the rank valueof eachobservation

ratherthansimplyits locationwith respect to thecombined median, and thus usesmore of the information in the data.

Ties. The Mann-Whitneytest assumes that the scoresrepresenta distributionwhichhasunderlyingcontinuity. With very precisemeas-

urement of a variable whichhasunderlying continuity, theprobability pf a tie is zero. However,with the relativelycrudemeasures whichwe

typicallyemployin behavioral scienti6cresearch, tiesmaywell occur. e ~ ~e havealreadynoted,Whitingand Childreachedthe smnedecisionon the of the parametrict test. Theyfoundthat t 4.05,p < .0005.

THE

CASE

OF

TWO

INDEPENDENT

SAMPLES

We assumethat the two observations whichobtaintied scoresare really different, but that this differenceis simply too refinedor minute for detectionby our crudemeasures. When tied scoresoccur, we give each of the tied observationsthe averageof the ranksthey wouldhavehadif notieshadoccurred. If the ties occurbetweentwo or moreobservations in the samegroup, the value of U is not affected.

But if ties occur between two or more

observations involvingboth groups,the valueof U is affected. Although the effectis usually negligible,a correctionfor ties is availablefor use with the normalcurveapproximationwhichwe employfor largesamples. The effectof tied ranksis to changethe variability of the set of ranks. Thus the correctionfor ties must be applied to the standard deviation of

the samplingdistributionof U.

Correctedfor ties, the standarddevia-

tion becomes

where N =

T=

n~ + n~ t'

t

(wheret is the numberof observations tied for a given

rank)

ZT is foundby summingthe T's overall groupsof tied observations With the correctionfor ties, we find z by

U 2$$$2 (6.9)

It may be seenthat if thereare no ties,the aboveexpression reduces directlyto that givenoriginallyfor z [formula(6.8)]. The useof the correctionfor ties may be illustratedby applyingthat correction to the data in Table 6.14.

For those data,

n~ + n~ = 16 + 23 = 39 = N

We observethesetied groups: 2 scores of 6 5 scores of 7 4 scores of 8 7 scores of 10 2 scores of 11 6 scores of 12 4 scores of 13

3 scores of 14 3 scores of 15

THE bKANN-WHITNEYU TEST

125

Thuswehavet'sof2,5,4,7,2,6,4,3,and8. TofindZT,wesumthe t3 t values of 12 foreachof th~ t1Mgroups 2' 2

5' 5

12

12

4' 4

7' 7

12

2' 2

12

6' 6 12

12

4' 4

8' 3

12

12

8'

3 12

= .5 + 10.0 + 5.0 + 28.0 + .5 + 17.5 + 5.0 + 2.0 + 2.0 = 70.5

Thusfor thedatain Table6.14,n>= 16,ng= 28,N = 89,U = 304, snd XT = 70.5. Substituting thesevaluesin formula(6.9),we have U

RJSQ

2

z

(6.9) N(N

1) 304

12

(16)(28) 2

= 3.45

Thevalueof z whencorrected for tiesis a little largerthanthat found eaher when the correctionwas not incorporated.The difference between z > 3.48andz > 3.45,however, is negligible in sofar as the

probability givenbyTable A isconcerned. Bothz'sarereadashaving an associated probabilityof p < .0008(one-tailed test).

this example demonstrates, tieshaveonly a slighteffect. Even whena largeproportion of thescores aretied (thisexample hadover90

percentof its observations involved in ties)theeffect is practically

negligible.Observe, however,that the magnitudeof the correction

factprXT,depends importantly onthelength ofthevarious ties,i.e., pnthesizeof thevarious ~'s. Thusa tieof length4 contributes 5.0to

yg jnthisexample, whereas twotiesoflength 2 contribute together only

0 (thatis,.5+ .5)to ZT. Anda tieoflength 6 contributes 17.5,

whereas twoof length3 contribute together only2.0+ 2.0 = 4.0.

peahen thecorrection is employed, it tendsto increase thevalueof z

ghtly,making it moresignificant. Therefore when wedonotcorrect

forQes ourfpQQt is"conservative" inthatthevalue ofy willbeslightly ~AaM. Thatis, thevalueof the probabii ty assoclat d with the bserved dataunder Howillbeslightly larger thanthatwhichwouldbe

f und werethecorrection employed.Thewriter'srecommendation is

THE

126

CASE

OF

TWO

INDEPENDENT

SAMPLES

that one should correct for ties only if the proportion of ties is quite

large, if some of the t's are large, or if the p which is obtained without the correction is very closeto one'spreviously set value of n. Summary of procedure. These are the steps in the use of the MannWhitney U test: 1. Determine

the values of n> and n2.

ni = the number of cases in

the smaller group; n2 = the number of casesin the larger group. 2. Rank together the scores for both groups, assigning the rank of 1 to the score which is algebraically lowest. Ranks range from 1 to N = n~ + n2. Assigntied observationsthe averageof the tied ranks. 3. Determine the value of U either by the counting method or by applying formula (6.7a) or (6.7b). 4. The method for determining the significance of the observedvalue of U dependson the size of n2. a. If n2 is 8 or less, the exact probability associatedwith a value as small as the observed value of U is shown in Table J.

For a two-tailed

test, double the value of p shownin that table. If your observedU is not shown in Table J, it is U' and should be transformed to U by formula

(6.6).

b. If n2 is between 9 and 20, the significanceof any observedvalue of U may be determined by referenceto Table K. If your observed value of U is larger than n>n2/2,it is U'; apply formula (6.6) for a transformation.

c. If n2 is larger than 20, the probability associatedwith a value as extreme as the observed value of U may be determined by comput-

ing the valueof z asgivenby formula (6.8),and testingthis valueby referringto Table A. For a two-tailedtest, doublethe p shownin that table. If the proportion of ties is very large or if the obtained

p is very closeto a, apply the correctionfor ties, i.e., useformula (6.9) rather than (6.8). 5. If the observed value of U has an associated probability equal to or

less than a, reject Ho in favor of H>. Power-Efficiency

If the Mann-Whitney test is applied to data which might properly be analyzed by the most powerful parametric test, the t test, its powerefficiency approaches3/m = 95.5 per cent as N increases(Mood, 1954), and is closeto 95 per cent even for moderate-sizedsamples. It is therefore an excellent alternative to the t test, and of course it does not have

the restrictive assumptionsand requirements associatedwith the t test. Whitney (1948, pp. 51 56) gives examples of distributions for which the U test is superior to its parametric alternative, i.e., for which the U test has greaterpower to reject Ho.

THE

KOLMOGOROV-SMIRNOV

Two-SAMPLE

127

TEST

References

For discussions of the Mann-Whitney

test,' the reader may refer to

Auble (1953), Mann and Whitney (1947), Whitney (1948), and Wilcoxon (1945).

THE

Function

KOLMOGOROV-SMIRNOV

TWO-SAMPLE

TEST

and Rationale

The Kolmogorov-Smirnov

two-sample test is a test of whether two

independent samples have been drawn from the same population (or from populations with the same distribution). The two-tailed test is sensitive to any kind of differencein the distributions from which the two samples were drawn differences in location (central tendency), in dis-

persion,in skewness, etc. The one-tailedtest is usedto decidewhether or not the values of the population from which one of the samples was

drawn are stochastically larger than the values of the population from which the other sample was drawn, e.g., to test the prediction that the scores of an experimental group will be "better" than those of the control group.

Like the Kolmogorov-Smirnov one-sampletest (pages47 to 52), th two-sample test is concernedwith the agreementbetweentwo cumulative distributions.

The one-sample test is concerned with the agreement

between the distribution of a set of sample values and some specified theoretical

distribution.

The two-sample test is concerned with

the

agreementbetweentwo setsof samplevalues. If the two sampleshave in fact been drawn from the same population distribution, then the cumulative distributions of both samples may be

expectedto be fairly closeto eachother, inasmuchas they both should show only random deviations from the population distribution.

If the

> Twononparametric statistical tests which are essentially equivalent to the MannWhitney U test have been reported in the literature and should be mentioned here. The first of these is due to Festinger (1946). He gives a method for calculating exact

probabilitiesandgivesa two-tailedtablefor the .05and .01levelsof significance for Qg+ >Q < 4Pgwhenn> < 12. In addition, for n>from 13to 15,valuesare given up to~,

+n~

=30.

The secondtest is due to White (1952), who gives a method essentially the same as the Mann-Whitney test except that rather than U it employs 8 (the sum of the ranks

of one of the groups)as its statistic. White overstwo-tailed tablesfor the .05, .01, and .001 levels of significance for n~ + n~ < 30.

Inasmuch as these tests are linearly related to the Mann-Whitney test (and therefore will yield the same results in the test of Ho for any given batch of data), it was felt that inclusion of complete discussions of them in this text would introduce unneces-

sary redundancy.

128

THE

CASE

OF

TWO

INDEPENDENT

SAMPLES

twosample cumulative distributions are"toofar apart"at anypoint, this suggests that the samplescomefrom differentpopulations. Thusa large enoughdeviationbetweenthe two samplecumulativedistributions is evidencefor rejectingHo. Method

To apply the Kolmogorov-Smirnov two-sampletest, we makea cumu-

lativefrequency distributionfor eachsampleof observations, usingthe sameintervalsfor both distributions. For eachinterval, then, we sub-

tract onestepfunctionfrom the other. The testfocuses on the largest of these observed deviations.

Let S,(X) = the observedcumulative step function of one of the

samples, that is, S,(X) = K/ni, whereK = the numberof scores equal

to or lessthan X. And let S,(X) = the observed cumulative step functionof the other sample,that is, $,(X)

= K/ng. Now the Kolmo-

gorov-Smirnov two-sample test focuseson

D = maximum[S,(X) S (X)]

(6.10 )

for a one-tailed test, and on

D = maximum~S,(X) S,(X) ~ (6.1m) fora two-tailed test. Thesampling distribution of D is known(Smirnov, 1948; Massey, 1951) and the probabilities associatedwith the occurrence

of valuesas large as an observedD under the null hypothesis(that the two sampleshavecomefrom the samedistribution)havebeentabled. Notice that for a one-tailed test we find the maximum value of D ie

the predicteddiredion [by formula (6.10a)) and that for a two-tailedtest

we flnd the maximumabsolute valueof D [by formula(6.10b)],i.e., we find the maximum deviation irrespective of direction. This is because

in the one-tailedtest, H~ is that the populationvaluesfrom whichoneof the sampleswas drawn are stochasticallylarger than the population values from which the other sample was drawn, whereasin the two-tailed

test, H> is simply that the two samplesare from differentpopulations. In the useof the Kolmogorov-Srnirnov test on data for whichthe sise

and numberof the intervalsare arbitrary,it is well to use as many intervalsas are feasible, When too few intervalsare used, information may be wasted. That is, the maximum vertical deviation D of

the two cumulativestepfunctions may be obscured by castingthe data into too few intervals.

For instance, in the example presented below for the case of small

samples, only8 intervalswereused,in orderto simplifythe exposition. As it happens,8 intervalsweresufficient,in this case,to yield a D which enabledus to reject Ho at the predeterminedlevel of significance. If jt

THE KOLMOGOROVWMIRNOV TWO-8AMPLE TE8T

129

had happenedthat with these8 intervals the observedD had not been

large enoughto permitus to reject Hs, beforewe couldacceptHo it would be necessary for us to increasethe numberof intervals,in orderto

ascertainwhetherthe maximumdeviationD had beenobscured by the use of too few intervals. It is well then to useas many intervals as are feasible to start with, so as not to waste the information inherent in the data.

Small samples. When ni = n~, and when both nx and mgare 40 or

less,TableL of theAppendixmaybeusedin thetestof the null hypothesis. The body of this table givesvariousvaluesof K~, whichis defined as the numerator of the largest difference between the two cumulative

distributions, i.e., the numeratorof D. To readTableL, onemust know the value of N (which in this caseis the value of ni = n~) and the value of Q~.

Qbserve also whether H~ calls for a one-tailed or a two-tailed

test. With this information, one may determinethe significanceof the observed data.

For example,in a one-tailedtest where N = 14, g K reject the null hypothesisat the a = .01 level.

>8

Esamyle for Seal Samples

Lepley' comParedthe serial learning of 10 seventh-gradepu il with the seriallearningof 10eleventh-grade pupils. His hypothesis

wasthat the primacyeffectshouldbelessprominentin the learning of the youngersubjects. The primacyeffectis the tendencyfor the

materiallearnedearlyin a seriesto be remembered moreefBciently thanthemateriallearnedlaterin theseries. Hetestedthis hypothesisby comparingthe percentageof errorsmadeby the two groupsin the first half of the seriesof learnedmaterial, predicting that the

older group (the eleventhgraders)would make relatively fewer errorsin repeatingthe first half of theseriesthanwouldthe younger group,

i. Null Hypothesis.Ho: thereis no differencein the proportion of errors made in recalling the first half of a learned seriesbetween

eleventh-gradesubjectsand seventh-gradesubjects. H~. eleventhgraders make proportionally fewer errors than seventh-gradersin recalling the first half of a learned series.

ii. Statiatica/Test. Sincetwo smallindependent samples of equal sizeare beingcompared, the Kolmogorov-Smirnov two-sample test may be appliedto the data. iii. Signgcance Level. Let a = .01. ei = n~ = N = the number of subjects in each group = 10. > Lepley,%'. M. 1934. Serial reactionsconsideredas conditionedreactions. p~chot.Monagr.,46,No. 205.

THE

130

CASE

OF

TWO

INDEPENDENT

iv. Sampling Distribution. for nI

SAMPLES

Table L gives critical values of KII

np when nl and n2 are less than 40.

v. Region of Rejection. Since HI predicts the direction of the difference, the region of rejection is one-tailed. Hp will be rejected if the value of KD for the largest deviation in the predicted direction

is so large that the probability associatedwith its occurrenceunder Hp is equal to or less than a = .01. vi. Decision. Table 6.15 gives the percentage of each subject's TABLE 6.15. PERcENTAGEoF TQTAL ERRQRs IN FIRsT HAI.F ol' SERIE8 Elevenlh-grade subjects

Seventh-gradesub' eds 39.1

35. 2

41.2

39.2

45.2

40.9

46.2

38.1

48.4

34.4

48.7

29.1

55.0

41.8

40.6

24.3

52.1

32.4

47.2

32.6

errors which were committed

in the recall of the first half of the

serially learned material. For analysis by the Kolmogorov-Smirnov test, these data were cast in two cumulative frequency distributions, shown in Table 6.16. = 10 seventh-graders.

Here nI = 10 eleventh-graders, and n2

TABLE 6.16. DAYA IN TABI.E 6.15 CAsv FoR KoI.MoooRov-SMIRNov TEST Per cent of total errors in first half of series 28-31

32-35

36-39

40-43

44-47

I

2 T5'

1 To'

10 T5'

0 To'

0 T%

6 Tlr 0

0 Ti'

2 Tlr

10 TII 5 Tlr

24-27

Sip,(X) Sgp,(X)

S,(X)

S,(X)

48-51

52-55

10 Tlr 8 T5

10 TII 10 T5'

5 T6'

Observethat the largest discrepancy betweenthe two seriesis ~ ~ Kz = 7, the numerator of this largest difference. Reference to Table L reveals that when N = 10, a value of KII = 7 is significant at the a = .01 level for a one-tailed test.

Inasmuch as the probabil-

ity associatedwith the occurrenceof a value as large as the observed value of K> under' Hp is at most equal to the previously set level of

significance, our decision is to reject Hp in favor of HI.*

We con-

P Using a parametric technique, Lepley reached the same decision. He used the critical ratio technique, and rejected Hp at a = .01.

THE

KOLMOGOROV-SMIRNOV

TWO-SAMPLE

131

TEST

elude that eleventh-gradersmake proportionally fewer errors than seventh-gradersin recalling the first half of a learned series. Large samples: two-tailed test. When both ni and n~ are larger than 40, Table M of the Appendix may be used for the Kolmogorov-Smirnov two-sample test. When this table is used,it is not necessarythat n> = n>. To use this table, determine the value of D for the observed data, using formula (6.10b). Then compare that observed value with the critical one which is obtained by entering the observed values of n> and

n,, in the expressiongiven in Table M. If the observedD is equal to or larger than that computed from the expressionin the table, H0 may be rejected at the level of significance (two-tailed) associated with that expression. For example, suppose ni = 55 and n2 = 60, and that a researcher wishes to make a two-tailed

test at a =

.05.

In the row in Table M for

~ = .05, he finds the value of D which his observation must equal or exceed in order for him to reject Ho. By computation, he finds that his D must be .254 or larger for Ho to be rejected, for 1.36

'=

n,n, =

1.36

(55)(60)=.254

Large samples: one-tailed test. When ni and n~are large, and regardless of whether or not n> n2, we may make a one-tailed test by using D = maximum [S,(X) S,(X)j

(6.10a)

We test the null hypothesis that the two sampleshave been drawn from the same population against the alternative hypothesis that the values of

the population from which one of the sampleswas drawn are stochastically larger than the valuesof the populationfrom whichthe othersample was drawn.

For example, we may wish to test not simply whether an

experimentalgroup is diferent from a control group but whether the experimentalgroupis "higher" than the control group. It has been shown (Goodman, 1954) that A$7E2

Si+

S2

(6.11)

has a sampling distribution which is approximatedby the chi-square distribution with df = 2. That is, we may determinethe significance

of an observed valueof D, ascomputedfromformula(6.10a),by solving formula(6.11)for the observed valuesof D, n>,andn>,andreferringto the chi-squaredistribution with df = 2 (Table C of the Appendix).

132

THE

CASE

OF

TWO

INDEPENDENT

SAMPLES

Examplefor Large Samples:One-tailedTest In a study of correlates of authoritarian personality structure,~ onehypothesiswas that personshigh in authoritarianism would show a greater tendency to possessstereotypes about members of various

national and ethnic groups than would thoselow in authoritarianism. This hypothesis was tested with a group of 98 randomly selected collegewomen. Each subject was given 20 photographs and asked to "identify" those whose nationality she recognized,by matching the appropriate photograph with the name of the national group. Subjects were free to "identify" (by matching) as many or as few

photographsas they wished. Since,unknown to the subjects,all photographs were of Mexican nationals either candidates for the Mexican legislature or winners in a Mexican beauty contest and since the matching list of 20 different national and ethnic groups did not include "Mexican," the number of photographs which any subject "identified" constituted an index of that subject's tendency to stereotype.

Authoritarianism

was measured by the well-known F scale of

authoritarianism,' and the subjects were grouped as "high" and "low" scorers. "High" scorerswere those who scoredat or above the median on the F scale; "low"

scorers were those who scored

below the median. The prediction was that thesetwo groups would differ in the number of photographs they "identified." i. Null Hypothesis. Hs. women at this university who score low in authoritarianism stereotype as much ("identify" as many photo-

graphs)as womenwho scorehigh in authoritarianism. Hi'. women who score high in authoritarianism stereotypemore ("identify" morephotographs)than womenwho scorelow in authoritarianism. ii. Statistical Test. Since the low scorers and the high scorers constitute two independent groups, a test for two independent

sampleswaschosen. Becausethe numberof photographs"identifie" by a subject cannot be considered more than an ordinal measure of that subject's tendency to stereotype, a nonparametric test is desirable. The Kolmogorov-Smirnov two-sample test compares the two sample cumulative frequency distributions and deter- . mines whether the observedD indicates that they have been drawn from two populations, one of which is stochastically larger than the other.

~siegel, S. 1954. Certain determinants and correlates of authoritarianism. Omet. Psychol.Nonopr., 40, 187-229. s Presentedin AdornogT Wy Frenkel-Brunswik,Else, LevinsonyJ3 Jp and Sanford, R. N. Thsauthoritnrianpersonality. New York: Harper, 1950.

THE KOLMOGOROV-SbHRNOV TW

ShMPLE TEST

133

iii. Significance Level. Let a = .01. The sizesof nI and ep

may be determined only afterthe dataarecollected, for subjects will be groupedaccordingto whether they scoreat or above the median on the F scaleor scorebelow the median on the F scale.

iv. SamplingDistribution. The samplingdistribution of X

p(nInp) (SI + fly)

[i.e., formula (6.11)],whereD is computedfrom formula (6.10a), is approximated by the chi-squaredistributionwith df = 2. The probabilityassociated with an observedvalueof D may be determinedby computingg' fromformula(6.11)andreferringto TableC. v. RejectionRegion. Since HI predicts the direction of the difference between the low and high F scorers,a one-tailed test is used.

The regionof rejectionconsists of all valuesof g', ascomputed from formula (6.11), which are so large that the probability associated with their occurrenceunder Ho for df = 2 is equal to or lessthan a=

.01.

vi. Decision. Of the 98 college women, 44 obtained F scores

below the median. Thus nI = 44. The remaining 54 women obtained scoresat or above the median: n~

54. The number of

photographs "identified" by eachof thesubjectsin thetwo groupsis givenin Table6.17. To apply the Kolmogorov-Smirnov test, we TABLE6.17. NUMBEROFLOW hND HIGH AUTHORITARIhNS IDENTIFYING VARIOUSNUMBERSOF PHOTOGRAPHS

recastthesedatainto twocumulative frequency distributions, asin Table6,18. For easeof computation,the fractionsshownin Table 6.]8 maybe convertedto decimalvalues;thesevaluesareshownin Table6.19. By simplesubtraction,wefind the differences between

thetwosample distributions at thevarious intervals.Thelargest

134

THE

CASE

OF

TWO

INDEPENDENT

SAMPLES

TABLE 6. 18. DATA IN TABLE 6. 17 ChsT FQR KQLMoGQRov-SMIRNov

TEsT

Number of photographs "identified"

TABLE 6.19. DEGIMAL EQUIvhLENTS OF DATA IN TABLE 6.18

of these differencesin the predicted direction is .406. That is, D = maximum [S,(X) S,(X)] = maximum [S44(X) Sp4(X)]

(6.10a)

= .406

With D = .406, we compute the value of y' as defined by formula (6.11)

Dp nInm

X

nI+

(6.11)

n2

( ),()(

44 + 54

)

= 15.97

Reference to Table C reveals that the probability

gp = 15.97 for df = 2 is p and n> be equal.

To show how well the chi-square approximation works even for small

samples,let us useit on the data presentedin the examplefor small samples(above). In that case,n~= n~ = 10,andD, ascomputedfrom formula (6.10a), was ~.

The chi-squareapproximation: X

4D2 %asm n> + np

10

(6.11)

10 + 10

= 9.8

Table C shows that g' = 9.8 with df = 2 is significant at the .01 level.

This is the sameresultasthat whichwasobtainedfor thesedataby the use of Table L, which is basedon exact computations. Summary of procedure. These are the steps in the use of the Kolmogorov-Smirnov two-sample test:

1. Arrangeeachof thetwogroupsof scores in a cumulative frequency distribution, using the sameintervals (or classifications)for both dis tributions.

Use as many intervals as are feasible.

2. By subtraction, determine the difference between the two sample cumulative distributions at each listed point. 3. By inspection, determine the largest of these differences this is D

For a one-tailedtest, D is the largestdifferencein the predicteddirection. 4. The method for determiningthe significanceof the observedD depends on the size of the samplesand the nature of H,:

a. Whenn~n~= N,

andwhenN < 40,TableL is used. It gives

critical values of Ky (the numerator of D) for various levels of significance, for both one-tailed and two-tailed tests.

b. For a two-tailed test,whenni andnmarebothlargerthan40, TableM is used. In suchcases it is not necessary that ni n,.

Criticalvalues of D foranygivenlargevalues of ni andn>maybe computed fromtheexpressions givenin thebodyof TableM.

c. For a one-tailed testwheren>andn~arelarge,the valueof ~> withdf = 2 whichis associated withtheobserved D is computed fromformula(6.11).Thesignificance of the resultingvalueof x>with df = 2 maybe determinedby reference to TableC. This

186

THE

CASE

OF

TWO

INDEPENDENT

SAMPLES

chi-square approximation is also useful for small samples with ni 8 n~, but in that application the test is conservative. If the observedvalue is equal to or larger than that given in the appro-

priate table for a particular level of significance,Ho may be rejectedat that level of significance. Power-EfBciency

Whencomparedwith the t test, the Kolmogorov-Smirnov test hashigh power-efficiency(about 96 per cent) for small samples(Dixon, 1954). It would seem that as the sample size increasesthe power-efficiency would tend to decrease slightly.

The Kolmogorov-Smirnov test seemsto be more powerful in all cases than either the g' test or the median test. The evidence seemsto indicate that whereas for very small samples the

Kolmogorov-Smirnovtest is slightly more efficient than the MannWhitney test, for large samplesthe converseholds. References

For other discussions of the Kolmogorov-Smirnov two-sample test, the reader may consult Birnbaum (1952; 1953), Dixon (1954), Goodman

(1954),Kolmogorov(1941),Massey(1951a;1951b),and Smirnov(1948). THE OLD-WOLFOWITZ

RUNS TEST

Function

The Wald-Wolfowitz runs test is applicable when we wish to test the

null hypothesisthat two independentsampleshave beendrawn from the same population against the alternative hypothesisthat the two

groupsdifferin any respectwhatsoever.That is, with sufficientlylarge samplesthe Wald-Wolfowitztest can rejectHo if the two populations differ in any way: in central tendency,in variability, in skewness,or whatever. Thus it may be used to test a large class of alternative

hypotheses.Whereasmany other tests are addressed to particular sorts of differencesbetween two groups (e.g., the median test determines whether the two samples have been drawn from populations with the

same median), the Wald-Wolfowitz test is addressedto any sort of difference.

Rationale

and Method

The Wald-Wolfowitz

test assumes that the variable under considera

tion has an underlying distribution which is continuous. It requires that the measurement

of that variable

be in at least an ordinal

scale.

To apply the test to data from two independent samples of size n> and n~, we rank the ni + n~ scoresin order of increasing size. That is,

THE

WhLD-WOLFOWITZ

RUNS

TE8T

187

~e cast the scoresof all subjectsin both groupsinto oneordering. Then ~g determine the number of runs in this ordered series. A run is defined

+~ any sequence of scoresfrom the samegroup (eithergroup1 orgroup2). For example, supposewe observedthese scoresfrom group A (consist-

ing of 8 cases~~ = 3) andgroupB (consistingof 4 cases em 4): Scores for group A

12

16

Scores for group B

~en

these 7 scoresare cast in one ordered series,we have: 66

8 11

BB

A

12 B

16

AA

Notice that we retain the identity of each score by accompanying that 8core with the sign of the group to which it belongs. We then observe the order of the occurrenceof these signs (A's and B's) to determine the number of runs.

Four runs occurred in this series: the 8 lowest scores

~ere all from group B and thus constituted 1 run of B's; the next highest acore is a run of a single A; another run constituted by 1 B follows; and the two highest scoresare both from group A and constitute the final rGB.

Now we may reasonthat if the two samplesare from the samepopulation (that is, if H pis true), then the scoresof the A's and the B's will be

dwellmixed. In that caser, the numberof runs, will be relatively large. It is when H pis false that r is small.

For example,r will be smallif the two samplesweredrawnfrom popu lations having diferent medians. Supposethe population from whic

the A cases were drawn had a higher median than the population frown which the B caseswere drawn. In the ordered series of scores from the two samples,we would expect a long run of B's at the lower

end of the seriesand a longrun of A's at the upperend,and consequently s,n r which is relatively small. Again, suppose the samples were drawn from populations which differed in variability. If the population from which the A cases were drawn was highly dispersed,whereas the population from which the

casesweredrawnwashomogeneous or compact,wewouldexpecta long run of A's at eachendof the orderedseriesand thus a relativelysmaQ value of r.

Sirdar arg mentscanbepresen~ to showthat whenthe populations from which the n>and n> caseswere drawn differ in skewnessor kurtosls

then the sise of r will also be "too small," i.e., small relative to the siles of nl and np.

THE

13S

CASE

OF

TWO

INDEPENDENT

SAMPLES

In general,then,we reject H pif r = the numberof runsis "too small." The samplingdistribution of r arisesfrom the fact that whentwo different kinds of objects(sayn>and n2)arearrangedin a singleline, the total number of different possiblearrangementsis ng + n2

ng + nm

From this it can be shown(Stevens,1939;Mood, 1950,pp, 392393)that

the probabilityof gettingan observed valueof r or anevensmallervalue 18

when r is an even number. When r is an odd number, that probabiTityis given by

where r =

2k 1.

Small samples. Tablesof critical valuesof r, basedon formulas (6.12a)and (6.12b),havebeenconstructed.TableFi of the Appendix

presents criticalvaluesof r for n~,n>< 20. Thesevaluesaresignificant at the .05level. That is, if an observed valueof r is equalto or lessthan the valuetabledfor the observedvaluesof nI and np H pmay be rejected

at the .05levelof significance.If the observed valueof r is largerthan that shownin Table Fr, we can only concludethat in termsof thetotal

number of runsobserved, the null hypothesis cannotberejectedat a = .05. Examplefor Small Samples

Twelve four-year-old boys and twelve four-year-oldgirls were observedduring two 15-minuteplay sessions, and eachchild's play

duringbothperiodswasscoredfor incidence of anddegreeof agression.' With thesescores,it is possibleto test the hypothesisthat there are sex differencesin the amount of aggressionshown.

i. Null Hypothesis. Hp. incidenceand degreeof aggression are the same in four-year-olds of both sexes. H>. four-year-old boys 'Siegel, Alberta E. 1956. Film-mediatedfantasyaggression and strengthof aggressive drive. Child Dppelpm., Q7,365378.

THE

WALD-WOLFOWITZ

RUNS

TEST

139

and four-year-old girls display differences in incidence and degree of aggression. ii. Statiatica/ Test. Since the data are in an ordinal scale, and since the hypothesis concerns differences of any kind between the

aggressionscoresof two independentgroups (boys and girls), the Wald-Wolfowitz

runs test is chosen.

iii. SignificanceLevel. Let a = .05. mI = 12 = the number of

boys,and n>

12 = the numberof girls.

iv. Sampling Distribution.

From the sampling distribution of

r, critical values have been tabled in Table FI for n,I, n~ < 20. (although nI = n> in this example,this is not necessaryfor the use of the runs test.)

v. RejectionRegt'on. The region of rejection consistsof all values of r which (for nI = 12 and nr, = 12) are so small that the probability associatedwith their occurrenceunder Ho is equal to or less than n=

.05.

vi. Decision. Each child's score for his total aggressionin both sessions was obtained. T~LE

These scores are given in Table 6.20.

6.20. AoGREsaIQNScoREs oP BQY8 AND GIRLs IN FREE PLaY Boys 86

55

69

40

72

22

65

58

113 65

16 79

118 45

16

141

26

104

36

41

20

50

15

Now if we combine the scoresof the boys (B's) and girls (G's) in

a singleorderedseries,we may determinethe numberof runs of G's and g s. This ordered seriesis shown in Table 6.21. Each run is underlined, and we observethat r = 4. Referenceto Table FI reveals that for nI = 12 and n> 12, an r pf 7 is significant at the .05 level. Since our value of 7' is smaller than that tabled, we may reject Ho at a = .05.e We conclude that

boys and girls displaydifferences in aggression in the free play situation.

Ui go

parametic Mann-WhitneyU teatfor the d ta ho

the investigator rejected H, at the> < 0002>

THE

CASE

OF

TWO

INDEPENDENT

SAMPLES

ThELE 6.21. DhTh IN ThELE 6.20 Ch8T FOR RUNS TE8T Score

79 GG

Group

15 GG

16

16 GG

20

22 GG

26

36 GG

40

Run Score

41

Group

45

50

BB

Run

B

55

58

GG

2 65

Score

Group

65

69

BB

72

86

BB

104

B

113

BB

118 BB

141

Run

Large samples. When either nI or nr, is larger than 20, Table F, can not be used. However, for such large samplesthe sampling distribution under Hp for r is approximately normal, with Mean = p, = and

2nIng nI+

Standard deviation = o, =

+l

nm

2nIn2(2nIn~ nI (n, + n,) (n, +»

n2) 1)

That is, the expression 2n In' Jl

z

(6. 13)

2.92 has probabilityof occurrence underHoof p = .0018. Sincethis valueof y is smaller than a = .01, our decisionis to reject Ho in favor of IIi.~

We con-

cludethat the two groupsof animalsdiffersignificantlyin %heir rate of learning (relearning).

Ties. Ideally no ties shouldoccur in the scoresusedfor a runs test

inasmuchas the populationsfrom whichthe samplesweredrawn are

assumed to becontinuous distributions.In practice,however, inaccurate or insensitivemeasurementresults in the occasionaloccurrenceof

Whentiesoccurbetween members of the differentgroups,then the sequence of scores is not unique. That is, suppose threesubjects

obtaintiedscores.Twoof theseareA's andoneis a B. In making the orderedseriesof scores,how shouldwe groupthesethree? If we gioup them asA B A, then we will havea differentnumberof runs than if we group them as A A B or (alternatively) as B A A.

If all tiesarewithinthesamesample, thenthenumber of runs(r) is unaffectedand thereforethe obtainedsignificance levelis unaffected. gut if observations from onesamplearetied with observations from the other sample,we cannotobtain a uniqueorderedseriesand therefore Usuallycannotobtaina unique valueof r, aswehavejust shown. This problemoccurredin the examplejust presented.Three rats required24 trials to learnto the criterion. In Table6.23we ordered

thesecases asC EC. Wemightjust aswellhaverankedthemE CC. ps it happens, no matterwhatorderwehadused,in this case,r would

havebeen6 orsmaller, andthusourdecision wouldhavebeento reject H, in any case. For this reasonties presented no majorproblemin reachinga statisticaldecisionconcerningthosedata.

In othersetsof data,they might. Our procedure with ties is to eUsmgs,parametric test,Ghiselli reached thesamedecision.Hereported a

criticalratio of 3.95,whichwouldallowhim to rejectHoat a .00005.

THE

CASE

OF

TWO

INDEPENDENT

SAMPLES

break the ties in all possible ways and observe the resulting values of r. If all these values are significant with respect to the previously set value of a, then ties present no major problem, although they do increase the tedium of computation. If the various possible ways of breaking up ties lead to some values of r which are significant and some which are not, the decision is more difficult. In this case, we suggest that the researcher determine the probability of occurrence associated with each possible value of r and

take the averageof thesep's ashis obtainedprobability for usein deciding whether to accept or reject Ho.

If the number of ties between scoresin the two diferent samplesis large, r is essentially indeterminate. test is inapplicable.

In such cases, the Wald-Wolfowitz

Summary of procedure. These are the steps in the use of the WaldWolfowitz

runs test:

1. Arrange the n> + n~ scores in a single ordered series. 2. Determine

r=

the number

of runs.

3. The method for determining the significanceof the observed value of r depends on the size of ni and n~.'

a. If both n>and n2are 20 or smaller, Table F~gives critical values of r at the .05 level of significance. If the observedvalue of r is equal

to or smallerthan that tabled for the observedvaluesof n~and n2, then Jlo may be rejected at a = .05.

b. If either n> or n2 is larger than 20, formula (6.13) or (6.14) may be used to compute the value of z whose associatedprobability under Ho may be determined by reading the p associated with that z, as given in Table A. Chooseformula (6.14) if ni + nq is not very large and thus a correction for continuity is desirable, If the p is equal to or less than a, reject Ho. 4. If ties occur between scoresfrom the two diferent samples,follow the procedure suggestedabove in the discussionof ties. Power-Efficiency

Little is known about the power-efficiency of the Wald-Wolfowitz test.

Noses (1952a) points out that statistical

tests which test Ho

against many alternatives simultaneously and the runs test is such a test are not very good at guarding against accepting Ho erroneously with respect to any one particular alternative.

For instance, if we were interested simply in testing whether two samples come from populations with the same location, the MannWhitney U test would be a more powerful test than the runs test because

it is specifically designedto disclosedifferencesof this type, whereasthe

THE MOSES TEST OF EXTREME REACTIONS

145

runs test is designedto disclosedifferencesof any type and is thus less powerful in disclosing any particular kind. This difference was illustrated in the examplefor small samplesshown above. The investigator was interested in sex differences in location of aggressionscores, and therefore used the U test. We tested the data for differences o anysort, using the runs test. Both tests rejected Ho, but the Mann-Whitney U test did so at a much more extreme level of significance.

Mood (1954) points out that when the Wald-Wolfowitz test is used to test Ho against specific alternatives regarding location or variability, it has theoretic asymptotic efficiency of zero. However, Lehmann (1958) discusseswhether it is proper to apply the notion of asymptotic normality to the runs test.

Smith (1958) states that empirical evidenceindicates that the powerefficiency of the Wald-Wolfowitz test is about 75 per cent for sample sizes near 20. References

The reader may find discussionsof the runs test in Lehmann (1958), Moses (1952a), Smith (1958), Stevens (1989), and Swed and Eisenhart

(1948). THE

MOSES

TEST

OP

EXTREME

REACTIONS

Function and Rationale

In the behavioral sciences,we sometimesexpect that an experimental condition

will cause some subjects to show extreme behavior in one

direction while it causesothers to show extreme behavior in the opposite direction. Thus we may think that economic depressionand political instability will causesome people to become extremely reactionary and others to becomeextremely "left wing" in their political opinions. Or we may expect environmental unrest to create extreme excitement in some mentally ill people while it creates extreme withdrawal in others. In psychological researchutilizing the perception-centeredapproach to

personality, there are theoretical reasonsto predict that "perceptual defense" may manifest itself in either an extremely rapid "vigilant" perceptual responseor an extremely slow "repressive" perceptual response.

The Mosestest is specificallydesignedfor usewith data (measured in at least an ordinal scale) collected to test such hypotheses. It should be used when it is expected that the experimental condition will affect

somesubjectsin oneway and othersin the oppositeway. In studiesof perceptualdefense,for example,we expectthe control subjectsto evince "medium or "normal" responses,while we expect the experimental

146

THE

CASE

OF

TWO

INDEPENDENT

SAMPLES

subjectsto giveeither"vigilant" or "repressive"responses, thusgetting either high or low scoresin comparison to those of the controls.

In such studies, statistical tests addressedto differences in central

tendencywill shieldrather than revealgroup differences, They leadto acceptanceof the null hypothesiswhen it should be rejected,because when some of the experimental subjects show "vigilant"

responsesand

thus obtain very low latency scoreswhile others show "repressive" responsesand thus obtain very high latency scores,the average of the

scoresof the experimentalgroup may be quite closeto the average score of controls (all of whom may have obtained scores which are "medium "). Although the Moses test is specifically designed for the sort of data

mentionedabove,it is also applicablewhen the experimenterexpects that one group will scorelow and the other group will scorehigh. However, Moses (1952b) points out that in such casesa test based on medians

or on mean ranks, e.g., the Mann-Whitney U test, is more efficient and is

thereforeto be preferredto the Mosestest. The latter testis uniquely valuablewhenthereexista priorigroundsfor believingthat the experimental condition will lead to extreme scores in either direction.

The Moses test focuses on the span or spread of the control cases.

That is, if there are nc control casesand ng experimentalcases,and the ng + nc scoresare arranged in order of increasing size, and if the null

hypothesis(that the E's and C's come from the samepopulation) is true, then we should expect that the E's and C's will be well mixed in the

orderedseries. WeshouldexpectunderHothat someof the extremely high scoreswill be E's and someC's, that someof the extremelylow scoreswill be E's and someC's, and that the middle rangeof scoreswould include a mixture of E's and C's. However, if the alternative hypothesis (that the E scoresrepresent defensive responses)is true, then we would

expectthat (a) most of the E scoreswill be low, i.e., "vigilant," or (b) most of the E scoreswill be high, i.e., "repressive," or (c) a considerable proportion of the E's will score low and another considerableproportion will score high, i.e., some E responseswill be "vigilant" while others are "repressive." In any of these three cases,the scoresof the C's will be

unduly congestedand consequentlytheir span will be relatively small. If situation (a) holds,then the C's will be congested at the high endof the

series,if (b) holdsthe C's will be congested at the low endof the series, and if (c) holds the C's will be congestedin the middle of the ordered

series. The Mosestest determines whetherthe C scores are so closely compacted or congested relative to the ng+ nc scores as to call for rejecting the null hypothesis that both E's and C's come from the same population.

THE MOSES TEST OIP EXTREME REhCTIONS

l47

Method

To compute the Moses test, combine the scoresfrom the E and C

groups,andarrangethesescoresin a singleorderedseries,retainingthe grpupidentity of eachscore. Then determinethe spanof the C scoresby noting the lowestand the

highestC scores andcountingthenumberof cases between them,including bpthextremes.That is, the span,symbolized ass', is definedasthe smallestnumberof consecutive scoresin an orderedseriesnecessary to jncludeall the C scores.For easeof computation,we may rank each sepreand determines' from the orderedseriesof the ranksassigned to the ns + nq cases.

For example, suppose scores are obtained for W = 6 a d =

7

Whenthese13 casesare rankedtogether,we havethis series.

Group

12

34

56

78

EE

CE

CE

CC

10

11

EC

12

13

EE

The spanof the C scores in this caseextendsover9 ranks(from3 tp >] jneluslve) and thus s' = 9.

Notice that in generals' is equalto the di6'erencebetweenthe extreme C ranks plus 1. In the present case,s' = ll

3+

> = 9.

The Mosestest determineswhether the observedvalue of s' js tpp small a value to be thought to have reasonablyarisenby chanceif the

E'sandC'sarefromthesamepopulation.Thatis,thesampling distri butionof s' underthenullhypothesis is known(Moses, 1952b) andmay be used for tests of significance.

Thereaderwill haveobserved that s' is essentially the rangeof the

C scores, andhemayobjectthatthewell-known instabilitypftherange makess' an unreliableindexto the actualspreador compactness of the

C scores.Mosespointsout that it is usuallynecessary to modifys~jn

orderto takecareof justthisproblem.Themodification is especially importantwhenncis large,because especially in thiscaseis therange (span)of C'saninefficient indexto thespread ofthegroup,dueto possible samplingfluctuations.

Themodification suggested by Moses is thattheresearcher, in advance pf collectinghis data,arbitrarilyselectsomesmallnumber,h. After the data are collected, he may subtracth controlscoresfrom bpth extremes of the rangeof controlscores.Thespanis foundfor those scpreswhichremain. That is, the spanis foundafterh controlscores have beendroppedfrom eachextremeof the series.

Fprexample, in thedatagivenearlier,theexperimenter mighthave

148

THE CASE OF TWO INDEPENDENT SAMPLES

Decided in advance that A = 1. Thenhe wouldhavedropped ranks3

and11fromtheC scores before determining thespan. In thatcase, the "truncated span," symbolizedas s~, would be el,= 9 5+

1 = 5.

This is given as: e~ 5, A = 1. Thus sais definedas the smallestnum-

berof consecutive ranksnecessary to includeall thecontrolscores except the A leastand the A greatestof them. Notice that s~can never be smallerthan nc

largerthan nc + ng

2A and can never be

2A. Thesamplingdistribution,then,shouldtell

us the probability under Ho of observingan s~which exceedsthe minimum value (nc 2A) by any specified amount.

If we useg to representthe amount by which an observedvalue of eq

exceeds nc

2A,wemaydetermine theprobabilityunderHoof observing

a particular value of s~ or less as i + nc 2A

2 ng+2A+

1

i

ng

< n

2A+ g) '

(nc+ ng nc

Thusfor any observed valuesof nc andng anda givenpreviouslyset value of A, onefirst finds the minimumpossibletruncatedspan:nc

2A.

Then one finds the value of g = the amount that the observedel,exceeds the value of (nc 2A). The probability of the occurrence of the

observedvalueof e>or lessunderHo is found by cumulatingthe termsin the numerator of formula (6.15). If g = 1, then one must sum the numerator termsfori = Oandi = 1. If g = 2, then one mustsum three numerator terms: for i = 0, i = 1, and i = 2. The computations called for by formula (6.15) are illustrated in the following exampleof the useof the Moses test.

Example

In a pilot study of the perceptionof interpersonalhostility in Qm dramas,the experimenter'comparedthe amountof hostility perceivedby two groupsof femalesubjects. The E groupwere

womenwhose personality testdatarevealed that theyhaddifBculty in handlingtheir own aggressive impulses. The C group were womenwhosepersonality tests revealedthat they had little or no disturbancein the area of aggressionand hostility, Each of the 9 E subjectsand the 9 C subjectswas showna filmed drama and

askedto ratetheamountof aggression andhostilityshownby the characters in the drama.

'This examplecitesunpublished pilot study data madeavailableto the author through the courtesyof the experimenter,Dr. Ellen Tessman.

THE MOSES TEST OP EXTREME REACTIONS

149

The hypothesiswas that the E subjectswould eitherunderattribute or overattribute

hostility

to the filIn characters.

Underattribution

is indicated by a low score,whereasoverattribution is indicated by a high score. It was predicted that the C subjects' scoreswould be more moderate than those of the E subjects, i.e., that the C's would evince lessdistortion in their perception of interpersonal hostility. i. Nul/ Hypothesis. Hp. women who have personal difBculty in handling aggressiveimpulses do not di6'er from women with relatively little disturbance in this area in the amount of hostility that they attribute to the film characters. HI. women who have personal diKculty in handling aggressive impulses are more extremethan others

in their judgments of hostility in film characterssome underattribute

and others overattribute.

ii. Statistical Teat. Since defensive (extreme) reactions are being predicted, and since the study employs two independent groups, the Moses test is appropriate for an analysis of the research data. In advanceof collecting the data, the researcherset II at 1. iii. SignificanceLeveL Let a = .05. nz = 9 and nc 9. iv. Sampling Distribution. The probability associated with the occurrenceunder H pof any value as small as an observeds~is given by formula (6.15). v. RejectionRegion. The region of rejection consistsof all values of sI, which are so small that the probability associatedwith their occurrenceunder H pis equal to or less than a = .05.

vi. Decision. The scoresfor attribution of aggressionby the E and C subjects are given in Table 6.24, which also showsthe rank of TAnm

6.24. ATTMBUTION OP AGGRESSION To CHARhcTERS IN FILM

+ ~en ties occur betweentwo membersof the samegroup, the value of sl, is unaffectedand thus the useof tied ranks is unneceaaary.For a

discussion of theproblemof tiesin the Mosestest,seethesectionfollowing this example.

150

THE

CASE

OF

TWO

INDEPENDENT

SAMPLES

each. When these ranks are ordered in a single series,we have the data

shown in Table

6.25.

TABLE 6.25. DATA IA TABLE 6.24 CAsT FQR MosEs Rank

2,

34

67

89

10

Group

clE c

EC

CC

CC

11

TEsT

12

13

14

C

E

EC

15

16

17

E

EE

18

Since h = 1, the most extreIne rank at each end of the C range is dropped; these are ranks 2 and 15. Without these two ranks, the truncated span of the C scores is 9. sI,=9

That is,

h=1

Now the minimum possible sA would be (nc 2h) = 9 2 = 7. Thus the amount by which the observed sq exceeds the minimum

possibleis 9 7 = 2. Thus g = 2. To determine the probability of occurrenceunder 110of sA< 9 when nc = 9, ng 9, and g = 2p we substitute these values into formula (6.15): i + nc 2h

2

ng+

2h + ng

p(sA< ng

1

i

c

h + g) ' no+

i+

9

2

2 9+2+

ng

1

i

0

(1) (220) + (6) (165) + (21) (120) 48,(.:20 = .077

* For any positive ititcgcrs, sap a s,nd b, ifa

and

=0

ifa

>b

and n~ observationsare from the same population. That is, it is merely a matter of chancethat certain

scoresare labeledA and othersare labeledB. The assignmentof the labelsA and B to the scoresin the particular way observedmay be con-.

ceivedasoneof manyequallylikely accidents if Hois true. UnderHs, the labelscouldhavebeenassigned to the scoresin any of 126equally 'This exampleis taken from Pitman, E. J. G. 1937a. Signi6cancetests which may be applied to samplesfrom any populations. Supplementto J. Royal Stolid. Sos., 4, 122.

THE RANDOMIzATIoN

TE8T FQR Two

INDEPENDENT

sAMPLEs

15$

likely ways:

( + .) (4+a) Under Hp, only oncein 126 trials would it happen that the four smallest

scoresof the nine wouldall acquirethe label A, while the flve largest acquired the label B.

Now if just sucha result shouldoccurin an actual single-trialexperiment, we could reject Hp at the p = ~ = .008 level of significance, applying the reasoningthat if the two groupswerereally from a common population,i.e., if Hp werereally true, there is no goodreasonto think that the most extremeof 126possibleoutcomesshouldoccuron just the trial that constitutes our experiment. That is, we would decide that

there is little likelihood that the observedevent could occurunder Hp, and therefore we would reject, Hp when the event did occur. This is part of the familiar logic of statistical inference.

The randomizationtest specifiesa numberof the mostextremepossible outcomeswhich could occur with n~+ np scores,and designatesthese

as theregionof rejection.Whenwehave np+ np ng

equally likely

occurrencesunder Hp, for someof thesethe differencebetweenZA (the sum of group A's scores)and ZB (the aum of group B's scores)will be extreme. The casesfor which these differencesare largest constitute the region of rejection.

If a is the significancelevel, then the region of rejection consistsof

thea nj+ np mostextremeof the possibleoccurrences. That is, the ni

number of possibleoutcomesconstituting the region of rejection is 'ng

The particular outcomes chosento constitute that num-

ber are those outcomesfor which the difference between the mean of the A'a and the mean of the B'a is largest. These are the occurrencesin

whichthe difference betweenZA andZB is greatest. Nowif the sample we obtainis amongthosecases listedin the regionof rejection,wereject Hp at significancelevel a.

In theexample of9 scores given above, thereare ng+ np = 126 np

possibledifferences betweenZA andZB. If a = .05,thenthe regionof

rejection consists of a ng+ np= .05(126) ng

= 6.3 extreme outcomes.

Sincethe alternativehypothesisis directional,the regionof rejection consistsof the 6 mostextremepossibleoutcomesin the specifieddirection. Under the alternative hypothesisthat p~ < pa, the 6 most extreme

THE

CASE

OF

TWO

INDEPENDENT

SAMPLES

possible outcomes constituting the region of rejection of a = .05 (onetailed test) are those given in Table 6.26. The third of these possible TABLE 6.26. THE SIx MosT ExTREME PossIBI.E OUTcoMEs IN THE PREDICTED

DIRECTION

(Theseconstitute the region of rejection for the randomization test when a = .05)

' The sample obtained.

extreme outcomes, the one with an asterisk, is the sample we obtained. Since our observed scores are in the region of rejection, we may reject

Hp at a = .05. The exact probability (one-tailed) of the occurrenceof the observed scores or a set more extreme under H pis p = ~ =

.024.

Now if the alternative hypothesis had not predicted the direction of the difference, then of course a two-tailed test of Hp would have been in order. In that case, the 6 sets of possible outcomesin the region of rejection would consist of the 3 most extreme possible outcomesin one direction and the 3 most extreme possibleoutcomesin the other direction. It would include the 6 possibleoutcomeswhosedifferencebetweenZA and XB was greatest in absolute value. For illustrative purposes,the 6 most extreme possible outcomes for a two-tailed test at a = .05 for the 9 scores presented earlier are given in Table 6.27. With our observed scoresHp would have been rejected in favor of the alternative hypothesis that II> W», becausethe obtained sample (shownwith asterisk in Table 6.27) is one of the 6 most extreme of the possible outcomes in either direction. The exact probability (two-tailed) associated with the occurrence under Hp of a set

as extreme as the one observed is

u = rh = 048. Large samples. When nI and nz are large, the computations necessary for the randomization test may be extremely tedious. However, they may be avoided, for Pitman has shown (1937a)that for nI and nz large, if the kurtosis of the combined samples is small and if the ratio of nt to nz lies between~ and 5, that is, if the larger sample is not more than five

THE RANDQMIzATIQN TEsT FQR Two T~sm

INDEPENDENT

sAMPLE8

6.27. THE Sxx Moss ExrRExxE PossxsLxx Omcoaxls EnaER

155

xN

DxaxcrroN

(These constitute the two-tailed region of rejection for the randomisation test when a .05)

' The sample obtained.

times larger than the smaller sample,then tbo randomization distribution possible outcomesis closely approximated by the t distrins

bution. That is, if the above-mentioned two conditions(small kurtosis and



5 are satisfied,then

5 nx E(B 8)' + Z(A A)'1 1 sg + xmas 2

'+A

+B

has approximatelythe Student t distribution with df = n~ + ns 2. Therefore the probability associatedwith the occurrenceunder Ho of any valueasextremeasan observedt may be determinedby referenceto gable B of the Appendix.

Thereadershouldnotethat eventhoughformula(6.16)is the ordinary

f test, the testis not usedin this caseasa parametric statisticaltest, for the assumption that the populations are normally distributed with

commonvarianceis not necessary.However,its userequiresnot only that the two conditions mentioned above be Inet, but also that the scores

representmeasurement in at leastan intervalscale. When n, and nx are large, another alternative to the randomization test is the Mann-WhitneyU test,which may be regardedasa randomization test applied to the ranks of the observationsand which thus constitutes a good approximationto the randomizationtest. It can be shown (Whitney, 1948)that there are situationsunder which the Mann~ Thebarredsymbols,for example,8, stand for means.

156

THE

CASE

OF

TWO

INDEPENDENT

SAMPLES

Whitney U test is more powerful than the t test and thus is the better alternative.

Summaryof procedure. These are the stepsin the use of the randomizationtest for two independentsamples: 1. Determinethe numberof possibleoutcomesin the regionof rejec-

tion: n~+ n~ 2. Specifyas belongingto the regionof rejectionthat numberof the most extremepossibleoutcomes. The extremesare thosewhich have

the largestdifference betweenXA andEB. For a one-tailed test,all of theseare in the predicteddirection. For a two-tailed test, half of the numberare the mostextremepossibleoutcomesin onedirectionand half are the mostextremepossibleoutcomesin the other direction. 3. If the observedscoresare oneof the outcomeslistedin the regionof

rejection,rejectHo at the a levelof significance. For sampleswhich are so large that the enumerationof the possible outcomesin the regionof rejectionis too tedious,formula (6.16) may be used as an approximationif the conditionsfor its use are met by the data. An alternative, which need not meet such conditionsand thus may be moresatisfactory,is the Mann-Whitney U test. Power-ESciency Becauseit usesall the information in the samples,the randomization

test for two independentsampleshas power-efficiency, in the sense defined,of 100 per cent. References

The reader may find discussions of the randomizationtest for two

independent samplesin Moses(1952a),Pitman (1937a;1937b;1937c), SchefN(1943), Smith (1953), and Welch (1937). DISCUSSION

In this chapter we have presentedeight statistical tests which are useful in testing for the "significanceof the difference"betweentwo

independent samples.In his choiceamongthesetests,the researcher may be aided by the discussionwhich follows, in which any unique advantagesof the testsare pointedout and the contrastsamongthem are noted.

All the nonparametrictestsfor two independentsamplestest whether it is likely that the two independentsamplescamefrom the samepopulation. But the varioustestswe have presentedare moreor lesssensitive

DISCUSSION

157

to differentkindsof differences betweensamples.For example,if one ~shes to test whethertwo samplesrepresentpopulationswhich differ in location (central tendency), these are the tests which are most sensitive

tQ sucha difference andthereforeshouldbe chosen:the mediantest (or the Fisher test whenN is small), the Mann-Whitney U test, the Kolmogorov-Smirnovtwo-sampletest (for one-tailedtests),and the randomization test.

On the other hand, if the researcheris interested in determin-

Ing whetherhis two samplesare from populationswhichdiffer in any respect at all, i.e., in location or dispersionor skewness, etc., he should

chooseoneof thesetests:the y' test,the Kolmogorov-Smirnov test (twotailed), or the Wald-Wolfowitz runstest. Theremainingtechnique, the Moses test, is uniquely suitable for testing whether an experimental group is exhibiting extremistor defensivereactionsin comparisonto the reactions exhibited by an independent control group. The choice among the tests which are sensitive to differencesin location is determined by the kind of measurementachieved in the research g,ndby the size of the samples. The most powerful test of location is the

randomization test. However, this test can be used only when the sample sizesare small and when we have someconfidencein the numerical

Immature of the measurementobtained. With larger samplesor weaker Ineasurement (ordinal measurement), the suggested alternative is the Mann-Whitney U test, which is almost as powerful as the randomization test. If the samples are very small, the Kolmogorov-Smirnov test is shghtly more efBcient than the U test. If the measurementis such that it is meaningful only to dichotomize the observations as above or below

the combinedmedian,then the mediantest is applicable. This test is

not as powerfulasthe Mann-WhitneyU testin guardingagainstdifferences in location, but it is more appropriate than the U test when the

data are observationswhich cannot be completely ranked. If the combined samplesizesare very small, when applying the median test the researchershould make the analysis by the Fisher test.

The choiceamongthe testswhicharesensitiveto all kindsof differences

(the secondgrouplisted above)is predicatedon the strengthof the

Ineasurement obtained, thesizeofthetwosamples, andtherelativepower pf the available tests. The x' test is suitable for data which are in nominal or strongerscales. When the N is small and the data are in a

>(2 contingency table,theFishertestshouldbeusedratherthany~. In Inany casesthe x' test may not make efBcientuse of all the infor-

Inationin the data. If the populations of scores arecontinuously dis-

tributed,wemaychoose eitherthe Kolmogorov-Smirnov (two-tailed) testor theWald-Wolfowitz runstestin preference to theg' test. Ofall testsfor anykindof difference, theKolmogorov-Smirnov testis themost

powerful.If it is usedwithdatawhichdonotmeettheassumption of

158

THE CASE OF TWO INDEPENDENT SAMPLES

continuity,it is still suitablebut it operates moreconservatively (Goodman, 1954),i.e., the obtainedvalueof p in suchcaseswill be slightly higher than it shouldbe, and thus the probability of a Type II error will

beslightlyincreased.If Hois rejectedwith suchdata,wecansurely haveconfidence in the decision. The runstest alsoguardsagainstall kinds of differences,but it is not aspowerfulas the Kolmogorov-Smirnov test.

Two pointsshouldbe emphasized aboutthe useof the secondgroup of tests. First, if one is interestedin testing the alternative hypothesis that the groups differ in central tendency, e.g., that one population has a

largermedianthan the other, then one shouldusea test specifically designedto guard against differencesin location one of the tests in the first group listed above. Second,when one rejects Ho on the basis of a test which guards against any kind of difference (one of the tests in the secondgroup), one can then assertthat the two groups are from different

populationsbut one cannot say in what specificmay(a)the populations differ.

CHAPTER 7

THE

CASE OF k RELATED

SAMPLES

In previouschapterswehavepresented statisticaltestsfor (a) testing for significantdifferences betweena singlesampleand somespecified population, and (b) testing for significant differencesbetween twe

samples,eitherrelatedor independent.In this andthe followingchapters, procedureswill be presentedfor testingfor the significanceof differ-

encesamongthreeor moregroups. That is, statisticaltestswill bepresented for testing the null hypothesisthat k (3 or more) sampleshave been drawn from the samepopulation or from identical populations.

This chapterwill presenttestsfor the caseof k relatedsamples;the following chapterwill presenttestsfor the caseof k independent samples. circumstancessometimesrequirethat we designan experimentsothat

Diore than two samplesor conditionscan be studiedsimultaneously. .hen three or more samplesor conditionsare to be comparedin an experiment, it is necessaryto use a statistical test which will indicate

whether there is an over&i differenceamongthe k samplesor conditions

beforeonepicksout any pair of samples in orderto test the significance of the difference between them.

If we wishedto usea two-samplestatistical test to test for differences

among,say,5 groups, wewouldneedto compute, in orderto compare each pair of samples,10 statistical tests. (Five things taken 2 at a

time= 2 = 10. Sucha procedure is notonlytedious, butit may lead to fallaciousconclusionsas well becauseit capitalizeson chance.

That is, suppose wewishto usea significance levelof, say,0, = .Q5. Our hypothesisis that thereis a difference amongk' = 5 samples.If wetest

that hypothesis by comparing eachof the 5 samples with everyother sample,usinga two-sample test (whichwouldrequire10comparisons in

all), we aregivingourselves 10chances ratherthan 1 chance to reject Ho Nowwhenweset.05asourlevelof significance, wearetakingthe risk of rejectingHoerroneously (makingthe TypeI error) 5 percentof

thetime. Butif wemake10statistical testsof thesame hypothesis, we increase theprobability of theTypeI error. It canbeshown that,for 5 samples,the probabilitythat a two-samplestatisticaltest will find 159

THE CASE OF k RELATED SAMPLES

one or more "significant" differences,when a = .05, is p = .40. That is, the actualsignificance levelin sucha procedure becomes n = .40. Caseshave been reported in the researchliterature (McNemar, 1955,

p. 234)in whichanover-alltestof fivesamples yieldsinsignificant results

(leadsto theacceptance of Hp)but two-sample testsof thelargerdifferencesamongthe fivesamples yieldsignificantfindings. Sucha posteriori selectiontends to capitalizeon chance,and thereforewe can have no confidencein a decisioninvolving k samplesin which the analysisconsistedonly of testing two samplesat a time.

It is only whenan over-alltest (a k-sampletest) allowsus to reject the null hypothesisthat we are justifiedin employinga procedure for

testingfor significantdifferences betweenany two of the k samples. '(Forsucha procedure, seeCochran, 1954;andTukey,1949.) The parametric technique for testingwhetherseveralsamples have comefrom identical populationsis the analysisof varianceor F test.

The assumptions associated with the statisticalmodelunderlyingthe F test are these:that the scoresor observationsare independentlydrawn

from normallydistributedpopulations; that the populations all havethe samevariance;and that the meansin the normallydistributedpopulations are linear combinationsof "effects" due to rowsand columns,i.e., that the effectsare additive. In addition, the F test requiresat least interval measurementof the variables involved. If a researcher finds such assumptions unrealistic for his data, if he

finds that his scoresdo not meetthe measurement requirement,or if he

wishesto avoidmakingthe assumptions in orderto increase the gener-

ality of hisfindings, hemayuseoneof thenonparametric statisticaltests

presented in thisandthefollowing chapter.In addition to avoiding the

assumptions andrequirements mentioned, thesenonparametric k-samp]e tests have the further advantageof enablingdata which are inherently

onlyclassificatory or in ranksto beexamined for significance. There are two basic designsfor comparingk groups. In the first

design, thek samples of equalsizearematched according to somecriterion or criteria which may affect the values of the observations. In some

cases, the matchingis achieved by comparing the sameindividualsor casesunder all k conditions. Or each of N groups may be measured under all k conditions. For such designs, the statistical tests for.k

relatedsamples(presented in this chapter)shouldbeused. The second

designinvolvesk independent randomsamples, not necessarily of the samesize, one samplefrom each population. For that design,the statistical testsfor k independentsamples(presentedin Chap.8) should be employed. The above distinction is, of course,exactly that made in the parametric case. The first design is known as the two-way analysis of variance,

THE COCHRh.N Q TEST

161

sometimes called"the randomized blocksdesign."' The seconddesign is known as the one-way analysis of variance. The distinction is similar to that we made between the case of two

relatedsamples(discussed in Chap.5) and the caseof two independent samples(discussedin Chap. 6).

Thischapterwill presentnonparametric statisticaltestswhichparallel the two-wayanalysisof variance. We will presenta testsuitablefor use with data measuredin a nominal scale and another suitable for use with data measuredin at least an ordinal scale. At the conclusion of this

chapterweshallcompareandcontrastthesetestsfor k relatedsamples, ofering further guidance to the researcher in his selection of the test suitable for his data. THE COCHRAN Q TEST

Function

The McNemartest for two relatedsamples, presented in Chap.5, canbeextended for usein research havingmorethantwo samples.This extension,the CochranQ testfor k relatedsamples,providesa methodfor

testingwhetherthreeor morematchedsetsof frequencies or proportions difer significantlyamongthemselves.The matchingmay be basedon relevant characteristicsof the diferent subjects,or on the fact that the same subjects are used under different conditions. The Cochran test is

particularlysuitablewhenthe dataarein a nominalscaleor aredichotoggzed ordinal information.

pne may imaginea widevariety of researchhypothesesfor which the

datamightbeanalyzed by theCochran test. For example, onemight testwhetherthevariousitemsona testdifferin difficultyby analyzing data consistingof pass-failinformation on k items for N individuals.

yn thisdesign, thek groups areconsidered "matched"because eachper.on answersall k items.

pn the otherhand,wemighthaveonlyoneitemto beanalyzed, and ~sh to compare the responses of N subjectsunderk diferentconditions.

Hereagainthe "matching"is achieved by havingthesamesubjects in +verygroup,but nowthe groupsdiffer in that eachis undera difFerent

), and (vi) the decision consistsof determining the observedvalue of the measureof association and then determining the probability under Ho associatedwith such an extreme value; if and only if that probability is equal to or less than n, the decision is to reject Ho in favor of H~.

Becausethe same sets of data are repeatedly used for illustrative material in the discussionsof the various measuresof association,in order to highlight the differences

snd similaritiesamongthesemeasures,the constantrepetition of the six stepsof statistical inferencein the exampleswould lead to unnecessary redundancy. Therefore we have chosennot to includethesesix stepsin the presentationof the examplesin this chapter. We mention here that they might well have been included in order to

point out to the readerthat the decision-making procedureusedin testingthe significanceof a measureof associationis identical to the decision-makingprocedureused in other sorts of statistical tests.

212

CORRELATION

AND TESTS OF SIGNIFICANCE

Wehavealsoseenthattherelationbetween socialstatusstrivings andamountof yieldingis rs = .62in ourgroupof 12subjects.By referring to TableP,wecandetermine thatrs ) .G2hasprobability of occurrence underHpbetween p = .05andp = .01(one-tailed). Thus we could conclude,at the u = .05 level, that thesetwo vari-

ab]esare associated in the populationfrom whichthe samplewas drawn.

Large samples. When N is 10 or larger, the significanceof an obtained

rs underthe null hypothesismay be testedby (I 10, the signifiicanceof a value as large as the observed value of rs may be determined by computing the t associatedwith that value [using formula (9.8)] and then determining the significanceof that value of t by referring to Table B. Power-EEciency

The efficiency of the Spearman rank correlation when compared with the most powerful parametric correlation, the Pearson r, is about 91 per cent (Hotelling and Pabst, 1936). That is, when rs is usedwith a sample to test for the existence of an association in the population, and when the

assumptionsand requirements underlying the proper use of the Pearson r are met, that is, when the population has a bivariate normal distribution and measurementis in the senseof at least an interval scale, then rs is 91 per cent as efficient as r in rejecting H0. If a correlation between X and Y exists in that population, with 100 casesrs will reveal that correlation at the samelevel of significancewhich r attains with 91 cases. References

For other discussions of the Spearman rank-order correlation, the reader may turn to Hotelling and Pabst (1936), Kendall (1948a; 1948b, chap. 16), and Olds (1949). THE

KENDALL

RANK

CORRELATION

COEFFICIENT'

r

Function

The Kendall rank correlation coefficient, 7.(tau), is suitable as a measure of correlation with the same sort of data for which rs is useful.

That

is, if @tleast ordinal measurementof both the X and Y variables has been achieved,so that every subject can be assigneda rank on both X and Y,

214

CORRELATION

AND

TESTS

OF

SIGNIFICANCE

then r will give a measure of the degree of association or correlation betweenthe two sets of ranks. The sampling distribution of 7 under the

null hypothesisis known, and thereforer, like rs, is subjectto tests of significance.

Oneadvantageof r over rs is that r can be generalized to a partial correlationcoefficient. This partial coefficientwill be presentedin the section following this one. Rationale

SupposeweaskjudgeX and judge Y to rank four objects. For example, we might ask them to rank four essaysin orderof quality of expository style. We representthe four papersas a, b, c, and d. The obtained rankings are these: F.ssay Judge X

Judge Y

31

If we rearrangethe orderof the essaysso that judgeX's ranksappearin natural order (i.e., 1, 2,...,

N), we get

Judge X

I2

34

Judge Y

24

3I

We are now'in a position to determinethe degreeof correspondence betweenthe judgmentsof X and Y. JudgeX's rankingsbeingin their natural order, we proceed to determine how many pairs of ranks in

judge Y's set are in their correct (natural) order with respectto each other.

Considerfirst all possiblepairs of ranksin whichjudge Y's rank 2,

the rankfarthestto the left in his set,is onemember.Thefirst pair, 2 and4, hasthecorrectorder:2 precedes 4. Sincetheorderis "natural," we assigna scoreof +1 to this pair.

Ranks 2 and 3 constitute the second

pair. Thispair is alsoin thecorrectorder,soit alsoearnsa score of +1. Now the third pair consistsof ranks 2 and 1. Theseranks are not in

"natural" order;2 precedes 1. Thereforeweassignthis pair a scoreof 1.

For all pairswhichincludethe rank 2, wetotal the scores: (+1) + (+1)

+ ( 1) = +1

THE KENDALL RANK CORRELATION COEFFICIENT: T

215

Now we considerall possiblepairs of ranks which include rank 4 (which is the rank secondfrom the left in judge Y's set) and onesucceedingrank. One pair is 4 and 3; the two membersof the pair are not in the natural order, so the scorefor that pair is l. Another pair is 4 and 1; again a score of 1 is assigned. The total of these scores is ( 1)+

( 1) =

2

When we consider rank 3 and succeeding ranks, we get only this pair:

3 and 1. The two membersof this pair are in the wrong order; therefore this pair receives a score of 1. The total of all the scores we have assigned is (+ 1) + ( 2) + ( 1) =

2

Now what is the maximum possible total we could have obtained for

the scoresassignedall the pairs in judge Y's ranking? The maximum possibletotal would have been yielded if the rankings of judges X and Y had agreed perfectly, for then, when the rankings of judge X were arranged in their natural order, every pair of judge Y's ranks would also be in the correct order and thus every pair would receive a score of +1. The maximum possible total then, the one which would occur in the case of perfect agreement between X and Y, would be four things taken two at a time, or

4 2

= 6.

The degree of relation between the two sets of ranks is indicated by the ratio of the actual total of +1's and 1's to the possible maximum total.

The Kendall

rank

correlation actual

coe%cient

total

is that

ratio:

2

maximum possible total

6

That is, ~ = .33 is a measure of the agreement between the ranks assignedto the essaysby judge X and those assignedby judge Y, One may think of 7 as a function of the minimum number of inversions or interchanges between neighbors which is required to transform one ranking into another. That is, r is a sort of coeScient of disarray. Method

We have seen that actual

score

maximum possible score

gn general,the maximum possiblescore will be,

which can be

216

CORRELATION AND TESTS OF SIQNIFICANCE

expressedas zN(N

1). Thus this last expressionmay be the denomi-,

nator of the formula for r.

For the numerator, let us denote the observed

sum of the +1 and 1 scoresfor all pairs as S. Then S

~N(N

(9.9)

1)

where N = the number of objects or individuals ranked on both X and Y. The calculation of S may be shortened considerably from the method shown above in the discussion of the logic of the measure.

When the ranks of judge X were in the natural order, the corresponding ranks of judge Y were in this order Judge Y:

24

31

We can determine the value of S by starting with the first number on the

left and counting the number of ranks to its right which are larger. We then subtract from this the number of ranks to its right which are smaller. If we do this for all ranks and then sum the results, we obtain S. Thus, for the above set of ranks, to the right of rank 2 are ranks 3 and 4 which are larger and rank 1 which is smaller.

Rank 2 thus contributes

(+2 1) = +1 to S. For rank 4, no ranks to its right are larger but two (ranks 3 and 1) are smaller. Rank 4 thus contributes (0 2) = -2 to S. For rank 3, no rank to its right is larger but one (rank 1) is smaller, so rank 3 contributes

(0 1)

=

1 to S.

(+1) + ( 2) + ( 1) =

These contributions total 2= S

Knowing S, we may use formula (9.9) to compute the value of r for the ranks assignedby the two judges: S

~N(N I(4)(4

1)

(9.9)

1)

= .33

Era,mpie We have already computed the Spearman ra for 12 students'

scoreson authoritarianism and on social status strivings. The scoresof the 12 students are presentedin Table 9.3, and the ranks of

thesescoresare presentedin Table 9.4. We may computethe value of r for the same data.

THE KENDALL RANK CORRELATION COEFFICIENT: T

217

The two setsof ranks to be correlated(shownin Table 9.4) are these: Subject Status strivings rank Authoritarianism

rank

EF

G

H

I

8 ll

10

6

7 12

3

4 12

10

98

J

To computer, we shallrearrangethe orderof the subjectssothat the rankingson socialstatusstrivingsoccurin the natural order: Subject

DC

Status strivings rank Authoritarianism

rank

A

B

K

G

F

J

10

11

12

12

34

5

15

26

7

89

12

Having arrangedthe rankson variableX in their naturalorder,we determinethe value of S for the corresponding order of ranks on variable

Y:

S = (11 0) + (7 3) + (9 0) + (6 2) + (5 2) + (6

0) + (5

0) + (2

2) + (1

2) + (2

0)

+(1-0) The authoritarianism

=44

rank which is farthest to the left is 1.

This

rank has 11 rankswhichare largerto its right, and 0 rankswhichare smaller,so its contributionto S is (11 0). The next rank is 5. It has 7 ranks to its right which are larger and 3 to its right which are smaller,sothat its contributionto S is (7 3). By proceeding in this way, we obtain the variousvaluesshownabove, which we have summedto yield S = 44. Knowingthat S = 44 and N = 12, we may useformula (9.9) to computer: S

~N(N

1)

(99)

44

~(12)(12 1) = .67

w = .67 representsthe degreeof relation betweenauthoritarianism and social status strivings shown by the 12 students. Tied observations. When two or more observationson either the X or

the Y variableare tied, we turn to our usualprocedure in rankingtied scores:the tied observations are giventhe averageof the ranksthey would have received if there were no ties.

218

CORRELATIONAND TESTS OF SIGNIFICANCE

The eEect of ties is to changethe denomtnatorof our formula for r. In the caseof ties, r becomes T

Q~N(N

whereTz = >Zt(t 1),

1)

Tz Q~N(N

1)

(9.10)

TF

t beingthe numberof tied observations in each

group of ties on the X variable Tr >Et(t 1), t being the number of tied observationsin each group of ties on the Y variable

The computations requiredby formula (9.10) areillustratedin the example which follows. Example with Ties

Again we shallrepeatan examplewhichwasfirst presentedin the discussionof the Spearman rs. Ke correlated the scoresof 12 subjects on a scale measuringsocial status strivings with the number of

times that eachyieldedto grouppressures in judgingthe length of lines. The data for this pilot study are presentedin Table 9.5. These scores are converted to ranks in Table 9.6.

The two setsof ranks to be correlated(first presentedin Table 9.6) are these: Subject

A

Status strivings rank

34

Yieldingrank

1.5

B

C

DE

21 1.5

3.5

8 3.5

F

G

I

11

10

'7

5

JK 12

5

9 10.5

10.5

12

As usual,we first rearrangethe orderof the subjects,so that the ranks on the X variable Subject

occur in natural order: K

CA

Status strivings rank Yielding rank

3.5

1.5

1.5

10.5

IE 78

56

23 3.5

H

8

I 9

95

10

F

J

11

12

6 10.5

Then we computethe value of S in the usualway: S = (8 2) + (8 2) + (8 0) + (8 0) + (1 5) + (3 3) + (2 3) + (4 0) + (0 3) + (1 1) +(1

0) =25

Having determinedthat S = 25, we now determinethe valuesof T~ and Tr.

There are no ties among the scoreson social status

strivings,i.e., in the X ranks, and thus Tz = 0.

THE KENDALL RANK CORRELATION COEFFICIENT: T

219

On the Y variable (yielding), there are three sets of tied ranks. Two subjects are tied at rank 1.5, two are tied at 3.5, and two are tied at 10.5. In eachof thesecases,t = 2, the number of tied observations. Thus T» may be computed: T> = 4Zt(t 1) = g[2(2 1) + 2(2 1) + 2(2 1)) 3

With T» = 0, T> = 3, S = 25, and N the value of r by using formula (9,10):

Q~N(N

12, we may determine

1)

Tp

(9.10)

25

Qg(12) (12 1)

3

= .39

If we had not corrected the above coefficient for ties, i.e., if we had used formula (9.9) in computing ~, we would have found r = .38. Observethat the effect of correcting for ties is relatively small. Comparison of ~ and ra In two caseswe have computed both r and rs for the same data.

The

reader will have noted that the numerical values of r and rs are not identical when both are computed from the same pair of rankings. For the relation between authoritarianism and social status strivings, rq .82 whereasr = .67. For the relation between social status strivings and number of yieldings to group pressures,ra = .62 and r = .39. These examples illustrate the fact that r and rs have different under-

lying scales,and numerically they are not directly comparable to each other. That is, if we measurethe degreeof correlation betweenthe vari-

ablesA and B by usingra, and then do the samefor A and C by usingr, we cannotthen saywhetherA is morecloselyrelatedto B or to C, for we shall be using two noncomparablemeasuresof correlation. However, both coefficientsutilize the same amount of information in the data, and thus both have the same power to detect the existenceof association in the population. That is, the sampling distributions of

< andra are suchthat with a given set of data both will reject the null hypothesis(that the variablesare unrelatedin the population) at the same level of significance. This should becomeclearer after the follow-

ing discussionon testing the significanceof T.

CORRELATION

AND

TESTS

OF

SIGNIFICANCE

Testing the Significance of r

If a random sampleis drawn from somepopulation in which X and Y

are unrelated,and the membersof the sampleare rankedon X and Y, then for any given order of the X ranks all possibleorders of the Y ranks are equally likely. That is, for a given order of the X ranks, any one possibleorder of the Y ranksis just as likely to occur as any other possible

order of the Y ranks. Supposewe order the X ranks in natural order, i.e., 1, 2, 3,..., N. For that orderof the X ranks, all the N! possible ordersof the Y ranks are equally probableunder Ho. Thereforeany particular order of the Y ranks has probability of occurrenceunder Ho of 1/N!. TABLE 9.7. PROBABILITIESOF r UNDER Ho FOR g ~

4

For each of the N! possible rankings of Y, there will be associateda value of r. These possible values of r will range from +1 to -1, and they can be cast in a frequency distribution. For instance, for N = 4 there are 4! = 24 possiblearrangementsof the Y ranks, and each has an associatedvalue of r. Their frequency of occurrenceunder Ho is shown in Table

9.7.

We could compute similar tables of probabilities for other valuesof N, but of courseas N increasesthis method becomesincreasingly tedious. Fortunately, for N ) 8, the sampling distribution of r is practically indistinguishable from the normal distribution (Kendall, 1948a, pp. 38-39). Therefore, for N large, we may use the normal curve table (Table A) for determining the probability associatedwith the occurrence under Ho of any value as extreme as an observedvalue of r. However, when N is 10 or less,Table Q of the Appendix may be usedto determine the exact probability associated with the occurrence (onetailed) under Ho of any value as extreme as an observed S. (The sampling distributions of S and r are identical, in a probability sense.

THE KENDALL RANK CORRELATION COEFFICIENT: r

221

Inasmuch as r is a function of S, either might be tabled. It is more convenient to tabulate S.) For such small samples,the significance of an observed relation between two samplesof ranks may be determined by simply finding the value of S and then referring to Table Q to determine the probability (one-tailed) associatedwith that value. If the y < a, Hp may be rejected. For example, supposeN = 8 and S = 10. Table Q showsthat an S > 10 for N = 8 has probability of occurrenceunder Hp of p = .138. When N is larger than 10, r may be considered to be normally distributed with Mean=y,=0 Standard deviation

and

= o,

=

That is, z

T

Pg tTg

2(2N + 5) 9N(N 1)

(9.11)

is approximately normally distributed with zero mean and unit variance. Thus the probability associatedwith the occurrenceunder Hp of any value as extreme as an observedr may be determined by computing the value of z as defined by formula (9.11) and then determining the significanceof that z by referenceto Table A of the Appendix. Example for N > 10*

We have already determined that among 12 students the correlation between authoritarianism and social status strivings is r = .67. If we consider these 12 students to be a random sample from some

population,we may test whetherthesetwo variablesare associated in that population by using formula (9.11): T

2(2N + 5) 9N(N 1)

(9.11)

.67

2[(2)(12) + 5j (9)(12)(12 1) = 3.03

]3y referring to Table A, we seethat z > 3.03 has probability of occurrenceunder Hp of p = .0012. Thus we could reject Hp at ~ See footnote, page 211.

222

CORRELATION

AND

TESTS

OF

SIGNIFICANCE

level of significance a = .0012, and conclude that the two variables

are associatedin the population from which this samplewasdrawn, We have already mentioned that r and r8 have ide»tical power to reject Hp. That is, even though r and r8 are numerically diA'ere»tfor the same set of data, their sampling distributions are such that with the samedata H0 would be rejected at the samelevel of significanceby the sig»ificance tests associated

with

both

measures.

In the present case,r = .67. Associatedwith this value is z = 3.03, which permits us to reject Hp at a = .0012. When the Spearmancoefficient was computed from the same data, we found r8 = .82. When we apply to that value the significance test for r8 [formula (9.8)], we arrive at t = 4.53 with df = 10. Table B showsthat t ) 4.53 with df. = 10has probability of occurrenceunder Hp of slightly higher than .001. Thus z and r8 for the same set of data have significancetests which reject Ho at essentially the same level of significance. Summary of Procedure These are the steps in the use of the Kendall rank correlation coefficient:

1. Rank the observations on the X variable from 1 to N. observations

on the Y variable

Rank the

from 1 to N.

2. Arrange the list of N subjects so that the X ranks of the subjects are in their natural order, that is, 1, 2, 3,..., N.

3. Observethe Y ranksin the order in which they occurwhenthe X ranks are in natural

order.

Determine

the value of 8 for

this order of

the Y ranks.

4. If there are no ties amongeither the X or the Y observations,use formula (9.9) in computi»gthe valueof r. If thereare ties, useformula (9.10).

5. If the V subjectsconstitutea randomsamplefrom somepopulation, one may test whether the observed value of r indicates the existence of

an association between the X and Y variables in that population. The method for doing so dependson the size of N:

a. For N < 10, Table Q showsthe associatedprobability (one-tailed) of a value as large as an observedS.

b. For .V ) 10, one may computethe value of z associated with ~ by using formula (9.11). Table A shows the associatedprobability of a value as large as an observedz.

If the p yielded by the appropriatemethodis equal to or lessthan a, Ho may be rejected in favor of Hi.

KENDALL PARTIAL RANK CORRELATIONCOEFFICIENT: T~,g

223

Power-EfBciency

The Spearmanr> and the Kendall r are equally powerful in rejecting H0, inasmuch as they make equivalent use of the information in the data.

When used on data to which the Pearsonr is properly applicable, both r and rs have efficiency of 91 per cent. That is, 7.is approximately as sensitive

a test

of the existence

of association

between

two variables

in a

bivariate normal population with a sample of 100 casesas is the Pearson

r with 91 cases(Hotelling and Pabst, 1936; Moran, 1951). References

The reader will find other discussionsof the Kendall r in Kendall (1938; 1945; 1947; 1948a; 1948b; 1949). THE KENDALL PARTIAL RANK CORRELATION COEFFICIENT: T,p., Function

When correlation is observed between two variables, there is always the possibility that this correlation is due to the associationbetweeneach

of the two variablesand a third variable. For example,amonga group of school children of diverse ages,one might find a high correlation betweensize of vocabulary and height. This correlation may not reflect any genuine or direct relation between these two variables, but rather may result from the fact that both vocabulary size and height are associated with a third variable, age.

Statistically, this problem may be attacked by methods of partial correlation. In partial correlation, the effects of variation by a third variable upon the relation betweenthe X and Y variables are eliminated. In other words, the correlation between X and Y is found with the third variable Z kept constant.

In designing an experiment, one has the alternative of either introducing experimental controls in order to eliminate the influence of the third variable or using statistical methods to eliminate its influence.

For example, one may wish to study the relation between memorization ability and ability to solve certain sorts of problems. Both of theseskills may be related to intelligence; therefore in order to determine their direct, relation to each other the influence of differencesin intelligence must be controlled. To effect experimental control, we might choose subjects with equal intelligence. But if experimental controls are not feasible, then statistical controls can be applied. By the technique of partial

224

CORRELATION

AND

TESTS

OI' SIGNIFICANCE

correlation we could hold constant the eA'ectof intelligence on the relation

between memorization ability and ability to solve problems,and thereby determine the extent of the direct or uncontaminated relation between these two skills.

In this section we shall present a method of statistical control which may be used with the Kendall rank correlation r. To use this nonparametric method of partial correlation, we must have data which are measuredin at least an ordinal scale. No assumptionsabout the shape of the population of scoresneed be made. Rationale

Supposewe obtain ranks of 4 subjectson 3 variables:X, Y, and Z. We wish to determine the correlation between X and Y when Z is phr-

tialled out (held constant)~ The ranks are Subject Rank

on Z

23

Rank

on X

12

Rank

on

I3

Y

Now if we considerthe possiblepairs of ranks on any variable, we know that there are

2

possible pairs four things taken two at a time.

Having arranged the ranks on Z in natural order, let us observe every

possiblepair in the X ranks, the Y ranks, and the Z ranks. We shall assign a + to each of those pairs in which the lower rank precedesthe

higher, and a

to each pair in which the higher rank precedesthe

lower: Pair

(a,b)

(a,c)

(a,d)

(b,c)

(b,d)

(c,d)

That is, for variableX the scorefor the pair (a,b)is a because the ranks for a and 5, 3 and 1, occur in the "wrong" orderthe higherrank precedesthe lower. For variable X, the score for the pair (a,c) is also a-

becausethe a rank, 3, is higher than the c rank, 2. For variable Y, the

pair (a,c)receives a + because thea rank,2,is lowerthanthec rank,3. Nowwemaysummarize the informationwehaveobtainedby casting

KENDALLPARTIALRANK CORRELATION COEFFICIENT:T~.g 225 it in a 2 X 2 table, Table 9.8.

Consider first the three signs under (a,b)

above. For that set of paired ranks, both X and Y are assigneda , whereasZ is assigneda +. Thus we say that both X and Y "disagree" with Z. We summarizethat information by casting pair (a,b) in cell D of Table 9.8. Consider next the pair (a,c). Here Y's sign agreeswith ThBLE 9.8

Y pairs whose Y pairs whose sign agrees with Z's sign

sign disagrees Total with Z's sign

X pairs whosesign agreeswith Z's sign

X pairs whosesign disagrees with Z's sign Total

Z's sign, but X's sign disagreeswith Z's sign. Therefore pair (a,c) is assignedto cell C in Table 9.8. In each caseof the remaining pairs, both Y's sign and X's sign agreewith Z's sign; thus these 4 pairs are cast in cell A of Table 9.8. ThBLB 9.9, FQRMFQRChsTINGDhTh FQRColFUThTIQNBY FQRMULh(9.12)

In general, for three sets of rankings of N objects, we can use the method illustrated above to derive the sort of table for which Table 9.9 is

a model. The Kendall partial rank correlation coefficient,r~., (read: the correlationbetweenX and Y with Z heldconstant)is computedfrom such a table.

It is defined as AD

BC

4(A + B)(C+ D)(A + C)(B+ D)

(9.12)

226

CORRELATION

AND TESTS

OF SIQNIFICANCE

In the caseof the 4 objectswehavebeenconsidering, i.e., in the caseof the data shownin Table 9.8, (4)(1) (o)(1)

V'(4)(2)(5)(1) = .63

The correlation between X and Y with the effect of Z held constant is

expressedby r~.. = .63. If we had computedthe correlationbetween X and Y withoutconsidering the effectof Z, we wouldhavefound' = .67. This suggeststhat the relations between X and Z and between Y and Z are only slightly influencing the observed relation between X and Y.

This kind of inference,however,must be madewith reservationsunless thereare relevantprior groundsfor expectingwhatevereffectis observed. Formula (9.12) is sometimescalled the "phi coefficient,"and it can be shown that X

Tsg j

The presence of X' in the expression suggests that r.., measures the extentto whichX and Y agreeindependently of theiragreement with Z. Method

Althoughthe methodwhichwe have shownfor computingr,., is usefulin revealingthe natureof the partialcorrelation, asN getslarger this methodrapidlybecomes moretediousbecause of the rapidincrease of the value of

N

Kendall (1948a, p. 108) has shown that rsvp

rS+rzc

(9.13)*

Formula(9.13)is computationally easierthanformula(9.12). To useit, onefirstmustfindthe correlations (r's)between X and Y, X andZ, andY andZ. Havingthesevalues,onemayuseformula(9.13)to findr,., For the X, Y, and Z ranks we have been considering, r,= .67, r,= .67, and w = .88. Insertingthesevaluesin formula(9.13),we have

.67 (.67) (.33)

V'[1 ( 67)'j[1 ( 33)'j = .63

* Thisformula isdirectly comparable tothatusedin finding theparametric partial product momentcorrelation.Kendall(1948a,p. 103)statesthat the similarity seems to be merely coincidental.

KENDALL PARTIALRANK CORRELATIONCOEFFICIENT:r~.,

227

Usingformula (9.13),wearrive at the samevalue of r~., we havealready arrived at by using formula (9.12).

Example We have already seen that in the data collected by Siegel and Fagan, the correlation between scores on authoritarianism and scores

on social status strivings is r = .67. However, we have also observedthat there is a correlationbetweensocialstatus strivings and amount of conformity (yielding) to group pressures:r = .39. This may make us wonder whether the first-mentioned correlation ThELE 9.10. RhNKSQN AUTHQRIThRIhNI8Mi SoclhL SThTU8STRIvINGsi hND CONFORMITY Rank

Subject Social status striving Authoritarianism Conformity (yielding) A

B C

3 2 4 1

6 2 5 1

D

1.5 1.5 3.5 3.5

g

8

10

5.0

F

11

9

6.0

G HI

10 67

JK

12 5

L

9

83 4

7.0 8.0 9.0

12 7

10.5

ll

12.0

10.5

simply representsthe operationof a third variable: conformity to group pressures. That is, it may be that the subjects' need to conform affects their responsesto both the authoritarianism scale and the social status strivings scale,and thus the correlation betweenthe scoreson thesetwo scalesmay be due to an associationbetweeneach varIable and need to conform. We may check whether this is true by computing a partial correlation between authoritarianism and

socialstatusstrivings,partiallingout the effectof needto conform, as indicated by amount of yielding in the Asch situation. The scoresfor the 12 subjects on each of the three variables are shown in Tables 9.3 and 9.5. The three sets of ranks are shown in Table 9.10. Observe that the variable whose effect we wish to partial out, conformity, is the Z variable.

228

CORRELATION

AND

TESTS

OF

SIGNIFICANCE

We have already determined that the correlation between social status strivings (the X variable) and authoritarianism (the Y variable) is ~~ = .67. We have also already determined that the correlation between social status strivings and conformity is ~= .39 (this value is corrected for ties). From the data presentedin Table 9.10, we may readily determine, using formula (9.10), that the correlation between conformity and authoritarianism is r,= .36 (this value is corrected for ties). With that information, we may determine the value of ~,., by using formula (9.13): +Sg

+Zg+$$

V'(1 ~..') (1

~s')

(9.13)

.67 (.36) (.39)

v'(~ (36)'H~>~)'1 = .62

We have determined that when conformity is partialled out, the correlation between social status strivings and authoritarianism is r~., = .62. Since this value is not much smaller than w~ = .67, we might conclude that the relation between social status strivings and authoritarianism

(as measured by these scales) is relatively

independent of the influence of conformity (as measuredin terms of amount of yielding to group pressures). Summary of Procedure. These are the steps in the useof the Kendall partial rank correlation coefBcient: 1. Let X and Y be the two variables whoserelation is to be determined, and let Z be the variable whose eÃect on X and Y is to be partialled out or held constant.

2. Rank the observations on the X variable from 1 to N. for the observations

Do the same

on the Y and Z variables.

3. Using either formula (9.9) or formula (9.10) (the latter is to be used when ties have occurred in either of the variables being correlated), determine the observedvalues of T T y and T. 4. With those values, compute the value of ~,.using formula (9.13). Test of Significance Unfortunately, the sampling distribution of the Kendall partial rank correlation is not as yet known, and therefore no tests of the significance of an observed r.., are now possible. It Inight be thought that with +C+.s

a x test could be used. This is not so becausethe entities in cellsg,

THE KENDALL COEFFICIENT OF CONCORDANCE: W 229

8, C, and D of a table like Table 9.9 are not independent(their sum is

N 2 ratherthan N) anda z' test mayproperlyandmeaningfullybemade only on independent observations. References

The reader may Snd other discussions of this statistic in Kendall

(1948a,chap. 8) and in Moran (1951).

THE KENDALL

COEFFICIENT

OF CONCORDANCE:

W

Function

In the previous sectionsof this chapter, we have been concernedwith

measures of the correlationbetweentwosetsof rankingsof N objectsor individuals. Now we shall considera measureof the relation among severalrankings of N objects or individuals. When we have k sets of rankings, we may determine the association amongthem by using the Kendall coefBcientof concordanceW. Whereas t Band v expressthe degreeof associationbetweentwo variables measured

in, or transformed to, ranks,W expresses thedegreeof association among k suchvariables. Such a measuremay be particularly useful in studies of interjudge or intertest reliability, and also has applications in studies of clusters of variables. Rationale

As a solutionto the problemof ascertaining the over-all~ment amongk setsof rankings,it might seemreasonable to find the rB's(or r's) betweenall possiblepairs of the rankingsand then computethe average of these coefficients to determine the over-all association In

followingsucha procedure, we would needto compute k rank cor 2

relationcoeScients.Unlessk werevery small,sucha procedure woul be extremely tedious. The computation of W is much simpler, and 8' bears a linear relation

to the averagerBtakenoverall groups. If wedenotethe averagevalue

oftheSpearman rankcorrelation coefBcients between the k pebble 2

pairsof rankingsas t'B,,thenit hasbeenshown(Kendall,1948ap 81) that

230

CORRELATION AND TESTS OF SIGNIFICANCE

Another approach would be to imagine how our data would look if there were no agreementamong the several sets of rankings, and then to imagine how it would look if there were perfect agreementamong the several sets.

The coefficient of concordance would then be an index of

the divergence of the actual agreeInent shown in the data from the maximum possible (perfect) agreement. Very roughly speaking, 1F is just such a coefficient. Suppose three company executives are asked to interview six job

applicantsand to rank them separatelyin their order of suitability for a job opening. The three independentsetsof ranks given by executives I, Y, andZ to applicantsa throughf might bethoseshownin Table9.11. TmLE 9.11. RANKS ASSIGNEDTO SIX JOB APPLIChNTSBY THREE COMPANY

EXECUTIVES

(Ar tificial data) Applicant

The bottom row of Table 9.11, labeled R;, gives the sums of the ranks assignedto each applicant. Now if the three executives had been in perfcetagreementabout the

applicants,i.e., if they had eachranked the six applicantsin the same order, then oneapplicantwould have receivedthree ranks of 1 andthus his sum of ranks, R,, would be 1 + 1 + 1 = 3 = k. The applicant whom all executivesdesignatedas the runner-up would have R, =2+2+2

=6=2k

The least promising applicant would have R; = 6+

6+ 6 = 18 = Nk

In fact, with perfectagreementamongthe executives,the varioussumsof ranks, R;, would be these: 3, 6, 9, 12, 15, 18, though not necessarilyin that order. In general, when there is perfect agreementamong k sets of rankings, we get, for the R,, the series:k, 2I., 3k,..., Nk. On the other hand, if there had been no agreement among the three executives, then the various R,'s would be approximately equal. From this example, if should be clear that the degree of agreement

THE KENDALL COEFFICIENT OF CONCORDANCE: W 231

among the k judges is reflected by the degreeof variance among the N sums of ranks. W; the coeScient of concordance,is a function of that degreeof variance. Method

To compute W, we first find the sum of ranks, R;, in each column of a k X N table. Then we sum the R; and divide that sum by N to obtain the mean value of the R;. Each of the R; may then be expressed as a deviation from the mean value. (We have shown above that the larger are thesedeviations, the greater is the degreeof associationamong the k sets of ranks.) Finally, s, the sum of squaresof thesedeviations, is found. Knowing these values, we may compute the value of W: ~k'(N' where

(9.15)

N)

s = sum of squares of the observed deviations from the

mean of R;, that is, s =

ZR,.

R;

k = numberof setsof rankings,e.g., the numberof judges N = number of entities (objects or individuals) ranked

~k'(N'

N) = maximumpossiblesum of the squareddeviations,i.e., the sum s which would occur with perfect agreement among k rankings

For the datashownin Table9.11,the rank totalswere8, 14,11,11, ]1, and 8. The meanof thesevaluesis 10.5. To obtains, wesquarethe deviation of each rank total from that mean value, and then sum those squares:

s = (8 10.5)'+

(14 10.5)'+

(ll

10.5)'+ + (ll

(ll

10.5)'

10.5)' + (8 10.5)~

= 25.5

Knowing the observedvalue of s, we may find the value of 8' for the data in Table 9.11 by using formula (9.15): 25.5

' (3)'(6'

6)

= .16

5' = .16 expresses the degreeof agreementamong the three fictitious executives in ranking the six job applicants.

With the samedata,wemight havefoundrs, by eitherof two methods.

Oneway wouldbe first to find the valuesof rs, rz, and rs .

282

CORRELATION

AND

TESTS

OF

SIGNIFICANCE

these three values could be averaged. For the data in Table 9.11, re .81, r8, = .54, and re., .54. The averageof thesevalues is .81 + ( .54) + ( .54) r saw

8 = .26

Another way to find re., would be to use formula (9.14): kW k

1

(9.14)

1

8(.16) 1 8

1

= .26

Both methods yield the same value: r8=

.26.

As is shown above,

this value bears a linear relation to the value of W.

One differencebetweenthe W and the r8,methods of expressingagreement among k rankings is that r8.may take values between 1 and +1, whereasW may take values only between0 and +1. The reasonthat W cannot be negative is that when more than two setsof ranks are involved, the rankings cannot all disagree completely. For example, if judge X and judge Y are in disagreement,and judge X is also in disagreement with judge 2, then judges Y and 2 must agree. That is, when more than two judges are involved, agreementand disagreementare not symmetrical opposites. k judges may all agree, but they cannot all disagree completely. Therefore W must be zero or positive. The reader should notice that W bears a linear relation to re but seems

to bear no orderly relation to r.

This reveals one of the advantages

which rq has over r.

Example

Twenty mothers and their deaf preschool children attended a summer camp designed to give introductory training in the treatment and handling of deaf children. A staft' of 18 psychologists and speech correctionists worked with the mothers and children during the 2-week camp session. At the end of that period, the 13 staH members were asked to rank the 20 mothers on how likely it was that each mother would rear her child in such a way that the

child would suter personal mala,djustment.' Theserankings are shown

in Table

9.12.

' Thisexamplecitesunpublished data fromreeesrch conducted at the 1955Camp Easter Seal Speechand Hearing Program, Laurel Hill State Park, Pa. The data

freremadeavailableto the authorthroughthe courtesyof the researcher, Dr. J. E. Gordon.

THE KENDhLL COEFFICIENT OF CONCORDhNCE: 8 233

A coefficientof concordance wascomputedto determinethe agreementamongthe stafFmembers. The meanof the variousR;is 135.5. The deviationof every R, from that mean,and the squareof that

deviation,are shownin Table 9.12. The sum of thesesquares = 64,899= s. II: = 13 = the number of judges. N = 20 = the ThBLE 9.12. RhNKSASSIGNED TO 20 MOTHERSBY 13 SThFPMEMBEES Mother Judge 12

B C DR g a H I

J E IN

RI

78

1 6

2 1513

3 8

2

2 16 12 11

9 2 6

7 10 ll 15 8 14 4 5 16 ll 14 13 10 2 16 6 7 16 10 e 12 13 3 13 8

31

7 8 6 10 15 16 6 e

14 1

10 ll

12 13 14 15 16 17 18 10 20

49 9 6 7 6 9

10 ll 3 ll 8 14 ll 18 10 17 14 6 17 10 12 17 15 19 10 13 0 10 15 12 e 14

ll1

la 0 410 2

6 78 17 9 213 2 3 77 6

02 11 8 5

5 8

6 9 14

0

48 5 a 8 14 7 62 6 92 3 3 2 114

12 13 13 7 10 1 7 8 63 36 81 91 41 01 01 11 10 10 11

a7 4

14 12 4 16

15 1B 17 18 10 12 16 12 11 13 10 12 13 16 8 16 ll 14 5 13 6 16 7 12 16 18

16 1

15 12

13 8 18 4 7

1T 10 20 19 18 10 18 20 10 11 19 18 15

18 15 13 1T

19 14 17 18 12 14 20 18 16 12 17 16 19

17 8 ll 16 18 18 le 17 17

20 20 18 20 19 20 10 19 1T 20 20 20 20

ERg N

Ol OO lO lO OI

RI

CO C) lO CA

OO Ol 'V t4

numberof motherswhowereranked. With this information, we may compute 8': 8

7'Ik'(Ng

N)

(9.15)

64,899

A(13)'[(20)' 20] = .577

Theagreement amongthe13staffmembers is expressed by W =.577

Tied observations Whentied observations occur,the observations are eachassignedthe averageof the ranksthey would havebeenass

hadnotiesoccurred, ourusualprocedure in rankingtiedscores

234

CORRELATION

AND

TESTS

OF

SIGNIFICANCE

The effect of tied ranks is to depressthe value of W asfound by formula (9.15). If the proportion of ties is small, that effect is negligible, and thus formula (9.15) may still be used. If the proportion of ties is large, a correction may be introduced which will increaseslightly the value of 1F over what

it would

have been if uncorrected.

That

correction

factor

is

the same one used with the Spearman rs.

where t = number of observationsin a group tied for a given rank X directs one to sum over all groups of ties within any one of the I' rankings With the correction of ties incorporated, the Kendall coefFIcient of concordance

is

' k'(N' N) k $T

(9.16)

T

where

T directs one to sum the valuesof T for all the A,rankings. T

Example with Tiea Kendall (1948a, p. 83) has given an example in which 10 objects are each ranked on 3 different variables: X, Y, and Z. The ranks are shown in Table 9.13, which also shows the values of 8;. TABLE 9.13. RANKS RECEIVED BY YEN ENTITIES ON THREE VARIABLES

The meanof the 8; is 16.5. To obtain e, we sum the squared deviations of each R; from this mean: a = (5.5 16.5)'+ (6.5 16.5)' + (9 16.5)' + (]3.5 . - 16.5)2 + (12 16.5)'+ (20 16.5)' + (23 16.5)' + (23.5 16.5)' + (25.5 16.5)' + (26.5 16.5) ~ = 591

THE KENDALL COEFFICIENT OF CONCORDA.NCE: W

235

Since the proportion of ties in the ranks is large, we should correct for ties in computing the value of W.

In the X rankings, there are two sets of ties: 2 objects are tied

at 4.5 and 2 are tied at 7.5. For both groups,t = the number oi observations tied for a given rank = 2. Tx

Z(t'

t)

Thus

(2' 2) + (2

12

2)

12

1

In the Y rankings, there are three sets of ties, and each set contains two observations. Here t = 2 in each case,and Z(t'

f)

(2' 2) + (2'

2) + (2'

2)

12

1.5

In the Z rankings,there are two setsof ties. One set, tied at 4.5, consists of 4 observations: here t = 4. The other set, tied at rank 8, consistsof 3 observations: t = 3. Thus

Z(t' t) (4' 4) + (3~3) 7 12

12

Knowing the valuesof T for the X, Y, and Z rankings,we may find their sum:

T=

1 + 1.5 + 7

= 9.5.

T

With the above information, we may compute W corrected for ties:

'k'(N' N) k$ T

(9. 16)

T

591

T'F(3)'K1o)'

10j 3(9 5)

= .828

If we had disregardedthe ties, i.e., if we had usedformula (9.15) in computing W, we would have found W = .796 rather than

W = .828. This differenceillustrates the slightly depressingeSect which ties, when uncorrected, exert on the value of W. Testing the Significance of W

SInI11samples. We may test the significanceof any observedvalueof W by determiningthe probability associatedwith the occurrenceunder Ho of a value as large as the e with which it is associated. If we obtain the sampling distribution of s for all permutations in the N ranks in all

possiblewaysin the k rankings, wewill have(N!) setsof possible ranks.

CORRELATION

236

AND

TESTS

OF

SIGNIFICANCE

Using these, we may test the null hypothesis that the Ipsets of rankings are independent by taking from this distribution the probability associated with the occurrenceunder H pof a valueas large asan observede. By this method, the distribution of s under Hp has been worked out and certain critical values have been tabled. Table R of the Appendix gives values of 8 for W's significant at the .05and .01levels. This table is applicable for k from 3 to 20, and for N from 3 to 7. If an observeds is equal to or greater than that shown in Table R for a particular level of significance,then Hp may be rejected at that level of significance. For example, we saw that when Ip = 3 fictitious executives ranked N = 6 job applicants, their agreement was W = .16. Reference to Table R reveals that the 8 associated with that value of W (e = 25.5) is

not significant. For the association to have been significant at the .05 level, 8 would have had to be 103,9or larger. Large samples. When N is larger than 7, the expressiongiven in formula (9.17) is approximately distributed as chi square with df = N

1

z~kN(N + 1)

(9.17)

That is, the probability associatedwith the occurrenceunder Hp of any value as large as an observed W may be determined by finding y' by formula (9.17) and then determining the probability associatedwith so large a value of y' by referring to table C of the Appendix. Observe

that S

i IN(N+ 1) k(N 1)W and therefore

x' = k(N

1)W

(9.18)

Thus one may use formula (9.18), which is computationally simpler than formula (9.17), with df = N 1, to determine the probability associated with the occurrence under Hp of any value as large as an observed

W.

If the value of g' as computed from formula (9.18) [or, equivalently, from formula (9.17)] equals or exceedsthat shown in Table C for a particular level of significance and a particular value of df = N 1, then the null hypothesis that the k rankings are unrelated may be rejected at that level of significance. Eagle'

In the study of ratings by staE personsof the mother-child rela-

tions of 20 motherswith their deafyoung children,Ip= 13, N = 20, ' See footnote, page 211.

THE KENDhLL COEFFICIENT OF CONCORDhNCE:W 237

and we found that W = .577. We may determine the signi6cance of this relation by applying formula (9.18): )P = k(N

(9.18)

1)W

= 13(20 i)(.577) = 142.5

Referring to Table C, we find that g' > 142.5 with df = N

1 = 20 1

= 19

has probability of occurrenceunder Ho of p ( .001. We can conclude with considerable assurancethat the agreement among the 13 judges is higher than it would be by chance. The very low probability under Ho associatedwith the observedvalue of W'enables us to reject the null hypothesisthat the judges' ratings are unrelated to each other.

Summary of Procedure Theseare the stepsin the use of W, the Kendall coefficientof concordance:

1. Let N = the number of entities to be ranked, and let k = the number of judges assigning ranks. Cast the observed ranks in a k X N table.

2. For eachentity, defermineR;, the sumof the ranksassigned to that entity by the k judges. 3. Determine the mean of the R;. Express each R~ as a deviation fIom that mean. Square these deviations, and sum the squaresto obtain

s.

4. If the proportion of ties in the k sets of ranks is large, use formula

(9.16) in computing the value of W; Otherwiseuse formula (9.15). 5. The method for determining whether the observed value of W' is

significantlydiferent from zerodependson the sizeof N: g. If N is 7 or smaller, Table R gives critical values of e associated with W's significant at the .05 and .01 levels.

g. If N is larger than 7, either formula (9.17) or formula (9.18) (the latter is easier)may be usedto computea value of g' whosesignificance, for df = N

1, may be tested by referenceto Table C.

Interpretation of W A high or significant value of W may

the observersor judges are applying essentiallythe same~ d

rankIngthe N ob]ectsunderstudy. OftentheIr pooledorderIngmay serve as a "standard," especially when there is no relevant external criterion for ordering the objects.

238

CORRELATION

AND

TESTS

OP SIGNIFICANCE

It should be emphasizedthat a high or significant value of W does not mean that the orderings observedare correct. In fact, they may all be incorrect with respect to some external criterion.

For example, the

13 staE membersof the camp agreedwell in judging which mothers and their children were headed for difficulty, but only time can show whether

their judgments were sound. It is possible that a variety of judges can agreein ordering objects becauseall employ the "wrong" criterion. In this case,a high or significant W would simply show that all more or less agree in their use of a "wrong" criterion. To state the point another way, a high degree of agreement about an order does not necessarily mean that the order which was agreed upon is the "objective"

one.

In the behavioral sciences,especially in psychology, "objective" orderings and "consensual" orderings are often incorrectly thought to be synonymous.

Kendall (1948a, p. 87) suggeststhat the best estimate of the "true" ranking of the N objects is provided, when W is significant, by the order of the various sums of ranks, R,. If one acceptsthe criterion which the various judges have agreed upon (as evidenced by the magnitude and significanceof W) in ranking the N entities, then the best estimate of the "true" ranking of those entities according to that criterion is provided by the order of the sums of ranks. This "best estimate" is associated, in a certain sense,with least squares. Thus our best estimate would be that either applicant a or f (seeTable 9.11) should be hired for the job opening, for in both of their casesRy = 8, the lowest value observed. And our best estimate would be that, of the 20 mothers of the deaf children, mother 6 (seeTable 9.12), whoseR = 57 is the smallest of the R;, is the mother who is most likely to rear a well-adjustedchild. Mother 2 is the next most likely, and mother 20 is the mother who, by consensus, is the one most likely to rear a maladjusted child. References

Discussions of the Kendall

coefficient of concordance are contained in

Friedman (1940), Kendall (1948a,chap. 6), and Willerman (1955). DISCUSSION

In this chapter we have presented five nonparametric techniques for measuringthe degreeof correlation between variables in a sample. For

eachof these,exceptthe Kendall partial correlationcoefficient,tests of the significanceof the observedassociationwere presented. One of these techniques, the coefficient of contingency, is uniquely applicable when the data are in a nominal scale. That is, if the measurement is so crude that the classifications

involved

are unrelated

within

239

DISCUSSION

any set and thus cannot be meaningfully ordered, then the contingency coefficient is a meaningful measure of the degree of association in the data. For other suitable measures, see Kruskal and Goodman (1954).

If the variables under study have been measuredin at least an ordinal scale, the contingency coefBcientmay still be used, but an appropriate method of rank correlation

will utilize

more of the information

in the

data and therefore is preferable.

For the bivariate case two rank correlation coefBcients,the Spearman rs and the Kendall ~, werepresented. The Spearmanrs is somewhat easierto compute, and has the further advantageof being linearly related to the coefficient of concordanceS'. However, the Kendall r has the advantagesof being generalizableto a partial correlation coefficient and of having a sampling distribution which is practically indistinguishable from a normal distribution for sample sizes as small as 9.

goth rs and r have the same power-efficiency(91 per cent) in testing for the existenceof a relation in the population. That is, with data which meet the assumptionsof the Pearsonr, both ra and r are as powerful as p for rejecting the null hypothesis when ra and r are basedon 10 observations for every 9 observationsusedin computing r. The Kendall partial rank correlation coefficient measuresthe degree of relation between two variables, X and Y, when a third variable, 2

(on which the associationbetweenX and Y might logically depend),is held constant. r~., is the nonparametric equivalent of the partial

product moment r. However,no test of the significanceof partial r is as yet available. The Kendall coefBcient of concordance W measures the extent of asso-

ciation among several (k) sets of rankings of N entities. It is useful in determining the agreement among several judges or the association

amongthreeor morevariables. It hasspecialapplicationsin providing a standard method of ordering entities accordingto consensuswhen there available no objective order of the entities.

REFERENCES

Anderson,R. L., and Bancroft, T. A.

1952. Statieticaltheoryin reeearch. New

York: McGraw-Hill.

Andrews,F. C. 1954. Asymptoticbehaviorof somerank testsfor analysisof variance.

Ann. Math. Stab'et.,$5, 724-736.

Auble,D. 1953. Extendedtablesfor the Mann-Whitneystatistic. BnQ. Inet. Educ.Bee.Indiana Unieer., 1, No. 2.

Barnsrd,G. A. 1947. Signiacance testsfor 2 X 2 tables. Biomet& ke,34, 123-138. Bergman,G., and Spence,K. W. 1944. The logicof psychological measurement Peychol.Bee.,51, 1-24.

Birnbaum,Z. W. 1952. Numericaltabulationof the distribution of Kolmogorov's statisticfor Snitesamplevalues. J. Amer.Statiet.Aee.,41, 425-441. Birnbaum, Z. W 1953. Distribution-freetests of St for continuousdistribution functions. Ann. Math. Statiet.,$4, 1-8. baum, Z. W., and Tingey, F. H. 1951. Oared conSdencecontoursfor

probabilitydistribution functions. Ann.Math. Statiet.,$$, 592-596. Blackwell,D., snd Girshick,M. A.

1954. Theoryof gameeand etatieticaldecieione.

New York: Wiley.

Blum,J. R., and Fattu, N. A. 1954. Nonparametric methods. Res.Educ.Res., $4, 467-487.

Bowker,A. H. 1948. A. test for symmetryin contingency tables. J. A.mer. Statiet. Aee., 48, 572-574.

BrowneG W g andMoodyA. M 1951.

On mediantestsfor linearhypotheses

Proceedinge of theeecondBerheIeyeym~um on mathematicai

ability. Berkeley,Calif.:Univer.of Calif. Press. Pp. 15g 166. Clopper,C. J., and Pesrson,E. 8. 1934. The useof conldenceor fiduciallimits illustratedin the cseeof the binomial. Biometriha,$6, ~13

Cochran, W. G. 1950. Thecomparison of percentages m matched samp]es metrika, 37, 256-266.

C h,W.G.

1g52. Thex*t tofg~~ofnt.

Ann.M~.8agMt.,$3,315

345.

Cochran, W. G. 1954. Somemethods for strengthening the common >s t ts Biometrice,10, 417-451.

Coombs,C. H. Pey~.

1950. Psychological scalingwithout a unit of moment.

m., 5V, 145-158.

Coombs,C. H. 1952. A theoryof Psychologicai scaling. B~ SngngBee. Inet., 34.

Un~

David, F.N.'1g4g.' Prdd@ay d for e~~~ m~A,. New York: C bridge Univer. Press. Davidson, D., 8iegel,8., and Suppes,p.

1955. 241

REFERENCES

on the meaeurementof utility and subjectiveprobaNlity. Rep. 4, Stanford Value Theory Project.

Dixon, W. J.

1954. Power under normality of several non-parametric tests. Ann.

Math. Statiet., 26, 610 614.

Dixon, W. J., and Massey, F. J. York:

1951. Introduction to etatietical analyeie. New

McGrsw-Hill.

Dixon, W. J., and Mood, A. M. 1946. The statistical sign test. J. Amer. Statist. Aee., 41, 557 566. Edwards, A. L. 1954. Statistical methods for the behavioral eciencee. New York: Rinehart.

Festinger, L. 1946. The significanc of differences between means without reference to the frequency distribution function. Peychometrikag 11' 97 105. Finney, D. J. 1948. The Fisher-Yates test of significance in 2 X 2 contingency tables. Biometrika, 86, 145-156. Fisher, R. A. 1934. Statistical methods for research workers. burgh: Oliver k Boyd.

(5th Ed.)

Edin-

Fisher, R. A. 1935. The designof experiments. Edinburgh: Oliver dcBoyd. Freund, J. E. 1952. Modern elementary etatietics. New York: Prentice-Hall, Friedman, M. 1937. The useof ranks to avoid the assumption of normality implicit in the analysis of variance.

J. Amer. Statiet. Aee., 82, 675 701.

Friedman, M. 1940. A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Statist., 11, 86-92. Goodman, L. A. 1954. Kolmogorov-Smirnov Psychol. Bull., 61, 160-'168.

Goodman, L. A., and Kruskal, W. H. classifications.

Hempel, C. G.

tests for psychological

research.

1954. Measures of association for cross

J. Amer. Statist. Aee., 4$, 732 764.

1952. Fundamentals of concept formation in empirical science.

Int. Encycl. Unif. Sci., 2, No. 7.

(Univer. of Chicago Press.)

Hotelling, H., and Pabst, Margaret R. 1936. Rank correlation and tests of significance involving no assumption of normality. Ann. Math. Statist., 7, 29-43. Jonckheere,A. R. 1954. A distribution-free k-sample test against ordered alternatives.

Biometrika, 41, 133-145.

Kendall, M. G.

1938. A new measureof rank correlation.

Biometrika, 80, 81 93.

Kendsll, M. G. 88, 239-251.

1945. The treatment

of ties in ranking problems.

Biometrika,

Kendall, M. G.

1947. The variance of r when both rankings contain ties. Bio-

metrika, 84, 297-298.

Kendall, M. G. 1948a. Rcnk correlation methods. London: Griffin. Kendall, M. G. 1948b. The advanced theory of statistics. Vol. 1. (4th Ed.) London:

Griffin.

Kendsll, M. G.

1949. Rank and product-moment correlation. Biometrikaa,86,

177-193.

Kendall, M. G., and Smith, B. B.

1939. The problem of m rankings. Ann. Math.

Statist., 10, 275-28?.

Kolmogorov, A.

1941. Confidence limits for an unknown distribution functon.

Ann. Math. Statiet., 12, 461-463.

Kruskal, W. H.

1952. A nonparametric test for the several sample problem.

Ann. Math. Statiet., 28, 525-540.

Kruskal, W. H., and Wallis, W. A. analysis.

1952, Use of ranks in one-criterion variance

J. Amer. Statist. Aee., 47, 583-621.

Lstscha,R. 1953. Testsof significancein a 2 X 2 contingencytable: Extensionof Finney's table.

Biometrika, 40, 74-86.

REFERENCES

Lehmann,E. L.

1953. The power of rank tests. Ann. Math. Statist., 24, 23-43.

Lewis, D., and Burke, C. J. 1949. The use and misuseof the chi-squaretest. PsychoL Bull., 46, 433489.

McNemar, Q. 1946. Opinion-attitude methodology. PsychoLBull., 4$, 289-374. McNemar, Q. 1947. Note on the sampling error of the differencebetweencorrelated proportions or percentages. Psychometrika,12, 153 157. McNemar,

Q.

1955.

Psychologieat statistics.

(2nd

Ed.)

New

York:

Wiley.

Mann, H. B., and Whitney, D. R. 1947. On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Statist., 18, 50-60.

Massey, F. J., Jr.

1951a. The Kolmogorov-Smirnov test for goodnessof fit.

J.

Asser. Statist. Ass., 46, 68-78.

Massey,F J pJr. 1951b. The distribution of the maximum deviation between two sample cumulative step functions. Ann. Math. Statist., 22, 125-128. Mppd, A, M. 1940. Th. distribution theory of runs. A nn. Math, Statist., 11, 367-392.

Mppd, A. M.

1950. Introduction to the theory of statistics. New York: McGraw-

Hill.

Mppd A. M. 1954. On the asymptotic efficiency of certain non-parametric twosample tests. Ann. Math,. Statist., 26, 514-522.

Moore,G. H., and Wallis,W. A. 1943. Time seriessignificancetestsbasedon signs of differences.

J. Amer. Statist. Ass., 38, 153-164.

Moran, p. A. P. 1951. Partial and multiple rank correlation. Biometrika,38, 26-32.

Mpses,L. E. 1952a. Non-parametric statistics for psychologicalresearch. Psychol. Bull., 4$, 122-143. Mpses L. E.

1952b.

A temple

Mpsteller, F. 1948. A ample

test.

Psychometrika 17 239-247.

slippagetest for an extremepopulation. Ann.

Math. Statist., 10, 58-65.

Mpsteller, F., and Bush, R. R. 1954. Lindsey (Ed.), Handbook of social Cambridge, Mass.: Addison-Wesley. Mosteller, F., and Tukey, J. W. 1950.

Selected quantitative techniques. In G. psycholotiy. Vol. 1. Theory and method. Pp. 289-334. Significance levels for a k-sample slippage

Ann. Math. Statist., 21, 120 123.

Olds,E. G. 1949. The 5% significancelevelsfor sumsof squaresof rank differences snd a correction.

Ann. Math. Statist., $0, 117 118.

pitman, E. J. G. 1937a. Significancetests which may be appliedto samplesfrom any populations. Supplementto J. R. Statist.Soc.,4, 119130. pitman, E. J. G. 1937b. Significancetests which may be appliedto samplesfrom

any populations.II. The correlationcoefficienttest. Supplementto J. R Statist. Soc.,4, 225-232.

Pitman, E. J. G. 1937c. Significancetests which may be appliedto samplesfrpm any populations. III. The analysisof variancetest. Biometrika,2$, 322335.

Savage,I. R. 1953. Bibliographyof nonparametric statisticsand relatedtopics J, Amer. Statist. Ass., 48, 844-906.

Savage, L. J. 1954. Thefoundations of statistics.NewYork: Wiley Scheffh,H. 1943. Statistical inferencein the non-parametriccase. Ann. Math. Statist., 14, 305-332.

Siegel,S. 1956. A methodfor obtainingan orderedmetric~g

P

21, 207-216. tions.

L Table for estimating thegoodness pf fitof Ann. Math. Statist., 10, 2?9-281.

ld

REFERENCES

Smith,K. 1953. Distribution-free statistical methods andthe concept of power efficiency. In L. Festingerand D, Kats (Eds.), Reeearch methodein thebehaeioral eciencee.New York: Dryden. Pp. 536-577.

Snedecor, G. W. 1946. Statisticalmethods.(4th Ed.) Ames,Iowa: Iowa State CollegePress.

Stevens,8. 8. 1946. On the theoryof scalesof measurement.Science, 10$,677680.

Stevens,S. S. 1951. Mathematics,measurement, and psychophysics. In 8. 8. Stevens (Ed.),Handbook ofexperimental peychology, NewYork:Wiley.Pp.1-49. Stevens,W. L.

1939. Distribution of groupsin a sequenceof alternatives. Ann.

Eugenics,9, 10 17.

Swed,FriedsS., andEisenhart,C. 1943. Tablesfor testingrandomness okgrouping in s sequenceof alternatives. Ann. Math. Starlet.,14, 66-87. Tocher, K. D. 1950. Extensionof the Neymsn-Pesrsontheory of teststodiscontinuous variates.

Tukey, J. W.

Biometrika, $7, 130-144.

1949. Comparing individual means in the analysisof variance.

Biometrice, 6, 99-114.

Wald, A. 1950. Statisticaldecision functione. New York: Wiley. Walker, Helen M., snd Lev, J. 1953. Statieticalinference. New York: Holt. Wslsh, J. E. 1946. On the powerfunction of the sign test for slippageof means. Ann. Math. Statist.,17, 358 362. Walsh, J. E. 1949a. Somesignificancetestsfor the medianwhichare valid under very generalconditions. Ann. Math. Statiet.,20, 64-81. Walsh, J. E.

1949b.

Applications of somesignificancetests for the median which are

valid under very generalconditions. J. Amer. Statist.Aee.,44, 342-356. Welch, B. L. 1937. On the e-test in rsndomisedblocksand Latin squares. Biometrika, 20, 21-52.

White, C. ments.

1952. The useof ranksin a test of significance for comparingtwo treat Biometrice, 8, 33-41.

Whitney, D. R. 1948. A comparisonof the powerof non-parametrictestsand tests basedon the normal distributionunder non-normalalternatives. Unpublished doctor'sdissertation,Ohio State Univer. Whitney, D. R. 1951. A bivariate extensionof the U statistic. Ann. Math. Statist., %2, 274-282.

Wilcoxon, F.

1946. Individual comparisonsby ranking methods. Biometrics

Bull., 1, 80-83.

Wilcoxon,F. 1947. Probabilitytables for individualcomparisons by ranking methods. Biometrics,$, 119-122,

Wilcoxon,F. 1949. Somerapid approximateetatietios/proceduree. Stamford, Conn.: AmericanCyanamid Co.

Wilks,S. 8. 1948. Orderstatistics. BulLAmer.Math. Soc.,64, 6 50 %inkerman,B.

1955. The adaptationand useof Kendall's coefficientof concord

ance(W) to sociometric-type rankings. PeychoL Bull., 62, 132-133. Yates, F. 1934. Contingencytablesinvolvingsmall numberssnd the xe test. 8upplementto J. R. Statist.Soc.,1, 217-236.

LIST

OF TABLES

able

Page

Table of Probabilities Associated with Values as Extreme as Observed Values of e in the Normal Distribution ............................. 247 B. Table of Critical Valuesof l. ....................... 248 24g C. Table of Critical Valuesof Chi Square..................... Associated with Valuesss Smallas ObservedValuesof D. Table of ProbabiTities s in the BinomialTest. ~ .. ~ -............. 250 E.

Tableof CriticalValuesof D in the Kolmogorov-Smirnov On~mple Test 251

F.

Table of Critical Valuesof r in the Runs Test

O.

Tableof CriticalValues of T in theWilcoxon Matched-pairs Signed-ranks

.. 252

Test. H. Table of Critical Values for the Walsh Test.

254 255

Table of Critical Valuesof D (or C) in the FisherTest............. 256 Table of Probabilities Associated with Values as Small as Observed Values J. of U in the Mann-Whitney Test......................... 271 274 K. Table of Critical Valuesof U in the Mann-Whitney Test................ I.

L.

Table of Critical Valuesof Xo in the Kolmogorov-Smirnov Two-sample Test (Small Samples).

M.

Tableof CriticalValuesof D in the Kolmogorov .11 or z < .11

is p ~ .4562. .01

.02

.03

.04

.05

.06

.07

. 5000 . 4602 . 4207 .3821 . 3446

.4960 .4562 .4168 .3783 .3409

.4920 .4522 .4129 .3745 .3372

.4880 . 4483 . 4090 .3707 .3336

.4840 . 4443 . 4052 .3669 .3300

.4&01 .4404 .4013 .3632 .3264

.4761 .4364 .3974 .3594 .3228

.4721

.4681

.4325

.4286

.3050 .2709

.3015 .2676

.2981 .2643

.2946 .2611

.2912

.2877

.2578

.2546

.2389

.2358

.2090 .1814

.2061 .1788

.2327 .2033

.2296 .2005

.9

. 3085 .2743 .2420 .2119 .1841

.1762

.1736

.2266 .1977 .1711

.2236 .1949 .1685

1.0

.1587

.1562

.1539

1.1 1.2 1.3 1.4

.1357 .1151 .0968 .0&08

.1335 .1131 .0951 .0793

.1314 .1112 .0934 .0778

.1515 .1292 .1093 .0918 .0764

.1492 .1271 .1075 .0901 .0749

.1469 .1251

.1446 .1230

.1056 .0885 .0735

1.6 1.6 1.7 1.8 1.9

.0668 .0548 .0446 .0359 .0287

.0655 .0537 .0436 .0351 .0281

.0643 .0526 .0427 .0344 .0274

. 0630 .0516 .0418 .0336 .0268

. 0618 .0505 .0409 .0329 .0262

2.0

.0228 .0179 .0139 .0107 .0082

.0222 .0174 .0136 .0104 .0080

.0217 .0170 .0132 .0102 .0078

. 0212 .0166 .0129 .0099 .0075

. 0062 .0047

.0060 .0045 . 0034

.0059 .0044 . 0033

.0026 . 0019

.0025

.0024

.0018

.0018

3.0 3.1 3.2 3.3 34

.0013

.0013

.0013

3.6 3.6 3.7 3.8 3.9

. 00023 . 00016

4.0

.00003

.0 .1

.2 .3 4 .5 .6

.7 .8

2.1 2.2 2.3 2.4 2.6

2.6 2.7 2.8

2.9

.0035

. 0010 . 0007 .0005 . 0003

. 00011

. 00007 . 00005

.08

.3936

.3897

.3557 .3192

.3520 .3156

.4641 .4247 .3859 .3483 .3121

.2843 .2514 .2206 .1922 .1660

.2810 .2483 .2177 .1894 .1635

.2776 .2451 .2148 .1867 .1611

.1038 .0869 .0721

. 1423 .1210 .1020 .0853 .0708

. 1401 .1190 .1003 .0838 .0694

.1379 .1170 .0985 .0823 .0681

.0606 .0495 .0401 .0322

.0594 .0486 .0392 .0314

.0582 .0475 .0384 .0307

.0571 .0465 .0375 .0301

. 0367 . 0294

.0256

.0250

.0207 .0162 .0125 .0096 .0073

.0202 .0158 .0122 .0094 .0071

.0197 .0154 .0119 .0091 .0069

.0192 .0150 .0116 .0089 .0068

.0188 .0146 .0113 .0087 .0066

. 0183 .0143 .0110 .0084 .0064

.0057 .0043 .0032 .0023 .0017

.0055 .0041 .0031 .0023 .0016

.0054 .0040

.0052 .0039

.0030

.0029

.0022 .0016

.0021 .0015

.0051 .0038 .0028 .0021 .0015

.0049 .0037 .0027 .0020 .0014

.0048 .0036 .0026 .0019 .0014

.0012 .0009

.0012 .0008

.0011 .0008

. 0011 .0008

. 0011

. 0010

.0008

.0007

.0010 .0007

0244

0239

hPPENDIX

T~m

B. Thnus or Cnntchr

Vhx.vm:soI te

Level of significancefor one-tailed test .10

.01

df Level of significancefor two-tailed test .10

3.078 1.886 1.6$8 1.533 1.476

69 8 7 1. 440 1.416 1.$97

.01

.001

63. 657 9.925

636.619 31.598 12.941 8.610 6.859

6.314 2.920 2.$53 2.132 2.016

12. 706 4.303 3.182 2.776 2.571

31.821 6.965 4.641 3.747 3.365

2. 447 2.365 2. 306 2.262 2.228

3.143

3.707

5.959

2.998 2.896 2.821

3.499 3.366 3.260

6.041

2.764

3.169 3. 106 3.056 3.012 2.977 2.947

4.318 4.221 4.140

4.016

6.841 4.604 4.032

10

1.372

1.94$ 1.895 1.860 1.8$$ 1.812

11 12 13 14 16

1.363 1.356 1. 350 1.345 1.341

1.796 1.782 1.771 1.761 1.753

2.201 2.179 2.160 2.145 2.131

2.718 2.681

16

1.337

1.746

1. 333 1. 330 1.328 1.325

1.740 1.734 1.729 1.725

2.120 2.110 2.101 2.093 2.086

2.6N 2.667 2.662 2.639 2.528

2. 921

17 18 19 20

1.323 1.321 1.319 1.318 1.316

1.721 1.717 1.714 1.711 1.708

2.080 2.074 2.069 2.064 2.060

2.518

2. N1

3.819

2.608

2.819

3.792

2.500 2.492 2.485

2.&07

3.767

2.797 2.787

3.746 8.726

1.316 1.314 1.813 1.311 1.310

l. 706 1.70$ 1.701 1.699

2.056 2.062 2.048

2.479

2.779

3.707

2.473

2.771

2.763

3.690 3.674

2.046

2.467 2.462

2.756

3.669

1.697

2.042

2.467

2.760

3.646

1.303 1.296 1.289 1.282

1.684 1. 671 1. 658 1. 646

2.021 2.000 1.980 1.960

2.423 2.390

2. 704 2.660

2.358 2.326

2. 617

3. 651 3. 460 3.373 3.291

1.383

2.624

2.898 2.878 2.861 2.845

2.676

6.405 4.781 4.687 4.437

4.073

3.922

3.88$ 3.860

~ TableB ia abridged fromTableIII of FisherandYates:Stotieticol tables for biobpicsl, atIricultural, andmedical research, published by Oliverand BoydLtd., Edinburgh,by permissionof the authorsand publishers.

APPENDIX

249

TAELE C. TAELEos' CarnCAL VALUESOy CHI SOUAEEe ProbabiTityunder Hs that x»

.98

.99

. QQQ15 .02 .12 .SO .66

.00

. 00063 .0039 .10 .04 .35 .18 .71 .43 1.14 .Td

.&0

.70

.016 .21 .$8 I.OB 1.61

.064 . IS .45 .71 1.00 1.42 1.5S 2.20 2.34 3.00

2.20 2.83 3.49 4.17

S.OT 8.82 4.60 6.38 5.18

ehi square

.60

.30

.46 1.80 2.3T 3.86 4.85

1.07 2.41 3.56 4.$8 B.oe

.10

1. 64 8.22 4. 64 6.00 7. 20

.05

.02

.01

2.TI 3.$4 d.41 6.84 4.50 6.90 7.82 0.21 6.26 7.82 0.84 11.34 7.78 0.40 11.67 13.28 0.24 11.07 13.39 16.09

.001

10.83 13.82 IB.ST 18.4B 20.62

9 10 2.68

2.63 3.06

1.64 2.17 2.73 3.32 S.94

11 12 13 14 16

3.0$ 3.67 4.11 6.28

3.61 4.18 4.76 6.37 6.9$

4.ds 6.23 6.89 6.67 7.25

6.68 6.30 7.04 7.79 8.65

6.00 7.81 8.83 1.47 10.81

$.15 9.03 9.03 10.82 11.72

10.34 11.34 12.34 13.34 14.34

12.00 14.01 IS.12 16.22 IT.SS

14. Bs 17.28 15.81 1$. 55 IB.08 10.81 18. 1$ 21.05 19.31 22.31

19.68 21.03 22.36 23.68 25.00

22.62 24.7S 24.05 25.22 25.47 27.e0 26 EST20.14 SS.SB 30.6

31.26 32. 01 Sa.ds 36. 12 37.70

15 17 18 10 20

6.81 5.41 7.02 7.68 8. 26

B.BI 7.26 7.91 8.6T 0.24

7.05 s.eT 9.39 10.12 10.8S

0.31 lo.os 10.85 11.66 12.44

11.15 12.00 12.86 13.72 14.SS

12.62 13.63 14.44 1$.3$ 16.27

15.34 IB.34 17.34 1$.34 IO.S4

18.42 19.51 20.60 21.59 22.7

20.45 23.54 21.62 24.77 22.7B 25.90 2S.OO 27.20 2$.04 28.41

Se.so 27.60 28. 87 30.14 31.41

20.63 82.00 31.00 83.41 32.$6 34. SS.B9 86.19 35.02 37.67

30.20 40. T6 42.31 43.82 46.82

Sl $.90 22 0.64 Ss 10.20 24 10.86 SS 11.62

9.92 10.60 11.29 11.09 12.70

11.69 12.34 18.00 IS.$6 14.51

18.24 14.04 14.86 16.65 IB.4T

16.44 16.81 17.19 18.06 18.94

17.18 18.10 10.02 19.94 20.87

20.34 21.24

26. 17 29.62 30.81 28.43 32.01 23.34 27.10 29. 65 S3.2 24.84 28.17 So. 88 84.38

32.eT 36.34 33.02 37.66 85.17 38.07 86.42 40.27 ST.BS 41.67

12.20 12.88 13.66 14.25 14.06

18.41 14.1S 14.85 IS.ST 15.31

16.88 16.16 16.93 17.71 18.40

17.20 18.11 1$.94 19.7T 20.60

10.82 20.70 21.60 22.48 Ss.se

21.79 22.72 23.65 24.68 25.$1

26.34 26.34 27.34 28.34 20.34

88.88 42.86 46.64 40. 11 44. Ia 46.08 41.34 45.42 48.28 42.$e 46.69 40.60 43.77 4T.QB 60.80

.87 1.24 1.65

25 27 28 20 $0

1. 13 1.65

3.83 6.35 7.23 8.66 10.64 4.B7 e.sd s.ss 1. &0 12.02 6.63 7.34 0.52 11. QS 13.36 6.39 8.34 IO.BB 12.24 14.68 7.27 0.84 11.78 13.44 15.00

20.2S sl. &0 35.66 S0.32 S2.01 35.74 31.30 37.02 32.45 35.14 39.00 33.58 40.28

12.69 14.0T 15.SI 16.02 18.31

15.03 16.81 16.B2 18. 48 18.1T 20.09 10. 21.57 21. 16 23.21

22.46 24. 82 SB. 12 27.88 20.60

ss.Os 40. 29 41. 54 42.08 44. $1

ae.so 48. ST 40.78 51.18 62.62 64.05 56.48 66.$9 68.$0 50.70

s Table C isabridged fromTableIV of Fisher andYates:StpHstksf talesfprhfsfettjcpf, pttrfsidturpf, padinsdfspf research, published by OliverandBoydLtd.~ Edinburgh, by permission of the authorsand publishers

APPENDIX

TABLE D. TABLE OF PROBABILITIES ASSOCIATED WITH VALUES hs SMALL AS OBSERVED VALUES OF S IN THE BINOMIAL TEST4

Given in the body of this table are one-tailed probabilities under Ho for the binomial test when P Q $. To save space,decimal points are omitted in the p's. 01

56 78

23

45

67

89

10

11

12

031

188 500

812

969

016

109 344

656

891

984

f

008

062

227

500

773

938

992

004

035

145 363

637

855

965

9

002

020

090

254

500

746

910 980 998

10

001

011

055

172 377

623

828

945

989

999

500

726 613 500 395

887 806 709 605

967 927 867 788

994 f f 981 997 f f 954 989 998 f 910 971 994 999

11

006

033

113 274

12

003

019

073

194 387

13

002

011

046

133 291

14

001

006

029

090

212

13

f

f

004

018

059

151

304 500 696 849 941 982 996 f f

16

002

011

038

105

227 402 598 773 895 962 989 998

17

001

006

025

072

18

001

004

015

048

20 21

001

19

22 23 24 25

315 240 180 132

500 407 324 252

039

095

026 001 005 017 001 003 011 002 007

067 047 032 022

004 002

013 008

15

996

15

166 119 002 010 032 084 001 006 021 058

14

685 593 500 412

f, ff

834 760 676 588

928 881 820 748

975 952 916 868

994 985 968 942

999 996 999 990 998 979 994

192 332

500

668

808

905

961

987

143 105 076 054

416 339 271 212

584 500 419 345

738 661 581 500

857 798 729 655

933 895 846 788

974 953 924 SSS

262 202 154 115

~ Adaptedfrom Table IV, B, of Walker, Helen, and Lev, J. 1953. Statistical inference. New York: Holt, p. 458, with the kind permissionof the authors and publisher.

f 1.0 or approximately 1.0.

hPPENDIX

251

TABLE E. TABLE OP CRITICAL VALUES OP D IN THE KOLMOGOROVSMIRNOV ONE-SAMPLE TEsTC

Sample

Level of significancefor D

maximum (Fo(X)

Sa(X))

sise

(N)

6 87

.15

.10

.05

.01

. 900

.925

.950

.975

.995

. 684

.726

.776

.842

.929

.565

.597

.708

.828

.494

.525

.564

.624

.733

.446

.474

.510

.565

.669

.410

.436

.470

. 521

.618

.381

.405

.438

.486

.577

.358

.381

.411

.457

9

.339

.360

.388

.482

.514

10

.322

.342

.36S

.410

.490

11

.307

.326

.852

.391

.46S

12

.295

.313

.338

.375

.450

13

.284

.302

.325

.361

.433

14

.274

.292

.814

.349

.418

15

.266

.283

.804

.338

.404

16

.258

.274

. 295

.328

.392

17

.250

.266

.286

.318

.381

18

.244

.259

.278

.309

.371

19

.237

.252

.272

.801

.363

20

.231

.246

.264

.294

.356

25

.21

.22

.24

.82

30

.19

.20

35

.18

.19

.21

.27

1.07

1.14

1. 22

1. 63

vN

vN

VN

QN

Over

35

+ adapted from Massey,F. J., Jr. 1951. The Kolmogorov-Smirnovtest for

goodness of fit. J. Abner. Statisl.Assp46'70>with thekind permission of theauthor and publisher.

APPENDIX

Tom

F. TAnuz or CamcA~

VAavas

or r m

Tms RvNs TmsT~

Given in the bodies of Table F> and Table Fu are various critical values of r for various values of a~ and n~. For the on~ample runs test, any value of r which is

equalto or smallerthan that shownin Table Fr or equalto or largerthan that shown in Table Fri is signiflcantat the .05 level. For the Wald-Wolfowitstwo-sampleruns test, any value of r which is equal to or smallerthan that shownin Table Fq is signiScant at the .05 level. Table F>

23

45

2 43 56 7

67

89

10 11

12

13

22

22

14

15

16

22

17

18

19

20

22

22

2

22

22

22

22

22

3

33

33

3

23

33

33

33

3

44

44

4

33

33

44

44

4

44

44 55 56

55 55 66

5 6 6

55 67

55 56 B6 77

5 6 6 7

7

77

88

8

7

88

88

9

8

89

99

9

3

22

33

33

44

22

33

34

45

8

23

33

44

55

9

23

34

45

55

66

67

10

23

34

55

56

B7

7

11

23

44

55

66

77

78

7

6B

12

22

34

4

56

67

77

88

8

99

13

22

34

5

5B

67

?8

89

9

9 10

14

22

34

5

5B

77

88 88

99 99

9 10

10

89

9 10

10 11

ll

9 10

10

10

10

10

10

10

11

11

10

11

ll

11

12

11

11

11

12

12

11

12

12

13

15

23

34

5

66

77

16

23

44

5

17

23

44

5

10

23

45

5

7 8 78 88

99

18

66 67 67

99

10

10

11

ll

12

12

13

13

19

23

45

6

67

88

9 10

10

11

11

12

12

13

13

13

20

23

45

6

B7

89

9 10

10

11

12

12

13

13

13

14

10

~Adapted from Swed, Frieda S., and Eisenhart, C. 1943. Tables for testing randomnea of grouping '.n a sequenceof alternatives. Aas. Math. Statiet., 14, 83-86, with the kind permissionof the authorsand publisher.

TABLE F. TABLE oF CEITIcAL VALUEs oF r IN Tss RUN$ TEsT~ (Coahamd) Table FII

23

45

67

89

10 11 12

13

14

15

16

17

18

19

20

17

17

17

17

18

2 3 99

4

78 6 5 9 9 10

10 11 11

9 10

11 12 12 13 13 13 13

11 12 13 13 14 14 14

14

15

15

15

11 1'2 13 14 14 15 15

16

16

16

16

17

18

13 14 14 15 16 16

1B

17

17

18

18

18

18

10

13 14 15 16 16 17

17

18

18

18

19

19

19

11

13 14 15 16 17 17

18

19

19

19

20

20

21

21

12

13 14 16 16 17 18

19

19

20

20

21

21

21

22

22

13

15 16 17 18 19

19

21

14

15 1B 17 18 19

20

15

15 16 18 18 19 21

25

25

17

17 18 19 20

18

2B 2e

26 27

19

22 23 24 25 25 2B 2e

23 24 25

17 18 19 20

22 23 23 24 25 25 2e

23 23 24

16

21 22 23 23 24 25 25

27

27

25

26

21

22

21

22

22

21

22

23

21

22

23

23

17 18 19 20

21

22

23

24

17 18 20 21

22

23

23

24

17 18 20 21

22

23

24

20

28

~opted from Swed, Frieda 8., and Eisenhart, C. 1943. Tables for testing randomnessof grouping in a aequenceof alternatives. Aaa. MatL 8totiei., 14, 83-86, with the kind permissionof the authorsand publisher.

hPPENMX

TAELE G. TABLE OF CRITICAL VALUES OF T IN THE WILCIEtON MATCHED-PAIRS SIGNED-RANKS TEST+

~ Adapted from Table I of Wilcoxon, F. 1949. Somerapid approzimatestatistical procedures. New York: American Cyanamid Company, p. 13, with the kind permission of the author and publisher.

255

APPENDIX

TABLE H. TABLE Op CBITICAL VALUES FOB THE WALSH TESTe Tests Bigni6cance level of teste Two-tsileds accept «s es 0 if either

10

16

Onetailed

Two tailed

.062

.125

de 0

.Q62 ,Q31

.125 .062

j(de+ da) < 0 ds 0 ds >0

.047 .031 .016

.094 .062 . 031

max [ds. j(de + de)] < 0 j(de+ de) < 0 de 0 j(ds + ds) > 0 ds >0

.055 .023 .016

. 109 . 047 .031 .016

msx (da, j(de + dr)] < 0 max [de. j(de+ dr)] < 0 j(de + dr) < 0 dr 0 min [dk, j (ds + ds)] > 0 j(ds + do) > 0 ds >0

.043 .027 .012

. 086 . 055 .023 .016 .008

max [de, j(de + de)] > 0 msz [do, j(da + d )] < 0 max [dr. j(de + ds)] < 0 j(dr + ds) < 0 ds 0 min [do, j(ds + de)] > 0 [ds, j(A+ ds)] >0 j(de+ do) > 0 ds >0

.061 .022 .010 .006

.102 .043 .020 .012 .008

msx max max max

(de, j(de + do)] (dr, j(da + do)] (do, j(de + do)] [do, j(dr + do)] j(do + do) < 0

min min min min

.056 .026 .011 .006

.111 .051 .021 .010

msx max max maz

[ds, j(de+ dse)] < 0 (dr, j(ds+ dso)] < 0 [ds, j(e4+ dso)] < 0 [do, j(de + dse)] < 0

.048 .028 .011

.097 . 056 .021 .011

maz maz msx max

(dr, j(de + dss)] < 0 (dr, j(ds + As)] < 0 [j(de + dss), j(do + do)] < 0 [do, j(dr + dss)] < 0

.p47 .p24 .pip .pp6

.094 .048 .020 .011

max msx max maz

[j(A + At). j(ds + As)] < 0 [do, j(da + Ao)] < o (ds. j(ds + dss)] < 0 [j(dr + dso), j(de + dso)] < 0

min [j(ds + de), j(dk + ds)] > 0 min [ds,j(ds + ds)] > 0

.047 .023 .010

. 094 .047 .020 .010

max max max maz

[j(de + dss), j(do+ As)) < 0 (j(ds + dss), j(de+ dss)] < 0 (j(ds+ dss), j(de+ dse)] < 0 [dso,j(dr + dso)] < 0

min [j(ds + dse), j(ds + do)] > 0 min [j(ds + de), j(do + ds)] > 0 min [j(ds + de), j(de + da)] > 0 min [de,j(ds+ dr)] >0

. 047 .023 .010 pp5

094 047 020 .010

maz [j(de + dse), j(de + dss)] < 0

.047 .023 .010 .006

. 094 .047 .020 .010

max [j(de + dss), j(da + dse)] < 0

One-tailed: accept «s < 0 if

0 >0

min [ds, j(ds + dr)] > 0 min (A, j(ds + ds)] > 0

min [ds,j(ds + ds)] > 0 min [ds,j(ds + da)] > 0 min [ds,j(A + ds)] > 0 min [ds. j(ds + dr)]

>0

min (j(ds + do), j(do + A)] > 0 min [do,j(ds + da)] > 0

min [de, j(ds + dr)]

>0

(j(ds + de), j(ds + de)l > 0

min[j(ds + dss),j(ds + dso)]> 0 min [j(dr+ dso), j(dh+ de)] > 0 min [ds, j(ds + de)] > 0 min [j(ds + ds), j(de + ds)] > 0

min [j(ds + dss),j(dh+ dss)]> 0 min [j(ds + dss),j(ds + Ao)] > 0 msn(j(ds + Ao). j(ds + ds)] > 0 min [ds,j(dr+

ds)] > 0

e Adapted fromWelsh,J. E. 1949. Applicationo of some significance testeforthemedian which validunderverygeneral conditions. If. Arrsor arersof. koo.,44o343,withthekindpermission of theauthorandthepublisher.

Trna@ I. Trna@ or Cmvxc~x.Vaxxxnsor D (oa C) xxxma Fxsann Tnt', f

'Adapted from Finney, D. J, 1948. The Fisher-Yatestest of significancein 0 X 2 contingencytables. Biomctriko,$4, 149-154, with the kind permissionof the author and the publisher.

TABLEI. TABLEQFCRITIcllLVhLUESoF D (oR C) IN THE FIsHERTEsT~,f (Cosliaued)

f ~en B isentered in themiddle column, thesigniScance levehareforD. When piaceof B, the signiSoance levelsare for C.

258

hPPENDIX

ThRLE I. ThELE oF CRITIchL VhLUES QF D (oR C) IN THE

FIsHER TEST >f (CoNIIascd)

~ Adapted from Finney, D. J. 1948. The Fisher-Yates test of significance in 2 X 2 contingency tables. BimnetrIka, $6, 149-154, with the kind permissionof the

author and the publisher.

JO?PENDIX

TABLEI. TABLEoF CRITIchLVALUEsQF D (OE C) IN TEE FxsHERTEsT', f (Continued)

f WhenB isenteredin themiddlecolumn,thesigni6cance levelsarefor D. When place of B, the significancelevels are for C.

T~aa

I. Tmm os Cnnrc~z. V~r.vmsos D (oa C) w nm FI8HER TE8T~,f (Continued)

Adapted from Finney, D. J. 1948. The Fisher-Yatestest of signiScancein 9 X 2 contingencytables. Bsomctrika,14, 149-164, with the kind permissionof the author and the publisher.

APPENDIX

261

Talus I. T~m or' Carnc~rVasss or D (on C) IN THE FrsanaTssv~,f (C~insed)

T %hen B lsentredm themiddle colnmn, the~SeancelevelarefotD When placeof B, the aigni5cance levelsare for C.

APPENDIX

262

TABLE I. TABLE OF CRITICAL VALUES OF D (OR C) IA THE

FIsHER TEsT*, f (ContinIIed)

90

0

80

C+D

3

12 11

C+D

2

10 i0

0 00

10 9

00

12

00

11

0

0

~ Adaptedfrom Finney,D. J. 1948. The Fisher-Yates test of significance in 2 X 2 contingencytables. BionIetrika,86, 149154,with the kind permissionof the author and the publisher.

APPENDIX

263

ThBI.E I. ThBLE OP CRITIChLVhLUESOP D (OR C) IN THE FIsHER TEsT», f (Continued) Totals in right margin A+B

13

C+D

B (or A)f 11

13

4

11

4

13

10 9

87 65 13 12 ll

10 9

8

5

0 5 2 1 0 3 1 0 6 3 1 0 4 2 1 5 3 14 2 0 50 0 4 0 4 1 0 2 1 5 3 1 0 4 2 1 31 2 2 0 40 1

00 6

3 12

0

1

0

4

3 12

0 00 0 00

87 65

C+D

.005

6 45 3 12 3 1 2 3 311 2 32 6

ll

=9

.01

12

12

C+D

.025

5

87 65 - 10

.05 7

10 9

C+D

Level of significance

00 5 3 12 3 12 2 10 0 3 1 1 10 9 78 6 13

4

12 ll

C+D

7

13 12 11

0 20 0 1 0 3 4 13 2 12

10

00 8 70 0 6

9

f WhenB is enteredin themiddlecolumn,the significance levelsarefor D. IIsed in place of B, the significance levels are for C.

When

hPPENDIX

264

Thaas I. Tham

or Canxchx. Vhx.vm or D (oa C) xN ma Fxsxxaa Tasv~, f (Coatiaued)

~ Adaptedfrom Finney, D. J. 1948. The Fisher-Yatestest of significancein 9 X 9 contingencytables. Biorsebiko,Sl, 149-154,with the hnd permissionof the author and the publisher.

T~m I. Tmm or Caner. Vmvasos'D (oa C) IN TH% Freya Tam~,f (Continued)

f ~en B isentered iathemiddle column, theaigai6cance levels arefog D. place of Bgthesigai6eanoe levels yrefogC.

APPENDIX

ThELE I. ThELE oF CRITIchL VhLUES oF D (oR C) IN THE

FIsHER TEsT~,f (Continued)

' Adaptedfrom Finney,D. J. 1948. The Fisher-Yates test of signi6cance in 2 X 2 contingency tables. Biomctrika, ld, 149-164,with thekindpermission of the author and the publisher.

APPENDIX

Ter,s I. Twas or CmvxcmVAI.vms os D (oa C) IN ma FIsHERTE8Tpt (Continued)

f ~en B isentered inthemiddle column, thesigni6cance levels areforD. ~en ~ in

placeof B, thesigniScance levelsarefor C.

APPENDIX

Tmua I. Trna@ or CaxTxcax.Vxx,vas or D (oa C) xN vxxa Fxsxxaa TasT», f (Continued)

~hdspted from Finney, D. J. 1948. The Fisher-Yatestest of significanceln 8 X 9 contingencytables. Bios''troika,N, 149-164,with the kind permissionof the author and the publisher.

APPENDIX

Tmm

269

I. Tmaa or Carnmx Vmmzs or D (on C) m Tan Fxsaan Tash~,f (Continued)

) WhenB is enteredin themiddlecolumn,the signilcancelevelsarefor D. When needin placeof B, the signiSoance levels are for C.

270

hPPENDIX

TABLE I. TABLE oF CRITIchL VhLUES QF D (oR C) IN THE FxsEER TEsT, f (Continued)

+hdapted from Finney, D. J. 1948. The Fisher-Yatestest of significancein 2 X 2 contingency tables. Biometrika, S6, 149-154, with the kind permissionof the author and the publisher.

f When B is enteredin the middlecolumn,the significancelevelsarefor D. When A is used in place of B, the significancelevels are for C.

hPPENDIX

271

ThBLEJ. ThBLKOF PROBhBILITIES ASSOCIhTED WITH VhLUEShs SMhLLhs OBSERvEDVALUES OF U IN THE MhNN-WHITNET TEST nt ~3

nt

4

12

0 1 .200 .400

34

.067 .133 .267 .400 .600

2 43 5 .600

67 8

.028 .014 .057 .029 .114 .057 .200 .100 .314 .171 .429

.243

.571

.343 .557

at~5

nt=6

0 Reproduced fromMann,H. B., andWhitney,D. R. 1947. Ona testof whether of two randomvariablesis stochastically largerthan the other. Ann. Math.

f8,5? 54,withthekindpermission oftheauthors andthepublisher.

272

hPPENDIX

ThBLE J. ThBLE oF PnoBhBILITIESAssocIhTEDwITH VhLUEs hs SMhLLhs

OB8ERYED VALUEsoF U IN THEMhNN-WHITNEY TE8T (Continrred) nn~7

01

.028 .056 .111 .167 .250

.008 .017 .033 .058 .092

5

.333

.133

67

.444 .656

.192 .258

2

34

.125 .250 .375 .500 .625

.003

.001

.001

.000

.006

.003

.001

.001

.012

.005

.002

.001

.021

.009

.004

.002

.036

.015

.007

.003

.055

.024

.011

.006

.082

.037

.017

.009

.116

.053

.026

.013 .019

8

.333

.168

.074

.037

9

.417

.206

.101

.061

.027

.264

.134

.069

,036 .049

10 .583

.324

.172

.090

12

.394

.216

.117

.064

13

.464

.265

.147

.082

14

.538

.319

.183

.104

15

.378

.223

.130

16

.438

.267

.169

17

.500

.314

.191

18

.662

11

.365

.228

19

.418

.267

20

.473 .627

.310

21

.366

22

.402

23

.451

24

.600

25

.549

~ Reproducedfrom Mann, H. B., and Whitney, D. R. 1947. On a test of whether one of two random variables is stochastically larger than the other. Ann. Math Statist., 18, 52-54, with the kind permission of the authors and the publisher.

hPPENDXK

T~m

J. Talus

oF Pnosmrumss

273

AssooI>TIn wITH Vmvss m S~rz.

m

OsssavsnVaIUzs oF U IN THEMANN-WIGTNET TEST~(Contiaucd) aa

12

34

8

56

78

t

Normal

01

. 111

.022

.006

.002

.001

.000

.000

.000

3.308

. 001

.222

.044

.012

.004

.002

.001

.000

. 000

3.203

.001

2

.333

.089

.024

.008

.003

.001

.001

.000

3.098

.001

3

.444

.133

.042

.014

.005

.002

.001

.001

2. 993

.001

4

.556

.200

.067

.024

.009

.004

.002

.001

2.888

.002

5

.267

.097

.036

.001

2. 783

.003

.35B

.139

.055

.444 .656

.188 .248 .315 .387 .461 .539

.077 .107 .141 .184 .230 .285

.006 .010 .015 .021 .030 .041

.003

67

.015 .023 .033 .047 .064 .085 .111

8 9 10 11 12 13

.341

14

.005

.002

2.678

.004

.007

.003

2.573

.005

.010

.005

2.468

.007

.014

.007

2.363

.009

.020

.010

2.258

.012

.054

.027

.071 .091 .114

.036

2.153 2.048

.016

.142 .177 .217

.014 .019

.047

.025

1.943

.026

.032

1.838

.033

.020

15

.467

.262

.141

.076

.041

1.733

.041

1B

.533

.311 .362 .416 .472 .528

.172 .207 .245 .286 .331

.095

.377

.232

22

.426

.268

23

.475

.306

24

.526

.347

1.628 1.523 1.418 1.313 1.208 1.102 .998 .893 .788 .683 .678 .473 .368 .2B3 .158 ,062

.062

21

.052 .065 .080 .097 .117 .139 .164 .191 .221 .253 .287 .323 .360 .399 .439 .480 .520

17 18 19 20

25 26 27 28 29 30 31 32

.116 .140 .168 .198

.389 .433

.478 ~ 622

.064 .078 .094 .113 .135 .159 .185 .215

.247 .282

,318 .356 .39B

.437 .481

o Reproduced fromMann,H. B.,sndWhitney, D. R. 1947. Ona testofwhether e of two randomvariables is stochastically largerthanthe other. Aaa. MaS. ~.g j,., 52-54,withthekindpermission of theauthors andthepubhsher.

274

APPENDIX

ThELE K. ThELE oF CEITIchL VhLUES oF U IN THE MhNN-WHITNEY TEST+ Table KI. Critical Values of U for a One-tailed Test at n Test

9 10

11

12

13

.001 or for a Two-tailed

at n .002

14

15

16

17

18

19

20

1 2 3 4 5

00

00

00

0

11

12

23

33

12

2

33

45

56

34

4

56

78

9 10

56

7

89

68

9

11

12

1

67 23 89 5?

10

11

10

12

12

12

14

13

14

17

14

15

19

12

10

11

13

14

15

16

14

15

17

18

20

21

12

14

15

17

19

21

23

25

26

14

17

19

27

29

32

15

17

20

22

29

32

34

37

20

23

25

34

37

40

42

20

23

26

29

22

25

29

32

23 27 31 36 39

25

17

21 24 28 32 36

8 10

8

10

77 11

12

38

42

46

48

43

46

50

64

47 52

51 56

55

69

60

65

15

17

21

24

28

32

36

40

43

16

19

23

27

31

35

39

43

48

17

21

25

29

34

38

43

47

52

57

61

66

70

18

23

27

32

37

42

46

51

56

61

66

71 77 82

76 82 88

19

25

29

34

40

45

50

55

60

66

71

20

26

32

37

42

48

54

59

65

70

76

~ Adaptedand abridgedfrom Tables1, 3, 5, and 7 of Auble,D. 1963. Extended tables for the Mann-Whitney statistic. Bulletin of the Institute of Educational Researchat Indiana University, 1, No. 2, with the kind permission of the author and the publisher.

hPPENDIX

TABLE K. TABLE oF CRITIcAL VALUEs oF U IN THE MhNN-WHITNEY

TEsT~ (Continued)

Table KIL Critical Values of U for a One-tailed Test at a .01

9 10

11

12

Test

at a ~

13

14

or for a Two-tailed

.02

15

16

17

18

19

20

1

3 2

00

00

11

12

22

38

4

33

45

56

56

78

9 10

78

9 11

5 76

9 11

89 10

11 14 16

13 16 19

11

00 4

77

45 9 10

89

11

12

13

14

16

16

12

13

16

16

18

19

20

22

12

14

16

17

19

21

23

24

26

28

15 18 22

17 21 24

20

22

24

26

28

30

32

34

23

26

28

31

33

36

38

40

27

30

33

36

38

41

44

47

11

18

22

25

28

31

34

37

41

44

47

50

53

12

24 27

28 31

31 35

35

88

42

46

49

53

56

60

13

21 23

39

43

47

51

55

59

63

67

14

26

30

34

38

43

47

51 ' 56

60

65

69

73

15

33 36 38 41 44

37 41 44 47 50

42 46 49 53 56

47

51

56

61

66

70

76

80

51

56

61

66

71

76

82

87

55

60

66

71

65

100

69

76 82

94

63

70 75

82 88 94

93

59

77 82 88

88

19

28 31 33 36 38

101

107

20

40

47

53

60

67

73

80

87

93

100

107

114

16 17 18

~ Adaptedand abridgedfrom Tables1, 3, 5, and 7 of Auble,D. 1953. Extended tables for the Mann-Whitney statistic. BuHetin of Se Institute oy gducat~ geeearcW at Indiana Unioersitg,1, No. 2, with the kind permissionof the author and the publisher.

276 ThRLE K. ThELE oF CRITIchL

VhLvEs

QF U IN THE MhNN-WHITNEY

TEsv' (Continued) Table KIII. Critical Values of U for s One-tailed Test at a

9 10

11

12

Test

at a ~

13

14

.ON or for a Two-tailed

.05

15

16

17

18

19

20

nI 1 2

00

01

1

11

1

22

34

23

34

4

65

6

67

45

67

8

9 10

5

78

9 11

12

13

14

22

78

ll

11

12

13

13

16

17

18

19

20

67

10

11

13

14

16

17

19

21

22

24

25

27

12

14

16

18

20

22

24

26

28

30

32

34

89

15 17 20 23 26 28

17 20 23 26 29 33

19 23 26 30 33 37

22 26 29 33 37 41

24 28 33 37 41 45

26

29

31

34

36

38

41

31

34

37

39

42

45

48

36

39

42

46

48

52

56

40

44

47

51

56

58

62

45 50

49 54

63 59

57

61

65

69

63

67

76 83

10 11 12 14

31

36

40

46

50

66

59

64

67

74

72 78

15

34

39

44

49

64

69

64

70

75

80

85

90

16

37

42

47

53

59

64

70

75

81

86

92

98

17

39

45

51

57

63

67

76

81

87

93

99

106

18

42

48

65

61

67

74

80

86

93

99

106

112

19

45

52

58

65

72

78

86

92

99

106

113

119

20

48

65

62

69

76

83

90

98

105

112

119

127

13

Adaptedand abridgedfrom Tables 1, 3, 5, and 7 of Auble, D.

1953. Extended

tablesfor the Mann-Whitneystatistic. Bulletinof the Inetituteof Educational Research at Indiana Unieereitti,1, No. 2, with the kind permissionof the author and the publisher.

TaELE K. ThBLE oF CRrrIchL

VALIIEs QF U IN

TEsT

THE MANN-WHITNEY

(Continued)

Table KIv. Critical Values of U for a One-tailed Test at a = .05 or for a Two-tailed Test at a .10

10

11

12

13

14

15

16

17

18

1

20

00 11

22

23

3

45

56

7

4

78

9 10

11

2

19

1

96 3 56 7

3

34

7

8

99

12

14

15

44 10

11

16

17

18

22

23

25

11

12

13

15

16

18

19

20

12

14

16

17

19

21

23

25

26

28

30

32

15

17

19

21

24

26

28

30

33

35

37

39

18

20

23

26

28

31

33

36

39

41

44

47

21

24

27

30

33

36

39

42

45

48

51

54

10

24

27

31

34

37

41

44

48

51

55

58

62

11

27

31

34

38

42

46

50

54

57

Bl

B5

69

12

30

34

38

42

47

51

55

60

64

68

13

33

37

42

47

51

5B

B1

65

70

75

14

36

41

46

51

56

61

66

71

77

82

15

39

44

50

55

61

66

72

77

83

88

72 &0 87 94

77 84 92 100

16

42

48

54

60

65

71

77

83

89

95

101

17

45

51

57

64

70

77

83

89

115

48

55

61

68

75

82

88

95

]02 109

109

18

96 102

116

123

19

51

58

65

72

80

87

94

101

109

116

123

130

20

54

62

69

77

84

92

100

107

115

123

130

138

89

107

~ Adaptedand abridgedfrom Tables 1, 3, 5, and 7 of Auble, D. 1953. Extended tables for the Mann-Whitney statistic. BuQetin of the Inetitute of Educational geeearehat Indiana Unieereity,1, No. 2, with the kind permissionof the author and the publisher.

278

hPPENDIX

ThRLE L. ThRLE oF CRITIchL

VALUE8 oF Kp IN THE KQLMQGQRov-SMIRNov

Two-shMPLE

TE8T

(Small samples) One-tailed a .05

test~ a =.01

3

3

4

4

Two-tailed a =.05

test f

a =.01

45 5

58 7 65 5 6 7 76 66 6 6 56 6 7 77 8 7 87 7 8 8 8 8 98 8 9 5

4

9 10 11 12 13 14

15

68 7 6 7

18 19 20 21

9

9 9

10

16 17

5

9

88 9

10

99

10

10

10

9

11

10

99

11

10

10

ll

22

ll

23

ll

10

ll 12

9

24

9

ll

10

25

9

11

10

12

26

99

11

10

12

12

10

12

28

10

12

11

13

29

10

12

11

13

30

10

12

11

13

35

11

13

12

40

11

14

13

27

~ Abridgedfrom Goodman,L. A. 1954. Kolmogorov-Smirnov testsfor psychologicalresearch. Psychol.Bull., 61, 167,with the kind permissionof the author and the American Psychological Association.

f Derived from Table 1 of Massey,F. J., Jr. 1951. The distribution of the maximum deviationbetweentwo samplecumulativestep functions. Ann. Moth. Statist., QQ,126-127,with the kind permissionof the author and the publisher.

APPENDIX

279

TABLEM. TABLEoF CRITIchLVALUEsoF D IN THE KQLMQGGRov-S1RIRNov Two-SAMPLE TEST

(Large samples:two-tailed test) *

Level of significance

Valueof D so largeas to call for rejection of Hoat the indicatedlevel of significance,

where D

maximum IS,(X) S,(X))

.10

n+ '%gag

.05

ng+ $$'Sg

.025

48 $$+ fig R)$$

.01

al + ns nqnq

al + SI n~n~

1 95 ag+ ng nqnq

+ Adapted from Smirnov,N. 1948. Tablesfor estimatingthe goodness of fit of empiricaldist,ributions. Ann.MaS.Stetiet.,19> 280-281, with the]cindpermission of the pubbsher.

APPENDIX

TABLE N. TABLEoF PRQBABILITIE8AssoclhTEDwITH VALUE8h8 LARGEhs OBsERVED VALUEs oF xpe IN THE FRIEDMAN Two-whY

ANALY818 oF VARIANcE BY RANKS+

Table NL k 3

N

N~7

N~6

x.'

1.00 1.33 2.38 3.00 4.00 4.33 5.38 6.33 7.00 8.33 9.00 9.33 10.33

12.00

1.000 .956 .740 .570 .430 .252 .184 .142 .072 .052 .029 .012 .0081 .0055 .0017 .00013

. 000 . 286 .857 1.143 2.000 2.571 3.429 3.714 4.571 5.429 6.000 7.148 7.714 8.000 8.857 10.286 10.571 11.148 12.286 14.000

8

.237 .192 .112 .085 .052 .027 .021 .016 .0036 .0027 .0012 .00032 .000021

.25 ,75 1.00 1.75 2.25 3.00 3.25 4.00 4.75 5.25 6.25 6.75 7.00 7.75 9.00 9.25 9.75 10.75 12.00 12.25 13.00 14.25 16.00

9

x.'

Xt 1. 000 .964 .768 .620 .486

N

1.000 . 967 .794 .654 .531 .355 .285 .236 .149 .120 .079 .047 .038 .030 .018 .0080

.0048 .0024 .0011 .00086 .00026 . 000061 .0000036

.000 . 222 . 667 . 889 1.556 2.000 2.667 2.889 3.556 4.222 4.667 5.556 6.000 6.222 6.889 8.000 8.222 8,667 9.556 10.667 10.889 11.556 12.667 13.556 14.000 14.222 14.889 16.222 18.000

1.000 .971 .814 .865 .569 .398 .328 .278 .187 .154 .107 .069 .057 .048 .031 .019 .016 .010 .0035 .0029 .0018 .00066 .00035 .00020 .000097

. Adaptedfrom Friedman,M. 1937. The useof ranksto avoidthe assumption of normalityimplicitin theanalysis of variance. J. Amer.Static.Aee.,$1, 688-689, with the kindpermission of the authorandthe publisher.

LPPEND1X

281

ThELE N. ThELE or PEOEhEILITISAssocIhTEDwrrH VhLUEs hs LhEOEhs QEsERVEDVhLUEs oF xt IN THE FEIEDMhN Two whY ANhLYsIs oP

VhnrhNcE s Y RhNxs' (Conhaued) Table NII. k ~ 4 N

2

N

x.'

Xr .0

N3

1. 000

xe 1.000

.0 .3

4 x.'

1.000

5.7

.141

6.0

.105

6.3

.094

6.6

.077

.958

.6

.958

1.2

,834

1.0

.910

.6

1.8

.792

1.8

.727

.9

2.4

.625

2.2

.608

1.2

B.9

.524

.928

3.0

.542

2.6

1.5

.754

7.2

.054

3.6

.458 .375 .208 .167 .042

3.4

1.8

.677

7.5

.062

3.8

2.1

.649

7.8

.036

.524

8.1

.033

8.4

.019

4.2 4.8 54 6.0

4.2

.300

2.4

6.0

.207

5.4

.176

2.7 3.0

.432

8.7

.014

5.8

.148

3.3

.389

9.3

.012

6.6

.075

3.B

.355

9.6

.0069

7.0

.054

3.9

.324

9.9

.0062

7.4

.033

4.5

10.2

.0027

8.2

.017

4.8

10.8

.0016

9.0

.0017

5.1

.190

11.1

.00094

5.4

.158

12.0

.000072

o Adapted fromFriedman, M. 1937.Theuseof ranksto avoidtheassumption of normalityimplicitin theanalysis ofvariance.J. Amer.Stotiet. Aee.,. 688-B89, with the kind permissionof the author and the publisher.

hPPENDIX

TABLE O. TABLE OF PROBABILITIESASSOCIATEDWITH VALUES hs LARGE hs OBSERVEDVALUES OF H IN THE KRUSKAL-WALLISONE-WAY ANALYSIS OP VARIANCE BY RANKS

Sample sizes

Samplesizes n<

ns

21

ns 1

22

32

32

ns

ns 6.4444

.008

6.3000

.011

5.4444

.046

5.4000

.051

4.5111

.098

4.4444

.102

6.7455

.010

. 100

6.7091

.013

3. 8571

.133

5.7909

.046

5.7273

.050

5.3572

.029

4.7091

.092

4.7143

.048

4.7000

.101

4.5000

.067

4.4643

.105

6.6667

.010

5. 1429

.043

4.9667

.048

4.5714

.100

4.8667

.054

4.0000

.129

4.1667

.082

4.0667

.102

2. 7000

.500

43

2

1 22

31

ns

4.5714

.067

3.7143

.200 .300

1

1

2

4.2857

44

1

6.1667 33

33

33

41 42

42

43

1

2

3

1 1

2

1

6. 2500

,011

5.3611

.032

5.1389

.061

4.5556

.100

4.2500

.121

44

2

7.0364 6.8727

.011

5.4545 5.2364

.052

.004

4.5545

.098

6. 4889

.011

4.4455

.103

5.6889

.029

5.6000

.050

7.1439

.010

5.0667

.086

7.1364

.011

4.6222

.100

5.5985

.049

5.5758

.051

4.5455

.099

4.4773

.102

7.6538

.008

7.5385

.011

5.6923

.049

5.6538

.054

4.6539

.097

4.5001

.104

7.2000

44

3

3.5714 4.8214

.057

4.5000

.076

4.0179

.114

6. 0000

.014

5. 3333

.033

5. 1250

.052

4. 4M3

.100

4.1667

.105

51

1

3. 8571

.143

5.8333

.021

52

1

5. 2500

.036

5.2083

.050

5.0000

.048

5.0000

.057

4. 4500

.071

4.0556

.093

4. 2000

.095

3.8889

.129

4.0500

.119

44

4

hPPENDIX

283

TABLE O. TABLE oE PRQBABILITIEsAssocIATED wITH VALUEs hs LARGE hs OBsERvEDVALUEs oF H IN THE KRUsKAL-WALLIs ONE whT ANALTsrs

oF VARIANcEBT RANK8 (Contznlb') Sample sizes nl

ns

52

n3

2

53

Sample sizes n'

5 ' 6308

~050

6.1333

~013

4 ' 5487

~099

5 ' 1600

~034

4.5231

~103

5 ~0400

~056

4 ' 3733

~090

~009

4.2933

~122

7 ' 7604 7 ' 7440 5 ' 6571

~049

~012

5 ~6176

~050

~048

4 ' 6187

~100

4 ' 8711

~052

4.5527

~102

4 ~9600

2

63

3

64

1

42

43

ne

~008

1

53

ni

6.5333

4.0178

~095

3 ' 8400

~123

6 ~9091

~009

6.8218

~010

5.2509

~049

5 ' 1055

~052

4 ' 6509

~091

4.4945

~101

7 ' 0788

~009

6.9818

~011

5.6485

~049

5.5152

~051

4 ' 5333

~097

4 ' 4121

~109

6 ~9545

~008

6 ' 8400

~011

4 ' 9855

~044

4 ' 8600

~056

3 ' 9873

~098

3.9600

~102

55

55

55

1

2

3

~011

7 ' 3091

~009

6.8364

~011

5 ' 1273

~046

4 ' 9091

~053

4 ' 1091

.086

4.0364

~105

7.3385

~010

7 ' 2692

~010

5 ' 3385

~047

5.2462

F051

4 ' 6231

~097

4 ' 5077

~100

7 ' 5780 7 ' 5429 5 ~7055

~010

5 ' 6264

~051

4 ' 5451

~100

4.5363

~102

7.8229 7 ' 7914

~010

5 ~6657

,049

~010 ~046

~010

7 ' 2045

~009

5 ' 6429

~050

7 ' 1182

~010

4 ' 5229

~099

5 ~2727

~049

4 ' 5200

~101

5 ' 2682

~050

4 ' 5409

~098 ~101

8 ~0000 7 ~9800

~009

4 ' 5182

5. 7&00

~049

55

5

.010

7 ' 4449

~010

5 ~6600

~051

7 ' 3949

~011

4 ~5600

~100

5 ' 6564

~049

4 ~5000

~102

~ Adapted andabridgedfrom Kruskal,W. H., and Waiiis,W. A. 1952. Us of ranksin on~riterion varianceanalysis. J. Amer,Statiet.Aee.,47,61~17 with the

l indpermission oftheauthors andthepublisher.(Thecorrections toth tablegiven by theauthors in Errata,J. Amer.Statiet. Aee.,48,910,havebeenincorpora

hPPENDIR

ThELE P. ThELE oF CRITIchL VhLUE8 oF rg THE SPEhlQKhN RhNK CORRELATION COEFFICIENT

AdaptedfromOlds,E. G. 1938. Distributions ofsumsofsquares ofrankdifferencesfor small numbersof individuals. Ann. Math. Statist.,9, 133-148, and from

Olds'E G

1 949 The 5 %significsnce levelsforsumsof squares of rankdifferences

and a correction. Ann. Math. Statist.,$0, 117-118, with the kind permissionof the author snd the publisher.

APPENDIX

TAELE Q. TAELE os' PaoEAEILITIEs AssocIATED wrra VALUEs As LAEoE As OESERvED VALUES OP S IN THE KENDALL RANK CORRELATION CoEPFIcIENT Values

Values of N

of N

10

0 42 .625

.592

. 548

.375

.408

.452

.167

.242

.360

.381

.117

.274

.306

.042

.199

.238

10

.138

.179

12

68

.042

1

. 500

.500

. 360

.386

.481

. 285

.281

.364

.136

191

.300 .242

39 7 5

.500

.068

.119

11

.028

.068

.190

.089

13

.0088

.035

.146

14

.054

15

.0014

.015

.108

16

.081

17

.0054

.078

19

.0014

.054

21

.00020

.036

18

.016

20

.0071

22

.0028

24

.00087

26

.00019

28

.000025

.012

28

.023

25

.014

27

.0088

.0012

29

.0046

31

.0028

32

.00012

33

.0011

34

.000025

85

.00047

36

.0000028

37

.00018

39

.000058

41

.000015

43

.0000028

45

.00000028

30

~ Adaptedby permissionfrom Kendall, M. G., Rank correlationNIegodg,Charles GriSn h Company, Ltd., London, 1948, Appendix Table 1 p. 141.

hPPENDIX

TABLE R. TABLE oF CRITIcAL VALUEs oF e IN

THE KENDALL CoEFFIGIENT

OF CONCORDANCE

Values at the .05 level of significance

34

56 8 10

54. 0

64. 4

103.9

157.3

9

49.5

88.4

143.3

217.0

12

71. 9

62.6

112.3

182.4

276.2

14

83. 8

75.7

136.1

221.4

335.2

16

95.8

453.1

18

107.7

571.0

48. 1

101.7

183. 7

299.0

60.0

127.8

231.2

376.7

15

89.8

192.9

349.8

570.5

20

119.7

258.0

468.5

764.4

864.9

1,158.7

Values at the .01 level of significance

' Adapted from Friedman, M. 1940. A comparison of alternative tests of significance for the problem of m rankings. AarI. Math. Slaliet., 11, 86-92, with the kind permission of the author and the publisher. 1 Notice that additional critical values of e for N 3 are given in the right-hand column

of this table.

hPPENDIX

T~m

S. Tmm

287

OX FhnOaXhaa Nl 1 1

26 24 120 720 5040 40320 362880 10

3628&00

11

39916&00

12

479001600

13

6227020&00

14

87178291200

15

1307674368000 20922789888000 355687428096000 6402373705728000 121645100408832000

16 17 18 19 20

2432902008176640000

APPENDIX

ThIILII

T. ThsLI

or BINOMIhL CQIIIFIGIENTs

(N)

(N)

(N) (N)

10

10

5

1

15

20

15

6

1

35

21

7

21

35

P)

(Ng

1

28

56

70

56

28

8

36

84

126

126

84

36

120 330

45 165

10 56

1 11

10

10

45

120

210

252

210

11

11

55

165

330

462

462

12

12

66

220

495

792

924

792

495

220

66

13

13

78

286

715

1287

1716

1716

1287

715

286

14

14

91

364

1001

2002

3003

3432

3003

2002

1001

15

15

105

455

1365

3003

5005

6435

6436

6005

3003

16

16

120

560

1820

4368

8008

11440

12870

11440

8008

17

17

136

680

2380

6188

12376

19448

24310

24310

19448

18

18

163

816

3060

8568

18564

31824

43768

48620

43758

19

19

171

969

3876

11628

27132

50388

75582

92378

92378

1140

4845

15504

38760

77620

125970

167960

184756

289

hPPENDIX

Tasm

lJ. TABLE QF SQUJUcna AND SQUARE RQQT8

Square root

Number

1

1.0000

16 81

6.4031

4 9

1.4142

17 64

6.4807

1.7321

18 49

6.5574

16

2.0000

19 36

6.6332

25

2.2361

20 25

6.7082

36

2.4495

21 16

6.7823

49

2.6458

22 09

6.8557

64

2.8284

23 04

6.9282

81

3.0000

24 01

7.0000

1

23 5 6 87 9

Square root

10

100

3.1623

25 00

7.0711

11

121

3.3166

26 01

12

144

3.4641

27 04

13

169

28 09

7.1414 7.2111 7.2801 7.3485 7.4162 7.4833 7.5498 7.6158

14

196

3.6056 3. 741'7

15

225

3.8730

16 17

256

4.0000

289

4.1231

30 25 31 36 32 49

18

324

4.2426

33 64

29 16

19

361

4.3589

34 81

7.6811

20

400

4.4721

36 00

7.7460

21

441

4. 5826

37 21

7.8102

22

484

4.6904

38 44

7. 8740

23

4.7958

39 69

7.9373

40 96

8.0000

25

529 576 625

5.0000

42 25

8.0623

26

676

5.0990

43 56

27

729

5.1962

44 89

8.1240 8.1854

24

28

784

5.2915

46 24

8.2462

29

841

5.3852

47 61

8.3066

30

900

5.4772

49 00

8.3666 8.4261 8.4853 8.5440 8.6023 8.6603 8.7178 8.7750 8.8318 8.8882 8.9443

31

961

5.5678

32

10 24

5.6569

33

10 89

5.7446

34

11 56

5.8310

35 36 37 38 39 40

12 25 12 96

5.9161 6.0000

13 69

6.0828

1444

6.1644

50 41 51 84 53 29 54 76 56 25 57 76 59 29 60 84

15 21

6.2450

62 41

1600

6.3246

64 00

+ Bypermission fromStatistics for students ofpsychology audeducation, by H. Soren-

aon, Copyright1936,MoGraw-HiH BookCompany,Inc.

hPPENDIX

ThELE U. ThELE oF SQUhREshND SQUhRERooTs (Continued)

81

Square

Square root

9.0000

146 41

11.0000 11.0454

Square root

Number

65 61

82

67 24

9.0554

148 84

83

68 89

9.1104

1 51 29

11.0905

84

70 56

9.1652

1 53 76

11.1355

85

72 25

9.2195

1 5625

11.1803

86

73 96

9.2736

1 58 76

11.2250

87

75 69

9.3274

1 61 29

11.2694

88

77 44

9.3&08

1 63 84

11.3137

89

79 21

9.4340

16641

11.3578 11.4018

90

81 00

9.4868

1 6900

91

82 81

9.5394

1 71 61

11.4455

92

84 64

9.5917

1 74 24

11.4891

93

86 49

9.6437

17689

11.5326

94

88 36

9.6954

17956

11.5758

95

90 25

9.7468

1 82 25

11.6190

96

92 16

9.7980

18496

11.6619

97

94 09

9.8489

1 87 69

11.7047

98

96 04

9.8995

1 9044

11.7473

9.9499

1 93 21

11.7898

19600

11.8322

99

98 01

100

10000

10.0000

101

10201

10.0499

19881

11.8743

102

10404

10.0995

20164

11.9164

103

10609

10.1489

20449

11.9583

104

1 08 16

10.1980

20736

12.0000

105

11025

10.2470

21025

12.0416

106

1 12 36

10.2956

2 13 16

12.0830

107

11449

10.3441

2 1609

12.1244

108

11664

10.3923

2 1904

12.1655

109

11881

10.4403

22201

12.2066

110

1 2100

10.4881

22500

12.2474

ill

1 23 21

10.5357

2 2801

12.2882 12.3288 12.3693 12.4097 12.4499 12.4900 12.5300 12.5698 12.6095 12.6491

112

12544

10.5830

2 3104

113

12769

10.6301

2 3409

114

12996

10.6771

23716

115

10.7238

24025

10.7703

243 36

117

13225 13456 13689

10.8167

24649

118

1 3924

10.8628

24964

25281 2 5600

116

119

14161

10.9087

120

14400

10.9545

~ Bypermissionfrom Slatietice for studentsof peticholoyti andeduction,by H. Sorenson. Copyright 1936, McGraw-Hill Book Company, Inc.

hPPENDIX

291

TABLE U. TABLE OP SQvhREs hYD SQUIRE RooTs

Number

Square

Squareroot

Number

(Coatiaued)

Squars

Squararoot

161

2 5921

12.68&6

201

40401

14.1774

162

26244

12.7279

40804

14.2127

163

26569

12.7671

202 203

41209

14.2478

164

12.8062

204

41616

14.2829

165

26896 27225

12.8452

205

14.3178

166

275 56

12.8841

206

4 2025 42436

167

27889

12.9228

207

42849

14.3875

168

28224

12.9615

208

43264

14.4222

169

28561

13.0000

209

43681

14.4568

170

2 8900

13.0384

210

441 00

)4.4914

171

29241

13.0767

211

445 21

)4.5258

172

295 84

13.1149

21?

44944

14.5602

173

29929

13.1529

213

453 69

14.5945

174

302 76

13.1909

214

45796

14.6287

175 176

30625 30976

13.2288

462 25

13.2665

215 216

177

3 13 29

13.3041

217

47089

178

3 )684

13.3417

218

475 24

179

3 2041

13.3791

219

4 7961

1&0

3 2400

13.4164

220

4 8400

14.6629 14.6969 14.7309 14.7648 14.7986 14.8324

181

3 2761

13.4536

221

4 8841

14. &661

182

3 31 24

13.4907

222

492 &4

)4.8997

183

3 3489

13.5277

223

497 29

184

3 3856

13.5647

224

501 76 50625 5 1076 51529 5 1984 5 2441 5 2900

14.9332 14.9666 15.0000 15.0333

185

34225

13.6015

225

186

34596

13.6382

226

187

3 4969

13.6748

227

188

3 53 44

13.7113

228

189 190

3 5721 36100

13.7477 13.7840

229

191

3 6481

13.8203

231

192

3 6864

13.8564

232

193 194 195

3 7249

13.8924

233

3 7636 3 8025 3 8416 3 8809 3 9204 3 9601 40000

13.9284

234

13.9642 14.0357

235 236 237

)4.0712

238

14.1067

239

14.1421

240

196 197 198 199

200

14.0000

230

46656

5 33 61 5 3824 54289 54756 5 5225 5 5696 56) 69 5 6644 5 71 21 5 7600

14.3527

15.0665 15.0997

15.1327 15.1658 15.1987 15.2315 15.2643 15.2971 15.3297

15.3623 15.3948 15.4272 15.4596 15.4919

~ Bypermission fromStatietica for ehuknte ofpeychology ondeducutio, by H. gownson. Copyright 1936,McGraw-Hi))Book Company

292

hPPENDIX

ThELE U. ThELE or SQUhRES hND SQUhRERooTs' (Continued) Number

Square

Squareroot

Number

Square

Squareroot

241

5 8081

15.5242

281

78961

16.7631

242

58564

15.5563

282

795 24

16.7929

249

59049

283

800 89

16.8226

244

595 36

284

806 56

16.8523

245

60025

285

81225

16.8819

246

605

15.5885 15.6205 15.6525 15.6844

286

81796

16.9115

247

6 1009

15.7162

287

823 69

16.9411

248

61504

15.74&0

288

8 2944

16.9706

249

6 2001

15.7797

289

835 21

17.0000

250

625 00

15.8114

290

84100

17.0294

251

69001

15.8490

291

84681

252

635 04

292

85264

253

64009

15.8745 15.9060

293

85849

254

645

15.9374

294

864 36

255 256 257

650 25 655 36

15.9687

295

87025

16.0000

296

87616

66049

16.0312

297

88209

17.0587 17.0880 17.1172 17.1464 17.1756 17.2047 17.2337

258

665 64

16.0624

298

88804

17.2627

259

67081

16.0935

299

89401

17.2916

260

6 7600

16.1245

300

90000

17.3205

261

681

16.1555

90601

17.3494

16

16

262

68644

16.1864

301 302

91204

17.3781

269

69169

16.2179

303

91809

17.4069

264

69696

16.2481

304

924

17.4356

265

702 25

16.2788

305

93025

17.4642

266

707 56

16.3095

306

93636

267

712 89

16.3401

307

942 49

268

71824

16.3707

308

94864

269

729 61 7 2900

16.4012

309

16.4917

310

95481 961 00

1'7.4929 17.5214 17.5499 17.5784 17.6068

16.4621

311

16.4924

312

16.5227

313

275

73441 73984 745 29 75076 75625

276 277

270 271 272

273 274

278 279 280

21

16.5529

314

16.5831

315

761 76

16.6132

316

76729 '7 72 84

16.6433

317

16.6799

318

16.7039

319

16.7392

320

77841 7 8400

16

96/21 973 44 97969 985 96 992 25 998 56 1004 89 10 11 24 1017 61 10 24 00

17.6352 17.6635 17.6918 1'/.7200 17.7482 17.7764

17.8045 1/.8326 17.8606 17.8885

~ By permissionfrom Stotietice for students of psychology ondeducotion, by H. Soren-

son. Copyright1936,McGraw-HillBookCompany,Inc.

hPPENDIR

293

ThBLE U. ThBLE QF SQUhRES hND SQUhEERooTs (Qonfzzzzf) Number

Square root

Number

Square

Squareroot 19.0000

321

10 30 41

17.9165

361

13 03 21

322

10 36 84

17.9444

362

13 1044

19.0263

323

1043 29

17.9722

363

13 1769

19.0526

324

10 49 76

18.0000

364

13 2496

19.0788

325

10 56 25

18.0278

365

13 3225

326

10 62 76

18.0555

366

13 3956

19.1050 19.1311

327

10 69 29

18.0831

367

13 46 89

19.1572

328

10 75 84

18.1108

368

13 54 24

19.1833

329

10 82 41

18.1384

369

330

10 89 00

18.1659

370

13 61 61 13 6900

19.2354

331

1095 61

18.1934

371

13 7641

19.2614

332

11 02 24

18.2209

372

13 83 84

19.2873

333

1108 89

18.2483

373

13 91 29

19.3132

334

ll

15 56

13 98 76

ll

22 25

375

14 06 25

336

ll

28 96

18.2757 18.3030 18.3303

374

335

376

337

11 35 69

18.3576

377

1413 76 1421 29

338

11 42 44

18.3848

378

1428

339

11 49 21

18.4120

379

14 36 41

340

11 5600

18.4391

380

14 44 00

19.3391 19.3649 19.3907 19.4165 19.4422 19.4679 19.4936

341

11 62 81

18.4662

381

14 51 61

342

11 69 64

18.4932

382

14 59 24

343

ll

76 49

18.5203

383

1466

344

ll

83 36

18.5472

384

14 74 56 14 82 25

84

19.2094

345

11 90 25

18.5742

385

346

11 97 16

18.6011

386

14 89 96

347

12 04 09

18.6279

387

348

12 ll

14 97 69 15 05 44 15 13 21 15 2100

19.5192 19.5448 19.5704 19.5959 19.6214 19.6469 19.6723 19.6977 19.7231 19.7484

15 28 81 15 36 64 15 44 49 15 52 36 15 60 25 15 68 16 15 7609 15 8404 15 92 01 1600 00

19.7737 19.7990 19.8242 19.8494 19.8746 19.8997 19.9249 19.9499 19.9750 20.0000

04

18.6548

388

18.6815 18.7083

389

349

12 18 01

350

12 25 00

351

12 3201 12 3904

18.7350 18.7617

391

352 353

12 46 09

18.7883

393

354

12 53 16

18.8149

394

355

12 60 25

18.8414 18.8680

395

12 74 49

18.8944

12 81 64

18.9209 18.9473 18.9737

397 398

356 357 358 359 360

12 67 36

12 88 81

12 96 00

390

392

396

399 400

89

~ Bypermission fromSAslisticsfor si~~ zf ~~~ z~ ~~. son. Copyright1936,MoGraw-HillBool'Companylno.

hPPENDIR

Tant Number

401

U. Tanya oF Sqvanxs aNo SqvhRE RooTs'

(Continued)

Square

Squareroot

Number

Square

Square root

16 08 01

20.0250

441

19 44 81

21.0000

402

16 16 04

20.0499

442

19 53 64

21.0238

409

16 24 09

20.0749

443

19 62 49

'21.0476

404

16 32 16

20.0998

444

19 71 36

21.0713

405

1640 25

20.1246

445

19 80 25

21.0950

406

1648 36

20.1494

446

19 89 16

21.1187

16 56 49

20.1742

447

19 98 09

21.1424

2007 04

21.1660

407 408

16 64 64

20.1990

448

409

1672 81

20.2237

449

20 16 01

21.1896

410

16 81 00

20.2485

450

20 25 00

21.2132

411

16 89 21

20.2731

451

20 34 01

21.2368

412

16 97 44

20.2978

452

2049 04

413

1705 69

20.3224

453

20 52 09

21.2603 21.2838

414

17 13 96

20.3470

454

20 61 16

21.3073

415

17 22 25

20.3715

455

20 70 25

21.3307

416

17 30 56

20.3961

456

20 79 36

417

17 38 89

20.4206

457

20 88 49

21.3542 21.3776

418

17 47 24

20.4450

458

20 97 64

21.4009

419

1755 61

20.4695

459

21 06 81

21.4243

420

17 64 00

20.4939

460

21 1600

21.4476

421

17 72 41

20. 5189

461

21 25 21

21.4709

422

17 &0 84

20.5426

462

21 3444

21.494?

423

17 89 29

20.5670

463

21 43 69

21.5174

424

17 97 76

20.5913

464

21 52 96

21.5407

18 06 25

20.6155

465

21 62 25

21.5639

426

18 14 76

20.6398

466

21 71 56

21.5870

427

1823 29

20.6640

467

21 &089

21.6102

428

18 31 84

20.6882

468

21 90 24

21.6333

429

18 40 41

20.7123

469

21 99 61

21.6564

430

18 49 00

20.7364

470

22 09 00

21.6795

425

431

18 57 61

20.7605

471

22 1841

432

18 66 24

20.7846

472

22 27 84

433

18 74 89

20.8087

473

22 37 29

434

1883 56

20.8327

474

22 46 76

20.8567

475

22 5625

21.7025 21.7256 21.7486 21.7715 21.7945

435

1892 25

436

19 00 96

20.8&06

476

2265

76

21.8174

497

1909 69

20.9045

477

22 75 29

21.8403

438

19 18 44

20.9284

478

22 84 84

21.8632

499

1927

21

20.9523

479

22 9441

21.8861

440

19 36 00

20.9762

4&0

23 0400

21.9089

~ Bypermissionfrom Stotietice for etudente of peychotogy endeducation, by H. Soren son. Copyright 1936,McGravr-Hill Book Company,Inc.

295

APPENDlX

Tom

U. Than oF SgUhIEs hND BQUhBERooTS~ (Conhaued)

Number

Square

Squareroot

Number

Square

481

23 13 61

482

23 23 24

21.9317

521

522

27 1441 27 24 84

22.8254

21.9545

483

23 32 89

21.9773

523

2735

29

22.8692

484

23 42 56

22.0000

524

22.8910

485

23 5225

22.0227

525

486

23 61 96

22.0454

526

27 45 76 27 56 25 27 66 76

22.9347

487

23 71 69

22.0681

527

27 77 29

22.9565

488

23 81 44

22.0907

528

27 87 84

22.9783

489

23 91 21

22.1133

529

490

24 01 00

22.1359

530

27 98 41 28 09 00

23.0000 23.0217 23.0434 23.0651

491

24 10 81

22.1585

531

492

24 20 64

22.1811

532

493

24 30 49

22.2036

533

494

24 40 36

22.2261

534

495

24 50 25

22.2486

535

22.84/3

22.9129

496

24 60 16

22.2711

536

28 19 61 28 30 24 2840 89 28 51 56 28 62 25 28 72 96

497

24 7009

22.2935

537

28 83 69

498

24 80 04

22.3159

538

28 94 44

499

24 90 01

22.3383

539

2905

500

25 0000

22.3607

540

29 16 00

501

25 10 01

22.3830

541

29 26 81

502

25 2004

22.4054

542

29 37 64

503

25 3009

22.4277

543

504

25 40 16

22.4499

544

29 48 49 29 59 36

505

25 5025

22.4722

545

29 70 25

506

25 60 36

22.4944

546 547

29 81 16 29 92 09 3003 04 30 14 01 30 25 00

23.2594 23.2809 23.3024 23.3238 23.3452 23.3666 23.38&0 23.4094 23.4307 23.4521

30 3601 30 47 04 30 58 09 30 69 16 30 80 25 3091 36 31 02 49 31 13 64 31 24 81 31 3600

23.4734 23.4947 23.5160 23.5372 23.5584 23.5797 23.6008 23.6220 23.6432 23.6643

507

25 7049

22.5167

508

25 &064

22.5389

548

509 510

25 90 81

22.5610

549

26 01 00

22.5832

550

511

2611

21

22.6053

512

26 21 44

22.6274

551 552

513 514 515 516 517 51$ 519 520

26 31 69

22.6495

553

26 41 96 2652 25

22.6716

554

26 72 89

22.6936 22.7156 22.7376

2683

24

22.7596

2693 61 27 0400

22.8035

555 556 557 558 559 560

2662

56

22.7816

21

Bypermission fromS&lieticsfor st~afsofpsp~pppQ~~~~

son. Copyright1936,MOGraw-Hill BookCompany

23.0&68

23.1084 23.1301 23.1517 23.1733 23.1948 23.2164 23.2379

by

296

hPPENDIX

ThBLE

U. Tom

Square root

Number

561

ot Sqvhnzs hxo Sqv~nz IlooTs'

3147

21

Number

(Continued)

Square

Square root

23. 6854

601

36 12 01

24.5153

36 2404

24.5357

562

31 5844

23.7065

602

563

31 69 69

23.7276

603

36 3609

24.5561

564

31 8096

23.7487

604

36 48 16

24.5764

565

31 92 25

23.7697

605

36 60 25

24.5967

566

32 03 56

23.7908

606

3672

36

24.6171

567

32 14 89

23.8118

607

36 84 49

24.6374

32 26 24

23.8328

608

36 96 64

24.6577

23.8537

609

37 08 81

24.6779

37 2100

24.6982

568 569

32 37 61

570

32 4900

23.8747

610

571 572 573

32 60 41

23.8956

611

3733 21

24.7184

32 71 84

23.9165

612

37 45 44

24.7385

32 83 29

23.9374

613

37 5769

24.7588

3294

23.9583

614

37 69 96

24.7790

33 0625

23.9792

615

37 82 25

24.7992

576

33 1776

24.0000

616

3794

24.8193

577

33 2929

24.0208

617

3806 89

24.8395

578

33 40 84

24.0416

618

38 19 24

24.8596

579

33 5241

24.0624

619

38 31 61

24.8797

580

33 6400

24.0832

620

38 44 00

24.8998

574 575

76

56

581

33 75 61

24.1039

621

38 5641

24.9199

582

33 8724

24.1247

622

38 68 84

24.9399

583

33 98 89

24.1454

623

38 81 29

24.9600

584

34 10 56

24.1661

624

3893 76

24.9800

585

34 22 25

24.1868

625

3906 25

25.0000

626

3918 76

25.0200

586

34 33 96

24.2074

587

34 45 69

24.2281

627

39 31 29

25.0400

588

34 57 44

24.2487

628

3943 84

25.0599

589

34 69 21

24.2693

629

39 5641

25.0799

590

34 81 00

24.2899

630

39 69 00

25.0998

591

3492

24.3105

631

39 81 61

25.1197

592

35 0464

24.3311

632

3994 24

593

35 1649

24.3516

633

40 06 89

25,1396 25.1595

81

594

35 2836

24.3721

634

40 19 56

25.1794

595

35 4025

24.3926

635

4032

25.1992

35 52 16

24.4131

636

40 44 96

25.2190

637

4057

69

25.2389

596

25

597

35 6409

24.4336

598

35 7604

24.4540

638

40 70 44

25.2587

599

35 8801

24.4745

639

40 83 21

600

36 00 00

24.4949

640

40 96 00

25.2784 25.2982

By permission fromStatietice for etudente of peychology andeducation, by H. Sorenson. Copyright1936,McGraw-HillBookCompany, Inc.

297

hPPENDIX

Twas U. Tmm

os' Sqvhmls hNn Sgvajrsl RooTs~ (Continued)

Square

Square root

641

41 08 81

25.3180

642

41 21 64

643

Square

Square root

681

46 37 61

26.0960

25.3377

682

46 51 24

26.1151

41 34 49

25.3574

683

46 64 89

26.1343

644

41 47 36

25.3772

684

46 78 56

26.1534

645

41 6025

25.3969

685

4692

25

26.1725

646

41 73 16

25.4165

686

47 05 96

26.1916

647

41 86 09

25.4362

687

47 19 69

26.2107

648

41 99 04

25.4558

688

26.2298

649

42 12 01

25.4755

650

42 2500

25.4951

690

47 33 44 4747 21 4'7 61 00

26.2679

651

42 3801

25.514F

691

4F /481

26.2869

652

42 51 04

25.5343

692

26.3059

653

42 64 09

25.5539

693

47 88 64 48 02 49

654

42 77 16

25.5734

694

48 16 36

26.3439

655

42 90 25

25.5930

695

48 30 25

26.3629

656

43 03 36

25.6125

696

48 44 16

26.3818

657

43 )649

25.6320

697

48 58 09

26.4008

658

43 2964

25.6515

698

48 72 04

26.4197

659

43 42 81

699

48 8601

26.4386

660

43 5600

25.6710 25.6905

700

49 00 00

26.4575

661

43 69 21

25.7099

701

49 14 01

26.4764

662

43 8244

25.7294

702

49 28 04

26.4953

663

43 95 69

25.7488 25.7682

703

49 42 09

26.5141

/04

26.5330 26.5518 26.5707 26.5895 26.6083 26.6271

Number

Number

664

44 08 96

665

25.7876

705

666

44 22 25 4435 56

25.8070

706

49 56 16 49 70 25 49 84 36

667

44 48 89

25.8263

707

49 98 49

668

44 62 24

25.8457

708

669

44 75 61

25.8650

670

44 8900

25.8844

710

50 12 64 50 26 81 50 41 00

671

45 02 41

25.9037

711

672

45 15 84

25.9230

712

673

45 29 29

25.9422

713

674

45 45 45 45 45 46 46

25.9615

714

25.9808 26.0000

715 716

26.0192

717

26.0384

718

26.0576

719

26.0768

720

675 676 677

678 679

680

42 76 56 25 69 76 83 29 96 84 10 41 24 00

.

709

50 55 21 50 69 44 50 83 69 50 97 96 51 12 25 51 26 56 51 40 89 51 55 24 51 69 61 51 8400

26.2488

26.3249

26.6458

26.6646 26.6833 26.7021 26.7208 26.7395 26.7582 26.7769 26.7955 26.8142 26.8328

By permission fromStotisties for students of psycho~ ondeducotion son. Copyright193B,McGraw-HillBookCompany,inc.

298

hPPENDIX

Thsm

U. ThRLE or SovhREs h ND SQUhRE RooTs ' (Continued)

Square

Squareroot

Number

Square

Squareroot

721

51 98 41

26.8514

761

5791 21

27.5862

722

52 12 84

26.8701

762

58 06 44

27.6043

723

52 2729

26.8887

763

58 21 69

27.6225

724

52 41 76

26.9072

764

58 3696

27.6405

725

52 56 25

26.9258

765

58 52 25

27.6586

726

52 70 76

26.9444

766

5867

27.6767

727

52 85 29

26.9629

767

58 82 89

27.6948

728

5299

26.9815

768

58 98 24

27.7128

729

53 1441

27,0000

769

59 13 61

27.7308

730

53 2900

27.0185

770

59 29 00

27.7489

731

53 43 61

27.0370

771

59 44 41

2'/.7669

732

53 5824

27.0555

772

59 59 84

27.7849

53 72 89

27.0740

773

5975

29

27.8029

734

53 87 56

27.0924

774

59 90 76

27.8209

735

5402 25

27.1109

775

6006

736

54 16 96

27.1293

776

60 21 76

27.8568

737

54 31 69

27.1477

777

60 37 29

27.8747

738

54 46 44

27.1662

778

60 52 84

27.8927

739

54 61 27

27.1846

779

60 68 41

27.9106

60 84 00

27.9285

Number

733

84

56

25

27.8388

740

54 76 00

27.2029

780

741

54 90 81

27.2213

781

60 99 61

27.9464

742

5505 64

27.2397

782

61 15 24

27.9643

743

55 2049

27.2580

783

61 30 89

27.9821

744

55 35 36

27.2764

61 46 56

28.0000

745

55 5025

27.2947

784 '785

61 62 25

28.0179

746

55 65 16

27.3130

786

61 77 96

28.0357

747

55 8009

27.3313

787

61 93 69

28.0535

748

55 95 04

27.3496

788

62 09 44

28.0713

749

56 10 01

27.3679

789

62 25 21

750

56 25 00

27.3861

790

62 41 00

28.0891 28.1069

751

56 40 01

27.4044

791

62 56 81

28.1247

752

56 55 04

27.4226

/92

62 72 64

28.1425

753

56 70 09

27.4408

793

62 88 49

28.1603

754

56 85 16

27.4591

794

63 04 36

755

5700 25 57 15 36

27.4773

795

63 20 25

27.4955

796

63 36 16

28.1780 28.1957 28.2135

757 758

57 3049

27.5136

797

63 5209

28.2312

5745 64

27.5318

798

63 6804

28.2489

759

57 60 81

27.5500

799

63 8401

28.2666

760

57 7600

27.5681

800

64 0000

28.2843

756

~ Bypermission fromStatietice for etudente ofpeychotogy andeducation, by H. Sorenson. Copyright1936,McGraw-HillBookCompany,Inc.

APPENDIX

TABLEU. TABLEQFSQUAEEa ANDSQUAI RooTs~(Conhnued)

801

64 16 01

28.3019

802

64 32 04

28.3196

803

64 48 09

28.3373

804

64 64 16

28.3549

805

64 $025

28.3725

806

64 96 36

807

65 12 49

808

65 28 64

28.4253

65 44 81

28.4429

810

65 61 00

28.4605

$11

65 77 21

28.4781

812

65 93 44

28.4956

813

66 09 69

28.5132

814

6625 96

28.5307

815

66 42 25

28.5482

70 72 81 70 89 64 71 0649 7123 36 71 40 25

29.0000 29.0172 29.0345

28.3901

71 57 16

29.0861

2$.4077

71 Fl 72 72

29,1033

7409 9104 0801 25 00

29.051F 29.0689

29.1204 29.1376 29.1548

816

66 58 56

2$.5657

72 42 01 72 5904 72 7609 72 93 16 73 1025 73 27 36

817

66 74 89

28.5832

73 44 49

818

66 91 24

28.6007

73 61 64

819

67 07 61

2$.6082

820

67 2400

28.6356

73 7881 73 9600

29.2404 29.2575 29.2746 29.2916 29.3087 29.3258

821

67 40 41

28.6531

7413 21

29.3428

822

675684

2$.6705

74 30 44

29.3598

823

67 73 29

28.6880

29.3769

824

67 89 76

28.7054

825

6806

25

28.7228

826

6822

76

28.7402

827

68 3929

28.7576

82$ $30

68 55 84 68 72 41 68 8900

28.7750 28.7924 28.8097

74 47 69 74 64 96 74 82 25 7499 56 75 16 89 75 34 24 75 5161 'FS 69 00

29.4958

831

69 05 61

832

69 22 24

28.8271 28.8444 28.8617

FS 8641 F603 84 76 21 29 76 38 76 76 56 25 76 73 76 76 91 29 77 08 84 77 26 41 77 44 00

29.5127 29.5296 29.5466 29.5635 29.5804 29.5973 29.6142 29.6311 29.6479 29.664$

829

833

69 38 89

834

28.8791

$35

6955 56 69 72 25

836

69 8896

28.9137

837

7005 69

28.9310

838

70 22 44

28.9482

839

/0 39 21

2$.9655

840

70 56 00

28.9828

28.8964

29.1719 29.1890 29.2062 29.2233

29.3939 29.4109

29.4279 29.4449 29.461$ 29.478$

+ Bypermission fromSQfielicefor etudents ofpeychology anded~ipp by H aon. Copyright 1936,McGraw-Hill Book Company'1nlQ.

300

APPENDIX

Thnrx U. TABLEQF SQUhRES hND SQUhEERooTs' (Conlinuedj Number

Square

Square root

Number

Square

Square root

881

77 61 61

29.6816

921

84 &241

30.3480

882

77 79 24

29.6985

922

85 00 84

30.3645

883

77 96 89

29.7153

923

85 1929

30.3809

884

78 14 56

29.7321

924

85 3776

30.3974

885

78 32 25

29.7489

925

85 5625

30.4138

78 49 96

29.7658

926

85 74 76

30.4302

85 93 29

30.4467 30.4631

886 887

78 67 69

29.7825

927

888

78 85 44

29.7993

928

86 11 84

889

7903 21

29.8161

929

86 30 41

30.4795

890

79 21 00

29.8329

930

86 49 00

30.4959

891

7938

81

29.8496

931

86 67 61

30.5123

892

79 56 64

29.8664

932

86 86 24

30.5287

893

79 74 49

29.8831

933

87 04 89

30.5450

894

7992

36

29.8998

934

8723 56

30.5614

895

80 10 25

29.9166

935

8742 25

30.5778

896

8028 16

29.9333

936

87 60 96

30,5941

897

80 46 09

29.9500

937

87 7969

30.6105

898

&0 64 04

29.9666

938

879& 44

30.6268

899

80 &201

29.9833

939

88 17 21

30.6431

900

81 00 00

30.0000

940

88 3600

30.6594

901

81 18 01

30.0167

941

88 54 81

30.6757

902

81 3604

30.0333

942

88 73 64

30.6920

903

81 54 09

30.0500

943

88 92 49

30.7083

904

81 72 16

30.0666

944

89 11 36

30.7246

905

81 90 25

30.0832

945

89 30 25

30.7409

82 08 36

30.0998

946

8949

16

30.7571

907

82 2649

30.1164

947

89 68 09

30.7734

908

82 44 64

30.1330

948

89 87 04

90.7896

909

82 62 81

30.1496

949

90 06 01

30.8058

910

82 81 00

30.1662

950

90 25 00

30.8221

911

82 99 21

30.1828

951

90 44 01

30.8989

912

83 1744

30.1993

952

9063

913

83 9569

30.2159

953

90 82 09

30.8707

914

83 53 96 83 72 25 83 90 56

30.2324

954

91 01 16

30.8869

30.2490

955

91 2025

30.9031

30.2655

956

91 39 36

30.9192 30.9354

915 916

04

30.8545

917

84 08 89

30.2820

957

91 5849

918

84 27 24

30.2985

958

91 77 64

30.9516

919

84 45 61

959

91 96 81

30.9677

920

84 64 00

30.3150 30.3315

960

92 16 00

30.9839

~ By permissionfrom Statisticsfor eludenteof peycIIologyand education,by H. Sorsn-

son. Copyright1936,McGraw-HillBookCompany,Inc.

hPPENDIX

TABLE

301

lj. ThBLE oF SQUhREs ANDSQUhRERooT8 (Continued) Square

Square root

961

92 35 21

31.0000

981

96 23 61

31. 3209

962

92 54 44

31.0161

982

9643

24

31.3369

963

92 73 69

31.0322

983

96 62 89

31.3528

964

92 92 96

31.0483

984

96 82 56

31.3688

965

93 1225

31.0644

985

97 02 25

91.3847

966

93 31 56

31.0805

9&6

97 21 96

31.4006

967

93 5089

31.0966

987

97 41 69

31.4166-

968

93 70 24

31.1127

988

97 61 44

31.4325

969

93 8961

31.1288

989

97 81 21

31.4484

970

94 09 00

31.1448

990

98 0100

31.4643

971

94 28 41

31. 1609

991

98 20 81

31.4802

972

9447 84

31. 1769

992

98 40 64

31.4960

973

94 67 29

31.1929

993

98 60 49

31.5119

974

94 86 76

31.2090

994

98 N36

31.5278

975

95 06 25

31.2250

995

976

95 25 76

31.2410

996

9900 25 99 20 16

31.5496 31.5595

977

95 45 29

31.2570

997

99 4009

31.5753

978

95 64 84

31.2730

998

99 6004

31.5911

979

95 8441

31.2890

999

99 8001

31.6070

9N

96 04 00

31.3050

1000

Number

Number

1000000

31.6228

By permissionfrom Stotietice for ehufente of peyckelogy andeducation, by H. Sorus eon. Copyright 1986,McGraw-Hill Book Company, Inc.

Binomial distribution, table of associated probabilities, 250

Adams, L., 107 Adorno, T. W., 132n., 186n., 205n.

Alpha (a), definitionof, 9

useof (eeeBinomial test; Signtest) Binomial test, 86-42

snd Type I error, 9

(See alsoSignificance level) Alternative hypothesis(H>), definition of, 7

comparedwith other one-sample tests, 59

function and rationale,36 McNemartest, relation to, 65n.,66-67

and location of rejection region, 13-14

(SeealeoOne-tailedtest;Two-tailed test)

Analysis of variance,nature of, 159161, 174-175

nonparametric,159-194 interactions in, 33

parametric(eeeF test) Anderson, R. L., 17, 31n. Andrews, F. C., 193 Arithmetic mean (eeeMean, arithmetic) Asch, S. E., 205n., 207 Associatedprobability,definitionof, 11 and rejectionregion,13 and samplingdistribution,11 and significanc level, 8

Assumptions, additivity,133 ss conditions of statistical model, 18-

method, 87-42 for large samples,~ correction for continuity, 40-41 normal distribution approximation, 40 one-tailed and two-tailed tests, 41 for small samples,38-40 example, 15-16, 89-40

one-tailed test, 88-89 two-tailed test, 39 power-efficiency, 42

table of associatedprobabilities,250 Birnbaum,C. W., 49, 52, 136 Blackwell, D., 8n. Bowker, A. H., 67 Brown, G. W., 116 Buford, H. J., 49n.

Burke, C. J., 47, 111,179

20

in measurement, 27-28

of parametricstatistical tests,2 3, 19-20, 25, 30-31

ss quslifiers of researchconclusions,19 and sampling distribution, 12

(SeealeoStatisticalmodel)

C (eeeContingencycoefBcient) Central-limit theorem, 12-18

xetest, contingencycoemcient,usein significance test, 197-200

Auble, D., 127, 274n;277n.

"in8ated N" in, 44, 109,228-229

Average (eeeMean; Median)

for k independent samples, 175-179 compared with other tests for k in-

dependentsamples,198-194

bancroft, T. A., 17, 1n gsrthol, R. P., 3W bergman, G., 80

function, 175 method, 175-178

Bernard, G. A., 104

orderedhypothesis,test of, 179

example, 176-178

Bets (p), definition of, 9 (SeealeoType II error) Binomial coefficients,tableof, 288 Binomial distribution, 15,36-38

power, 179

requirementsfor use, 178-179

smallexpected frequencies, 178179

normal approximation to, 40-41

(See alsoMediantest,extension of) 303

304

INDEX

z~ test, nominaldata, usewith, 23 one-sample test, 42-47

165

comparedwith other one-sample tests, 59 function, 42 43 method, 43 47 degreesof freedom, 44 example, 44-46 small expected frequencies, 46 ordered hypothesis, test of, 45n., 47 power, 47 compared with Kolmogorov-

Smirnovone-sampletest, 51 table of critical values, 249 for two independent samples, 104-111 compared with other tests for two independent samples, 156-158 function, 104

as mediantest, 112

powerandpower-cfficiency, 165-166 Coefficient,of concordance (seeKendall coefficient of concordance)

of contingency (seeContingency coefficient) of variation, use with data in ratio scale, 29, 30

Coles,M. R., 118, 119n. Consensual ordering,useof Kendall W to obtain, 287-238

Contingencycoefficient(C), 196-202 compared with other measuresof association, 238-240 function, 196

limitationsof, 200-201 method, 196-198, 200 example, 197-198

method, 104-110 degreesof freedom, 106 expected frequencies, 105-106, 109 2 X 2 contingency tables, 107-109 correction for continuity in, 107 ordered hypothesis, test os 110 power, 110

comparedwith KolmogorovSmirnovtwo-sampletest, 136 requirements for use, 110

x,~ (seeFriedman two-way analysisof variance by ranks) Chi-square distribution, 43n., 106n. approximation to, in Cochran Q test,

nominaldata, usewith, 23, 30 power, 201-202

significance test, 198-200 example, 200

Continuity,correction for (seeCorrection for continuity)

Continuous variable,assumption in statistical tests,25 and tied scores,26-26 Coombs,C. H., 30, 76n. Correctionfor continuity,in binomial test, 40-41

in x test of 2 X 2 table, 107 in McNemar test, 64 in sign test, 72

162-163

in Friedman two-way analysis of variance by ranks, 168 in Kendall

CochranQ test,method,example,163-

coefficient

of concord-

ance, 286

in Kendall partial rank correlation coefficient, 226, 228-229 in Kolmogorov-Smirnov temple test, 131-135 in Kruskal-Wallis one-way analysis of variance by ranks, 185 in McNemar test, 64 table of critical values, 249 (See also z' test)

Child, I. L., 112-115, 121, 123n. Classificatoryscale(seeNominal scale) Clopper,C. J., 42 Cochran,W. G., 46, 47, 104, 110, 160, 162, 166, 179, 184, 202

CochranQ test, 161-166 comparedwith other teatsfor k related samples, 178

function, 161-162 method, 162-165

in Wald-Wolfowitsruns test, 140-141 Cumulativefrequencydistribution,in Kolmogorov-Smirnov one-sample test, 47-52

in Kolmogorov-Smirnov two~pie test, 127-136

Cyclicalffuctuationsand one-sample runs test, 52

David, F. N., 42 Davidson,D., 80 Decision,statistical,theory, 8n. in statisticalinference,6-7, 14 Degreesof freedom,44 Designof research,beforeand after, 63 correlational, 195-196

k independentsamples,174-175 k related samples, 169-161 single sample, 86-36

two independentsamples,95-96 two related samples,159-161 Disarray, ~ ascoefficientof, 215

Z)iscrete variste, 25 distribution-free statistical tests, 3

(SecalsoNonpsrametricstatistical tests)

>ixon, W. J., 17, 31n., 47, 75, 87, 110, 136, 179

Fisher exact probability test, method, associatedprobability of data, one-tailedtest, 99-100 two-tailedtest, 100 exact probabilityof data, 98-100 Tocher'smodification,101-103 power, 104

table of critical values,256-270 ~wards

A L y 31n.,110, 179

@elle, K., 24n.

Fisenhart, C., 58, 145,252n;253n. Engvall, A., 69n.

equated groups,andanalysis of variance, 160-161 sensitivity of, 62 and two-sample teste, 61 62

gquivalenceclsseee, in intervalscale,28, 30

in nominal scale, 23, 30 in ordinal scale, 24, 30 in ratio scale, 29-30

gstimation, 1

gÃpectedfrequencies, in xI on~mple test, 43-44

in x~ temple

test, 105-106

in contingency coefficient, 1g6-1g7

Frenkel-Brunswik, E., 132n.,186n.,205n. Frequencycounts,usewith nominal data, 23, 30 Fteund, J. E., 58 Friedman,M., 168, 171n.,172,238,280n., 281n., 286n.

Friedmantwo-way analysisof variance by ranks, 166-172 compared with other teste for k re-

lated samples,173 function, 166 method, 166-172 example,169 172 small samples,168-169 power, 172 rationale, 166-168

tableof associated probabilities, 280281

smelly46' 109' 178-17g

gxtensionof mediantest (seeMedian test)

ggtremereactions, testfor (aceMoses teat of extreme reactions)

p test,assumptione, 19-20,160 interval scale,usewith data in, 28 for k independentsamples,174 for k relatedsamples,160

power,compared withFriedman twoway analysisof variance,172 ~key's procedure,160

gactorials, tableof,287 pagan,J., 205n.,207,227 pestinger, L., 127n. pinney,D. J., 104,256n.-270n. Fisher,R. A., 31n.,92,101,104,248n., 249n.

gisherexactprobability test,96-104 >>test,useof Fishertestasalternative, 110

compared with othertestsfor two independentsamples,156-158 function, 96-97 as median test, 112 Inethod, 97-103 associatedprobabilityof data, 98 100

example,100-101

Generalieation fromparametric andnonparametricteste, 18-20

Geometric mean,29, 30 Ghiselli,E. E., 141,143n. Girshick,M. A., Sn.

Goodman, L. A., 49,52,59,131,135, 13B,158, 239, 278n.

Goodness of St, binomialtestof, 36-42 z' test of, 42-47

Kolmogorov-Smirnov testof, 47-52 andon~pie case,35, 59-60 Gordon,J. E., 232n. Groeslight,J. H., 169n.

H test (seeKruekal-Wallis one-way analysisof varianceby ranks) Hempel, C. G., 30

Hollingshead, A. B., 176,177n.,197,198, 200

Homoecedssticity, assumption of, in t and F teste, 19 definitionof, 19 Hotelling, H., 213, 223 Hurst, P. M., 80n.

Hypotheses, derivedfromtheory',B errorsin testing,8-11 operationalstatementsof, 7 procedurein testing,6-7

INDEX

Hypotheses, tests of, 1

(SeealsoAlternativehypothesis;Null hypothesis; Research hypothesis; Statistical testa)

Kendall rank correlationcoefficient(r), comparisonof r and rq, 219 in numerical values, 219 in power, 219, 222, 228, 239 in uses, 213-214

"Infiated N" in x' test, 44, 109, 22S-229 Interactions in analysis of variance, 33 Interval scale, 26-28, 30 admissible operations, 2S, 30 definition of, 26-27 examples of, 27-28 formal properties of, 28, 30 unit of measurement, 27 zero point, 26-28 Isomorphism, 22 Jonckheere, A. R., 194

k samples(seeDesignof research) Kendall, M. G., 172, 202, 203, 212, 213, 220 223 226' 229' 234) 238) 285n Kendall coefficient of concordance (W), 229-238

comparedwith other measuresof association, 238-239 function, 229

interpretationof, 237 238 method, 231 235, 237 example, 232 233 tied observations, 233 235 assignment of ranks to, 233 correction for, 234 235 effect of, 234 example with, 234 235 ordinal data, use with, 30 rationale, 229 231

significancetest, 235 237 large samples, 236-237 chi-square approximat ion, 236 example, 236-237 small samples, 235 236 table of critical values, 286

Kendall partial rank correlationcoefficient (~,.,), 223-229

comparedwith other measuresof association, 238-289 function, 223 224 method, 226 228

example,227 228 rationale, 224-226

significancetest, 22S-229 Kendall rank correlation coefficient (r), 213-223

compared with othermeasures of association, 238-239

function, 213 214 method, 215-219, 222 example, 216-217 tied observations, 217 219

assignmentof ranks to, 217 correction for, 218-219 effect of, 219 example with, 218-219

ordinal data, usewith, 25, 30 power-efficiency, 223 rationale, 214 215 significancetest, 220-222 large samples, 220 222 example, 221-222

normaldistributionapproxima tion, 221-222 smallsamples,220 221 table of associatedprobabilities,285 Kolmogorov,A., 136 Kolmogorov-Smirnov test,,for one sample, 47 52

comparedwith other one-sample tests, 59-60 cumulative frequency distribution in, 47 52 function and rationale, 47 48 method, 48-51 example, 49-50 one-tailed test, 49 two-tailed test, 48 power, 51 compared with x' test, 51 table of critical values, 251 when parameter values are esti-

mated from sample,60 for two samples, 127-136

comparedwith other testsfor two independentsamples,156-158 cumulative frequency distribution in, 127-136 function, 127 method, 128-136 large samples, 131-135 one-tailed test, 131-135 two-tailed test, 131 small samples, 129-131 example, 129-131 power-efficiency, 136 rationale, 127 128 tables for, 278-279

Kruskal, W. H., 188, 189n., 193, 239, 283n.

807

IADER

Kruaksl-Wallis one-way analysis of variance by ranks, 184 193

compared with other tests for k independent samples,193-194 function, 184-185 method, 185-192 large samples, 185 example, 189-192 small samples, l 85-188 example, 186-188 tied observations, 188-192 assignment of ranks to, 188 correction for, 188-192 effect of, 188-189 example with, 189-192

power-effiCienC,192-193 rationale, 1S5 table of associated probabilities, 282283

IfcNemar test for signifirsnrc of c hangea, in behavioral science,21-22, 26-28 63-67

comparedwith other testafor two related samples, 92-94 function, 63 method, 63-67 binomial test, relation to, 65n., 6667

corrertion for continuity, 64 example, 65-66 one-tailed test, 67

small expectedfrequrnriea,M-67 two-tailed test, 67

power-eScicncy,67 rationale, 63 64 sign test, relation to, 74 ]@ann, H. 13., 120, 127, 27ln.-273n. ]4fsnn-Whitney U test, l 16-127

comparedwith other testsfor two independentsamples,156-158 function, 116 method, 116-126

as criterion in choice of statistirsl test, 31

formal properties of scales,23 isomorphism, 22 levels of, 22 interval scale, 26-28 nominal scale, 22-23

ordered metric scale, 76n. ordinal scale, 23 26 ratio scale, 28-29

snd nonparametricstatistical testa,3 parametricstatistical model,requirement associatedwith, 19-20 in physical science,21 and statistics, 30

Median,usewith ordinaldata,25,30 Medianteat, extensionof, for I; independentsamples,179-184

compared with other testa for k

independent samples, 193-194

function, 179

30S

INDEX

Median test, extension of, for k indepen- Nonparamctri«statisti«al tests, in bs havioral science, 31 dentsamples, method,179-184 example, 180-184

power-efficiency, 184 compared with Kruskal-Wallis test, 193

for two independentsamples,111-116 comparedwith other testsfor two independentsamples,156-158 function, 111

method, 111115 example, 112-115

power, comparedwith KolmogorovSmirnov two-sample test, 136

compared with Mann-lvVhitney U test, 123

powermfficiency, 115 rationale, 111-112 Meeker, M., 24n.

Minimax principle,Sn.

Mode,usewith nominaldata, 28, 30

Model,statistical (seeStatistical model) MoodyA M y 17'31np 42 75)88g1 15' 116,126,145,184,194 Moore, G. H., 58 Moran, P. A. P., 223, 229

Moses,L. E., 75, 88, 92, 116,144147, 152, 156 Moses test of extreme reactions, 145-152

compared with othertestsfor two independent samples, 156-15S function and rationale, 145-146 method, 147-151 example, 148-151 tied observations, 151 effect of, 151

procedurewith, 151 power, 151 range in, 146 Mosteller, F., 194

Multipleproduct-moment correlation, use with data in interval scale, 30

conclusions from generality of 3

interval scale,use with data in, 28 measurementrequirements, 3, 80, 83 parametric statistical tests, comparison with, 30-34 (See aleo Contents for list of nonparametric tests)

Normal distribution, approximation to, in binomial test, 40-41

in Mann-Whitney U test, 120-128 in one-sample runs test, 56 58 in randomization

test for two re-

lated samples, 91 92 in sign test, 72 74

in significance test,for Kendall r, 221-222

in Wald-Wolfowitz runs test, 140143

in Wilcoxon matched-pairs signedranks test, 79 88 assumption of, in interval scaling, 27

in t and F tests, 19 table of, 247 Null hypothesis (Hi,), definition of, 7 statement of, in steps of hypothesistesting, 6

Olds, E. G., 213, 284n. One-tailed test, and nature of H>f 7

power of, 11 rejection region of, 13 14 Order tests, 3 use with ordinal data, 25 Ordered metric scale, 76n. Ordinal scale, 23 26, 30 admissible operations, 24-25 definition of, 2:3-24 examples of, 24

formal properties of, 24 statistics and tests appropriate to, 2526I 30

Nno York Post, 45n. Neyman, J., 104 Nominal scale, 22-23, 30

ties, occurrencein, 25-26

definition of, 22

Pabst, M. R., 218, 223 Parameters, assumedin parametric tests,

examplesof, 22-23 formal properties of, 23 statistics and tests appropriate to, 23,

definition of, 2 Parametric statistiral

admissible operations, 23

30

Nonnormality and parametric tests, 20, 126

Nonparametric statistical tests, assumptions underlying' 25' 31i 32

30-32

tests, 2-3

interval scale, use with data in, 28 measurementrequirements, 19-20, 30 nonparametric statistical tests, comparison with, 30-34 ordinal scale, use with data in, 26

309

parametric statisticaltests,ratio scale, use with data in, 29 statistical model, 19-20

underlying continuity,assumptionof, 25

for various research designs,correlational, 195-196

k independentsamples,174175 k related samples, 160-161 one sample, 35-36 two independent samples, 96 two related samples, 62 partial rank correlation, 223-229

partially orderedscale,24

Powerwfficiency, of nonparametrictests, randomisation test, for matched pairs, 92

for two independentsamples,156 sign test, 75

Spearmanrs, 213, 219 Wald-Wolfowitsruns test, 144-145 Walsh test, 87 Wilcoxonmatched-pairssignedranks test, 83

and samplesise,20-21 Q test (seeCochran Q test)

pearson, E. S., 42, 104

pearson product-momentcorrelation coefficient (r), interval scale, use with data in, 28, 30 measure of association, 195-196

power,compared with re, 213 comparedwith r, 223 percentile, usewith ordinaldata, 30 phi coefficient,226

pitman, E. J. G., 92, 152n.,154,156 pool, I. de S., 100n. Power of statisticaltest, 10-11 curves, test of the mean, 10 and one-tailed and two-tailed tests, 11n.

sample sise,relation to, 10-11, 20-21 and statist,ical test, choice of, 18) 31 and Type II error, 10-11

power-efficiency,20-21 as oriterion in choice of statistical test, 31

definition of, 20-21, 33

of nonparametrictests,33 binomial test, 42

x tests 47>110' 179 Cochran Q test, 165-166

contingencycoefficient,201-202 extension of median test, 184 Fisher test, 104

Friedman two-way analysisof varianceby ranks, 172 Kendall r, 219, 223 Kolmogorov-Smirnov one-sample test, 51

Kolmogorov-Smirnov tw~ple test, 136 Kruskal-Wallis one-way analysis of

varianceby ranks, 192-193

r (eeePearsonproduct-momentcorrels tion coefficient)

re (eeeSpearmanrank correlation coefficient)

Radlow, R., 169n.

Randomisation test,for matchedpairs, 88-92

comparedwith other testsfor two related samples,92-94 function,88 method,88-92 small samples,88-91 example, 90-91

large samples,91-92 normal distributionapproximation, 91-92

Wilcoxontestasalternative, 91-92

power-efficiency, 92

rationale,88-89

for twoindependent samples, 152-156 comparedwith other testsfor two independentsamples,156-158 function, 152 method, 152-156 large samples,154-156 Mann-Whitney test as alterna-

tive, 155-156 t distributionapproximation, 154-156

small samples,152-154 power-efficiency,156 rationale, 152-154 Randomisedblocks,161 Randomness,test for, 52-58 Range in Mosestest, 146

McNemar test, 67

Rankingscale(eeeOrdinalscale)

Mann-Whitney U test, 126

Ranking tests,3 usewith ordinal data, 25 Ratio scale,2&40

median test, 115 Moses test, 151

one-sampleruns test, 58

admissible operations, SMO

310

INDEX

Ratio scale, definition of, 28 29 example of, 29 formal properties of, 29, 30 zero point in, 28-29

Sample size (N), and power, 9-11

Regionof rejection(seeRejectionregion) Reflexivedefined,23n.

Sampling distribution, 11-13

and power-efficicncy, 20 21 specification of, in steps in hypothesis testing, 6

Rejection region, 13-14 definition of, 13 illustration of, 14 (Fig. 2)

locationof, and alternative hypothesis, 13 size, and significancelevel, 14

specificationof, in stepsof hypothesis

Schueller, G. K., 100n.

testing, 6

Research,designof (seeDesignof research)

and statistics, 1-2, 6 and theory, 6

Researchhypothesis,definitionof, 7 operationalstatementof, 7 Rho (seeSpearmanrank correlation coefficient)

Run, definitionof, 52, 13? Runs test, k-sample,194 one-sample, 52 58

comparedwith other one-sample tests

60

function and rationale, 52-53 method, 53-58 large samples, 56 58 example, 56-58 normal distribution approximation, 56 small samples, 53 56 example, 54-56

power-efficiency, 58 two-sample (Wald-Wolfowitz), 136145 compared with other tests for two independent samples, 156-158

140-

example, 141-143 normal distribution approximation, 140 small samples, 138-140 example, 138 140 tied observations, 143-144 effec of, 143

power-efficiency, 144-145 rationale, 136-138

McNemar test, relation to, 74 method, 6S-75 large samples,72 74 correction for continuity, 72 example, 72-74 normal distribution approximation, 72 one-tailed and two-tailed teats, 72 small samples, 68 71 example, 69-?1 one-tailed and two-tailed tests, 69

tied observations,procedurewith, 71 power, compared to Wilcoxon test, power-efficicncy, 75 Significance level (n), 8-11 definition of, 8 and rejection region, 14

specification of, in steps in hypothesis

141

procedure with, 143 144 power, compared with MannWhitney U test, 144-145

Scientific method, objectivity in, 6 Sicgel, A. E