Table Of ContentPearson New International Edition
Psychological Testing
Principles and Applications
Kevin R. Murphy Charles O. Davidshofer
Sixth Edition
International_PCL_TP.indd 1 7/29/13 11:23 AM
ISBN 10: 1-292-04002-5
ISBN 13: 978-1-292-04002-8
Pearson Education Limited
Edinburgh Gate
Harlow
Essex CM20 2JE
England and Associated Companies throughout the world
Visit us on the World Wide Web at: www.pearsoned.co.uk
© Pearson Education Limited 2014
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted
in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without either the
prior written permission of the publisher or a licence permitting restricted copying in the United Kingdom
issued by the Copyright Licensing Agency Ltd, Saffron House, 6–10 Kirby Street, London EC1N 8TS.
All trademarks used herein are the property of their respective owners. The use of any trademark
in this text does not vest in the author or publisher any trademark ownership rights in such
trademarks, nor does the use of such trademarks imply any affi liation with or endorsement of this
book by such owners.
ISBN 10: 1-292-04002-5
ISBN 10: 1-269-37450-8
ISBN 13: 978-1-292-04002-8
ISBN 13: 978-1-269-37450-7
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
Printed in the United States of America
Copyright_Pg_7_24.indd 1 7/29/13 11:28 AM
111122222579135702461022264382684
P E A R S O N C U S T O M L I B R AR Y
Table of Contents
1. Tests and Measurements
Kevin R. Murphy/Charles O. Davidshofer 1
2. Defining and Measuring Psychological Attributes: Ability, Interests, and Personality
Kevin R. Murphy/Charles O. Davidshofer 20
3. Testing and Society
Kevin R. Murphy/Charles O. Davidshofer 52
4. Basic Concepts in Measurement and Statistics
Kevin R. Murphy/Charles O. Davidshofer 72
5. Scales, Transformations, and Norms
Kevin R. Murphy/Charles O. Davidshofer 92
6. Reliability: The Consistency of Test Scores
Kevin R. Murphy/Charles O. Davidshofer 116
7. Using and Interpreting Information About Test Reliability
Kevin R. Murphy/Charles O. Davidshofer 134
8. Validity of Measurement: Content and Construct-Oriented Validation Strategies
Kevin R. Murphy/Charles O. Davidshofer 153
9. Validity for Decisions: Criterion-Related Validity
Kevin R. Murphy/Charles O. Davidshofer 178
10. Item Analysis
Kevin R. Murphy/Charles O. Davidshofer 202
11. The Process of Test Development
Kevin R. Murphy/Charles O. Davidshofer 226
12. Computerized Test Administration and Interpretation
Kevin R. Murphy/Charles O. Davidshofer 248
13. Ability Testing: Individual Tests
Kevin R. Murphy/Charles O. Davidshofer 264
I
233344448159224775145637
14. Ability Testing: Group Tests
Kevin R. Murphy/Charles O. Davidshofer 287
15. Issues in Ability Testing
Kevin R. Murphy/Charles O. Davidshofer 315
16. Interest Testing
Kevin R. Murphy/Charles O. Davidshofer 351
17. Personality Testing
Kevin R. Murphy/Charles O. Davidshofer 394
Appendix: Forty Representative Tests
Kevin R. Murphy/Charles O. Davidshofer 425
Appendix: Ethical Principles of Psychologists and Code of Conduct
Kevin R. Murphy/Charles O. Davidshofer 426
References
Kevin R. Murphy/Charles O. Davidshofer 443
Index 477
II
Tests and Measurements
The term psychological test brings to mind a number of conflicting images. On the one
hand, the term might make one think of the type of test so often described in television,
movies, and the popular literature, wherein a patient answers questions like, “How
long have you hated your mother?” and in doing so reveals hidden facets of his or her
personality to the clinician. On the other hand, the psychological test might refer to a
long series of multiple-choice questions such as those answered by hundreds of high
school students taking college entrance examinations. Another type of “psychological
test” is the self-scored type published in the Reader’s Digest,which purports to tell you
whether your marriage is on the rocks, whether you are as anxious as the next fellow,
or whether you should change your job or your lifestyle.
In general, psychological tests are neither mysterious, as our first example might
suggest, nor frivolous, as our last example might suggest. Rather, psychological tests
represent systematic applications of a few relatively simple principles in an attempt to
measure personal attributes thought to be important in describing or understanding
individual behavior. The aim of this text is to describe the basic principles of psycho-
logical measurement and to describe the major types of tests and their applications. We
will not present test theory in all its technical detail, nor will we describe (or even men-
tion) all the different psychological tests currently available. Rather, our goal is to pro-
vide the information needed to make sensible evaluations of psychological tests and
their uses within education, industry, and clinical practice.
The first question that should be addressed in a psychological testing text is,
“Why is psychological testing important?” There are several possible answers to this
question, but we believe that the best answer lies in the simple statement that forms
the central theme of this text: Tests are used to make important decisions about in-
dividuals. College admissions officers consult test scores before deciding whether
to admit or reject applicants. Clinical psychologists use a variety of objective and
From Chapter 1 of Psychological Testing: Principles and Applications, Sixth Edition. Kevin R. Murphy,
Charles O. Davidshofer. Copyright © 2005 by Pearson Education, Inc. All rights reserved.
1
Tests and Measurements
projective tests in the process of choosing a course of treatment for individual clients.
The military uses test scores as aids in deciding which jobs an individual soldier might
be qualified to fill. Tests are used in the world of work, both in personnel selection and
in professional certification and licensure. Almost everyone reading this text has taken
at least one standardized psychological test. Scores on such a test may have had some
impact on an important decision that has affected your life. The area of psychological
testing is therefore one of considerable practical importance.
Psychological tests are used to measure a wide variety of attributes—intelligence,
motivation, mastery of seventh-grade mathematics, vocational preferences, spatial
ability, anxiety, form perception, and countless others. Unfortunately, one feature that
all psychological tests share in common is their limited precision. They rarely, if ever,
provide exact, definitive measures of variables that are believed to have important ef-
fects on human behavior. Thus, psychological tests do not provide a basis for making
completely accurate decisions about individuals. In reality, no method guarantees
complete accuracy. Thus, although psychological tests are known to be imperfect mea-
sures, a special panel of the National Academy of Sciences concluded that psychologi-
cal tests generally represent the best, fairest, and most economical method of obtaining
the information necessary to make sensible decisions about individuals (Wigdor &
Garner, 1982a, 1982b). The conclusions reached by the National Academy panel form
another important theme that runs through this text. Although psychological tests are
far from perfect, they represent the best, fairest, and most accurate technology avail-
able for making many important decisions about individuals.
Psychological testing is highly controversial. Public debate over the use of tests,
particularly standardized tests of intelligence, has raged since at least the 1920s (Cron-
bach, 1975; Haney, 1981; Scarr, 1989).1An extensive literature, both popular and techni-
cal, deals with issues such as test bias and test fairness. Federal and state laws have
been passed calling for minimum competency testing and for truth in testing, terms
that refer to a variety of efforts to regulate testing and to increase public access to infor-
mation on test development and use. Tests and testing programs have been challenged
in the courts, often successfully.
Psychological testing is not only important and controversial, but it is also a
highly specialized and somewhat technical enterprise. In many of the natural sciences,
measurement is a relatively straightforward process that involves assessing the physi-
cal properties of objects, such as height, weight, or velocity.2 However, for the most
part, psychological attributes, such as intelligence and creativity, cannot be measured
by the same sorts of methods as those used to measure physical attributes. Psychologi-
cal attributes are not manifest in any simple, physical way; they are manifest only in
1Special issues of American Psychologistin November 1965 and October 1981 provide excel-
lent summaries of many of the issues in this debate.
2Note, however, that physical measurement is neither static nor simple. Proposals to rede-
fine the basic unit of length, the meter, in terms of the time that light takes to travel from point to
point (Robinson, 1983) provide an example of continuing progress in redefining the bases of
physical measurement.
2
Tests and Measurements
the behavior of individuals. Furthermore, behavior rarely reflects any one psychologi-
cal attribute, but rather a variety of physical, psychological, and social forces. Hence,
psychological measurement is rarely as simple or direct as physical measurement. To
sensibly evaluate psychological tests, therefore, it is necessary to become familiar with
the specialized methods of psychological measurement.
This chapter provides a general introduction to psychological measurement.
First, we define the term testand discuss several of the implications of that definition.
We then briefly describe the types of tests available and discuss the ways in which tests
are used to make decisions in educational, industrial, and clinical settings. We also dis-
cuss sources of information about tests and the standards, ethics, and laws that govern
testing.
PSYCHOLOGICAL TESTS—A DEFINITION
The diversity of psychological tests is staggering. Thousands of different psychological
tests are available commercially in English-speaking countries, and doubtlessly hun-
dreds of others are published in other parts of the world. These tests range from per-
sonality inventories to self-scored IQ tests, from scholastic examinations to perceptual
tests. Yet, despite this diversity, several features are common to all psychological tests
and, taken together, serve to define the term test.
A psychological test is a measurement instrument that has three defining charac-
teristics:
1. A psychological test is a sample of behavior.
2. The sample is obtained under standardized conditions.
3. There are established rules for scoring or for obtaining quantitative (numeric) information
from the behavior sample.
Behavior Sampling
Every psychological test requires the respondent to do something. The subject’s
behavior is used to measure some specific attribute (e.g., introversion) or to predict
some specific outcome (e.g., success in a job training program). Therefore, a variety of
measures that do not require the respondent to engage in any overt behavior (e.g., an
X-ray) or that require behavior on the part of the subject that is clearly incidental to
whatever is being measured (e.g., a stress electrocardiogram) fall outside the domain
of psychological tests.
The use of behavior samples in psychological measurement has several implica-
tions. First, a psychological test is not an exhaustive measurement of all possible be-
haviors that could be used in measuring or defining a particular attribute. Suppose, for
example, that you wished to develop a test to measure a person’s writing ability. One
strategy would be to collect and evaluate everything that person had ever written,
from term papers to laundry lists. Such a procedure would be highly accurate, but im-
practical. A psychological test attempts to approximate this exhaustive procedure by
3
Tests and Measurements
collecting a systematic sample of behavior. In this case, a writing test might include a
series of short essays, sample letters, memos, and the like.
The second implication of using behavior samples to measure psychological vari-
ables is that the quality of a test is largely determined by the representativeness of this
sample. For example, one could construct a driving test in which each examinee was
required to drive the circuit of a race track. This test would certainly sample some as-
pects of driving but would omit others, such as parking, following signals, or negotiat-
ing in traffic. It would therefore not represent a very good driving test. The behavior
elicited by the test also must somehow be representative of behaviors that would be
observed outside the testing situation. For example, if a scholastic aptitude test were
administered in a burning building, it is unlikely that students’ responses to that test
would tell us much about their scholastic aptitude. Similarly, a test that required
highly unusual or novel types of responses might not be as useful as a test that re-
quired responses to questions or situations that were similar in some way to those ob-
served in everyday life.
Standardization
A psychological test is a sample of behavior collected under standardized condi-
tions. The Scholastic Assessment Tests (SAT), which are administered to thousands of
high school juniors and seniors, provide a good example of standardization. The test
supervisor reads detailed instructions to all examinees before starting, and each por-
tion of the test is carefully timed. In addition, the test manual includes exhaustive in-
structions dealing with the appropriate seating patterns, lighting, provisions for
interruptions and emergencies, and answers to common procedural questions. The test
manual is written in sufficient detail to ensure that the conditions under which the
SAT is given are substantially the same at all test locations.
The conditions under which a test is administered are certain to affect the behav-
ior of the person or persons taking the test. You would probably give different answers
to questions on an intelligence test or a personality inventory administered in a quiet,
well-lit room than you would if the same test were administered at a baseball stadium
during extra innings of a play-off game. A student is likely to do better on a test that is
given in a regular classroom environment than he or she would if the same test were
given in a hot, noisy auditorium. Standardization of the conditions under which a test
is given is therefore an important feature of psychological testing.
It is not possible to achieve the same degree of standardization with all psycho-
logical tests. A high degree of standardization might be possible with many written
tests, although even within this class of tests the conditions of testing might be difficult
to control precisely. For example, tests that are given relatively few times a year in a
limited number of locations by a single testing agency (e.g., the Graduate Record Ex-
amination Subject Tests) probably are administered under more standard conditions
than are written employment tests, which are administered in hundreds of personnel
offices by a variety of psychologists, personnel managers, and clerks. The greatest diffi-
culty in standardization, however, probably lies in the broad class of tests that are ad-
4
Tests and Measurements
ministered verbally on an individual basis. For example, the Wechsler Adult Intelli-
gence Scale (WAIS-III), which represents one of the best individual tests of intelligence,
is administered verbally by a psychologist. It is likely that an examinee will respond
differently to a friendly, calm examiner than to one who is threatening or surly.
Individually administered tests are difficult to standardize because the examiner
is an integral part of the test. The same test given to the same subject by two different
examiners is certain to elicit a somewhat different set of behaviors. Nevertheless,
through specialized training, a good deal of standardization in the essential features of
testing can be achieved. Strict adherence to standard procedures for administering var-
ious psychological tests helps to minimize the effects of extraneous variables, such as
the physical conditions of testing, the characteristics of the examiner, or the subject’s
confusion regarding the demands of the test.
Scoring Rules
The immediate aim of testing is to measure or to describe in a quantitative way
some attribute or set of attributes of the person taking the test. The final, defining char-
acteristic of a psychological test is that there must be some set of rules or procedures
for describing in quantitative or numeric terms the subject’s behavior in response to
the test. These rules must be sufficiently comprehensive and well defined that different
examiners will assign scores that are at least similar, if not identical, when scoring the
same set of responses. For a classroom test, these rules may be simple and well de-
fined; the student earns a certain number of points for each item answered correctly,
and the total score is determined by adding up the points. For other types of tests, the
scoring rules may not be so simple or definite.
Most mass-produced standardized tests are characterized by objective scoring
rules. In this case, the term objective should be taken to indicate that two people, each
applying the same set of scoring rules to an individual’s responses, will always arrive
at the same score for that individual. Thus, two teachers who score the same multiple-
choice test will always arrive at the same total score. On the other hand, many psycho-
logical tests are characterized by subjective scoring rules. Subjective scoring rules
typically rely on the judgment of the examiner and thus cannot be described with suffi-
cient precision to allow for their automatic application. The procedures a teacher fol-
lows in grading an essay exam provide an example of subjective scoring rules. It is
important to note that the term subjectivedoes not necessarily imply inaccurate or un-
reliable methods of scoring responses to tests, but simply that human judgment is an
integral part of the scoring of a test.
Tests vary considerably in the precision and detail of their scoring rules. For multiple-
choice tests, it is possible to state beforehand the exact score that will be assigned to
every possible combination of answers. For an unstructured test, such as the Rorschach
inkblot test, in which the subject describes his or her interpretation of an ambiguous
abstract figure, general principles for scoring can be described, but it may be impossi-
ble to arrive at exact, objective scoring rules. The same is true of essay tests in the class-
room; although general scoring guidelines can be established, in most cases, it is
5