Barry Bricklin, Ph.D.
Adjunct Associate Professor
The Institute for Graduate Clinical Psychology, Widener University
Michael H. Halbert
Consultant to Management
Bala Cynwyd, PA
The
following summary of research with the PORT and BPS was published as two separate
articles. One is called “Can Child
Custody Data be Generated Scientifically?” The second is called “Perception-of-Relationships Test and Bricklin
Perceptual Scales: Validity and Reliability Issues.”
They
appeared in the American Journal of Family Therapy. Both came out in 2004, Volume 32, the first on pages 119-138, and
the second
on
pages 189-203 (two separate journal issues).
Abstract
Existing and new validity data on 3,880 cases from the Bricklin Perceptual Scales and Perception-of-Relationships Test address the assertion that custody data cannot be generated scientifically. Reliability (93 percent stability over 8 months) and validity data (90 percent agreement with multiple independent criteria) are presented, but without a fully explicated chain linking evidence to conclusions, one ends up with unresolvable, typically all-or-none, disputations about the adequacy of one’s evidence. This chain includes, among others, the (confusing) role of values in science, how system complexities profoundly affect measurement choices, and the value of information to a decision-maker. Psychometric indices, the usual source of such arguments, cannot alone address any of these areas. New research addresses differentiating test-retest changes that are due to errors of measurement from true changes in measured variables.
Can Child Custody Data Be Generated Scientifically?
The article begins by looking at whether it is not only practically but even theoretically possible to create child custody data scientifically. Doubts have been expressed in both areas (Krauss & Sales, 2000, pp. 859-870; O’Donohue & Bradley, 1999, pp. 314-315). It is argued that many, if not all, of these doubts stem from too narrow a definition of “science,” the complexities of creating system-specific measurements and confusion about the unavoidable roles in science of value-driven choices that cannot, by themselves, be right or wrong. System complexities impact the creation of suitable measurement units (ordinal and/or interval) as well as the choice of one’s reference standard (normative, criterion, single-participant). Further, the challenge of validating system-specific data may be quite different than those that arise with non-system-specific targets. Confusion also exists in regard to the value of using test data in real-life. Several points are made on the need, in evaluating scientific merit, to present a highly explicated chain of reasoning that links evidence to conclusions. Presenting such a chain is all one can do to demonstrate the degree to which an approach deserves to be called “scientific.” For one thing, it should be noted that custody assessment tools are particularly difficult to assess for merit, since a decision-maker cannot weigh their value by checking their data against those derived from a widely accepted model. None exists (Krauss & Sales, 2000, pp. 845-870; 1999, pp. 88-90). Another reason for the detailed chain is to avoid the kind of arguments exemplified by proponents and enemies of the Rorschach test (Exner, 2001, pp. 386-388; 2002, pp. 391-404; Ganellan, 2001; Garb, Wood, Lilienfeld, & Nezworski, 2002, pp. 455-457; Meyer, 2001, pp. 389-396; Weiner, Spielberger, & Abeles, 2002, pp. 7-12; Wood, Nezwarski, Garb, & Lilienfeld, 2001, pp. 350-373) and those that involve basic disagreements about what the word “evidence” means in the phrase “evidence-based” (formerly, “empirically-validated”) practice (Anthony, Rogers & Farkas, 2003; Gonzales, Ringeisen & Chambers, 2002). What one side offers as evidence is not viewed as evidence by the other side.
Our chain, spelled out by Piotrowski (1957, pp. 14-22), ascribes a simple but thorough description of a scientific model, similar to the one endorsed by Albert Einstein (1936). There are four tiers. The first consists of concepts. “Intelligence,” “depression” and later, it is argued, a “tree,” are concepts, that is, not completely definable by external sensory data. The second is principles. Principles state the relations among concepts. Empirical equivalents, most frequently the hidden cause of unresolvable arguments, define what one looks for in the world of sensory experience to exemplify a concept. Validation refers to the degree to which the relations among the empirical equivalents of the concepts correspond to the relations among the concepts as stated in the principles. There are no a priori ways to determine if empirical equivalents are well chosen, except insofar as how the four tiers work together to achieve some specific predictive goal.
Many believe values (personal or group) somehow contaminate the “objectivity” of the scientific process; for example, the value-driven nature of a best-interests determination makes it intrinsically impossible to approach it scientifically. But all scientific endeavors are value-driven, and not simply as constructivists or postmodernists believe, but in more basic ways (Berger, & Luckmann, Gonzales, Ringeisen, & Chambers, 20022, 204-209). Value-driven decisions are needed because “science” is not a closed system, one that is logically complete and internally consistent—one that possesses all the propositions and theorems needed to deduce all of its conclusions, as well as having the ability to prove any statement in it is true or false.
Most evaluators think of a system as an interactional model in which stable traits interact. This can be seen in the way they conceptualize and write about their evaluations. There are sections called “Mr. Jones,” “Mrs. Jones,” child “Mary Jones,” child “Sam Jones,” as though one can assess each element in a system as a separate entity and then somehow add up the parts. In systems-based decisions, the elements of the system cannot be evaluated apart from the interactions of those elements within the system. As people move in and out of systems, the relevant measurement reference standard can shift. There are aspects of a custody evaluation in which it is helpful to know how Child 1 assigns value to his or her parents, which requires a single-participant reference (a child’s scores are compared to other of his or her scores) in addition to how value would be assigned to this parent by comparing him or her to other parents, which requires a group reference. Note also that systems complexities can have profound effects on the choice of validating empirical equivalents. The parent from whom a child seeks emotional closeness and/or active help can change dramatically depending on the family systems in which the child-parent interactions take place (Bricklin & Elliot, 2002[a]; 2002[b]).
Sex: 797 females; 784 males
Age: Mean age 7.76; SD=0.17
SES: Low-Middle to High-Middle
Race: 98 percent Caucasian; 2 percent all other
BPS Normative Data
(1964-1997), n=2,389
Sex: 1202 females; 1,187 males
Age: Mean age 8.94; SD=2.40
SES: Low-Middle to High-Middle
Race: 98 percent Caucasian; 2 percent all others
PORT Validity Data (1961-1997),
n=1,381
The percent-of-agreement rate is listed following the sample size. Structured task problem-solving by children with access to both parents, observed from behind a one-way screen by three psychologists (1961), n=30, 90 percent; courtroom judges (1964-1981), based on all data available, n=45, 89 percent; agreement with BPS choices (1964-1981), n=23, 83 percent; courtroom judges (1981-1985), based on all data available, n=42, 95 percent; agreement with BPS choices (1981-1983), n=30, 84 percent; two psychologists, based on family therapy notes plus consultation with relevant therapists with families seen over two- to five-year intervals (1980-1985), n=30, 93 percent; courtroom judges (1986-1990), based on all data available, n=76, 93 percent; independent psychologists based on all clinical (except for PORT and BPS scores) and life-history data available (1995-1997), n=1,038, 89 percent.
BPS Validity Data (1964-1997),
n=2,279
Agreement with PORT choices (1964-1981), n=23, 83 percent; two psychologists, based on family therapy notes plus consultation with relevant therapists with families seen over two- to seven-year intervals (1980-1983), n=21, 100 percent; courtrooms judges (1980-1983), n=30, 90 percent; “Would” questionnaire choices (a “disguised” semi-projective test, asking what Mommy/Daddy would do in certain situations e.g., “You get a bad mark on a test”) (1980-1983), n=23, 87 percent; PORT choices (1981-1983), n=30, 84 percent; courtroom judges based on all available information (1984-1990), n=179, 96 percent; independent psychologists based on all clinical and life-history data available (1988), n=141, 97 percent; independent psychologists based on all clinical and life-history data available (1992-1995), n=1,765, 88 percent; independent psychologists based on all clinical and life-history data available (1995-1997), n=67, 87 percent.
PORT/BPS Interrater Reliability
Interrater reliability of PORT scoring was obtained from two samples of seminar attendees (n=36; n=41) in which more than half of the scorers had no prior experience with the PORT. Four different percent-of-agreement scores were obtained: (1) the points scored on Task I (the most complex task); (2) the POC on Task I; (3) the overall TDS score for all seven tasks; (4) the overall POC based on seven tasks. The percent-of-agreement rates, respectively, were: 74; 90; 82; 92. No interrater data for the BPS were gathered since scoring it is mechanical and requires only the ability to read arabic numbers and to recognize when one is larger than another. It is also assumed that an evaluator can add and subtract numbers between zero and 32.
One purpose was to gather test-retest data with larger samples than had been used before. Another was to formulate clinical hypotheses that could detect patterns that would red-flag test changes in the parent-of-choice (POC) over time and to investigate whether such changes should be considered errors of measurement or true changes in the measured variables.
Mental health professionals, abbreviated MHPs, were recruited from among those who had written or phoned the Professional Academy of Custody Evaluators (PACE) with a custody-related question between 1995 and 2002. They were also recruited at seminars given by PACE. Each MHP who made a validity criterion designation had to have continual contact either with the families of the tested children, and opportunities to observe each child with his or her parents or continual exchanges of information with a MHP who had such contact. Each MHP who made an independent validity criterion designation was instructed to use all of the test, documentary, data-based observation protocol and other clinical/life-history information available (except for PORT or BPS scores). This included numerous consultations with the MHP who had ongoing contact with each child and his or her family over the time-spans involved.
One hundred and twenty-seven children took the PORT at least two times, where the time span between Test One and Test Two was at least six months. Ninety-three children also took the BPS. The actual mean spread of months between Test One and Two turned out to be eight months, with none less than six months. One group consisted of children who came from intact families. There were 57 children in this group. Fifty-four of the 57 were in some form of psychotherapy. A second group was composed of two children whose parents were about to divorce, although the parents were still living together. One of the two children was in psychotherapy. A third group was made up of pre-divorce parents who were living separately. Five children came from this group. One of the five was in therapy. The fourth group consisted of parents who had already divorced. There were 63 children in this group, 16 of whom were in some form of psychotherapy. The relative proportions of the numbers regarding the children in the BPS group were essentially the same as for the PORT.
PORT and
BPS Normative Data, 1997-2002
PORT Normative Data
(1997-2002), n=127
Sex: 61 females; 66 males
Age: Mean age 7.87; SD=2.101
SES: Low-Middle to Upper-Middle
Race: 92 percent Caucasian; 8 percent all other
BPS Normative Data
(1997-2002), n=93
Sex: 47 females; 46 males
Age: Mean age 7.88; SD=1.473
SES: Low-Middle to Upper-Middle
Race: 92 percent Caucasian; 8 percent all other
Our paper addresses whether PORT concurrent validity data are similar to future validity. They are.
The paper addresses the stability of test-retest data on the PORT and BPS over an eight month interval. They are quite stable. (Since the paper on which this descriptive summary is based has been submitted for publication, we cannot present the actual tables. Test-retest stability for both tests over an eight-month interval is 97 percent. Instability increases sharply as a TDS and IDS approach zero and one on the PORT, and zero, one, two or three on the BPS.)
One could probably offer a good argument for any (or all) of these positions. We have presented information on the importance of including in the decision process, along with the others, data from a single-participant reference standard, our attempt to measure specific parental value for a particular child in different family systems.