Comments on IL Simultaneous v.
Sequential Lineup Field Test
Ebbe B. Ebbesen
May 2, 2006
The following are some comments, additional data analyses, and theoretical results concerning the IL Pilot Project on Sequential Double Blind Eyewitness Procedures. These comments are in addition to the data analyses and conclusions reached in the report and appendices to those reports that are already in the public domain (see http://www.chicagopolice.org/IL%20Pilot%20on%20Eyewitness%20ID.pdf for the main report and http://www.chicagopolice.org/Apndx%20to%20IL%20Pilot%20on%20Eyewitness%20ID.pdf for the appendix to the report. The latter includes the main data analyses that our laboratory conducted.)
Is the experiment flawed?
Some have suggested (e.g., Wells) that the experimental design used in the pilot project is flawed because it compared eyewitness performance from simultaneous lineups without blind administrators with performance from sequential lineups with blind administrators. There are a number of reasons why this criticism is irrelevant and should not undermine the conclusions from the study.
i. This is because those who are promoting the new approach often claim that every aspect of their new package is critical.
1. In the present case, advocates of the sequential procedure were suggesting the an important component of the improvements that the new procedure would bring come from the fact that investigators running the lineups would be not know who the suspect was.
2. A corollary of this suggestion is that the standard practice suffers both because it presents items simultaneously (and therefore allows relative decisions rather than absolute ones) and because lineup administrators can influence (mostly by displaying unconscious behavioral cues) witnesses to choose the suspect.
3. Notice that the recommendation for the new procedure assumes that BOTH blinding of the lineup administrator AND the sequential presentation are necessary to see significant improvement in eyewitness performance.
ii. As a result, we can’t make the new package simpler so that the old and new differ in only one respect (e.g., sequential without blind to compare to simultaneous without blind). Had we done this, critics would have complained the package was incomplete and didn’t provide a fair test of the suggested reform.
iii. Similarly, we cannot change the “standard” method by adding features that are claimed to the be an essential part of the new process so that the experiment examines just the effect of one aspect of the new procedure, e.g., blind simultaneous with blind sequential. Were we to do this, we would not know how well the old compares to the new because the old, standard procedure was never assessed.
i. Notice that a critic of the results could always say, but the reason the results turned out the way they did is because “a” was included.
ii. That is, the change from old to new was due to “a” and not to a+b+c. We wouldn’t know after the results were obtained.
iii. What we do know is that the package of a+b+c produces a different or the same result as the old procedure.
i. It is worth noting that the failure of the field results to confirm a generalization based on laboratory research is consistent with a position that I and few colleagues (e.g., Konecni, Yuille, Egeth) have making since the mid 1980s.
ii. In Ebbesen and Konecni, 1989 we argued that many of the conclusions reached by those testifying in court about eyewitness identification could not be fairly generalized because the research methods and results were incomplete. In particular, they were typically conducted in laboratory settings in which the motivations of the subjects, the viewing conditions, and the selection of subjects, and the methods used to measure memory might not simulate the conditions typical of actual crimes. Appropriate field trial were not done and as a result, key processes (e.g., the fact that witnesses, prosecutors, and investigators select witnesses and cases in the legal system) were not being duplicated in the laboratory.
i. The blinding procedure itself might have been different for simultaneous and sequential. It is likely that a different number of investigators are required for the two procedures and as a result, the protocols would necessarily have been different.
ii. The sequential procedure could have been varied in a host of different ways and a critic could have “explained away” results by saying that we should have run the sequential procedure differently than the way we chose to do it
1. Critics could argue that we should have given different instructions to witnesses and/or to the investigators. They could have claimed that our instructions were not what they should have been (despite the fact that there are no specifics regarding the precise wording of instructions that should be used, e.g., whether the witness will be allowed to see all items even after they select one, whether they will be allowed to look at all of them together after looking at each one alone, whether they should withhold their decision until they have seen all of them as in the procedure that was used in the Steblay study in MN – a procedure that has never been studied in laboratory simulation research), and so on.
2. Critics could argue that we ran our sequential protocol incorrectly because we did not tell witnesses that they would be looking at a large number of alternatives rather than just the number in lineup. This was originally thought to be an essential component of the sequential procedure because it would prevent witnesses from thinking that the last item in the lineup was their last chance to pick someone.
3. Critics could argue that the results were due to the method used to select fillers.
4. Etc.
i. In research on feedback effects, witnesses who correctly ID the suspect are never provided with accurate feedback. Only witnesses who have viewed blank lineups are studied.
ii. In addition, the videotapes of simulated crimes shown to college students in these studies generally produce no or very weak memory for the culprit (witnesses view a grainy videotape of a culprit in which they barely see his face for a few seconds while he is running around). This research never examines results from witnesses who may have gotten a good look at the suspect and formed a strong memory of his appearance.
i. When asked why nine during discussion after a presentation to the committee, Wells’s response was that nine is a good number. It is not too large and not too small. He said that the law of large numbers, for which there is considerable support both in psychology and economics, suggests that differences between large numbers are psychologically smaller than differences between smaller numbers. Therefore, 3-4 would be too little and 15 would be too many. So nine is about right.
Administrator influence as an explanation for the findings
One alternative explanation for the findings, favored by critics of the IL experiment, is that the failure to use blind administrators in the simultaneous lineup allowed the administrators to “influence” which item the witnesses chose from the lineups. Such influence would cause the witnesses to choose more suspects and fewer foils. The administrator would (mostly unconsciously) “steer” the witness to what the administrators believed was the correct choice, namely, the suspect. This position assumes that administrators “leak” behavioral cues (or if the behavior is conscious, that the administrator simply decides to do or say something informative) that are diagnostic of their knowledge about who is and is not the suspect. That is, the cues contain information that point to the correct suspect. This explanation also assumes that the witnesses (consciously or unconsciously) attend to these cues in a way that allows them to detect the information about who is and is not the suspect. This explanation assumes that once they notice the relevant cues, the witnesses will tend to alter their decisions in the direction of the suspect and away from fillers.
A corollary of this explanation assumes that the extra information (behavioral cues provided by the administrator) about who the suspect is will also supply some type of consensual validation for the witnesses’ choices. If witnesses pick the same person that they believe the administrator “knows” is the suspect, the witnesses should feel more confident in their decisions. After all, they are agreeing with a person who knows who committed the crime. If, on the other hand, witnesses choose a filler, the fact that they are disagreeing with someone who knows who the guilty person is should lower their confidence.
My position is not that critics are wrong in his explanation, but that they have no idea whether it is right. In addition, the critics have presented no evidence to support this explanation over other equally, if not more reasonable, explanations. The following are tests of the validity of this explanation using results from the IL study. These tests are designed to determine whether implications of the “administrator influence hypothesis” are consistent with results from the IL experiment.
If the “administrator influence” explanation for the higher filler choice rate in sequential than in simultaneous lineups is correct, one might expect investigator influence to increase as witnesses’ memories of the culprit become weaker. The weaker the memory of the culprit, the more the witness might look to other sources of information about who to pick. Alternatively, the stronger the memory, the less the witness might be swayed by someone else. While we don’t have a direct measure of memory strength, we can infer that various conditions of exposure and testing might increase or decrease the witnesses’ memory of the culprit. For example, we could predict that as the duration of exposure to the culprit increases, memory for the culprit also increases (in general). Alternatively, memory for culprits who are the same race as the witness might be stronger than memory for culprits who are of a different race than the witness. While there was not enough information in the files to assess duration of exposure, we were able to code the race of the suspect and the race of the witness for almost all of the ID attempts. If it is true that administrators influence witnesses by leaking cues, we might expect this influence to be strongest when the witnesses are less sure about which person is actually the suspect, namely, when other-race identifications are being made (and they “all look alike”). Table 1 shows the rate of suspect choices and foil (filler) choices (one can compute the no choice percentage by subtracting these from 100%) as a function of the type of lineup (photo v. physical or live), lineup procedure (simultaneous v. sequential) and whether the witness and suspect (and therefore fillers) were of the same or a different race.
If this form of the administrator hypothesis is correct, we should expect to see more suspect and fewer filler choices for other-race IDs than for same race IDs in simultaneous but not in sequential lineups (because the administrators were blind in the latter procedure). The results in Table 1 are inconsistent with this prediction. First, we can see that the suspect choice rates were much lower for other race IDs than same race IDs in all but the simultaneous, physical lineups. Second, foil choice rates were not higher when the witness and suspect were in different racial categories.[1] Second, the difference between same and other race suspect choice rates should be smaller for simultaneous than sequential lineups (because more witnesses in the simultaneous lineup who would otherwise not choose the other-race suspect were induced to do so). For Photo lineups the same-other race difference was (53.7-28 =) 25.7% for sequential lineups and it was (64-26.5 =) 37.5% for simultaneous lineups. Thus, the difference was actually larger, not smaller, in simultaneous lineups. For physical lineups, on the other hand, the sequential lineups same-other race difference was (50-21.2 =) 28.8% and the simultaneous difference was (80.6-83.3 =) -3.3%. Thus, the data for live lineups appears inconsistent with the results from photo lineups. The opposite prediction should apply to foil choices. That is, the rate of foil choices should be that much lower in simultaneous lineups when other-race IDs are being made than when same-race IDs are being made. For photo lineups the same-other foil choice rate difference was (9.8-8 =) 1.8% for sequential lineups and it was (0-2.9 =) -2.9% for simultaneous lineups. But for live lineups, the difference was 13.2% for sequential lineups and 0% for simultaneous lineups. Again, the results seem inconsistent. At this point, we do not know why these inconsistencies exist. Nevertheless, it is clear that the pattern predicted from the administrator leakage/influence hypothesis does not appear to be supported.
Table 1. Frequency of Suspect and Filler Choices as a Function of Lineup Type and Lineup Procedure and Racial Similarity of the Witness and Suspect.
|
Lineup
Type |
Lineup Procedure |
Racial
Similaritya |
%
Suspect Choices |
% Foil
Choices |
|
Photo |
Simultaneous |
Other |
26.5 |
2.9 |
|
Photo |
Simultaneous |
Same |
64.0 |
0.0 |
|
Photo |
Sequential |
Other |
28.0 |
8.0 |
|
Photo |
Sequential |
Same |
53.7 |
9.8 |
|
Physical |
Simultaneous |
Other |
83.3 |
0.0 |
|
Physical |
Simultaneous |
Same |
80.6 |
0.0 |
|
Physical |
Sequential |
Other |
21.2 |
0.0 |
|
Physical |
Sequential |
Same |
50.0 |
13.2 |
a. This refers to whether the witness and the suspect
(and therefore all of the fillers) were in the same or different racial
categories.
If the administrator influence explanation is correct, one might expect such influence to be less when the surrounding context makes such influence more difficult or less likely because of the presence of others, e.g., with a photo lineup rather than with a physical lineup. It seems reasonable to assume that multiple investigators and prosecutors are more likely to be present at live than photo lineups. As a result, investigators might not be in as good physical positions for their behavioral cues to be monitored by witnesses (assuming that they give off such cues on a regular basis). In addition, the layout of the rooms used in live lineups will generally not place the administrator directly in front of the witnesses as seems more likely in the case of photo lineups. Results are inconsistent with this view, however. Table 2 presents the suspect and foil choice rates as a function of lineup type and lineup procedure. Examination of the results in Table 2 shows that if anything, suspect choice rates were higher in physical lineups than in photo lineups. This means that no choice rates were highest when they should have been lowest (simultaneous photo lineups).
Table 2. Frequency of Suspect and Filler Choices as a Function of Lineup Type and Lineup Procedure.
|
Lineup Procedure |
Lineup Type |
%
Suspect Choices |
% Foil
Choices |
% No
Choices |
|
Simultaneous |
Photo |
52.6 |
1.3 |
46.1 |
|
Sequential |
Photo |
43.8 |
9.4 |
46.8 |
|
Simultaneous |
Physical |
81.8 |
0.0 |
18.2 |
|
Sequential |
Physical |
46.4 |
5.4 |
41.0 |
If, as critics suggest, investigators in the simultaneous procedure where “consciously or unconsciously” suggesting which of the alternatives to pick (because they were not blind), we might expect the witnesses to be much more confident in their choices in the simultaneous lineup procedure than in the sequential procedure. After all, in the simultaneous procedure, the witnesses choices would be “reinforced” either by the pre (and/or) post responses of the investigators who knew which alternative was the suspect. “Good, you picked our suspect” might be a response provided by those who were not blind or suggestions might be made prior to the choice as to who the suspect was (“We all know it is number 3.”)
Table 4a shows the number of witnesses who expressed “high”, “moderate”, and “low” confidence broken down by lineup procedure and Table 4b show the same result as percentages within each lineup procedure.
Table 4a. Number of Witnesses
Viewing Simultaneous and Sequential Lineups Who Expressed High, Moderate, and
Low Confidence in Their Responsesa
|
Lineup Procedure |
Confidence |
Total |
||
|
High |
Moderate |
Low |
||
|
Simultaneous (Not Blind) |
101 |
18 |
10 |
129 |
|
Sequential (Blind) |
158 |
27 |
20 |
205 |
|
Total |
259 |
45 |
30 |
334 |
a These results are for the subset of witness ID attempts for which confidence estimates were available.
Table 4b. Percent of Witnesses
Viewing Simultaneous and Sequential Lineups Who Expressed High, Moderate, and
Low Confidence in Their Responses
|
Lineup
Procedure |
Confidence |
Total |
||
|
High |
Moderate |
Low |
||
|
Simultaneous
(Not Blind) |
78.29 |
13.95 |
7.75 |
129 |
|
Sequential
(Blind) |
77.07 |
13.17 |
9.76 |
205 |
|
Total |
259 |
45 |
30 |
334 |
The results in Tables 4a and b are inconsistent with the idea that investigators influenced the witnesses’ choices to an extent that made them feel more confident in those choices. We can see that the percentage of high confident witnesses was virtually identical for the two lineup procedures despite the fact that those who administered the simultaneous lineups knew who the suspect was.
We can make the above argument even stronger by noting that if the administrator was leaking cues to pick the suspect (and not the fillers) during the simultaneous lineups, only those witnesses that picked the suspect would have the consensual validation of the their choices. Those who picked the fillers would actually be disagreeing with the administrator’s influence attempt. This reasoning predicts that the witnesses viewing the simultaneous lineup who chose the suspect should be more confident in those choices than witnesses who chose the suspect from a sequential lineup. In contrast, those who chose the fillers from a simultaneous lineup should be less confident than those who chose fillers from a sequential lineup. We analyzed the percent of witnesses who expressed high confidence (see next two paragraphs) for those who chose the suspect and for those who chose fillers. For the simultaneous lineup, 69 out of 87 (or 79.3% of the) witnesses who chose the suspect did so with high confidence. For sequential lineups, 118 out of 140 (or 84.3% of the) witnesses who chose the suspect did so with high confidence. Thus, if anything, witnesses were more likely to be confident in their suspect choices in sequential/blind lineups than in simultaneous lineups.
When the filler choices were examined, 66.7% of the filler choices made to simultaneous lineups and 21.5% made to sequential lineups were made with high confidence. While the Ns are small, if anything, the trend is opposite to the investigator influence explanation for the results. Thus who chose a filler from a simultaneous lineup were more confident even though their choices should have disagreed with the influence attempts of the administrator (assuming they existed).
One might argue that this analysis
is flawed because the coding of confidence is invalid. Garbage in, garbage out.
To determine whether the results from the this study support this view, we
examined the extent to which self-reported confidence predicted which witnesses
would make mistakes. Although many psychologists who have testified for the
defense in criminal cases argue that witness confidence is a poor predictor of
witness accuracy, I have argued that this claim is false when the relationship
is correctly measured. If I am correct that there is some relationship between
confidence and accuracy among actual witnesses to real crimes and the
confidence coding that we did was valid, we might expect less confident
witnesses to be more likely to make mistakes.[2]
Table 9 shows the choice results for each procedure (for all
witness/suspect lineups) broken down into low (“I think that’s him, but I can’t
be positive.”, “"He looks like the guy, but I'm not positive.",
"#1 could have been the passenger.", “Only 45% sure.”), moderate
("Yes, that looks like the guy.", "Looks like him, he was husky
like that.", "80-90% sure"), and high ("That's him. I'm
certain.", “100% sure.”, "100% absolutely positive.”, "I'm
positive that's the one that shot
Table 5.
|
Confidence |
Number of Suspect Choices |
Number of Filler Choices |
Number of No Choices |
% Suspect |
% Filler |
|
|
Simultaneous |
NA |
177 |
2 |
109 |
61.46 |
0.69 |
|
Sequential |
NA |
29 |
10 |
76 |
25.22 |
8.70 |
|
Simultaneous |
High |
69 |
4 |
28 |
68.32 |
3.96 |
|
Simultaneous |
Moderate |
14 |
2 |
1 |
82.35 |
11.76 |
|
Simultaneous |
Low |
4 |
0 |
6 |
40.00 |
0.00 |
|
Sequential |
High |
118 |
3 |
37 |
74.68 |
1.90 |
|
Sequential |
Moderate |
19 |
7 |
1 |
70.37 |
25.93 |
|
Sequential |
Low |
3 |
4 |
13 |
15.00 |
20.00 |
The results in Table 5 show that (with the exception of the lowest confidence choices in the simultaneous procedure with an N = 4), the likelihood that a positive ID of someone would be a filler increased as the confidence that witnesses expressed in their choices decreased. Thus, filler choices accompanied by expressions of high confidence occurred in about 4% of the simultaneous and 2% of the sequential lineups. On the other hand, filler choices accompanied by expressions of less than high confidence (moderate plus low) occurred in about 7.4% of the simultaneous and about 23.4% of the sequential lineups.
We can analyze the data in Table 5 slightly differently. We can ask what percent of the positive identifications made by the witnesses were of suspects as opposed to the fillers at each confidence level. Table 6 shows these results. We can see that, in general, the higher the confidence level, the more witnesses tended to identify suspects rather than known innocents (fillers). These results tend to validate the confidence analyses presented earlier and strengthen the conclusion that administrators were not influencing witnesses more in simultaneous than sequential lineups.
Table 6. Percent of Positive IDs that were Filler Choices as a Function of Lineup Procedure and Category of the Witness’s Expressed Confidence in the ID
|
Procedure |
Confidence |
% of Positive IDs that were a (Known) Error |
|
Simultaneous (N = 178) |
NA |
1.12 |
|
Sequential (N = 39) |
NA |
25.64 |
|
Simultaneous (N = 73) |
High |
5.48 |
|
Simultaneous (N = 16) |
Moderate |
12.5 |
|
Simultaneous (N = 4) |
Low |
0.00 |
|
Sequential (N = 121) |
High |
2.48 |
|
Sequential (N = 26) |
Moderate |
26.92 |
|
Sequential (N = 7) |
Low |
57.14 |
The person making the identification could have been a victim of the criminal act or simply a witness to the action. We analyzed whether the status of witness had any effect of choice rates. Table 11 shows the choice rates for all lineups as a function of the status of the witness. We can see when all of the lineups for which the information was available are examined, there was no effect on choice rates of witness status.
Table 7. Number and Percent of Witnesses Choosing Suspects, Fillers, or No One as a Function of Witness Status: Witness to Crime or Victim of Crime
|
Staus |
# Suspects |
# Fillers |
# No Choice |
% Suspect |
% Filler |
|
NA |
6 |
0 |
14 |
30.00 |
0 |
|
Victim |
242 |
20 |
152 |
58.45 |
4.83 |
|
Witness |
186 |
12 |
105 |
61.39 |
3.96 |
We can also examine whether witness status played a different role in simultaneous than sequential lineups. Because the consequences of making a choice are different for the two types of witnesses, we might expect those who simply witnessed the crime to be less likely to “want” to convict someone, just anyone. This tendency might be something that the investigators can take advantage of when they know who the suspect is in the lineup. If so, we might expect victims to be less likely to pick foils and more likely to pick suspects, but only when presented with the simultaneous procedure in which the investigators knew who the suspect was.
The results in Table 12 are inconsistent with this view. As can be seen, victim and witness choice rates were identical for both lineup procedures. Stated differently, the effect of lineup procedure on choice rates was the same for witnesses and victims.
Table 8. Number and Percent of Witnesses Choosing Suspects, Fillers, or No One as a Function of Lineup Procedure and Witness Status: Witness to Crime or Victim of Crime
|
Procedure |
Staus |
# Suspects |
# Fillers |