In the mid-1960s, Hall and Van de Castle carried out a major sleep laboratory study involving 15 participants and 560 planned awakenings in order to understand the many potential problems that might constrain the usefulness of dream reports collected in the laboratory. How many nights does it take for participants to adapt to the laboratory setting? Does more than one awakening per night change dream content? Does dream content change in any way throughout the night? Even more importantly for purposes of this paper, the study also tried to resolve another important question, the possibility of major differences between dreams collected at home and in the laboratory from the same person.
The results of the investigation were privately published as a small monograph that was only sent to the handful of members in the Association for the Psychophysiological Study of Sleep, the organization for sleep and dream researchers at the time (Hall, 1966). Most of the results therefore received very little subsequent attention. Unfortunately, all that most investigators know about the study is that dreams collected from everyday recall at home supposedly were more "dramatic" than those collected in the laboratory. But were the home and laboratory dreams in this study really that different? Or did the statistical convention that dominates psychology, the emphasis on statistical "significance," lead everyone astray?
To answer these questions, this paper presents three new analyses of Hall and Van de Castle's original data by examining the magnitude of the differences -- the "effect size" -- when their codings of dreams collected at home and in the laboratory from the same participants are compared. The new findings on effect sizes presented here reinforce the old adage that some differences don't make much of a difference even if they are statistically significant, thereby lending support to those who argue that significance testing has been a "disaster" for psychology and should be abandoned (Hunter, 1997; Schmidt, 1996).
The serendipitous discovery of the REM/NREM sleep cycle and its many physiological correlates triggered a new era of sleep and dream research (Aserinsky & Kleitman, 1953; Dement, 1955; Dement & Kleitman, 1957a, 1957b). By the early 1960s it was clear that the sleep laboratory was an excellent place to learn more about the process of dreaming, but it was not certain that the laboratory was a good place to study dream content. For one thing, the adjustment to a new sleep setting might have a larger effect on dream content than it did on the physiological correlates of the sleep cycle or the process of dreaming. For another, the imposing nature of the laboratory setting -- including the large EEG machine, the attachment of electrodes to the participant's head, and the numerous awakenings throughout the night -- might have an inhibitory effect on dream content. Moreover, there might be differences in dream content from early in the night to late in the sleep period, which would have important implications for the representative sampling of dream content.
In fact, some findings from a small study of 12 male participants by Domhoff and Kamiya (1964a) comparing 120 home and 120 laboratory dream reports were claimed to be evidence for such an inhibitory effect. First, laboratory reports contained fewer aggressive actions and less sexuality. Second, the fact that the laboratory setting appeared in as many as 20% to 30% of laboratory dreams in that study as well as two other studies also was interpreted as evidence for a possible inhibitory effect (Dement, Kahn, & Roffwarg, 1965; Domhoff & Kamiya, 1964b; Whitman, Pierce, Maas, and Baldridge, 1961).
Such were the problems and issues that Hall set out to resolve when he received a large National Institute of Mental Health grant and hired Van de Castle as the project director. The study was conducted in a large house in a quiet residential neighborhood in Miami where participants could have their own sleep quarters for a month and report dreams in the least threatening atmosphere possible, thereby minimizing any inhibitory effects from the usual laboratory setting in a science or medical building.
To gain familiarity with the equipment and a subjective sense of what it was like to be a subject in the study, Hall and Van de Castle began with a study of the sleep and dreams of the four psychologists and graduate students working on the project, including themselves. They then turned to a more systematic study of the dreams of 11 young male participants between the ages of 19 and 25.
Since the aim of the study was to determine the conditions that led to the most representative sample of dream reports possible in a laboratory setting, seven adjustment nights were provided for the young male participants before the formal collection of reports began. Time to fall asleep was noted. In a carefully balanced design, only one awakening occurred on some nights, multiple awakenings on others. Dreams were collected by means of tape-recorded reports from the first four REM periods of the night under both the single-awakening and multiple-awakening schedules. Participants were asked after each awakening to estimate how long they had been dreaming. They also were asked to rate the clarity of their recall, the vividness of the dream, the emotional intensity of the dream, and whether the dream took place in the past or present. Unplanned "spontaneous" awakenings were noted and any dream reports from them were transcribed for comparison with dream reports from scheduled awakenings. Finally, participants wrote down any dreams they remembered at home for a two-week period. Some wrote their dream reports before their stay in the laboratory, some before and during their stay, and some during and after their stay. The goal was to have at least 15 dreams written down at home by each person, but one young adult participant wrote down only 11 and another didn't write down any.
The results from the study were generally reassuring and useful for investigators who want to make studies of dream content in a laboratory setting. First, participants had very little difficulty adjusting to the laboratory situation. Beginning with the first night the EEG machine was turned on, which was the third adjustment night, it took them no longer to fall asleep than it did on later nights (Hall, 1966: 38). Then, too, the small percentage of dream reports including allusions to the experimental situation, ranging from 7.2% to 13.5% with the young adult males, did not vary from the fourth adjustment night, when they were awakened to report a dream for the first time, to the end of their laboratory visits (Hall, 1966: 32); this range is much lower and narrower than the 20% to 30% reported in previous studies (Dement, Kahn, & Roffwarg, 1965; Domhoff & Kamiya, 1964b; Whitman, Pierce, Maas, & Baldridge, 1961). Contrary to expectations based on a study by Dement and Kleitman (1957b), which reported that five participants could correctly distinguish between awakenings after five or 15 minutes of REM dreaming, there was no overall correlation between the amount of REM time before an awakening and participants' estimates of how long they had been dreaming (Hall, 1966: 10-11, 38-39). The idea that the amount of preceding REM time correlates highly with the subjective sense of how long a dream has been going on is repeated as a solid finding in many articles and textbooks, but it is in fact a tentative finding with a small sample size that Hall and Van de Castle could not replicate.
As far as dream content is concerned, the most important result was the general similarity of the dream reports from single and multiple awakenings, and from all REM periods of the night on the several Hall/Van de Castle categories that were used. Moreover, the results for the rating scales responded to by the dreamer at the time of awakening were similar in showing no differences, except that participants reported better recall and greater clarity for each successive awakening on nights when there were multiple awakenings, which could be due to a practice effect. The finding by Verdone (1965) with five participants who reported more references to the past in later REM periods was not replicated (Hall, 1966: 27).
Moreover, the 57 dream reports from 11 participants who had spontaneous awakenings did not differ from those collected from their single awakenings, the only laboratory dream reports with which they were compared; nor did the 37 dream reports collected from 12 participants during awakenings on adjustment nights differ from those collected through single awakenings (Hall, 1966: 25-26). Generally speaking, then, these laboratory results suggest that only a short adjustment period is needed if people are sleeping in the laboratory on a regular basis, and that it is possible to collect a representative sample of dream life at any hour of the sleep period. These findings make it possible to rely on awakenings late in the sleep period when REM periods are longer and people are more easily awakened, and more cooperative once awakened.
Of all the comparisons made in the study, the only ones showing several statistically significant differences were those concerning reports written down at home compared to those from the laboratory. These results became controversial in the late 1960s and created what is in retrospect unnecessary friction.
Differences Between Laboratory and Home Dream Reports
Perhaps due to the pervasive influence of Freud's theory of dreams in the 1950s and 1960s, many investigators expected the dreams collected in the sleep laboratory to be highly dramatic and riven with impulse, so there was surprise when there seemed to be a blandness and mundanity to the dream reports they were collecting (e.g., Snyder, 1970). The suspicion therefore arose that some sort of "inhibition" or "defense" must be working to produce bland dreams in the laboratory, as seen in the interpretation that Domhoff and Kamiya (1964a, 1964b) gave to their results. A systematic comparison of home and laboratory collected dream reports from the same participants was therefore a very important component of the Hall and Van de Castle investigation.
Hall and Van de Castle's findings supported impressionistic observations and the smaller study by Domhoff and Kamiya (1964a) in reporting that dreams written down at home differed from those collected in the laboratory in a number of ways on 26 comparisons for each of 12 individuals using the nonparametric Wilcoxon matched pairs, signed-rank test (Siegel, 1956: 75-83). Most of the statistically significant differences concerned aggressions and misfortunes. At the same time, home and laboratory dreams had few differences in types of characters and no differences in the percentage of dreams with at least one "unusual" (bizarre) element. The figure on unusual elements was a mere 10% for both samples, which does not support a finding of more unusual elements in laboratory dream reports in the study by Domhoff and Kamiya (1964a). A later study by Hunt, Ogilvie, Belicki, and Atalick (1982) also reported no differences in bizarreness between home and laboratory dream reports.
The nature of the differences between dream reports under the two conditions is summarized by findings with what Hall termed the "dramatic intensity" index, which he calculated by adding together all aggressions, friendly interactions, sexual activities, successes, failures, good fortunes, and misfortunes and then dividing the total by the average words per dream for each person. This index showed consistent differences between reports written at home and laboratory dreams collected through either single or multiple awakenings for both early and late REM periods (Hall, 1966: 19-21, 23-25).
Given these apparent differences, Hall turned his attention to two likely causes, inhibition in the laboratory even under the best of conditions, and selective recall for more dramatic dreams at home. His discussion was informed by additional findings with dreams from one of the psychologists working on the project whose home dreams had been collected under two different conditions, normal everyday recall of dreams that were written down, and dreams spoken into a tape recorder after awakenings by an alarm clock set randomly each night by another person. When these two types of home dream reports turned out to be similar to each other, and different from the same person's dream reports in the laboratory, it tipped the scale in Hall's mind in favor of the laboratory inhibition hypothesis as a more important cause for the differences than selective recall (Hall, 1966: 46).
Unfortunately, Domhoff (1969) amplified Hall's conclusion by claiming that home dreams are "better" if the goal is to learn more about personal concerns and personality through the analysis of dream content. As might be expected, this conclusion did not sit well with those who wanted to understand both the cognitive process of dreaming and the general nature of dream content because it minimized the importance of laboratory studies through the use of the term "better." Since the notion of "better" rested primarily on the claim that the laboratory setting was inevitably inhibitory, it was undercut when Weisz and Foulkes (1970) and Foulkes (1979) provided evidence that any differences were most likely due to selective recall at home. It also was undercut by Zepelin (1972) when he found no differences between 55 home and 55 laboratory dream reports from 12 men between the ages of 27 and 64, and by Heynick and de Jong (1985) when they concluded that home reports from telephone awakenings were more like dreams collected in the laboratory than those written down at home.
But just how large are the statistically significant differences that Hall found? Are they big enough to make any difference in understanding either dreaming as a cognitive process or the emotional preoccupations of those who provide dreams? That question was not an issue in the methodological climate of the 1960s, but the more recent emphasis on "effect sizes" by some statistical experts within psychology makes it one well worth raising, especially because the answer can now be found very easily by entering the original codings into the DreamSAT spreadsheet for Hall/Van de Castle content studies (Schneider & Domhoff, 1999).
Effect Sizes When Home and Laboratory Reports Are Compared
For reasons that are explained by Cohen (1977: Chap. 6) and summarized by Domhoff (1996: Appendix D), Cohen's h statistic is the most useful measure of effect sizes with the percentage data that is used in the Hall and Van de Castle system due to the system's nominal level of measurement and the need to control for the differing length of dream reports. The h statistic employs an arcsine transformation of the percentages to correct for the fact that the parameters of a distribution of percentages cannot be known, which makes the determination of a standard deviation impossible. The transformations necessary to determine h from percentages are easily accomplished through tables in Domhoff (1996: Appendix D) and are built into the DreamSAT, making any computations unnecessary.
According to Cohen (1977: 184-185), a rough rule of thumb would start with the assumption that h=.20 is a "small" effect size corresponding to a correlation coefficient of .10; that h=.50 is a "medium" effect size equivalent to an r of .25; and that h=.80 is a "large" effect size implying an r of .37-.39. However, he also stresses that judgments about the relative importance of effect sizes must be determined by experience within each area of investigation. Based on work to date applying the h statistic to dream content, it seems likely that effect sizes between zero and .20 should be considered "small," those between .21 and .40 "medium," and those over .40 "large" (Domhoff, 1996, 1999).
The first reanalysis of the Hall and Van de Castle data using effect sizes is based on eight young adult males who provided at least 15 home dreams and at least 34 laboratory dreams. When there were more than 15 home dream reports, the first 15 were used. When there were more than 34 laboratory dream reports, reports from adjustment nights and spontaneous awakenings were eliminated first, and then an equal number of single-awakening and multiple-awakening reports were removed from the sample if more reports had to be discarded. The result was a group sample of 120 dream reports written down at home and 272 dream reports transcribed from tape-recorded reports in the laboratory.
The findings from this reanalysis are presented in Table 1, which includes statistical significance levels and effect sizes. As can be seen, the effect sizes are generally small or medium by the standard of previous Hall/Van de Castle content studies even for the statistically significant differences, except in the case of the physical aggression percent, where the h of .43 is large. It is also noteworthy that three of the four statistically significant differences involve aggression. In addition to the higher physical aggression percent, there is also a higher rate of aggressions per character (A/C index) and a higher percentage of dream reports with at least one aggression. Since the number of dreams involved in this analysis is very large, especially in comparison to most published dream studies, it is likely that these overall findings on effect size are accurate.
As can be seen for the seven "at least one" categories at the bottom of Table 1, laboratory dream reports are slightly lower on four of the seven categories and much lower on aggression. Only dream reports with at least one friendliness or at least one failure are more frequent in the laboratory sample. This finding in effect reveals the magnitude of differences on Hall's dramatic intensity index. Although he divided the total number of codeable elements in the seven categories by the average number of words per dream for each participant, it is sufficient to divide by the number of dream reports, as is done here, because there were no differences in length between laboratory and home reports.
Inspection of the seven "at least one" categories suggests that the difference Hall found on the dramatic intensity index is created by the large difference in aggression in conjunction with several small differences that are of relatively minor magnitude when they are looked at individually. More exactly, when aggression is put aside there are .36 h "points" to one side of the norms vs. .16 h points on the other. This means that only .20 h points of difference are weighted toward home dream reports after six of the seven categories have been considered, whereas aggression is contributing .28 h points to create an overall difference on the dramatic intensity index of .48.
Moreover, there is variation from participant to participant in the number of categories in which home dream reports have higher scores. This point is demonstrated in Table 2, which uses all of the home and laboratory dreams from each of the eight participants to determine whether the home or laboratory reports are "higher," "lower," or the "same" for each participant for each of 11 selected categories, with "same" defined as within five percentage points of each other. A summary for each category is provided on the right-hand side of the table and a summary for each participant at the bottom of the table. These summaries show, first, that there is variation from person to person in the degree to which home dreams contain more codeable elements in the various categories than do laboratory dreams. Second, there is also variation from category to category; physical aggression percent, for example, is almost always higher in home dream reports. Conversely, laboratory dream reports generally have the same amount of or more friendliness for most participants.
The second reanalysis concerns the dream reports of the older person who helped tip the scale for Hall in favor of the inhibitory effect of the laboratory in explaining the differences he found. This reanalysis concerns 35 home dream reports and 66 reports collected in the laboratory. It does not include the tape-recorded dreams from alarm-clock awakenings because the codings for them could not be found in the files. Even without the tape-recorded home dream reports, which were reported by Hall (1966: 46) to be similar to other reports of dreams collected at home, the differences for this participant are generally smaller than for the eight young adult males (see Table 3).
There is only one statistically significant difference in coding indicators for this person. Furthermore, every index relating to social interactions shows a very small effect size; it is also worth noting that -- although the difference was not statistically significant -- a slightly higher percentage of the dreams collected in the laboratory had at least one aggression than those written down at home, contrary to the usual expectation. The largest difference -- and the only statistically significant one -- concerns the far greater number of home dream reports with at least one misfortune, with an h of .65. If there were a significant difference between home and laboratory reports on the dramatic intensity index, this large difference on misfortune might suffice in itself to tip the index in favor of dreams written down at home.
The third and final reanalysis concerns one of the other adult males whose dream reports merit a closer look because he had tape-recorded 28 home dreams before he contributed 38 tape-recorded reports from his stay in the laboratory. With the method of reporting held constant across home and laboratory, most of the effect sizes are in the low to medium range when the differences are statistically significant, which is rare (see Table 4). This finding supports similar findings by Weisz and Foulkes (1970) and Foulkes (1979) on the importance of holding the method of reporting constant in comparing home and laboratory dream reports.
Another interesting finding for this participant is that there is a higher percentage of dream reports in the laboratory sample with at least one aggression. This finding contradicts the pattern for most of the eight young male participants. However, the difference on physical aggression percent is similar to that for most other participants in being higher in dream reports collected at home, and the h of .36 is at the upper end of the medium range.
It is noteworthy that the magnitude of the effect sizes in the two individual cases is similar to what was found in the group comparison reported in Table 1. This similarity is further evidence that the effect sizes reported in this paper are reliable findings.
The small effect sizes for most categories used in the three reanalyses of the original codings from the Hall and Van de Castle study in 1963-1964 make the concern about any statistically significant differences between home and laboratory collected dreams for Hall/Van de Castle coding categories seem like an unnecessary expenditure of energy that could have been deployed more productively on new empirical studies. Controversy might have been avoided if effect sizes had been part of the statistical armamentarium in the 1960s and 1970s. In fact, this argument seems to be a good example of what Hunter (1997: 3) means when he says that significance testing has been a "disaster" for psychology (cf. Cohen, 1990, 1994; Scarr, 1997; Schmidt, 1996).
The one fairly consistent difference between the two types of samples, in terms of both statistical significance and effect sizes larger than .30, concerns one or another indicator of aggression, especially the physical aggression percent. The fact that the most consistent differences between dreams written down at home and laboratory dream reports relate to aggression is in keeping with Weisz and Foulkes's (1970) finding that only the aggression category distinguished home and laboratory reports in their carefully controlled study. More generally, the differences on aggression fit with findings on the variability of aggression in conjunction with several other factors. First, the variability in aggression between the early teens and young adulthood is the largest difference between the two age groups (Avila-White, Schneider, & Domhoff, 1999). Second, there may be a decline in aggression in old age (Hall & Domhoff, 1963; Zepelin, 1980-1981, 1981), but some results from longitudinal studies make this cross-sectional finding less certain (Domhoff, 1996: Chapt. 7). Third, there are large variations from culture to culture in aggression (Domhoff, 1996: Chapt. 6). Fourth, there are large individual and gender differences on some measures of aggression in dreams (Domhoff, 1996; Hall & Domhoff, 1963; Paolino, 1964). When the time comes for developing a new theory of dream meaning based on a wide range of systematic studies of dream content, these variations in aggression may prove useful.
However, the differences between home and laboratory dream reports on some aggression indicators are not so important that they constitute an argument for or against using one of the two types of samples. Instead, this finding can be used simply to say that less aggression should be expected in laboratory studies and more aggression in studies using dreams collected from everyday recall. More generally, then, there is no reason to believe that home dreams are "better" than laboratory dreams for content studies using the Hall/Van de Castle coding system, or that home dreams are so different from laboratory dream reports that they cannot be used in systematic studies of dream content. Both types of samples are useful for research purposes. The main problem in most dream content studies is not the source of the dream reports, but the small sample sizes (Domhoff, 1996).
The results of this study very likely generalize to all aspects of dream content, but it nonetheless needs to be emphasized in closing this discussion that the findings refer to categories of the Hall/Van de Castle coding system. There may be aspects of dream content not encompassed by their system that do differ between home and laboratory dreams. Although the lack of differences reported by Foulkes (1979) and Hunt, Ogilvie, Belicki, Belicki, and Atalick (1982) using different coding systems militate against such a conclusion, it does remain a possibility. Moreover, the present study does not use the Hall/Van de Castle coding categories for emotions, so it may be that there are differences in emotions in home and laboratory dreams.
The emphasis on statistically significant differences without regard to effect sizes slowed progress in the study of dream content by creating unnecessary polarities and focusing energy on methodological arguments. The introduction of effect sizes into the study of dream content makes it possible to suggest that the controversy over home and laboratory collected dream reports never should have happened. The emphasis in dream content studies henceforth should be on effect sizes and large samples. Then future dream researchers could focus on testing new ideas using dream reports collected either at home or in the sleep laboratory.
Addendum: A Hall/Van de Castle Comparison of REM and NREM Dream Reports
When Hall and Van de Castle did their large descriptive study of dream content in the laboratory, they did not collect any dream reports from NREM awakenings because NREM dreams were considered minor, and perhaps even memories from REM dreaming, when Hall wrote his grant proposal in 1961. By the mid-1960s, however, there was solid evidence for the existence of NREM dreaming (Foulkes, 1962; Goodenough, Lewis, Shapiro, Jaret, & Sleser, 1965; Kamiya, 1961; Rechtschaffen, Verdone, & Wheaton, 1963; Rechtschaffen & Foulkes, 1964). Following the publication of these studies, the issue became the degree to which REM reports were different from NREM reports in their content. Based on ratings along five-point scales by judges and by the subjects themselves, NREM reports were found to more often contain "thinking" rather than perceptual and physical activities, and to be more realistic, abstract, and focused on daily events than REM reports.
Since none of the REM/NREM comparisons used the Hall/Van de Castle coding system, Hall did his own analysis based on reports generously shared with him by Foulkes and Rechtschaffen (1964). What he found is consistent with later analyses by Antrobus (1983) and Foulkes and Schmidt (1983): most of the differences between reports from the two types of sleep disappear when there are controls for length. For example, 13% of REM reports had at least one emotion vs. 5% of NREM reports, and 15% of REM reports had a misfortune vs. 9% of NREM reports, but such differences disappeared when there was a control for length. Similarly, in the domain of social interactions, 18% of the REM reports had at least one aggression vs. 7% for NREM reports, and 17% of the REM reports had at least one friendliness vs. 9% for NREM reports, but both differences disappeared when a correction for length was made by dividing the total number of aggressions or friendly interactions by the total number of characters, which is the best type of correction in the case of social interactions (Hall, 1969a, 1969b).
One of the main findings in other studies, the greater amount of "thinking" in NREM reports, is reinforced by Hall's findings. Although 87% of REM reports and 89% of NREM reports had at least one activity in them, the percentage of all activities that were cognitive was 20% in NREM reports, but only 11% in REM reports. Conversely, there were more visual and verbal activities in REM than in NREM reports, 12% vs. 6% for visual, 37% vs. 22% for verbal.
The new and most important finding that arose from Hall's unpublished content analysis showed that the NREM reports from late in the sleep period, after there have been three REM periods, were different from early NREM reports and more like REM reports. For example, whereas 46% of the early NREM reports had no codeable elements, only 30% of the later ones had none. Conversely, the number of elements in one of the seven categories making up what Hall named the "dramatic intensity" index -- aggression, friendliness, sexuality, misfortune, good fortune, success, and failure -- went from 7% in early NREM reports to 14% later, which was greater than the 10% for such elements in the third REM period of the night; this finding is based on a correction for the length of the reports.
When these findings on NREM reports from late in the sleep period are combined with those from Hall and Van de Castle's laboratory study, it seems likely that a representative sample of dream life can be collected by awakening subjects at any time after the third REM period of the night, when they have been asleep for four or five hours and are primarily in either REM or Stage II sleep. Studies showing high dream recall from NREM awakenings at the end of the sleep period by Weisz and Foulkes (1970), Herman, Ellman, and Roffwarg (1978), and Foulkes (1979) support this conclusion, which is a highly useful one because it makes the collection of dream reports in the laboratory more efficient and practical.
Our thanks to David Foulkes, Richard Zweigenhaft, and the external reviewers for their helpful comments on this paper.
Go back to the Dream Library index.