The purpose of this paper is to explain the methods and statistics used in the seven empirical papers that follow in this special issue on new directions in the study of dream content. As these papers demonstrate, recent methodological and statistical advances have laid the foundation for new findings using the comprehensive coding system developed over the space of nearly 50 years by the late Calvin S. Hall and his collaborators (e.g., Hall, 1951, 1969a, 1969b, 1984; Hall & Lind, 1970; Hall & Van de Castle, 1966; Van de Castle, 1969).
The methodological and statistical innovations alluded to in the first paragraph include (1) new Hall/Van de Castle content indicators such as "friends percent" and "aggressor percent" that are revealing for a wide range of individuals and groups; (2) new methods of statistical analysis that take into account effect sizes; (3) an Excel 5 spreadsheet called DreamSAT that does computations for over 20 Hall/Van de Castle indicators and then produces tables and graphs, along with significance levels, confidence intervals, and effect sizes; (4) a new approach to collecting representative samples of dream reports in an efficient and inexpensive manner; and (5) information on necessary sample sizes for reliable results (Domhoff, 1996; Domhoff & Schneider, 1999; Schneider & Domhoff, 1999).
Since the main emphasis in this special issue is on demonstrating the usefulness of new methods through the presentation of original empirical findings, none of the seven empirical papers is concerned with hypothesis testing or theory building. However, the concluding paper attempts to demonstrate the theoretical implications of these and related descriptive empirical findings (Domhoff, 1999). It argues that several systematic findings from sleep laboratory studies and content analysis challenge Freudian, Jungian or activation-synthesis theory to one degree or another, but that all of the established findings are compatible with a cognitive orientation that stresses the similarities between waking and dreaming cognition (Foulkes, 1985, 1999; Lakoff, 1993).
The present paper begins with an overview of the main features of the Hall Van de Castle coding system and then compares it to alternative content analysis systems that use rating scales. It next turns to several methodological and statistical problems that call into question the usefulness of many past dream content studies. It then introduces the Most Recent Dream method for collecting representative dream samples and shows its usefulness by exploring the problems in studies using dreams from one-week or two-week dream diaries kept by volunteers. Finally, to demonstrate the strengths and weaknesses of various methodological and data gathering strategies, I will conclude with a critical discussion of the literature on gender similarities and differences in dream content.
The Hall/Van de Castle Coding System
The Hall/Van de Castle system is an application of the general methodological strategy called "content analysis." Content analysis is an attempt to use carefully defined categories and quantitative methods to extract meaning from a "text," whether it be a newspaper article, transcribed conversation, short story, or dream report. One of the earliest proponents of content analysis (Cartwright, 1953, p. 466) stated that the "fundamental objective" of this method is to convert the "symbolic behavior" of people into "scientific data," by which he meant (1) objective and reproducible; (2) susceptible to measurement and quantification; (3) significant for either pure or applied theory; and (4) generalizable. Hall (1969a, p. 175) defined content analysis as "the categorization of units of qualitative material in order to obtain frequencies which can be subjected to statistical operations and tests of significance."
The most difficult task in carrying out a content analysis is to develop categories that lead to reliable and valid findings. Unfortunately, there are no general rules for developing such categories. Nor has it been found that categories created for one type of text can be readily utilized with texts of another kind. For the most part, content categories have been developed through trial and error after full immersion in the texts to be analyzed. They usually go through several versions before they are ready for regular use. This certainly was the case with the Hall/Van de Castle system, which Hall first formulated in the 1940s and then revised with the help of Van de Castle during two strenuous years of work in the 1960s (Hall & Van de Castle, 1966).
There are two main issues in developing content categories. First, should they be "nominal" or "hierarchical" in their level of measurement? That is, should the categories be "discrete" ones that stand by themselves and have no implication of degrees of difference, such as "indoor" and "outdoor," or "female" and "male," where there is a simple tabulation of frequencies for each category? Or should the categories be points along a continuum, suggesting that there are degrees of difference along a dimension that can be ranked or weighed? "Vividness," "activity level," and "emotional intensity" are examples of hierarchical rating scales.
Second, should the categories be empirical in nature, that is, categories that seem to be natural groupings, based on our experience of the world without regard to any particular theory? Or should they be "theoretical," meaning that seemingly disparate elements in a dream report may be put into a category derived from a careful rendering of concepts and examples from a body of theory? "Characters," "social interactions," "settings and objects," and "emotions" are examples of empirical categories that seem to fit with our experience; these categories work as well with plays, for example, as they do with dreams (Hall & Van de Castle, 1966). Categories for the study of the "anima," "castration anxiety," and "ego synthesis" in dreams are examples of theoretical categories, derived from Jungian, Freudian, and Eriksonian theory respectively (Hall, 1969a, 1969b, Jones, 1962, 1969; Sheppard, 1969).
The nominal/hierarchical and empirical/theoretical dichotomies lead to the possibility of four different types of scales, and in fact all four types have been employed in dream research. However, some of the types have been more frequently employed than others. Generally speaking, most of the coding systems for the study of dreams have been hierarchical and empirical. A factor analysis of codings from several different empirical rating scales (Hauri, 1975) showed that they boiled down to five basic ratings: (1) degree of vividness and distortion; (2) degree of hostility and anxiety; (3) degree of initiating and striving; (4) amount of sexuality; and (5) amount of activity (i.e., activity level). It also can be said that some types of scales have been more useful than others. In particular, empirical scales, whether nominal or hierarchical in measurement, have proven to be more useful than either type of theoretical scale.
The Hall/Van de Castle coding categories are atypical in the area of dream research in that they are nominal in nature. The system contains both empirical and theoretical categories, but the theoretical categories did not prove to have any validity or usefulness, and have long since been abandoned (Domhoff, 1996). It should be noted, however, that the problems with the theoretical categories are not unique to the Hall/Van de Castle system.
The original Hall/Van de Castle system consists of eight general categories, most of which are divided into two or more subcategories. Those eight general categories are as follows:
There are also two categories that were devised with theoretical concepts in mind, "orality" and "regression," but they are in fact empirical in focus and better thought of as categories revealing the frequency of "food and eating" elements and "elements from the past."
Because the categories in a nominal coding system can be clearly defined, there is very high intercoder reliability in the use of the Hall/Van de Castle system. This high reliability is determined by the method of perfect agreement, which simply means that all the similar codings by two independent coders are divided by the number of agreements plus the number of disagreements. For example, if coder A makes 51 codings for characters and coder B makes 49 codings, and they make the same coding 48 times, then the intercoder reliability is 48 divided by 52 (48 agreements plus four disagreements), which equals .92. Hall and Van de Castle arrived at their decision to use this approach by comparing the results with what is found with every other conceivable approach. In fact, they show that the outcomes from various methods of determining interjudge reliability can range from zero to 100% (Hall & Van de Castle, 1966, pp. 145-147). It is therefore meaningless to report a "reliability" finding without stating the method used, and it makes little or no sense to use the other methods with this particular coding system (Domhoff, 1996; Hall, 1969b; Hall & Van de Castle, 1966; Van de Castle, 1969).
The findings with the empirical categories of the Hall/Van de Castle system are most readily understood and useful when they are conveyed in an array of percentages and ratios that are called "indicators." Such indicators are also the best way to control for the differing lengths of dream reports. In addition, they lend themselves to the form of statistical analysis that is most appropriate for data from nominal categories.
Table 1 presents the main indicators in the Hall/Van de Castle system and how they are calculated. One or another subset of these indicators is used in the seven empirical papers in this special issue. The percentage indicators reveal what parts of an overall category are contained in specific subcategories. For example, the number of characters in a series or set of dreams that are animals is divided by the total number of characters to provide the "animal percent." A calculation of the human characters that are male and female yields the "male/female percent."
The social interaction ratios, on the other hand, provide "rates" of social interactions per character, not percentages. The "friendliness per character" ratio, called the "F/C ratio" or "F/C index," is typically .22 for women and .21 for men, which means there is one friendly interaction for every five characters who appear in the overall set of dreams.
Some of the indicators are created by combining various categories. For example, the "aggressor percent" and the "befriender percent" can be combined to create an "assertiveness percent" by dividing the sum of initiated aggressive and friendly interactions by the total number of aggressive and friendly interactions. The newest such indicator is the "self-negativity percent," which might prove useful in predicting a highly critical attitude toward the self or some forms of psychopathology. The way in which it uses failures by the dreamer, misfortunes that happen to the dreamer, and victim status in aggressive interactions is explained in Table 1. Although such indicators have an empirical foundation, they can be considered "quasi-theoretical" in nature because they involve a grouping of categories based on middle-range conceptualizations (Van de Castle, 1969).
For a quick overview or for studying large samples, it is very useful to determine what percentage of the dream reports have "at least one" instance of a coding category. With this approach, coders move on to the next dream report as soon as they have recorded the first instance of the category or categories being utilized. For example, misfortunes occur in about 33% of the dreams of college men and women, and instances of food or eating in about 17%.
As the findings mentioned in this section imply, Hall and Van de Castle (1966, chap. 14) made the system more useful by developing normative findings for young men and women based on five dreams from each of 100 male and 100 female students at Case Western Reserve University and Baldwin-Wallace College in Cleveland, Ohio, in the late 1940s. All or parts of these norms have been replicated with dreams collected on different university campuses from the 1960s to 1990s (Domhoff, 1996; Dudley & Fungaroli, 1987; Dudley & Swank, 1990; Hall, Domhoff, Blick, & Weesner, 1982; Reichers, Kramer, & Trinder, 1970; Tonay, 1990-1991).
To study individual differences and unique population groups, such as types of mental patients, Hall and Van de Castle (1966) make the assumption that the frequency of occurrence of a dream element reveals the intensity of concern, interest, or emotional preoccupation. The findings from individual or group studies are then compared with the norms in a search for statistically significant differences. For example, in an early study by Hall (1966a) that was further analyzed by Schneider and Domhoff (1998) with new indicators, there were a number of differences between male schizophrenics and the male norms. The schizophrenics were high on aggressor percent, and low on friends percent, the F/C ratio, and the percentage of dreams with at least one friendly interaction. They showed very little successful or unsuccessful striving, and they were high on the recently developed self-negativity percent.
Two of the empirical papers in this special issue make use of a subset of the Hall/Van de Castle indicators with atypical individuals or groups. The report by Kirschner (1999) is the first case study that looks at the dreams of an individual before and after treatment by psychotherapy or medication. While there are limitations with any case study, Kirschner's study is highly suggestive. It is published here with the hope that it will encourage researchers to do similar pre/post studies to see if the large changes found in her study can be replicated. The paper by Hurovitz, Dunn, Domhoff and Fiss (1999) compares dreams from blind participants with the norms, showing differences that "make sense" in terms of the lives that blind people lead. In particular, dreams in which they have difficulties traveling from one place to another are notable.
The employment of the Hall/Van de Castle categories, indicators, and norms has led to many interesting and plausible descriptive empirical findings, several of them quite unexpected. They are summarized in the final paper in this special issue because they pose challenges for Freudian, Jungian, and activation-synthesis theories (Domhoff, 1999). One of the more surprising of these findings is that the dream content of older adults seems to differ very little from that of the young adults on whom the norms are based (e.g., Hall & Domhoff, 1963a, 1964; Kramer, Winget, & Whitman, 1971; Lortie-Lussier, 1995; Zepelin, 1980-1981, 1981). The one exception may be a decline in aggression and negative emotions, on which the findings are mixed (Brenneis, 1975; Domhoff, 1996; Howe & Blick, 1983).
There are relatively few good studies using the Hall/Van de Castle system with children. In fact, it has been shown that the earlier work on very young children by Hall and Domhoff (1963a, 1964) was based on atypical dreams (Domhoff, 1996, chap. 5; Foulkes, 1979, 1982). Three papers in this special issue begin to fill this gap for youngsters between the ages of 8 and 15, and make suggestions for future studies as well (Avila-White, Schneider, & Domhoff, 1999; Saline, 1999; Strauch & Lederbogen, 1999). While the first two of these papers focus on narrow age ranges because they are primarily methodological in nature, the longitudinal study by Strauch and Lederbogen (1999) compares the dreams and waking fantasies of 12 boys and 12 girls in Switzerland at three different points between the ages of 9 and 15.
Rating Scales for Dream Content
Rating scales, as already noted, are based on the assumption that a characteristic can be ranked or weighted. A rating scale is called "ordinal" if it is only possible to rank elements from high to low, "equal interval" if all points on the scale are equally distant from each other, and "ratio" if the scale has an exact zero point, as in the case of weight or age. Almost all rating scales in dream research have been ordinal ones, resting on the assumption that "more" or "less" is the most that can be judged in a dream report.
Ordinal rating scales have been employed with great benefit in a wide variety of useful studies (Winget & Kramer, 1979), perhaps the most important of which are the longitudinal and cross-sectional studies by Foulkes (1982, 1993) and his co-workers (Foulkes, Hollifield, Sullivan, Bradley, & Terry, 1990). The scales used in their work made it possible to demonstrate systematic changes in dream content from mainly static images without the dreamer present with children under age 6 to action-oriented plots with dreamer involvement by age 8. The implications of these findings for various theories are discussed in the final paper (Domhoff, 1999).
Rating scales seem to be most useful for characteristics of dream reports that have degrees of intensity in waking life, such as activity level or emotionality, or that are without specific content, such as clarity of visual imagery or vividness. Sometimes useful ratings are made by the dreamers themselves. For example, Foulkes (1966) employed ratings by both judges and participants on such dimensions as activity level, dramatic quality, clarity, and unpleasantness to compare dream reports collected in the sleep laboratory from the first three REM periods of the night. He found that any differences were small and that the dream reports in general were not as emotional or unpleasant as dreams are often claimed to be. In similar fashion, Howe and Blick (1983) had women rate their dream reports on several emotionality dimensions, finding that the emotions in the dreams of older women were rated as more benign.
Despite their many useful applications in the past, there are nonetheless drawbacks to rating scales. First, it is difficult to establish reliability with some scales, which is one of several reasons why Hall and Van de Castle (1966) chose to avoid them. This lack of reliability is especially noticeable when researchers from outside the original investigative team try to use them (e.g., Winget & Kramer, 1979, p. 117). Perhaps partly for this reason, each new investigator tends to create her or his own rating scales, leading to a situation where results across studies cannot be directly compared.
Second, much specific information can be lost or unused with rating scales. An overall "bizarreness" scale, for instance, does not include the fact that in one set of dream reports the high degree of bizarreness may be due to metamorphoses, in another to magical action by specific dream characters, and in still another to impossible settings or objects. Similarly, the highest rating on a hostility scale may be due to either a murder or a fatal illness, but the difference between the two may be as informative as the extremity of the situation. In the Hall/Van de Castle system, the murder would fall into an aggression category reserved for murders, and the fatal illness would be classified as one type of "bodily misfortune." If the researcher later wanted to determine the general extent of deadly calamities, it would be a simple matter to combine the two categories and compare them with normative figures that are readily created due to the DreamSAT spreadsheet for the Hall/Van de Castle system (Schneider & Domhoff, 1999).
Third, many rating scales rest on assumptions that are psychologically untenable when they are examined critically. For example, in a "dependency" rating scale created by Whitman, Pierce, Maas, and Baldridge (1961), a score of 6 is assigned if the person eats food, whereas a score of 1 is assigned if the person seeks the help of others. Since the ratings for each dream are added together to create an overall score, this rating system implies that "mentioning a ham sandwich shows six times as much dependency as accepting a helping hand from another" (Van de Castle, 1969, p. 193). Thus, the authors' claim that the medicinal drugs imipramine, prochlorperazine, and phenobarbital increased dependency in the laboratory-collected dream reports of the subjects must be treated with great caution (Whitman, Pierce, Maas, & Baldridge, 1961).
This same type of psychologically untenable assumption is prevalent in rating scales for hostility. With most of these scales, the highest rating is given for murders, medium scores for injury or damage to personal possessions, and low scores for insults or expressions of hostility. The ratings for each dream are added together and an average hostility score is calculated for each individual or group. Such a procedure implies that several angry thoughts or a few damaged possessions are psychologically equivalent to one murder, a weighting that seems indefensible once it is made explicit (Hall, 1969a, 1969b).
The main reason Hall and Van de Castle (1966) used nominal categories was to avoid such untenable psychological assumptions. Instead of creating a hostility scale, for example, they created eight separate nominal categories for types of aggression that range from (1) angry thoughts to (2) critical remarks to (3) rejections and refusals to (4) dire verbal threats to (5) stealing or destruction of possessions to (6) being chased to (7) being confined or attacked to (8) murder. Codings can be done reliably, and no information is lost when eight different frequencies can be noted and compared. For purposes of analysis, the percentage of all aggressions that are in one of the four physical categories (5-8) can be calculated (the physical aggression percent). For some purposes, the overall number of aggressions of all types can be found simply by adding together the results from all eight categories. Comparisons can be made with the norms for each category or for the sum total of all types of aggressions. An A/C ratio can be calculated for each type of aggression or for overall aggression scores.
Fourth, theoretically based rating scales are more problematic than empirical rating scales because it is difficult to translate complex theoretical concepts into matters of degree. For example, in Sheppard's (1963, 1969) scale for "ego integration" derived from psychoanalytic theory, the "body image" portion calls for a coding of "8" if there is a "bizarre deformity," a "4" if there is a mutilation or critical injury, a "2" for a mild illness, and a "1" if there is no mention of ill health. But distinctions between bizarre deformities and mutilations may not be easily made. Moreover, there is no rationale for why a deformity receives twice the weighting of a mutilation, or for any of the other weightings.
Implicit value judgments sometimes call theoretical scales into question. For example, Polster's theoretical scale for "ego strength" makes an implicit value judgment when it labels an aggressive response to an aggression as "appropriate," but a nonaggressive response as "inappropriate" (Hall & Van de Castle, 1966, p. 208). Thus, if a dreamer escapes from an aggressive character or convinces him or her not be to be aggressive, that is a sign of low ego strength.
Hall and Van de Castle's nominal empirical categories can be used instead of unreliable or complicated theoretical rating scales by combining relevant categories. This point can be demonstrated by looking at studies using Beck and Hurvich's (1959) mislabeled and unvalidated "masochism" scale, used in a study of divorced men and women by Cartwright (1992), and Krohn and Mayman's (1974) object relations scale, which has been used in a study of maturity in adolescents by Winegar and Levin (1997). The Beck and Hurvich scale consists of a wide range of negative experiences from physical discomfort to rejection to failure to being punished, lost, or victimized. Using this scale, Cartwright (1992) came to the conclusion that divorced women who are not depressed are more masochistic than divorced men who are depressed, a surprising result that seems to raise more questions than it answers.
In a very perceptive study, Clark, Trinder, Kramer, Roth, and Day (1972) hypothesized that the items on the masochism scale were encompassed by three aspects of the Hall/Van de Castle system: failures, misfortunes, and victim status in aggressive interactions. They then demonstrated this point by coding two different samples with both the masochism scale and the three Hall/Van de Castle categories. They found that the masochism findings were encompassed by the Hall/Van de Castle categories, which picked up several elements missed by the masochism scale as well. Since women are slightly more likely to fail when they strive in dreams, and to be victims in aggressive interactions, the "masochism" that Cartwright reports in her women participants is really a combination of failures and victimizations, which are not obvious manifestations of clinical masochism. Moreover, the Hall/Van de Castle misfortune categories, which might seem to be more related to masochism, do not show any gender differences. These three categories are now part of the self-negativity index explained in Table 1.
Krohn and Mayman (1974) developed a very complicated rating system for determining the level of maturity in "object relations" in dreams, a term that is roughly equivalent to "interpersonal interactions." It calls for subtle judgments such as assigning an "8" if there is "a sense of rapport with people and a well-developed understanding of their thoughts, feelings and conflicts" (Krohn & Mayman, 1974, p. 454). A "5" is assigned if the people in the dream have "no real identity," a "3" if people are experienced as "insubstantial, fluid, more or less interchangeable," and a "1" if there are no other people and the "subject's world seems to be completely lifeless, vacant, alien, strange" (Krohn & Mayman, 1974, pp. 452-454).
Winegar and Levin (1997) applied the Krohn/Mayman scale to 389 dream reports recorded by 115 adolescents between the ages of 15 and 18. They found that the girls showed more maturity in object relations than the boys and that the differences were greater at the older ages. Winegar and Levin were kind enough to provide this author with copies of the dream reports used in their study. A sample of 12 reports coded "low" and 10 reports coded "high" on the Krohn/Mayman scale was coded with Hall/Van de Castle categories by two coders who had no knowledge of the Krohn/Mayman scale or of the fact that there were two sets of dreams. The results are simply stated. Dreams with at least one friendly interaction always were in the "high" group, as were dreams with nonphysical activities like talking. Dreams with physical aggressions and physical activities were in the "low" group with one exception, where a friendly interaction also took place. Dreams with no social interactions were in the "low" group. The results suggest that this theoretical rating scale can be encompassed by employing three easily coded nominal categories--physical aggression, friendliness, and activities.
When the relative advantages and disadvantages of rating systems and the Hall/Van de Castle nominal categories are weighed, it seems fair to say that the nominal categories are more useful except for dimensions that are also plausible in waking life (e.g., emotional intensity, activity level) or have no specific content (e.g., clarity, vividness). However, there are also instances where the two types of approaches complement each other. For example, Foulkes's (1966) finding of only small differences in dream content from the first three REM periods with rating scales is similar to findings with the Hall/Van de Castle categories in three different studies comparing REM dream content (Domhoff & Kamiya, 1964a; Hall, 1966b, Strauch & Meier, 1996).
Determining a Unit of Analysis
Every dream researcher, no matter what system of content analysis is employed, has to decide on the "unit of analysis" to be used in making standardized comparisons from dream to dream or group to group. In many studies, the unit of analysis is simply the dream report as a whole. The sum total of the frequencies in nominal categories or of the ratings for each dream is divided by the total number of dream reports. But there are two major problems in using the dream report as the unit of analysis.
First, and most crucially, there are wide individual differences in report length, and women's dreams are often found to be longer than those of men (e.g., Bursik, 1998; Hall & Van de Castle, 1966; Winegar & Levin, 1997). Varying lengths are a problem because longer reports are likely to have more of most things in them, although one study showed that the relationship was not monotonic for all categories in the Hall/Van de Castle system (Trinder, Kramer, Reichers, Fishbein, & Roth, 1970).
The failure to correct for dream length may be even more serious in studies using rating scales. For example, a frequently used theoretical rating scale for "primary process thinking" in dream content, which requires judgments of differing degrees of distortion and improbability, correlates .60 with the length of the dream report (Auld, Goldenberg, & Weiss, 1968). Wood, Sebba, and Domino (1989-90) and Livingston and Levin (1991) found that previously reported positive relationships between this scale and creativity measures disappear when there is a control for length.
Second, dream reports can vary from group to group in the frequency with which certain elements appear even when report length is held constant. This difference in "density" seems to be especially the case for the frequency of characters, which means that there is more likelihood of social interactions in some dream reports than others. Once again, there is a gender difference. There are more characters in women's dream reports, an interesting finding in and of itself, but one that should be taken into account in analyzing social interactions (Hall, 1969a, 1969b).
Several approaches have been used to correct for the length problem. For example, only the first 100 words in a report may be used, but that solution throws away information and does not take into account that some people may take more words to report the same interactions and actions. It also eliminates the endings of many dreams. Another approach is to use the mean number of lines or words per dream report as the unit of analysis, but that does not deal with the differing "wordiness" of participants and leads to cumbersome findings such as "there were 2.3 human characters per every ten lines (or 100 words) in the dream narratives." Still another possibility is to establish minimum and maximum lengths for the reports to be analyzed, thereby making it possible to use the dream report as a whole as the unit of analysis. None of these solutions is ideal, however, and none of them deals simultaneously with the issue of differing character densities.
There is, however, a good solution with a nominal set of categories such as those in the Hall/Van de Castle system, which is to use the kinds of percentages and ratios presented in Table 1. Such an approach is independent of report length or character density within very broad limits, effectively dealing with both problems at the same time. And, as already noted, the findings with these indicators are also readily communicated and understood. For example, people immediately understand if it is reported that the "animal percent" in dreams declines from 30-40% in young children to 4-6% in adulthood, and is higher in small traditional societies than it is in modern nations (Domhoff, 1996).
The likely dependence of social interactions on the frequency of characters can be handled in a similar fashion using the ratios introduced in an earlier section. Thus, dividing all friendly interactions by the total number of characters produces the F/C ratio, which also can be figured separately for each of the seven categories of friendliness, or for the dreamer's interactions with specific characters in the dream reports, such as "sister," "brother," or "stranger," without having to deal with a more cumbersome unit of analysis.
However, the use of percentages and ratios of the kind just described does not work for an "at least one" analysis because the dream report is by definition the unit of analysis. For an "at least one" analysis in which comparisons with the Hall/Van de Castle norms are to be made, it is therefore necessary to use dream reports between 50 and 300 words in length because that is the range used to establish the normative frequencies. If the comparison is between two groups, then the researcher can pick any range of lengths. If the comparison is among subsets from a dream journal from one person, then it is unlikely that any limits would need to be set unless it is determined that the length of the person's reports varies significantly over time.
Testing for Statistical Significance
The usefulness of tests of statistical significance has been vastly overrated in psychology (e.g., Cohen, 1990, 1994; Rosenthal, 1990; Rosnow & Rosenthal, 1989). They can be so misleading that some psychologists have advocated their banishment from scientific journals (Hunter, 1997, Schmidt, 1996). The real test is being able to replicate results again and again (Cohen, 1990). Nevertheless, tests of statistical significance are likely to persist for some time to come.
Statistics textbooks generally agree that parametric statistics like the t-test and analysis of variance should not be used with the nominal data of the Hall/Van de Castle system or the ordinal data from rating systems (e.g., Siegel & Castellan, 1988). This is because parametric statistics rest on several assumptions that are not met by nominal and ordinal levels of measurement, such as the need for the points along the scale to reflect an underlying continuous distribution with equal intervals between them. Parametric tests also assume a normal distribution, which is found less frequently in the social world than is generally realized, and is seldom tested for by those who use parametric tests.
This does not mean, of course, that it is impossible to add, subtract, multiply, and divide the frequencies derived from nominal scales or the numbers assigned to ordinal scales. It is just that such an analysis is misleading or in error if assumptions are violated. As Siegel and Castellan (1988, p. 33) stress:
It should be obvious that a mean and standard deviation may be computed for any set of numbers. However, statistics computed from these numbers only "make sense" if the original assignment procedure imparted "arithmetical" interpretations to the assignments. This is a subtle and critical point.
Given these important strictures, statistics textbooks advocate the use of proportions and chi square with nominal data. Happily, the percentages and ratios used in the Hall/Van de Castle system are ideal for these two types of statistics (e.g., Reynolds, 1984). Moreover, the test for the significance of differences between two independent proportions and chi square provide exactly the same result for 2 X 2 categorical tables, which are the most frequent type of comparison made in dream research. That is, the "z" score derived from a proportions test is equal to the square root of chi square (e.g., Ferguson, 1981, pp. 211-213). The test for the significance of differences between two independent proportions is therefore recommended for use with Hall/Van de Castle content indicators (Domhoff, 1996, Appendix D). For those who adopt this statistical approach, both significance levels and confidence intervals are computed automatically on the DreamSAT spreadsheet that makes the analysis of Hall/Van de Castle codings faster, easier, and far more accurate (Schneider & Domhoff, 1999).
Effect Sizes and the h-Profile
As just noted, statistical significance does not seem to be as important in psychological research as was claimed in the past. The more important matter is the magnitude of any reliable differences that are found, which is revealed by "effect sizes." For nominal data in 2 X 2 tables, the effect size is basically the percentage difference between the cells in the top row of the table. This percentage difference is also the same as the Pearson r for dichotomous variables (Rosenthal & Rubin, 1982). In addition, the two measures for effect size associated with chi square, phi (f) and lambda ( l ), are equal to r. However, there is a slight complication.
In determining effect sizes with data expressed in percentages, a mathematical correction has to be made because the standard deviation of the sampling distribution cannot be determined due to the fact that the varying distances between scores are unknown when there are only percentile ranks. To deal with this problem, Cohen (1977) developed the "h" statistic, which makes the necessary correction using an arcsine transformation of the percentage difference between any two comparison groups. A table for determining h from percentages is provided by Domhoff (1996, p. 315) and is calculated by DreamSAT for each percentage indicator in the Hall/Van de Castle system once the codings are entered (Schneider & Domhoff, 1999).
The importance of taking effect size into consideration is shown in this special issue in Domhoff and Schneider's (1999) reanalysis of the differences between the home and laboratory dreams collected in a large-scale study by Hall and Van de Castle in the 1960s (Hall 1966b). The original study reported several statistically significant differences between dreams written down at home and those reported in the laboratory, contributing to a controversy over the relative merits of the two collection methods. However, the reanalysis using effect sizes shows that most of the statistically significant differences were not very large in terms of magnitude. The small to medium effect sizes in Domhoff and Schneider's reanalysis suggest that the overall controversy involved much ado about not very much. Perhaps the conflict could have been avoided if effect sizes rather than "significance levels" had been stressed at the time.
The effect sizes for any array of comparisons with Hall/Van de Castle indicators can be placed on a bar graph that resembles an MMPI profile. It reveals any unique patterns in the overall analysis, such as might be found in character categories or types of social interactions. This display has been named the "h-profile" (Domhoff, 1996). It can be especially useful in showing the way in which an individual or special population differs from the male or female norms. Effect sizes and h-profiles are produced as both tables and graphs by DreamSAT.
Figure 1 presents the h-profile for 104 dream reports from 20 male schizophrenics that were coded by Hall (1966a) and then reanalyzed by Schneider and Domhoff (1999) by entering the original Hall codings into DreamSAT. The h-profile immediately reveals the magnitude of the several differences that were mentioned in an earlier section of this paper. The differences on friendliness categories and the self-negativity percent are notable. Such reanalyses of old codings are easily done with DreamSAT.
The case study by Kirschner (1999) comparing the dream reports of young adult women before and after taking an anxiety-reducing medication provides an especially good example of how helpful the h-profile can be in understanding a large number of results. In her paper, the before and after results can be compared with the female norms and with each other at a glance, showing that the post-medication dream content is generally closer to the normative findings.
A New Approach to Collecting Dream Reports
In addition to demonstrating the many possibilities for developing new findings with the Hall/Van de Castle coding system, this special issue extends the usefulness of a new approach for collecting good dream samples to children as young as ages 10-11 (Avila-White, Schneider, & Domhoff, 1999; Saline, 1999). Termed the Most Recent Dream (MRD) method, it simply asks everyone in a group setting to write down the Most Recent Dream they can remember, "whether it was last night, last week, or last month" (Domhoff, 1996, p. 67). To reinforce the emphasis on the last dream recalled, and to make it possible to eliminate dreams from months or years in the past, participants are also asked to write down the date on which they think the dream occurred.
In the past, dream researchers relied on one or more of four methods for the collection of dream reports: (1) awakenings in sleep laboratories; (2) brief dream diaries kept for a period of several nights, a week, or a month by volunteers at the request of a dream researcher; (3) lengthy dream journals kept by dreamers for their own reasons; and (4) the recording of dreams discussed in psychotherapy. Reports collected in the sleep laboratory provide the most systematic data, but with the decline in funding for dream studies (Foulkes, 1996) it has become necessary to rely on the other methods. It may be that dream studies from the laboratory have played their most important role, until such time as new funding is available, by establishing the standards against which the usefulness of other approaches to dream collection can be measured.
Brief dream diaries kept at the request of dream researchers are now used very widely, but this approach has numerous problems. First, it often takes weeks or months to obtain even four or five dreams. Second, a large minority of participants drop out or turn in only one or two dreams. Third, the samples consist of different numbers of reports from different subjects, which leads to questions about how to standardize the contribution of each subject to the overall sample. Finally, the demand characteristics of the study can be very strong, especially when the researcher has to prod subjects one or more times to write down dreams. Such pressures increase the probability of hasty or confabulated reports. In a survey by Tonay (1990-1991) of several hundred students in a psychology class at the University of California, Berkeley, 43% said they would be likely to make up dreams if required to turn them in as part of a course assignment.
The Most Recent Dream approach pioneered by Hartmann, Elkin, and Garg (1991, p. 316) has the great advantage of making it possible to collect large samples of dreams in an efficient and inexpensive manner from virtually everyone in a group in the space of 15 to 20 minutes. It also provides a standardized way of collecting dreams in classrooms and waiting rooms in many different countries or at times when forthcoming or unanticipated events would make it suddenly interesting to have large samples of dreams (e.g., impending visits to a city by a famous celebrity, dramatic political changes, the occurrence of natural disasters).
The credibility of the Most Recent Dream approach is enhanced by the finding that dream reports collected from everyday recall do not differ from those collected in the sleep laboratory when controls are introduced (Foulkes, 1979). To the degree that there are any differences, they involve various measures of aggression (Domhoff & Kamiya, 1964b; Weisz & Foulkes, 1970). Furthermore, the effect sizes are relatively small even where there are statistically significant differences, except in the case of physical aggression, which is more frequent in dreams collected outside the sleep laboratory (Domhoff & Schneider, 1999).
The minimum sample size necessary for MRD studies that utilize comparisons with the Hall/Van de Castle norm is 100-125 dream reports. This conclusion is based on a study in which many different subsamples of 25, 50, 75, 100, and 125 dream reports were drawn from the original Hall/Van de Castle codings of the 500 dreams used to establish the male norms. A determination of "average departures from the norms" for each subsample size found that such subsamples did not approximate the overall normative figures for most content indicators until they included at least 100 dream reports (Domhoff, 1996, pp. 65-66). The minimum sample size of 100 Most Recent Dreams was then supported by a study of 100 MRD's from women students at the University of California, Santa Cruz, in1992 and 1993. There was not a single statistically significant difference between this sample and the female norms (Domhoff, 1996, p. 67).
The extension of the MRD method to preadolescents and young teenagers by Saline (1999) and Avila-White, Schneider, and Domhoff (1999) in this special issue is important for two reasons. First, the problems of the dream-diary method are probably magnified in studies of younger age groups (Foulkes, 1979). Both Bulkley (1970) and Howard (1978) reported difficulties in obtaining completed diaries, especially from boys. In their study of the maturity of interpersonal relations in the dreams of adolescents, Winegar and Levin (1997) had 182 initial volunteers out of the 550 students in the classrooms they visited, but only 115 turned in at least two dreams of 35 words or more; 63% of their final sample were girls, 37% boys.
Second, the two MRD studies of adolescents in this special issue are important because both drew their samples from schools with students who eventually go on to college in large numbers. This makes it possible to compare their results with the Hall/Van de Castle norms without concern about class levels and educational expectations as possible confounding variables.
Gender and Dream Content
The ways in which poor samples, inadequate content indicators, and inappropriate statistical analyses lead to confusion and seemingly contradictory findings in the dream literature can be demonstrated through a consideration of several studies on the relationship between gender and dream content. Controversy over these findings is magnified by feminist criticisms (e.g., Rupprecht, 1985) of earlier Freudian interpretations of empirical findings with the Hall/Van de Castle system (e.g., Hall & Domhoff, 1963b). However, such interpretations are separate from the findings themselves, and generally have been abandoned in any case due to the lack of evidence for Freudian and other clinically based theories (e.g., Domhoff, 1996, 1999). Thus, the emphasis in this section is on issues related to sampling, data analysis, and statistics in creating conflicting results.
The most systematic empirical findings on gender and dream content are provided by Hall and Van de Castle's (1966) normative study of 500 dreams from 100 women and the same number of dreams from a similar sample of men. As noted in an earlier section, these dreams were collected at Case Western Reserve University and Baldwin-Wallace College in Cleveland, Ohio, in the late 1940s. Table 2 and Figure 2 [not present in this on-line version of the paper] present the normative findings in table and bar graph form for the main Hall/Van de Castle indicators, along with significance levels and effect sizes. It should be stressed that there are many categories that show no gender differences or ones that are relatively small. Nor are any theoretical interpretations made for these empirical categories.
The norms for the character categories were replicated with 418 dreams collected in the sleep laboratory at the University of Cincinnati in the 1960s (Reichers, Kramer, & Trinder, 1970). Then the findings for all the major coding categories were replicated with 340 dreams from 69 women and 263 dreams from 53 men collected at the University of Richmond in 1979 (Hall, Domhoff, Blick, & Weesner 1982).
Further support for the stability of the normative findings came in two separate investigations of women's dreams by female dream researchers in the 1980s. Tonay (1990-1991) collected and coded 500 dreams from 100 women at the University of California, Berkeley, in the late 1980s, finding very few differences except for three small ones with an Asian-American subsample. About the same time, Dudley and Fungaroli (1987) and Dudley and Swank (1990) collected two different samples at Salem College, an all-women's college in Winston-Salem, North Carolina, finding very few differences. Finally, the women's norms were replicated once again in the 1990s in the study of 100 Most Recent Dreams from the University of California, Santa Cruz, mentioned in the previous section; these dreams were collected by Domhoff and coded by Tonay (Domhoff, 1996, p. 67).
Despite this impressive record of replication, one that is rarely attained in psychological studies, several studies claim changes on one or more content indicators. These studies receive far more attention than the replications. However, in every instance the alleged changes are due to misunderstanding of the original norms, the use of inadequate statistics, or small sample sizes.
The first report to claim changes was based on 1,190 dreams collected from 11 women and 11 men in the sleep laboratory at the University of Cincinnati during the 1970s and early 1980s (Kramer, Kinney, & Scharf, 1983). The authors conclude that the "sexual revolution of the past two decades has indeed had some psychological impact in altering some of the traditional differences between men and women" (Kramer, Kinney, & Scharf, p. 1). Their primary reference is to the disappearance of differences between women and men in the categories for aggressions and misfortunes, but in fact their findings concerning the percentage of dreams with at least one aggression or at least one misfortune are the same as those reported by Hall and Van de Castle using this type of indicator. As Table 2 shows, there never were gender differences on the "at least one" indicator for either aggression or misfortune, nor for friendliness or food and eating, for that matter. Thus, the alleged change is a simple misreading of the norms.
Kramer, Kinney, and Scharf (1983) also report several changes for minor coding categories that are seldom used, most of them with very small frequencies. They include elements such as "auditory activity," mention of "old age," and the presence of straight lines and straight edges. However, there are two major methodological problems. First, they did not correct for differing lengths of dream reports. In this case the oversight is especially egregious because the sample is a rare one in that the men's dreams are longer than those of the women (116 words vs. 92), which stacks the deck even further towards differences from the norms. Second, the decision to use ratios with very small numbers also created major problems. The details of this critique can be found in Domhoff (1996, pp. 79-82).
Further claims concerning changes in gender differences come from a series of studies carried out by Lortie-Lussier and her collaborators at the University of Ottawa (e.g., Lortie-Lussier, Schwab, & De Koninck, 1985; Lortie-Lussier, Simond, Rinfret, & De Koninck, 1992; Rinfret, Lortie-Lussier, & De Koninck, 1991). Although most of their work concerns possible differences in the dream content of women in different roles, the main point of their efforts is that the findings for both women and men can vary with changing circumstances. Their studies are characterized by very small samples, the use of inappropriate parametric statistics, and very few statistically significant differences.
The original study by Lortie-Lussier, Schwab, and De Koninck (1985) compared a mere 30 dream reports from 15 working mothers with 30 dreams from 15 "nonworking" mothers (also called "homemakers" in some places). Mean scores per participant were derived for each pair of dreams for seven Hall/Van de Castle coding categories and three categories of their own (residential settings, vocational settings, and overt hostility). When the two groups were compared on the ten variables with t-tests, only a greater number of negative emotions for working mothers was statistically significant. This meager result is not surprising given the small sample size and the use of an inappropriate parametric significance test. The two groups were then compared by means of discriminant analysis, which searches for patterns of variables that differentiate groups and requires at least an interval level of measurement (e.g., Klecka, 1980, p. 8). This analysis suggested that residential settings and overt hostility, neither of which is a Hall/Van de Castle category, "contributed positively to the identification of the homemakers' dreams," whereas indoor settings and negative emotions, both Hall/Van de Castle categories, "were positive discriminators of working mothers" (Lortie-Lussier, Schwab, & De Koninck, 1985, p. 1015).
In the study by Rinfret, Lortie-Lussier, and De Koninck (1991), which compared working mothers between the ages of 27 and 39 with college-age women students, the working mothers dreamed more often of the work environment, their husbands, their children, and unpleasant emotions than did the women students who did not have work, husbands, or children. In the study by Lortie-Lussier, Simond, Rinfret, and De Koninck (1992), two dreams from each of 32 homemakers, 32 employed mothers, and 32 fathers were compared for 8 Hall/Van de Castle categories and several of their own scales using analysis of variance, which should not be used with nominal data, and without correction for differences in dream length, even though the homemakers had a mean word length of 230, the employed mothers 200, and the fathers 168. Although most of the hypothesized differences were not found, the homemakers did dream more about their families and the two wage-earning groups had more male characters, more unfamiliar characters, and more outdoor settings.
In all, these findings probably provide some evidence for the idea that familial and work roles can have some influence on dream content, which never has been disputed in any study by Hall or his collaborators, but they are not very substantial and must be considered tentative because of the methodological and statistical shortcomings that would have to be answered with a reanalysis or studies of new and larger samples.
Sources raising criticisms about the Hall/Van de Castle normative findings sometimes cite a study based on one of the most unorthodox sampling procedures imaginable (Krippner & Rubenstein, 1990; Rubenstein & Krippner, 1991) as evidence for changing gender differences. The sample in question was obtained when Krippner appeared on a nationwide morning television talk show and asked the viewing audience if they would send him dreams. Moreover, there seemed to be some suggestion that the participants might receive feedback about the dreams they sent. Not surprisingly, 33% of the 220 dream reports he received were described as recurrent dreams, and many other respondents reported they had experienced the dream many years earlier, sometimes in childhood (Rubenstein & Krippner, 1991:41). It is therefore to be expected that the findings from such an atypical sample would differ from the Hall/Van de Castle norms in some respects.
The latest and most vigorous challenge to the Hall/Van de Castle norms reports that the only gender differences that continue to exist are greater physical aggression in men's dreams and more mentions of failure in women's dreams (Bursik, 1998). The latter difference was "not anticipated" according to Bursik (1998, p. 212), but the norms show that 15% of women's dreams have "at least one" failure compared to 10% of men's dreams. Instead of gender differences, the author finds "gender role" differences between "masculine," "feminine," "androgynous," and "undifferentiated" participants, as defined by the Bem Sex-Role Inventory. The "masculine" and "feminine" role orientations show most of the content differences associated with men and women in other studies.
Bursik (1998, p. 213) concludes that "these data can be viewed as additional evidence for the continuing transcendence of gender differences, at both the conscious and unconscious levels of experience," and suggests that "the time has come for contemporary dream researchers to move beyond the search for gender differences in manifest dream content." There are uncritical supporting references to the findings of Kramer, Kinney, and Scharf (1983), Lortie-Lussier, Schwab, and De Koninck (1985), and Krippner and Rubenstein (1990. There is no mention of the study by Waterman, de Jong, and Magdelijns (1988) in the Netherlands, with five dreams from each of 34 women and 32 men, that found that gender differences were more important than gender role orientation. Nor is there any consideration of the possibility that sampling problems and a very different methodological and statistical approach might have led to the differences.
The dream reports used in Bursik's study were collected at Suffolk University in Boston from volunteer participants in one of her psychology classes. Participants were given 1 month to provide 5 dreams, but it took 4 months to receive at least 4 dreams from 40 men and 40 women, for a total of 320 dream reports from 80 participants. After the Hall/Van de Castle codings were completed for specific categories like "outdoor settings," "weapons," "males," and "physical aggression" by two research assistants, each one coding the 320 dreams for different categories, the data were analyzed using multivariate analysis of covariance, with dream length as the covariate. In presenting the conclusion that most previous gender differences had disappeared, there is no mention of the fact that most Hall/Van de Castle studies have used percentages, ratios, and nonparametric statistics.
Nor are the problems of using parametric statistics with nominal categories ever acknowledged or addressed. There is no attempt made to demonstrate that the frequencies in the various categories are normally distributed. More seriously, there is no recognition that categories like "outdoors," "weapons," "clothes," "males," and "females" do not have an underlying continuous distribution. As Siegel and Castellan (1988) note, it is certainly possible to derive means and standard deviations from the frequencies in nominal categories, thereby making a multivariate analysis of covariance possible, but it is highly likely that the findings are misleading or erroneous. It is also unlikely that any differences could be found because Bursik has a relatively small number of cases (80) for the large number of variables she is analyzing, which is called the "case-to-variable-ratio" by statisticians (e.g., Tabachnick & Fidell, 1989), but that point need not be belabored given the use of inappropriate statistics for a nominal level of measurement.
It is striking that neither Bursik nor the peer reviewers or editors at Sex Roles found the sudden disappearance of replicated gender differences to be surprising enough to raise methodological questions about the results. This oversight by the editors at Sex Roles is made all the more surprising by a report of continuing gender differences in the dreams of adolescents between the ages of 15 and 18 by Winegar and Levin (1997) that appeared in the same journal just one year previously. If there were general changes taking place, as Bursik's (1998) discussion clearly insists, then these changes should have appeared in Winegar and Levin's (1997) study as well. The studies in this special issue on American children (Saline, 1999) and teenagers (Avila-White, Schneider, & Domhoff, 1999) are also relevant if it can be expected that converging gender orientations will manifest themselves at a younger age. Both studies report gender differences that are for the most part similar to those in the Hall/Van de Castle norms.
Although the contention over conflicting findings on gender and dream content may seem unusually strong, it is in fact the case that most of the dream literature suffers from the same sampling and methodological problems that have been demonstrated in this section. Sample sizes are usually far too small for replication to be possible. Corrections for dream length are often absent or inadequate. The statistics used to analyze the data often are not appropriate. At the least, the different types of analyses are not as commensurate as authors such as Bursik (1998) seem to assume, which means that there can be no general conclusions drawn from reviewing the literature.
This paper has presented an overview of the methods and statistics used in the seven empirical papers that follow in this special issue. It is therefore cited in these papers on issues that might be unfamiliar to readers, rather than repeating in each paper the rationales for the application of the Hall/Van de Castle system.
In addition, this paper has attempted to illuminate the methodological and statistical problems that plague the literature on dream content. Hopefully, it can contribute to a growing sophistication in dream content studies and to higher standards for publication by the editors of journals that receive submissions on dream content.
Go back to the Dream Library index.