Cognitive Information Processing (Atkinson & Shiffrin – 1968)

The ideas of Ebbinghaus, Tolman and Kohler represent early foundations of cognitive research and cognitive theory that were influential in opening the way for broad acceptance of a more general theory of cognitive learning. Leahey and Harris (1997, p. 104) cite the three-stage, multi-store model of memory proposed by Atkinson and Shiffrin (1968) [1] as laying the ground work for this theory. Atkinson (in Izawa, 1999) described the initial realization of the three stage model and the academic context of this effort as follows:

An invitation to contribute a chapter to Psychology of Learning and Motivation provided an opportunity to pull the various empirical and theoretical strands together into a larger framework. In the process, Shiffrin and I realized that the short-term buffer process that we were using in our various models was merely a stand-in for a more complicated set of processes representing short-term memory, leading us to broaden the conception of short-term memory to “control processes,” a term standing for “active memory” or “working memory.” This conception in turn allowed us to put together a theoretical framework with relatively autonomous sensory processing, controlled processing in short-term memory, and a permanent long-term memory upon which control processes could operate to produce retrieval. The field was obviously ready to embrace this approach, and the publication of the chapter seemed to act like the nucleus that causes a solution in delicate equilibrium to precipitate. (p. x)

Atkinson attributed the success of the model to “more than just a matter of a publication arriving on the scene at a propitious moment” and rather because it provided a good “quantitative fit” to “a wide array of experimental paradigms and conditions” (Izawa, 1999, p. x). Not many years after the publication of their chapter, just as Skinner and radical behaviorism proved to be the star of the behavioral stage, cognitive learning theory came to be dominated by one central model based on the ideas of Atkinson and Shiffrin. This model is commonly referred to as the information processing model.[2]

The information processing model has three major components (Eggen & Kauchak, 1999, pp. 243-244):

  1. Information stores – repositories used to hold information. Three types of storage are assumed: sensory, short-term (working), and long-term.
  2. Cognitive processes – intellectual actions that transform information and move it from one store to another. Processes include: attention, perception, rehearsal, encoding, and retrieval.
  3. Metacognition – knowing about and having control over cognitive processes; a form of self-regulation. Metacognition controls and directs the processes that move information from one store to another.

In this model of memory, a depiction of which can be found in Eggen and Kauchak (1999, p. 244), new information is assumed to flow from left to right, entering through the senses and moving through the processes of attention, perception, rehearsal, and encoding into long term memory, where it is then available for later retrieval and use.

Information from the environment is temporarily stored in sensory memory so that it can be selectively attended to—i.e., brought into short term (or working) memory—and further processed. Information in sensory memory decays rapidly and disappears in only a few seconds if it is not transferred into working memory.

Strong evidence supporting the existence of sensory memory was presented by Sperling (1960) when he demonstrated that subjects were able to consistently recall about 75 percent of 12 letters arranged in a 3×4 matrix and flashed on screen for only 50 milliseconds. Using a “partial-report” method, subjects were not told in advance which portion of the matrix to attend to, but instead were signaled after the presentation by a high, middle, or low tone to report only the top, middle, or bottom line of the matrix.

Sperling also found that decay of information brought in through the visual system—in cases where no further attention and processing is involved—is complete in one second or less. He replicated his study with the auditory system and found that decay took place in under four seconds. More recent research suggests that it may last as long as ten seconds (Samms, Hari, Rif, & Knuutila, 1993, as cited in Leahey & Harris, 1997, p. 107). Regardless of the exact number, however, it is well known that sensory memory does not persist for more than a very brief period. This temporary trace is sufficient to enable the aggregation of fragmented images (for visual information) and echoes (for auditory information) into meaningful units that can then be processed further in working memory, but short enough to prevent clutter and confusion. Leahey and Harris (1997) expressed their belief that the “fact that information decays so rapidly from sensory memory is actually an adaptive feature of our information-processing system,” and that it prevents “double images in the visual system or confusing echoes in the auditory system” (p. 107).

Sensory information selected for further processing is brought into working memory through attention and perception (or pattern recognition).[3] Attention has been conceptualized as a “filter” that selects from among different information input “channels” (Broadbent, 1958); a “tuner” that selectively attenuates, or raises the thresholds for accepting signals from non-relevant or non-interesting sources (Treisman, 1960); and as “mechanisms that control the significance of stimuli” through the allocation of limited resources or capacity (Kahneman, 1973, p. 2).[4] Attention can be thought of as having three dimensions: (a) voluntary vs. involuntary, (b) locus of interest, and (c) intensity.

Although attention may be voluntary (conscious) or involuntary (sub- or pre-conscious), cognitive psychologists have generally been most interested in voluntary attention, in which a subject attends to particular stimuli because they are relevant to a task he has chosen to perform. Attention may also be described in terms of the locus of interest—e.g., inputs from a particular source, targets of a particular type, a particular attribute of objects, or outputs in a particular category (Treisman, 1969, as cited in Kahneman, 1973, p. 3). Whether voluntary or involuntary, and regardless of focus, attention may also vary in intensity, or level of arousal. “Collative properties” [5] of stimuli, such as novelty, complexity, and incongruity cause some stimuli to be more arousing than others (Cupchik & Berlyne, 1979; Berlyne, 1951, 1960, and 1970, as cited in Kahneman, 1973, p. 3).

In Kahneman’s (1973) capacity or effort model of attention “transient variations in the effort a subject invests in a task determine his ability to do something else at the same time” (p. 4). When attention is voluntary the level of intensity corresponds to effort rather than to mere wakefulness or alertness. Ulric Neisser, whose theoretical contributions will be discussed further under our review of constructive learning theory, followed in the lead of the Gestalt theorists’ notion of autochthonous[6] forces, and emphasized “pre-attentive” processes that “produce the objects which later mechanisms are to flesh out and interpret” (Neisser, 1967, p. 86). This view of “pre-attentive” processes resolves, in part, the apparent conflict between the contradictory observations that (a) “man often performs several activities in parallel, such as driving and talking, and apparently divides his attention between the two activities” but, (b) “when two stimuli are presented at once: often, only one of them is perceived, while the other is completely ignored; if both are perceived, the responses that they elicit are often made in succession rather than simultaneously” (Kahneman, 1973, p. 5). Kahneman’s resolution to this conflict is a model of capacity, in which total capacity of attention may be apportioned among multiple potential sources of input or activity. In his model, capacity of attention is finite and will be consumed more by some activities than others since “the ability to perform several mental activities concurrently depends, at least in part, on the effort which each of these activities demands when performed in isolation” and “not all activities of information-processing require an input of attention” (p. 9).  Schneider and Shiffrin (1977) described the distinction between activities that demand attention and those that do not as a difference of controlled vs. automatic processes.[7]

Automatic processing is learned in long-term store, is triggered by appropriate inputs, and then operates independently of the subject’s control. An automatic sequence can contain components that control information flow, attract attention, or govern overt responses. Automatic sequences do not require attention, though they may attract it if training is appropriate, and they do not use up short-term capacity. They are learned following the earlier use of controlled processing that links the same nodes in sequence. In search, detection, and attention tasks, automatic detection develops when stimuli are consistently mapped to responses; then the targets develop the ability to attract attention and initiate responses automatically, immediately, and regardless of other inputs or memory load.

Controlled processing is a temporary activation of nodes in a sequence that is not yet learned. It is relatively easy to set up, modify, and utilize in new situations. It requires attention, uses up short-term capacity, and is often serial in nature. Controlled processing is used to facilitate long-term learning of all kinds, including automatic processing. In search, attention, and detection tasks, controlled processing usually takes the form of a serial comparison process at a limited rate. (pp. 51-52)

To summarize, attention may be voluntary or involuntary, focused on one or more sources of input or tasks either simultaneously or alternating serially, varied in level of intensity, and is of limited capacity. Sensory information may be brought into working memory when an individual chooses to attend to it, or when it simply catches his attention. While he is only able to attend to a limited amount of information, automaticity enables him to engage in multiple intellectual activities simultaneously.

The next step in the information processing model is perception. Through the process of perceiving, sensory input becomes meaningful. As we perceive, we recognize either familiar patterns or novel entities. Hoffding (1891) described pattern matching as the attachment of an input sensation to consciousness:

Whatever states and farther effects it may be able to call up afterwards, the first condition is that there shall be an instinctive recognition, in other words that the sensation shall have a point of attachment in consciousness. This point…then forms the starting-point of further operations. (p. 153)

Neisser (1967) elaborated on Hoffding’s view, calling it out as the missing step in theories of the association of ideas:

To say that the sight of bread gives rise to the idea of butter “by virtue of previous association,” as was (and is) so commonly assumed, is to miss a crucial step. The present sight of bread, as a stimulus or a perceptual process, is not generally associated with butter; only stored memories of bread are associated in this way. Hence we must assume that the present event is somehow identified as bread first, i.e., that it makes contact with “memory traces” of earlier experiences with bread. Only then can the preexisting association be used. Association cannot be effective without prior pattern recognition. (p. 50)

Pattern recognition models are generally grouped under three classes: (a) template matching, (b) prototypes, and (c) feature analysis (Leahey & Harris, 1997, p. 114). Template matching is perhaps the most intuitive of the three models. This model assumes that incoming sensory images are compared against previously stored mental copies that serve as exact-match templates. This model has been criticized for lack of plausibility, i.e., because of inefficient storage and search time to determine a match. Under the second model, the prototype model, it is assumed that only an abstracted general instance is stored rather than an enormously large number of exact-match templates. The third model, feature analysis, takes the approach that rather than templates or even prototypes, what is stored are specific perceptual features of sensory input. Feature profiles extracted from sensory input are matched with existing profiles in memory by identifying features that are shared in common. The larger the number of common features the more likely the match.

Working memory, “variously called short-term memory, short-term store, working memory, immediate memory active memory, or primary memory” (Eggen & Kauchak, 1999, p. 123), is limited in capacity and is subject to relatively rapid decay. One effective strategy that can be used to extend the amount of information held in working memory is chunking (Miller, 1956). Chunking is a process of recoding multiple bits of information into a meaningful representation that contains the same amount of information, but takes up fewer slots[8] in memory. Driscoll (2000, p. 89) gave the example of recoding seventeen individual letters (JFKFBIAIDSNASAMIT) into five meaningful acronyms (JFK, FBI, AIDS, NASA, and MIT), thereby reducing the number of chunks from seventeen to five. Leahey and Harris (1997) provided an example of using mental visualization to chunk three bits of information, the words rabbit, hat, and hamburger, into one bit of information in the form of a coherent image of “a rabbit wearing a baseball cap and chomping on a Big Mac” (p. 124). In Miller’s original study (1956), subjects chunked streams of binary digits into their decimal or octal equivalents (for example, 10 becomes 2, 1010 becomes 10, 0111 becomes 7, 101000 becomes 20, etc.). Automaticity is an important factor that determines the effectiveness of chunking. To produce a significant savings the re-encoding process itself needs to be automatic:

The recoding schemes increased their span for binary digits in every case. But the increase was not as large as we had expected on the basis of their span for octal digits. Since the discrepancy increased as the recoding ratio increased, we reasoned that the few minutes the subjects had spent learning the recoding schemes had not been sufficient. Apparently the translation from one code to the other must be almost automatic or the subject will lose part of the next group while he is trying to remember the translation of the last group. (Miller, 1956, p. 94)

The effort expended in learning the recoding scheme to a point of automaticity, however, may be well worth the investment, not just for parlor tricks, but for useful, everyday tasks:

It is a little dramatic to watch a person get 40 binary digits in a row and then repeat them back without error. However, if you think of this merely as a mnemonic trick for extending the memory span, you will miss the more important point that is implicit in nearly all such mnemonic devices. The point is that recoding is an extremely powerful weapon for increasing the amount of information that we can deal with. In one form or another we use recoding constantly in our daily behavior. (Miller, 1956, pp. 94-95)

Through chunking we are able to extend the capacity of our working memory. Through maintenance rehearsal we are able to prevent the rapid decay of information in working memory. Maintenance rehearsal is “the process of repeating information over and over, either aloud or mentally, without altering its form” (Eggen & Kauchak, 1999, p. 257). A household example of maintenance rehearsal is when we repeat a phone number silently or out loud until we are able to write it down or dial it.

Maintenance rehearsal may be adequate for information that need only be retained temporarily, but it is not the best strategy for more permanent encoding. For long-term encoding to be successful, information must become meaningful to the learner through organization and elaboration. Organization is “the process of clustering related items of content into categories or patterns” (Eggen & Kauchak, 1999, p. 259). Research evidence suggests that we naturally organize information as we encode it in long term memory. During a study in which subjects were asked to list items in specified categories Bousfield and Sedgewick (1944) discovered that responses tended to cluster according to various types of categorical and spatial contiguity—for example, “groups of domesticated animals, commonly exhibited species, and various zoological phyla” (p. 153). Although this phenomenon was noted in their report, clustering of categories was not the focus of their study. As a follow up, Bousfield (1953) later presented a technique for quantifying such clustering, the theoretical significance of which is “a means for obtaining additional information on the nature of organization as it operates in the higher mental processes” (p. 229). His experiment demonstrated that “subjects, when given, a list of randomly arranged items will in their recall show a greater-than-chance tendency to group the items in clusters containing members of the same general category” (p. 237). It has also been found that repetition plays an important role in the organization of new material, that “subjective organization increases with repeated exposures and recall of the material, and that there is a positive correlation between organization and performance” (Tulving, 1962, p. 352).[9] In other words, with repeated exposure comes greater organization, and with greater organization comes higher levels of performance.

Another important factor of encoding is the context in which it takes place. According to the principle of encoding specificity (Thomson & Tulving, 1970), the probability of later recall depends on the similarity of context during initial learning and the context during later recall. Context is defined by the material being learned, the mental set derived from the material, the environmental surroundings, and the mood or prevalent emotion and feeling of the individual during learning and retrieval (Leahey & Harris, 1997, pp. 147-149).

Craik and Lockhart (1972, as cited in Bower & Hilgard, 1981, p. 434) distinguished between maintenance rehearsal (“superficial recycling” of the material) and elaborative rehearsal. The organization and encoding of new information into long-term memory is facilitated by elaborative rehearsal. Elaborative rehearsal is “any form of rehearsal in which the to-be-remembered information is related to other information” (Bruning et al., 2004, p. 67). New information becomes meaningful when it is understood in the context of existing knowledge or when it becomes sufficiently well known so as to be familiar, easily recalled, and useful. Elaborative encoding strategies include mediation, imagery, mnemonics, guided questioning, and deep processing.

Mediation “involves tying difficult-to-remember information to something more meaningful” (Bruning et al., 2004, p. 67). For example, Montague, Adams, and Kiess found that subjects were better able to recall pairs of nonsense syllables such as RIS-KIR when they were tied to a natural language mediator such as race car (as cited in, Bruning et al., 2004, p. 68).

Imagery is the encoding of simple or complex information, such as the meaning of the German word das buch or the concept of abiogenesis, respectively. In the first case, a single mental picture of an old leather-bound bible with yellowed pages and the smell of old print will do nicely. In the second case, a more elaborate, complex, and detailed picture, or possibly an entire set of images might needed—such as one representing simple chemicals, one representing polymers, one representing the replication of polymers, one for the hypercycle, one for the protobiont, and finally, one for bacteria—all tied together by a summative mental image containing a directed graph with the words “simple chemicals,” “polymers,” “replicating polymers,” “hypercycle,” “protobiont,” and “bacteria” linked by arrows.

Mnemonics involve attaching new information to well-known information—in possibly a very artificial way[10]— to facilitate elaboration, chunking or retrieval from memory. Mnemonic techniques include the use of rhymes such as “i before e except after c;” sayings like “thirty days has September, April, June, and November;” gestures as in the “right-hand rule” in physics to demonstrate the flow of electrical current in a magnetic field; and imagery (Bruning et al., 2004, p. 69). Common mnemonic methods include (a) the peg method, in which “students memorize a series of ‘pegs’ on which to-be-learned information can be ‘hung;’” (b) the method of loci, in which new information is mentally attached to familiar or present locations; (c) the link method, in which an image is formed for each item in a list to be learned and the image is pictured as interacting with the next item on the list (e.g., one can remember to take the dog to the groomer, pick up a roasted chicken for Sunday dinner, mail a letter, and pick up some two-inch light fixture screws by imagining the dog chasing a chicken with a letter in its beak and wearing two-inch screws as earrings); (d) stories, constructed from a list of words to remember; (e) the first-letter method, using the first letters of words to be learned to construct acronyms; and (f) the keyword method, which involves identifying a keyword that sounds like the vocabulary word to be remembered and an imagery link in which the keyword is imagined to interact with the word to be remembered (e.g., a 6th grade student remembers the word “captivate” by selecting the keyword “cap,” and visualizing their Uncle Bob who tells fascinating stories and always wears a cap (Bruning et al., 2004, p. 73)). Leahey and Harris (1997) summarized the value of mnemonics in learning as follows:

Mnemonics take advantage of our natural information-processing tendency to impose structure and organization on material that we process. It is natural, not exceptional, to try to make meaningful something that does not have much meaning. The more links we can establish between the material we are trying to learn and information already in our long-term memory, the more potential avenues we have available to retrieve the information later. (p. 145)

Once encoded, information can become more permanently affixed in long-term memory through elaborative rehearsal. Methods for promoting elaborative rehearsal include schema activation, guided questioning, and deep processing (Bruning et al., 2004, pp. 67-77). Schema activation refers to the activation of students’ relevant existing knowledge prior to engaging in a learning activity. The fundamental assumption of methods based on the idea of schema activation is that “students at any age will have some relevant knowledge to which new information can be related” (p. 75). For example, in a pre-school lesson on living things the children might be asked to describe animals and plants they have observed or interacted with. They might also be asked to describe objects that don’t eat, breath, or move. These examples will then serve as anchoring points or exemplars of living vs. non-living objectives.

Another method to promote elaborative rehearsal is that of teacher, self, or peer guided questioning (Bruning et al., 2004, p. 76). Using this method, questions are asked and answered as new material is presented. King (1994) found that questions which prompted “comparing and contrasting, inferring cause and effect, noting strengths and weaknesses, evaluating ideas, explaining, and justifying” (p. 340) were especially effective.

Elaborative rehearsal is also promoted through deep processing. According to Craik and Lockhart (1972) the ease with which items are recalled from memory depends on what learners do as they encode new information. Information processed at only a superficial or surface level will be less well remembered than information that is analyzed or processed at a much deeper level.[11] Bruning, et al. (2004) gave an example to contrast shallow versus deep processing:

These two levels of processing may be seen in two common classroom assignments. In the first, students are asked to underline a set of vocabulary words in a brief essay. In the second, students are asked to read the same essay and be prepared to tell the class about it in their own words. If the students follow directions, the first assignment is a clear example of shallow processing; all they have to do is find the words in the essay and underline them. To perform this task, students do not have to think about the meaning of the essay and perhaps not even with the meaning of the words. Not surprisingly, if we tested these students for their understanding of the contents of the essay, the odds are they would remember relatively little.

In contrast, if the students who were asked to explain the essay to their classmates followed instructions, we would likely see a very different outcome. Putting an essay into one’s own words requires thinking about the meaning of the content. In so doing, the students would have had to carefully analyze and comprehend the material. If we were to surprise these students with a test measuring their understanding of the essay, they almost certainly would remember far more of its contents than could the group that underlined vocabulary words. (p. 77)

The third type of storage in the Atkinson and Shiffrin model (1968) is long-term memory. Long-term memory was distinguished from sensory and short-term memory as follows:

The last major component of our system is the long-term store. This store differs from the preceding ones in that information stored here does not decay and become lost in the same manner. All information eventually is completely lost from the sensory register and the short-term store, whereas information in the long-term store is relatively permanent (although it may be modified or rendered temporarily irretrievable as the result of other incoming information). Most experiments in the literature dealing with long-term store have been concerned with storage in the a-v-l mode [auditory-verbal-linguistic], but it is clear that there is long-term memory in each of the other sensory modalities, as demonstrated by an ability to recognize stimuli presented to these senses. There may even be information in the long-term store which is not classifiable into any of the sensory modalities, the prime example being temporal memory. (pp. 17-18)

In regards to the form of long-term memory, or how information is stored, Atkinson and Shiffrin (1968) generally accepted the prevailing view that memories are stored as a trace, in varying sense modalities, and of varying strength (pp. 25-30). The strength of the trace depends on the amount of rehearsal, which both increases “the length of stay in short-term store (during which time a trace is built up in LTS)” and gives “coding and other storage processes time to operate” (p. 35).

While the sensory trace model of memory gives an account for basic memory representation, others have been interested in accounting for relationships between knowledge units in memory and have developed network models, feature comparison models, connectionist and parallel distributed processing models, and dual systems models. The earliest, and best known hierarchical network model is the semantic network model of Collins and Quillian (1969). In this model, the structure of memory is a branching network of subordinate and superordinate nodes, with each node representing a concept such as bird. Attributes of the node are stored in the node itself and apply to all nodes below it in the hierarchy. For example, has wings, applies to both CANARY and OSTRICH, but not to ANIMAL or FISH.

Collins and Quillian provided experimental support for the semantic network model by testing the speed at which subjects were able to verify sentences such as “ canary is yellow” or “A canary is an animal.” To verify the first sentence only one level of dereferencing is required (i.e., having located canary, the subject need only search first order attribute connections to verify the truth of the statement). To test the second, two levels of dereferencing are required, verifying first that a canary is a bird, and second that a bird is an animal. Collins and Quillian confirmed the predicted timing of such operations, but also received some unexpected results. Subject were able to identify that a canary was a bird more quickly than they were able to identify that a penguin was a bird, suggesting that the model did not account for typicality of concepts. The more typical, or perhaps the more familiar, the concept, the more quickly statements about the concept are able to be verified.

The inconsistency in access times for typical vs. non-typical concepts was addressed by feature comparison models of memory that assume concepts in memory are not stored in hierarchical networks but instead represented by sets of defining features. Building on Meyer’s (1970) two-stage model for comparing attributes to verify statements of the form “ALL S ARE P” and “SOME S ARE P,” Smith, Shoben and Rips (1974) presented a feature model for making semantic decisions. Using Meyer’s model, the statement All robins are birds would be evaluated as follows:

Verification of the proposition A robin is a bird is based on a comparison process that determines whether every attribute of bird is also an attribute of robin. Note that this model determines semantic classification correctly only if the attributes associated with bird are necessary and sufficient criteria for category membership. Other attributes which we characteristically associate with bird (e.g., that they can fly) cannot be included in this set of attributes, for otherwise a proposition like A penguin is a bird will be incorrectly disconfirmed…. In the first stage of Meyer’s model, a set is retrieved which contains the names of all categories that have some members in common with the category of the predicate noun (bird in the previous example). Then this set of intersecting categories is searched for the subject noun…. If the subject noun is found in this set of intersecting categories, the second-stage attribute comparison is executed. Thus, structurally, the model represents each lexical item in two different ways—by the names of the item’s intersecting categories and by the item’s defining attributes—while processing involves comparison mechanisms that operate on this information. (E. E. Smith et al., 1974, p. 215)

In the model proposed by Smith et al., semantic information is represented only by semantic features (rather than both nouns and attribute sets). Their model also considered both characteristic and defining features, and proposed a two-stage comparison process in which the second stage will only be executed if the result of the first stage—a measure of similarity calculated by comparing all features, both characteristic and defining—simultaneously falls below a predetermined level of confidence, above which the statement could be classified as true, and above a second predetermined level of confidence, below which the statement could be classified as false. In the second stage only defining features of the category (e.g. bird) are compared with features of the instance (e.g. robin) and a positive response can be made if: “(a) each defining dimension of the category is also a defining dimension of the instances, and (b) the particular values (features) on these dimensions which the instance possesses are within the range of allowable values for the category” (p. 223).

A third type of memory model is the connectionist or parallel distributed processing model. This model is also a type of network model, but it is not tied to linguistic concepts. In the connectionist network model of memory, each node is a generic processing unit, and the form of the representation in memory is “not a passive recording or data structure but rather a pattern of activation, either excitory or inhibitory, and cognitive processing occurs through the propagation of activation patterns” (Leahey & Harris, 1997, pp. 140-141). Knowledge is not encoded in a particular location in the brain, but is made up of connection strengths between nodes. Learning is the process of strengthening or weakening these connections. The connectionist model is a direct model of the neural interconnections in the brain and is the predominately accepted view of memory at present.

In addition to the question of how information is stored in long-term memory, cognitive scientists have also been interested in what is stored. Although the network models cited above are based primarily on a verbal encoding of information, proponents of dual-code systems assume that information may be simultaneously encoded in both verbal and image forms. Paivio (2006) cites empirical evidence suggesting that it is much easier to encode concrete information that is easily represented as an image than it is to encode it in verbal form. He suggested that “a substrate of nonverbal representations and imagery derived from the child’s observations and behaviors related to concrete objects and events” (p. 7) is the foundation for all ensuing cognitive development. Investigators of human memory have also proposed other distinctions between different types of memory, such as acoustic versus verbal, and episodic versus semantic (Bower & Hilgard, 1981). Although the debate is far from settled, and the true nature of long term memory remains a mystery, based on the various accounts of empirical evidence already cited it does seem that information may be encoded in different forms.

Another question regarding long-term memory concerns the process of retrieval. Leahey and Harris (1997, pp. 141-144) listed two types of retrieval: recognition and recall. In a recognition task, the subject is required only to identify an item as familiar or not. A positive recognition depends on three factors: (a) the strength of the memory, (b) the match of the presented item to the memory trace, and (c) the criterion level set by the subject (how sure the subject needs to feel in order to give a positive response). The stronger the memory trace, and the closer the match of the presented item to the stored memory, the more likely an item will be identified as familiar. The criterion level set by the subject, however, may override the other two when the consequences of a false positive are severe—such as when a “yes” decision may result in sending an innocent man to prison. A variation on the general recognition task is the forced-choice recognition task. In this variation—most commonly experienced in a multiple choice test—the subject must choose from one among several possible answers. The difficulty of the task depends greatly upon the alternative answers that the correct answer is embedded within.

Since recognition is possible with a very low strength memory trace, it is generally not considered to be as valid a measure of learning as a recall task. In a free recall task retrieval is entirely unaided and the subject must remember the desired information without any clues or hints. It is generally accepted that when information is retrieved from long-term memory under these circumstances the memory trace is fairly strong, and that the subject really knows the information. A variation is the cued recall task. In a cued recall task the desired information is prompted or teased out with a hint. It is somewhat similar to a recognition task, in that it is not as demanding, but is still held to be a more valid estimate of how well something is known, since the provided cue is removed one or more links from the information to be recalled. Because of this, cued recall can be an effective mode of learning to promote encoding as links are established with existing information (cf. Skinner’s concept of vanishing (1986, p. 107)).

A final question about long-term memory is how and why information is forgotten. Forgetting, is generally attributed to one of four primary causes (Driscoll, 2000, pp. 104-106; Leahey & Harris, 1997, pp. 149-150). They are

  1. Failure to encode – the material was never encoded properly to begin with, meaning there may not have been enough elaborative rehearsal or active rehearsal strategies to transfer it to long-term memory.
  2. Retrieval failure – the material has been encoded and stored but has not been properly indexed and therefore cannot be retrieved (e.g., like a misplaced book on library shelves)[12] or alternatively, in the Freudian tradition, the memory has been repressed, and hidden from oneself through the erection of defensive mental barriers.
  3. Decay – over time the memory trace loses strength.[13]
  4. Interference – “other events or information get in the way of effective retrieval” (Driscoll, 2000, pp. 104-105). Interference may be retroactive, in which material learned later interferes with the recall of material learned earlier, or proactive, in which previous learning interferes with later learning (e.g. the type of interference a long-time tennis player deals with when trying to learn racquetball).

[1] Atkinson first met Shiffrin in 1964 while serving on the faculty at Standford. At the time he was working on mathematical models of memory, using a computer-controlled system to conduct experiments. Their collaboration began when Shiffrin’s graduate advisor, Gordon Bower, left on sabbatical and asked Atkinson to take over as Shiffrin’s research advisor.

[2] The review of the Information Processing Model given here is based primarily on the overviews by Bruner, Goodnow and Austin (1972), Driscoll (2000), Eggen and Kauchak (1999; 2007), and Leahey and Harris (1997).

[3] Eggen and Kauchak (1999, p. 256) use the term “perception,” defining it as “the process by which people attach meaning to their experiences.” Leahey and Harris (1997, p. 113) emphasize “pattern recognition” and define it as the process by which we “recognize environmental stimuli as exemplars of concepts already in memory.”

[4] Kahneman’s Attention and Effort (1973) provides an excellent review of the study of attention.

[5] Berlyne introduced the term “collative properties” in 1960 to describe “the effects of comparisons among elements which are presented either simultaneously or in succession” (Cupchik & Berlyne, 1979, p. 94). Collative properties refer to the structural relations between individual elements in a stimulus. For example, in a work of art the individual elements might be the “colour, texture, and medium of a visual work, the tones of a musical passage, or the words and rhyme of a poem” (p. 93). The collative properties would include “the degree of similarity or difference among the elements and intercorrelations among them” (p. 93).

[6] “Autochthonous Gestalt: A perceptual pattern induced by internal factors rather than by external stimulus” (Corsini, 1999, p. 83). “Autochthonous idea (E. Bleuler) A thought that originates within the mind, usually from an unconscious source, yet appears to arise independently of the person’s stream of consciousness, such as fantasies, dreams, delusions, inspirations, insights as well as repetitious thoughts of obsessive-compulsive individuals.

[7] Neisser (1967) referred to “responses that can also become preattentive with sufficient practice” as “automatisms” (p. 101).

[8] From Leahey and Harris (1997, p. 123):


The metaphor of a series of slots (called the rehearsal buffer) in working memory has been useful. As the central executive processes new material and/or old material retrieved from long term memory, these slots are filled one-by-one until there is no remaining space in the rehearsal buffer. Once this happens, in order for new material to be added to working memory, something currently there must be “bumped,” either through forgetting or encoding and transfer to long-term memory, out of immediate consciousness. This slot metaphor reflects the intuitive feeling we often have that our mind is so full that there is no room for even one more piece of information without something already there being pushed out.

[9] Tulving’s study is an extension of Bousfield’s work, taking it “from on trial recall and experimentally organized materials…to a learning situation, with several successive trials, and experimentally unorganized materials” (Tulving, 1962, p. 352).

[10] Mnemonics attachment is often, perhaps typically, made where there is no intrinsic relation between the new information and what it is being attached to. For example, using the method of loci one might mentally place items to be purchased at the grocery store in various locations around one’s home. Although you would not normally find an oversized stick of butter melting on the keys of your piano and running down into the cracks, this appalling visual image is very memorable.

[11] Craik and Tulving (1975) later indicated that elaboration or spread might be a better descriptive term than depth.


[12] Leahey and Harris (1997) make note that “the selection of an appropriate retrieval cue is critical to the retrieval process,” and that “what is an effective retrieval cue for one person might be ineffective to another person trying to retrieve the same information.

[13] Because this is difficult to unequivocally demonstrate (Leahey & Harris, 1997, p. 150), forgetting is more popularly attributed to failure to encode, retrieval failure, or interference. For a particularly strong argument against Thorndike’s law of disuse—the original hypothesis behind the concept of forgetting as a process of decay—see McGeoch’s Learning and the Law of Disuse (1932).

Leave a Comment