Реферат на тему:

Linguistics and Other Fields

Crystal D. Linguistics. Second ed.

Penguin Book, 1990. – pp. 256-267.

The main merit of research over the past few years is that people now
have a much clearer idea as to what the important questions of
linguistic theory are: over the next few years, we may go some way
towards solving some of them. It should be clear from this attitude,
then, that those who clamour for applications of linguistics — myself
included — are not likely to be satisfied for a while. Too much of the
subject is in an unformulated state to be able to be applied in any
useful way to the study of some other field — though, as we shall see,
some restricted areas have come to be fairly well investigated and
introduced. The absence of any complete grammar of English (which has
been the most analysed of all languages) is one of the most obvious
limitations of the applicability of linguistics at the present time. The
presence of so much fundamental theoretical disagreement, which has to
be gone into before one can adopt a particular ‘applied’ line, is
another. However, it would be wrong to criticize linguistics for failing
to come up to expectations, or for being too negative (in its criticisms
of earlier work), or for being too complicated and abstract — such
criticisms are not uncommon. The negative flavour of early linguistics
was, as we have seen, an essential preliminary to the development of a
more constructive and open-minded state of mind on the part of language
scholars. Understanding the weaknesses of early accounts of language
helped them to reach an understanding of the fact that it was complex,
and to appreciate the nature and extent of its complexity. It was this
awareness which promoted the careful analysis of data and the
development of the necessary (albeit abstract) distinctions of
phonetics, morphology, and the other levels. It is in fact this very
complexity which is the reason why linguistics has not developed further
than it has. It would be perfectly possible for any competent linguist
to sit down and write a linguistic grammar of English, in the light of
available knowledge, for the purpose of language teaching; but it is
unlikely that it would be a wholly satisfying job. There is still too
much dispute about the theoretical principles on which such a grammar
should be based, too much dispute over terminology, and too much
uncertainty over the facts of the language, to produce a sound,
comprehensible and comprehensive grammar. And bearing in mind that
linguistics has been with us such a short time, this inadequacy is
perhaps not surprising. A great deal has nonetheless been achieved.

Awareness of this inadequacy has not of course stopped people from
trying to write such grammars; nor should it. The more attempts there
are to formulate adequate grammars for particular applications in
teaching and elsewhere, the more quickly the difficulties will be
appreciated, and the sooner they will be overcome. What is important is
that the potential users of these books should not make premature
demands for their production (rushed research is regretted research),
and that the authors of these books — or their publishers — should not
make premature claims for their product. This prematurity can be
possible in two ways. First, a linguistic introduction to the structure
of English, let us say, can be premature in the sense that the kind of
model in which it presents its rules and facts has been outdated by new
ideas about the nature of the model, or about the formalization of the
rules, or even about the nature of the facts (e.g. new statistical
information about usage having become available). This has often
happened, particularly in generative grammar, where the development of
ideas has been so rapid that a grammar book is liable to find itself
dismissed as old-hat by linguists, even when it is hot off the press.
Naturally, teachers who are trying to get to grips with generative
grammar are disturbed by this reaction; but they should not be, if they
appreciate the inevitable movement in the progression of scientific
theory. They should use a grammar book, for the time being, not as an
authoritative account of linguistic structure, that has to be taught to
the letter; but as a set of suggestions about ways of looking at
language which they are likely to find illuminating and applicable to
specific problems. This can be done even though there is a likelihood of
further developments in the subject which will make some of the specific
features of the approach redundant. This critical attitude is also
helpful, I believe, in that it helps to reduce the difficulties inherent
in the second cause of prematurity mentioned above, namely, that not
enough is known about the psychological and other demands linguistics
makes upon the student, or about the methodological difficulties
involved in grading linguistic material for presentation pedagogically.
One book may be suitable for pedagogical context A (e.g. language
teaching to immigrants from the West Indies), but not for context B
(e.g. language teaching to immigrants from India and Pakistan).
Teachers, however, who are eclectic in their use of linguistic material,
who build up, in a personal but informed way, their own ‘theory’ of
language and their own description of English, bearing in mind the
specific needs of the situation in which they are working, are likely to
avoid the more serious of these pragmatic difficulties. This of course
is what many teachers already try to do, if they are in the unfortunate
position of not having an applied linguistics research project trying to
do the job for them (and there are more and more such projects producing
materials in a variety of fields these days). It is good to see an
increasing number of centres in Britain and the United States organizing
courses, conferences, in-service training, and the like, in order to try
to bridge the gap between theory/research and pedagogy, and to develop a
positive and selective state of mind of this kind.

For such a gap does exist, and there is no point in trying to deny
it. There is a considerable gap in this book, for instance, between the
practical claims and suggestions which show the potential applicability
of the subject. There might almost seem to be two subjects involved, the
study of language, on the one hand, and the study of linguistics, on the
other — and there are those who make this distinction in their work. But
ultimately there is and can be no such distinction: whether or not we
commit ourselves to the detail of a specific linguistic approach, when
we commence the study of language, on no matter how small a scale, we
are necessarily committed to the demands for clarity, consistency and
accuracy, which it is the ultimate purpose of linguistic study to
fulfil. As soon as we ask ourselves how we are using terms, as soon as
we impose a certain grading or selection on material, we are committing
ourselves to a particular linguistic view of the world. Whether we
realize it is another matter. Naturally, one hopes that intelligent
people will take pains to realize what they are doing — linguists
included. But developing this awareness of principles of analysis is at
once to do linguistics. There is no natural gap between theory and
practice in language study; but there is a very real psychological and
practical gap, due to the apparent complexity of many linguistic ideas,
and the lack of time and material for people outside the subject to get
into it. Indeed, the bridging of this gap is the whole purpose of the
present book.

But there is another way in which this gap can be bridged, through
the development of the relationship between linguistics and other fields
of study. A cardinal principle underlying the whole linguistic approach
is that language is not an isolated phenomenon; it is a part of society,
and a part of ourselves. It is a distinctive feature of human nature
(some, who talk of ‘homo loquens’, say it is the distinctive feature);
and it is a prerequisite -or so it would appear — for the development of
any society or social group. […] it enters into a very large number of
specialized fields. Consequently, it is not possible to study language,
using the methods of linguistics or any other, without to some extent
studying — or at least presupposing the study of — other aspects of
society, behaviour, and experience. The way in which linguistics
overlaps in its subject-matter with other academic studies has become
well appreciated over the last few years, and in the past decade we have
seen the development of quite distinct interdisciplinary subjects, such
as sociolinguistics, psycholinguistics, philosophical linguistics,
biological linguistics, and mathematical linguistics. These, as their
titles suggest, refer to aspects of language which are relevant and
susceptible to study from two points of view (sociology and linguistics,
psychology and linguistics and so on), and which thus require awareness
and development of concepts and techniques derived from both. And as
many of the points of contact refer to issues which are obviously of
everyday concern, these marginal branches of the subject stand a much
better chance of avoiding the charges of irrelevance levelled at its
‘purer’ aspects. This can be seen by looking briefly at the kind of
topic covered by the two most important branches to have developed so
far, sociolinguistics and psycholinguistics.

Sociolinguistics studies the ways in which language interacts with
society. It is the study of the way in which language’s structure
changes in response to its different social functions, and the
definition of what these functions are. ‘Society’ here is used in its
broadest sense, to cover a spectrum of phenomena to do with race,
nationality, more restricted regional, social and political groups, and
the interactions of individuals within groups. Different labels have
sometimes been applied to various parts of this spectrum.
‘Ethno-linguistics’ is sometimes distinguished from the rest, referring
to the linguistic correlates and problems of ethnic groups — illustrated
at a practical level by the linguistic consequences of immigration;
there is a language side to race relations, as anyone working in this
field is all too readily aware. The term ‘anthropological linguistics’
is sometimes distinguished from ‘sociological linguistics’, depending on
one’s particular views as to the validity or otherwise of a distinction
between anthropology and sociology in the first place (e.g. the former
studying primitive cultures, the latter studying more ‘advanced’
political units). Usage of British and American scholars differs
considerably in this respect. ‘Stylistics’ is another label which is
sometimes distinguished, referring to the study of the distinctive
linguistic characteristics of smaller social groupings (such as those
due to occupational or class differences). More usually, however,
stylistics refers to the study of the literary expression of a
community, using linguistic methods. None of these labels has any
absolute basis: the subject-matter of ethnolinguistics gradually merges
into that of anthropological linguistics, that into sociological
linguistics, and that into stylistics, and the subject-matter of social
psychology. The kinds of problem which turn up are many and various, and
some have been illustrated in Chapter i, which was very much concerned
with the role of language in society. They include: the problems of
communities which develop a standard language, and the reactions of
minority groups to this (as in Belgium, India, or Wales); the problems
of people who have to be educated to a linguistic level where they can
cope with the demands of a variety of social situations; the problems of
communication which exist between nations or groups using a different
language, which affects their ‘world-view’; the problems caused by
linguistic change in response to social factors; the problems caused
(and solved) by bilingualism or multilingualism; the problems caused by
the need for individuals to interact with others in specific linguistic
ways (language as an index of intimacy or distance, of solidarity, of
prestige or power, of pathology, and so on). I am not arguing that
sociolinguistics by itself can solve problems such as these; but it can
identify precisely what the problems are (this is sometimes a major task
in itself), and obtain information about the particular manifestation of
a problem in a given area, so that possible solutions can thereby be

One thing is clear. There is little chance of solving any of these
problems until certain basic principles about the relationship of
language to society have been established, and accurate techniques of
study developed. And so far, there are many basic issues about which
there is much controversy — for example, the extent to which our social
background determines our linguistic abilities, or the rationale on
which multilingual individuals use their different languages for
different social purposes. There are of course innumerable facts to be
discovered, even about a language as well investigated as English,
concerning the nature of the different kinds of English we use in
different situations — when we are talking to equals, superiors or
subordinates; when we are ‘on the job’; when we are old or young, upper
class or lower class, male or female; when we are trying to persuade,
inform or bargain; and so on. An informal definition of sociolinguistics
highlights this concern to get even the most elementary of descriptive
information down on paper: ‘Who can say what, how, using what means, to
whom, when, and why?’ If we knew all these factors, we would know a
great deal about social problems. These days sociolinguistics has
progressed far in accumulating its own data in order to answer these

To analyse a problem sociolinguistically implies being able to
analyse it linguistically. Sociolinguistics makes use of the findings of
linguistic theory and description in its work; and in one sense its
success is dependent on success in ‘pure linguistics’. On the other
hand, the nature of its subject-matter means that there will arise a
great deal which will be both theoretically and methodologically novel —
explanatory constructs of one kind or another which are not constructs
of either linguistics or sociology, but a derivative of both. One
example of this is the notion of ‘interference’, that is, linguistic
disturbance which results from two languages (or dialects) coming into
contact in a specific situation. The problem of interference is not
something which linguistics, or any other subject, on its own, could
handle. There has been some debate as to whether the existence of
uniquely sociolinguistic problems of this kind requires the
establishment of a quite independent discipline, with a theoretical
identity and methodology of its own, or whether the dependence on
linguistics in its general sense is so fundamental that such a prospect
is impossible. This is an issue which will doubtless continue to be
discussed for some time. Meanwhile, it is the case that for practical
purposes (as in teaching linguistics) most courses would not make a
clear-cut distinction, but would consider the study of sociolinguistics
to be an essential part of the explanation of the subject as a whole.

An even stronger link is argued these days for my second example of
interdisciplinary overlap, psycholinguistics. The relation of
linguistics to psychology has been the source of some heated discussion
of late, largely due to Chomsky’s particular emphasis on this question.
His view of linguistics, as outlined for instance in his book Language
and Mind, is that the most important contribution linguistics can make
is to the study of the human mind; and that linguistics is accordingly
best seen as a branch of cognitive psychology. This is not an altogether
surprising thing in view of the mentalistic claims of parts of his
theory (cf. p. 103) and his particular views on the nature of language
acquisition in children. But it is an extreme view, which most linguists
at the present time do not share. On the other hand, no one would want
to deny the existence of strong mutual bonds of interest operating
between psychology and linguistics. The extent to which language
mediates or structures thinking, the extent to which talk about language
‘simplicity’ or ‘complexity’ can be given any meaningful psychological
basis, the extent to which language is influenced by and itself
influences such things as memory, attention, recall and constraints on
perception, and the extent to which language has a central role to play
in the understanding of human development are broad illustrations of
such bonds.

Psycholinguistics as a distinct area of interest developed in the
early sixties, and in its early form covered the psychological
implications of an extremely broad area, from acoustic phonetics to
language pathology. Nowadays, certain areas of language and linguistic
theory tend to be concentrated on by those who call themselves
psycholinguists, and most of them have been influenced by the
development of generative theory. The most important area is the
investigation of the acquisition of language by children. Here, there
have been many studies of both a theoretical and a descriptive kind. The
descriptive need is prompted by the fact that until recently hardly
anything was known about the actual facts of language acquisition in
children, in particular about the order in which grammatical structures
were acquired. Even elementary questions such as when and how children
develop their ability to ask questions syntactically, or when they learn
the inflectional systems of their language, went unanswered. And a great
deal of work has gone on recently into the methodological and
descriptive problems involved in obtaining and analysing information of
this kind.

The theoretical questions have focused on the issue of how we can
account for the phenomenon of language development in children at all.
Normal children have mastered most of the structure of their language by
the age of five. The generative approach argued against the earlier
behaviourist assumptions that it was possible to explain language
development largely in terms of imitation and selective reinforcement.
It asserted that it was impossible to explain the rapidity or the
complexity of language development solely in terms of children imitating
the language used by the people around them. And as a result of the
arguments supporting this assertion, it would now be generally agreed
that imitation alone is not enough. Imitation is an important factor in
the development of language (cf. p. 46), but it cannot be the major one,
and thus the basis of any theory of language acquisition, because there
is too much of central importance in language which is not amenable to
direct observation, and thus not imitatable — the various
meaning-relations between sentences or parts of sentences, for instance,
or, more generally, the abstract knowledge of the grammatical rules of
their language which adults have as part of their competence. All normal
children come to develop this abstract knowledge for themselves; and the
generative approach argues that such a process is only explicable if one
postulates that certain features of this competence are present in the
brains of children right from the beginning. In other words, what is
being claimed is that children’s brains contain certain innate
characteristics which ‘pre-structure’ them in the direction of language
learning. To enable these innate features to develop into adult
competence, children must be exposed to human language, i.e. they must
be stimulated in order to respond. But the basis on which they develop
their linguistic abilities is not describable in behaviourist terms.

What we have here, then, is a hypothesis about the nature of
language acquisition. So far, it has not been tested in any convincing
way (and it may not be possible to test it, in the usual sense); but it
has provoked a great deal of speculation. In particular, it raises the
question of how far the innate features could be identified with the
primitive meaning-relations of grammatical theory — that is, the
linguistic universals talked about at the end of Chapter 4. Are all
children born with an ability to discriminate ‘subjects’ from ‘objects’,
let us say, in some sense? How many such basic relations might one
plausibly ascribe to the child? And how specific is its innateness?
Clearly, it is not possible to suggest that the child has any features
of a particular language innate, for instance a particular feature of
English syntax which does not occur in French or German. To suggest this
would be tantamount to saying that children of any race would find it
easier to learn English than to learn other languages (that is, their
brains would predispose them towards English); and all available
evidence points to the implausibility of this conclusion. A Zulu child
learns Zulu just as rapidly as an English child learns English, it
seems. No, the innate features must be sufficiently general,
sufficiently ‘deep’, to be capable of equally readily underlying the
structure of any language. And on this point, the identity of interests
between linguistic and psycholinguistic theory (at least, in this field)
should be clear. There have of course been a number of objections raised
to the innateness hypothesis — for example, on the grounds that what is
innate is not so much deep structural information, but rather learning
principles of a more general kind. Some people would like to see what
would happen if the hypothesis were formulated in terms other than those
provided by Chomsky’s later work. As someone put it once, ‘Why should we
see the child as if it were born with a copy of Aspects of the Theory of
Syntax tucked inside its head!’ Unkind, perhaps; for without Aspects,
and the work which followed it, many interesting questions might never
have been raised. The issue, however, is by no means determined.

In the 1980s, the interest in the innateness hypothesis has been
largely replaced by a focus on the relationship between language
development and a child’s cognitive skills, following on the influential
work of Jean Piaget and other psychologists. There has been renewed
interest in the strategies which children use in acquiring language, and
the significance of such topics as imitation has come to be reconsidered
in this light. Above all, there has been a concern to study the factors
which characterize children’s learning environment — in particular, the
nature of the input language they receive from mothers and other
caretakers (motherese). The Journal of Child Language, which commenced
publication in 1975, is now the best source of information on current
trends in the subject. Its contributors span the disciplines of
psychology and linguistics, and their work illustrates a wide range of
experimental and naturalistic approaches to the subject. Without doubt,
the field of language acquisition remains one of the most intriguing
areas of linguistics study, at the present time, and one which will
certainly remain in the forefront of linguists’ attention over the next
few years.

There are many other applications of linguistics in fields not so
far mentioned, which tend to be grouped together anonymously as ‘applied
linguistics’. Foreign language teaching and learning is the major
application, as suggested in Chapter 1; but there is also native
language teaching, translation (either individually, or using machines),
the many facets of telecommunications, lexicography . . . The list could
go on for some time. Each of these fields selects its basic information
and theoretical framework from the overall perspective which linguistics
provides, and applies it to the clarification of some general area of
human experience. And it is surely the many branches of applied
linguistics that will ultimately provide the main link between Chapters
1 and 4, if such a link be needed. But, as always, we must remember that
an application is but the tip of a theoretical iceberg: many hours of
research and discussion, much of it highly specialized, abstract, and
quite unpractical, will have taken place in order to provide the basic
knowledge which can be implemented in a specific application. Indeed, in
many cases it is only through the illuminating models developed in
linguistic theory, and the demonstration of a coherent system underlying
apparently disorganized data, that applications and approaches to a
problem have been thought of at all.

[…] ‘What does it matter’, such queries run, ‘whether the basic
phonological unit is the phoneme or the distinctive feature? or whether
the morpheme concept fits all cases? or whether there is a boundary-line
between syntax and semantics?’ If these questions are still being asked,
then the arguments underlying my Chapter 3 about the scientific aims of
linguistics have not been appreciated. The kinds of distinction drawn
there are essential if we hope to build up a general theory of language;
we have to appreciate the kinds of reasoning relevant to this task, even
if we do not always agree with the conclusions reached. If we are
adopting a rational approach to our study of (or interest in) language,
then we cannot just blindly analyse and describe in a random, arbitrary
way. Whatever our purpose, whether ‘pure’ or ‘applied’, we must know why
we are doing what we are doing, if we hope to be clear and consistent
and wish to convince others (or even ourselves) of its validity. It does
matter about these questions, and many others like them, because the
answers constitute our world-view of language. Choosing to work with
distinctive features is one choice we make, along with many others,
which ultimately builds up a coherent and self-consistent picture of
language structure that intuitively satisfies us. We sit back and say,
‘Yes, that makes sense.’ To a certain extent, then, our final decisions
about which concepts to work with are a matter of taste. But the more we
understand the relative merits and demerits of the various theories,
descriptions and procedures which the subject provides, the more likely
we will be to reach a view of language that is reasonable and
convincing, as well as personally satisfying.

Corpus Linguistics

Kennedy G. An Introduction to Corpus Linguistics.

London and New York: Addison Wesley

Longman Limited, 1998. – pp. 1-12.


In the language sciences a corpus is a body of written text or
transcribed speech which can serve as a basis for linguistic analysis
and description. Over the last three decades the compilation and
analysis of corpora stored in computerized databases has led to a new
scholarly enterprise known as corpus linguistics. The purpose of this
book is to introduce the various activities which come within the scope
of corpus linguistics, and to set current work within its historical
context. It brings together some of the findings of corpus-based studies
of English, the language which has so far received the most attention
from corpus linguists, and shows how quantitative analysis can
contribute to linguistic description. It is hoped that, by concentrating
in particular on some of the results of corpus analysis, the book will
whet the appetites of the growing body of teachers and students with
access to corpora to discover more for themselves about how languages
work in all their variety. The book is intended primarily for those who
are already familiar with general linguistic concepts but who want to
know more of what can be done with a corpus and why corpus linguistics
may be relevant in research on language. Corpus linguistics is not an
end in itself but is one source of evidence for improving descriptions
of the structure and use of languages, and for various applications,
including the processing of natural language by machine and
understanding how to learn or teach a language.

The main focus of this book is on four major areas of activity in
corpus linguistics:

• corpus design and development

• corpus-based descriptions of aspects of English structure and use

• the particular techniques and tools used in corpus analysis

• applications of corpus-based linguistic description

Readers may choose to work through the book in the above order or
to begin with the sections dealing with corpus-based descriptions of
English in order first to become more familiar with some of the results
of corpus analysis. In focusing on the contribution of corpus
linguistics to the description of English and on some of the central
issues and problems which are being addressed within corpus linguistics,
the book also attempts to bring together disparate work which is often
hard to get hold of. However, such is the speed of development and
change in corpus linguistics at the present time that anyone writing
about it must be conscious that it would be easy to produce a Ptolemaic
picture of the field — with the world distorted and with Terra Australis
Incognita, the Great Southern Continent, both misconceived and
misplaced. Work relevant for corpus linguistics is being done in many
fields, including computer science and artificial intelligence, as well
as in various branches of descriptive and applied linguistics. It would
not be surprising if some of the scholars contributing to corpus
linguistics from these and other perspectives found that their work is
inadequately represented here. However, they can be assured that such
neglect is not intended.

Because corpus linguistics is a field where activity is increasing
very rapidly and where there is as yet no magisterial perspective, even
the very notion of what constitutes a valid corpus can still be
controversial. It also needs to be understood at the outset that not
every use of computers with bodies of text is part of corpus
linguistics. For example, the aim of Project Gutenberg to distribute
10,000 texts to 100 million computer users by the year 2001 is not in
itself part of corpus linguistics although texts included in this
ambitious project may conceivably provide textual data for corpus
analysis. Similarly, contemporary reviews of computing in the humanities
show the enormous extent of corpus-based work in literary studies. While
some of the methodology used in literary studies resembles some of the
activity being undertaken in corpus linguistics, research on authorial
attribution or thematic structure, for example, does not come within the
scope of this book. Nor does the book attempt to cover systematically
the wide range of corpus-based work being undertaken in computational
linguistics in such areas of natural language processing as speech
recognition and machine translation.

Although there have been spectacular advances in the development
and use of electronic corpora, the essential nature of text-based
linguistic studies has not necessarily changed as much as is sometimes
suggested. In this book, reference is made to corpus studies which were
undertaken manually before computers were available. Corpus linguistics
did not begin with the development of computers but there is no doubt
that computers have given corpus linguistics a huge boost by reducing
much of the drudgery of text-based linguistic description and vastly
increasing the size of the databases used for analysis. It should be
made clear, however, that corpus linguistics is not a mindless process
of automatic language description. Linguists use corpora to answer
questions and solve problems. Some of the most revealing insights on
language and language use have come from a blend of manual and computer
analysis. It is now possible for researchers with access to a personal
computer and off-the-shelf software to do linguistic analysis using a
corpus, and to discover facts about a language which have never been
noticed or written about previously. The most important skill is not to
be able to program a computer or even to manipulate available software
(which, in any case, is increasingly user-friendly). Rather, it is to be
able to ask insightful questions which address real issues and problems
in theoretical, descriptive and applied language studies. Many of the
key problems and challenges in corpus linguistics are associated with
the following questions:

• How can we best exploit the opportunities which arise from having
texts stored in machine-retrievable form?

• What linguistic theories will best help structure corpus-based

• What linguistic phenomena should we look for?

• What applications can make use of the insights and improved
descriptions of languages which come out of this research?

In answering these and other questions corpus linguistics has potential
to provide solutions and new directions to some of the major issues and
problems in the study of human communication.


The definition of a corpus as a collection of texts in an electronic
database can beg many questions for there are many different kinds of
corpora. Some dictionary definitions suggest that corpora necessarily
consist of structured collections of text specifically compiled for
linguistic analysis, that they are large or that they attempt to be
representative of a language as a whole. This is not necessarily so. Not
all corpora which can be used for linguistic research were originally
compiled for that purpose. Historically it is not even the case that
corpora are necessarily stored electronically so that they can be
machine-readable, although this is nowadays the norm. […] electronic
corpora can consist of whole texts or collections of whole texts. They
can consist of continuous text samples taken from whole texts; they can
even be made up of collections of citations. At one extreme an
electronic dictionary may serve as a kind of corpus for certain types of
linguistic research while at the other extreme a huge unstructured
archive of texts may be used for similar purposes by corpus linguists.

Corpora have been compiled for many different purposes, which in turn
influence the design, size and nature of the individual corpus. Some
current corpora intended for linguistic research have been designed for
general descriptive purposes — that is, they have been designed so that
they can be examined or trawled to answer questions at various
linguistic levels on the prosody, lexis, grammar, discourse patterns or
pragmatics of the language. Other corpora have been designed for
specialized purposes such as discovering which words and word meanings
should be included in a learners’ dictionary; which words or meanings
are most frequently used by workers in the oil industry or economics; or
what differences there are between uses of a language in different
geographical, social, historical or work-related contexts.

A distinction is sometimes made between a corpus and a text archive
or text database. Whereas a corpus designed for linguistic analysis is
normally a systematic, planned and structured compilation of text, an
archive is a text repository, often huge and opportunistically
collected, and normally not structured. It is generally the case, as
Leech (1991:11) suggested, that ‘the difference between an archive and a
corpus must be that the latter is designed or required for a particular
«representative» function’. It is nevertheless not always easy to see
unequivocally what a corpus is representing, in terms of language

Databases which are made up not of samples, but which constitute an
entire population of data, may consist of a single book (e.g. George
Eliot’s Middlemarch) or of a number of works. These corpora may be the
work of a single author (e.g. the complete works of Jane Austen) or of
several authors (e.g. medieval lyrics), or all the editions of a
particular newspaper in a given year. Some projects have assembled all
the known available texts in a particular genre or from a particular
historical period. Some of these databases or text archives described in
Section 2.4 are very large indeed, and although they have rarely yet
been used as corpora for linguistic research, there is no reason why
they should not be in the future. In many respects it is thus the use to
which the body of textual material is put, rather than its design
features, which define what a corpus is.

A corpus constitutes an empirical basis not only for identifying
the elements and structural patterns which make up the systems we use in
a language, but also for mapping out our use of these systems. A corpus
can be analysed and compared with other corpora or parts of corpora to
study variation. Most importantly, it can be analysed distributionally
to show how often particular phonological, lexical, grammatical,
discoursal or pragmatic features occur, and also where they occur.

In the early 1980s it was possible to list on a few fingers the
main electronic corpora which a small band of devotees had put together
over the previous two decades for linguistic research. These corpora
were available to researchers on a non-profit basis, and were initially
available for processing only on mainframe computers. The development of
more powerful microcomputers from the mid-1970s and the advent of CD-ROM
in the 1980s made corpus-based research more accessible to a much wider
range of participants.

By the 1990s there were many corpus-making projects in various
parts of the world. Lancashire (1991) shows the huge range of corpora,
archives and other electronic databases available or being compiled for
a wide variety of purposes. Some of the largest corpus projects have
been undertaken for commercial purposes, by dictionary publishers.’
Other projects in corpus compilation or analysis are on a smaller scale,
and do not necessarily become well known. Undertaken as part of graduate
theses or undergraduate projects, they enabled students to gain original
insights into the structure and use of language.

The role of computers in corpus linguistics

The analysis of huge bodies of text «by hand’ can be prone to error and
is not always exhaustive or easily replicable. Although manual analysis
has made an important contribution over the centuries, especially in
lexicography, it was the availability of digital computers from the
middle of the 20th century which brought about a radical change in
text-based scholarship. Rather than initiating corpus research,
developments in information technology changed the way we work with
corpora. Instead of using index cards and dictionary ‘slips’,
lexicographers and grammarians could use computers to store huge amounts
of text and retrieve particular words, phrases or whole chunks of text
in context, quickly and exhaustively, on their screens. Furthermore the
linguistic items could be sorted in many different ways, for example,
taking account of the items they collocate with and their typical
grammatical behaviour.

Corpus linguistics is thus now inextricably linked to the computer,
which has introduced incredible speed, total accountability, accurate
replicability, statistical reliability and the ability to handle huge
amounts of data. With modern software, computer-based corpora are easily
accessible, greatly reducing the drudgery and sheer bureaucracy of
dealing with the increasingly large amounts of data used for compiling
dictionaries and other information sources. In addition to greatly
increased reliability in such basic tasks as searching, counting and
sorting linguistic items, computers can show accurately the probability
of occurrence of linguistic items in text. They have thus facilitated
the development of mathematical bases for automatic natural language
processing, and brought to linguistic studies a high degree of accuracy
of measurement which is important in all science. Computers have
permitted linguists to work with a large variety of texts and thus to
seek generalizations about language and language use which can go beyond
particular texts or the intuitions of particular linguists. The
quantification of language use through corpus-based studies has led to
scientifically interesting generalizations and has helped renew or
strengthen links between linguistic description and various
applications. Machine translation, text-to-speech synthesis, content
analysis and language teaching have been among the beneficiaries.

Some idea of the changes which the computer has made possible in
text studies can be gauged from a report in an early issue of the ALLC
Bulletin, the forerunner of the journal Literary and Linguistic
Computing. A brief report by Govindankutty (1973) on the coming of the
computer to Dravidian linguistics captures the moment of transition
between manual and electronic databases. The text he was working with of
300,000 words is small by today’s standards, but what took the
researcher and his long-suffering colleagues nearly six years of data
management and analysis could, 20 years later, be carried out in

It took nearly six years’ hard labour and the co-operation
of colleagues and students to complete the Index of Kamparamayanam, the
longest middle Tamil text, in the Kerala University under the
supervision of Professor V. I. Subramoniam. The text consists of nearly
12,500 stanzas and each stanza has four lines; each line has an average
of six words. All the words and some of the suffixes were listed on
small cards by the late Mr. T. Velaven who is the architect of this
voluminous index. Later, the cards were sorted into alphabetical order
and each item was again arranged according to the ascending order of the
stanza and line. Finally, each entry was checked with the text and the
meaning and grammatical category were noted. The completed index
consists of about 3,500 typed pages (28 x 20 cm).

While indexing, some suffixes such as case were listed
separately. This posed some problems when I started to work on the
grammar of the language of the text. When it was necessary to find out
after what kind of words and after which phonemes and morphemes the
alternants of a suffix occur, it became necessary again to go through
all the entries. Though I have tried to work out the frequency of all
the suffixes, for want of time it was not completely possible. However,
the frequency study helped to unearth different strata in the linguistic
excavation and indirectly emphasized that it is a sine qiui non, at
least, for such a descriptive and historical study.

Though it took a lot of time, energy and patience, the
birth of an index brought with it an unknown optimism in the grammatical
description. After completing the index and the grammatical study of
Kamparamayanam, three months ago I started indexing Ramacaritam, an
early Malayalam text, using small cards. This project is being carried
out in the Leiden University with the guidance of Professor F. B. J.
Kuiper. While I was half my way through the indexing, Dr. B. J. Hoff of
the Linguistics Department informed me of the work done in the Institute
for Dutch Lexicology with the help of a computer. When I discussed the
problems with Dr. F. de Tollenaere, who is the head of this institute,
he outlined with great enthusiasm how a computer can be utilized for
this purpose. Immediately, I started transcribing the text and now it is
being punched on paper tape, using an AREA paper tape punch at the
Institute. This paper tape punch, having an extra shift, has twice the
eighty-eight standard possibilities, which results in one hundred and
seventy-six different punching codes, which for the computer has the
value of one hundred and seventy-six characters. Moreover, a coding
system makes it possible to have up to two hundred and seven
possibilities, which are also available at the output stage, as the
Institute has at its disposal a print train with two hundred and seven

To a present-day corpus linguist, even the laborious data entry by
punched paper seems quaintly archaic, and Govindankutty’s task could now
be undertaken on a personal computer accessed directly through a

Until the mid-1980s corpus linguistics typically involved mainframe
computing and was largely associated with universities having access to
large machines. In the 1970s, with shared access to a standard
mainframe, it could take an hour or more to make a concordance
consisting of all the instances of a word such as when in a
one-million-word corpus. By the late 1980s, the time taken to run such a
program had been reduced to minutes. In the 1990s, the same job can be
done just as quickly on the faster personal computers running at 60 or
more megahertz. Hard disk drives of 500 megabytes or more on personal
computers and input from a CD-ROM are now common, thus facilitating
storage and rapid analysis.

In the early 1980s a captive computer scientist or friendly
computer programmer was almost indispensable to assist many aspiring
corpus linguists to cope with inevitable technical problems associated
with data management and the programming skills necessary for corpus
analysis. By the 1990s, improvements in personal computers of the kind
already mentioned, and the availability of commercial software packages
designed for corpus analysis, have meant that most corpus linguists can
now concentrate not on how to program and use a computer but on problems
and issues in linguistics which can be addressed through a corpus.

The scope of corpus linguistics

Corpus linguistics is based on bodies of text as the domain of study and
as the source of evidence for linguistic description and argumentation.
It has also come to embody methodologies for linguistic description in
which quantification of the distribution of linguistic items is part of
the research activity. As Leech (1992:107) has noted, the focus of study
is on performance rather than competence, and on observation of language
in use leading to theory rather than vice versa.

It would be misleading, however, to suggest that corpus linguistics
is a theory of language in competition with other theories of language
such as transformational grammar, or even more that it is a new or
separate branch of linguistics. Linguists have always needed sources of
evidence for theories about the nature, elements, structure and
functions of language, and as a basis for stating what is possible in a
language. At various times, such evidence has come from intuition or
introspection, from experimentation or elicitation, and from
descriptions based on observations of occurrence in spoken or written
texts. In the case of corpus-based research, the evidence is derived
directly from texts. In this sense corpus linguistics differs from
approaches to language which depend on introspection for evidence. In
his celebrated work, Coral Gardens and their Magic, Malinowski (1935: 9)
wrote about the paradigm shift which he considered was necessary in the
linguistics of the day.

The neglect of the obvious has often been fatal to the development of
scientific thought. The false conception of language as a means of
transfusing ideas from the head of the speaker to that of the listener
has, in my opinion largely vitiated the philological approach to
language. The view set forth here is not merely academic: it compels us,
as we shall see, to correlate other activities, to interpret the meaning
— text; and this means a new departure in the handling of linguistic
evidence. It will also force us to define meaning in terms of experience
and situation.

Linguists may not see the necessity for such a sea change today.
However, it is the case that corpus linguists often have different
concerns from many other linguists. Corpus linguists are concerned
typically not only with what words, structures or uses are possible in a
language but also with what is probable — what is likely to occur in
language use. The use of a corpus as a source of evidence however is not
necessarily incompatible with any linguistic theory, and progress in the
language sciences as a whole is likely to benefit from a judicious use
of evidence from various sources: texts, introspection, elicitation or
other types of experimentation as appropriate. Any scientific enterprise
must be empirical in the sense that it has to be supported or falsified
on evidence and, in the final analysis, statements made about language
have to stand up to the evidence of language use. The evidence can be
based on the introspective judgment of speakers of the language or on a
corpus of text. The difference lies in the richness of the evidence and
the confidence we can have in the generalizability of that evidence, in
its validity and reliability. The boundaries, therefore, between
corpus-based description and argumentation and other approaches to
language description are not rigid, and linguists of varied theoretical
persuasions now use corpora for evidence which is complementary to
evidence obtained from other sources.

Corpus linguistics, like all linguistics, is concerned primarily
with the description and explanation of the nature, structure and use of
language and languages and with particular matters such as language
acquisition, variation and change. Corpus linguistics has nevertheless
developed something of a life of its own within linguistics, with a
tendency sometimes to focus on lexis and lexical grammar rather than
pure syntax. This is partly a result of using methodologies such as
concordancing where the contextual evidence available in a single line
of wide-carriage computer printout of 130 characters is sometimes too
limited for the analysis of syntax or discourse.

Work in corpus linguistics is currently associated with several
quite different activities. Scholars working in the field tend to be
identified with one or more of them. The first group of researchers
consists of corpus makers or compilers. These scholars are concerned
with the design and compilation of corpora, the collection of texts and
their preparation and storage for later analysis.

A second group of researchers has been concerned with developing
tools for the analysis of corpora. Important contributions to software
development especially for the syntactic analysis of corpora have been
associated particularly but not exclusively with researchers in
computational linguistics. These researchers have been concerned with
the use of corpora to develop, among other things, algorithms for
natural language processing and the modelling of linguistic theories.

A third group of researchers consists of descriptive linguists
whose main concern has been to make use of computerized corpora to
describe reliably the lexicon and grammar of languages, both of the
linguistic systems we use and our likely use of those systems. It is the
probabilistic aspect of corpus-based descriptive linguistic studies
which especially distinguishes them from conventional descriptive
fieldwork in linguistics or lexicography. That is, corpus-based
descriptive linguistics is concerned not only with what is said or
written, where, when and by whom, but how often particular forms are
used. The measurement of the distribution of words and grammar has
encouraged new ways of studying the linguistic basis of variation in
text types, language change and regional and other varieties of
language. The corpus provides contexts for the study of meaning in use
and, by making available techniques for extracting linguistic
information from texts on a scale previously undreamed of, it
facilitates linguistic investigations where empiricism is text based.

A fourth area of activity, which has been among the most innovative
outcomes of the corpus revolution, has been the exploitation of
corpus-based linguistic description for use in a variety of applications
such as language learning and teaching, and natural language processing
by machine, including speech recognition and translation.

At the present time in corpus linguistics, some researchers tend to
focus on issues in corpus design, others on methods for text analysis
and processing, and still others, probably the majority, on corpus-based
linguistic description and the application of such descriptions.

Although the scope of corpus linguistics may be defined in terms of
what people do with corpora, it would be a mistake to assume that corpus
linguistics is simply a faster way of describing how a language works,
or is about the nature of linguistic evidence. Analysis of a corpus by
means of standard corpus linguistic research software can and frequently
does reveal facts about a language which we might never previously have
thought of seeking. Altenberg’s (1991a) study of amplifier collocations
in English, for example, raised questions about semantic

classes of maximizers and boosters such as perfectly or awfully which
probably would not have been asked without the evidence of a corpus. He
found for example that frequent maximizers such as quite tend to
collocate with non-scalar words (quite obviously) while absolutely has a
greater tendency than other maximisers to collocate with negatives
(absolutely not). The major shift in methodology associated with corpus
linguistics comes not from theory but rather from what the use of
corpora makes possible.

As we have seen, corpus linguistics goes beyond the use of corpora
as a source of evidence in linguistic description. It also revives and
carries on a concern of some linguists with the statistical distribution
of linguistic items in the context of use. From the 1920s there was,
especially in the United States and the United Kingdom/a tradition of
word counting in texts in order to discover the most frequent, and
arguably therefore the most pedagogically useful, words and grammatical
structures for language teaching purposes.

From the 1930s, Prague School linguists undertook quantitative
studies (mainly of Czech, English and Russian) of the frequency of
certain grammatical processes, the relative frequencies of different
parts of speech, the location and distribution of information in the
sentence, and the statistical distribution of syllable types and
structures. Some of this work was directed towards comparative stylistic
analysis (e.g. Kramsky, 1972) and some towards quantitative comparisons
of varieties of English (e.g. Duskova, 1977). Such Prague School
quantitative studies, which were carried out manually, differ from
modern computer corpus-based studies particularly in the size of the
corpora and in their representativeness. Duskova, for example, studied
10,000 finite verb forms from 10 plays to draw conclusions about the
functions and use of the preterite and the perfect in British and
American English, but it is not clear why these 10 plays were chosen as
representative of contemporary English. Nevertheless, the Prague School
focus on quantitative studies was commendable at a time when orthodox
linguistics eschewed them. Other quantitative studies were directed
towards discovering the ‘statistical laws’ of text.

The work of the American philologist George Zipf, from the 1930s,
was concerned with such quantitative analyses as the relation between
the frequency of words in text and text length, the frequency of words
and their antiquity, and the relation between the rank order of an item
in a word frequency list and the number of occurrences or tokens of that
item in a text. Zipf (1949) sets out his famous ‘law’ which held that
the relationship between the frequency of use of a word in a text and
the rank order of that word in a frequency list is a constant (f.r=c).

As noted above, the earliest computerized corpora compiled for
linguistic research from the 1960s required the use of mainframe
computers, and researchers frequently had to design their own software
for analysis. Initial interest was often in lexis, including word
counts, but it was quickly apparent that a computer corpus facilitated
the study of permissible or likely word sequences or collocations (are
we more likely to write different from, different to or different than?)
and grammatical and stylistic characteristics of particular authors and
genres. There was a particular interest in what characterized
‘scientific style’, ‘newspaper style’ and ‘literary or imaginative

With a corpus stored in a computer, it is easy to find, sort and
count items, either as a basis for linguistic description or for
addressing language-related issues and problems. It is not surprising,
therefore, that a wide range of research activities have come to be
within the scope of corpus linguistics. Analyses can contribute to the
making of dictionaries, word lists, descriptive grammars, diachronic and
synchronic comparative studies of speech varieties, and to stylistic,
pedagogical and other applications. With appropriate software it is easy
to study the distribution of phonemes, letters, punctuation,
inflectional and derivational morphemes, words (as variously defined),
collocations, instances of particular word classes, syntactic patterns,
or discourse structures. Recent work at Birmingham University described
by Renouf (1993) shows how new words and new uses can be identified in
corpora at the time these words enter journalistic use.

The scope and current concerns of a field of scholarship can
sometimes be seen or defined through the topics which make up conference
programmes and the content of specialist journals. In the 1990s the
topics which appear on conference programmes and in journals which cover
corpus linguistics include improved ways of annotating corpora, the
tagging of parts of speech and the senses of polysemous word forms,
improved automatic parsing, identification of collocations,
phraseological units and discourse structure, text categorization,
research methodology in the face of more and bigger corpora, and the
application of this work in lexicography, syntactic description,
translation, speech and handwriting recognition, and language teaching.
Educational applications are increasingly on the agenda. At Lancaster
University in 1994 and 1996 the pedagogical significance of electronic
corpora was the subject of conferences on the teaching of linguistics
and the teaching of languages.

In March 1993, a Georgetown University Round Table meeting in
Washington, DC, on corpus-based linguistics identified the following
topics as those in particular need of investigation and dissemination at
a time when linguistics was returning to more text-based approaches to

• the design and development of text-speech corpora

• tools for searching and processing on-line corpora

• critical assessments of on-line corpora and corpus-processing tools

• methodological issues in corpus-based analysis

• applications and results in linguistics and related disciplines,
including language teaching, computational linguistics, historical
linguistics, discourse analysis and stylistic analysis

The scope of computer corpus-based scholarship can also be measured
by some of its achievements. In lexicography the revision of the Oxford
English Dictionary, its publication in electronic form on CD-ROM and the
publication of new learners’ dictionaries of English by other major
publishers were all based on corpora. The completion of the
100-million-word British National Corpus in 1994 set a new standard in
corpus design and compilation. Another important international standard
set in corpus preparation and formatting has been in the gradual
adoption of the Standard Generalized Markup Language (SGML) through the
Text Encoding Initiative (TEI) (see Section 2.6.5). In the analysis of
corpora there have been improvements in the accuracy of the automatic
grammatical tagging and parsing of texts. There has also been a
substantial and rapidly growing amount of descriptive detail on the
elements and structure of languages (particularly English) arising from
corpus-based research.

Current issues

Widdowson H.G. Linguistics. – Oxford

University Press, 1996. – pp. 69-77.

Linguistics, like language itself, is dynamic and therefore subject to
change. It would lose its validity otherwise, for like all areas of
intellectual enquiry, it is continually questioning established ideas
and questing after new insights. That is what enquiry means. Its very
nature implies a degree of instability. So although there is, in
linguistics, a reasonably secure conceptual common ground, which this
book has sought to map out, there is, beyond that, a variety of
different competing theories, different visions and revisions,
disagreements and disputes, about what the scope and purpose of the
discipline should be. There are three related issues which are
particularly prominent in current debate. One has to do with the very
definition of the discipline and takes us back to the question of
idealization. Another issue concerns the nature of linguistic data and
has come into prominence with the development of computer programs for
the analysis of large corpora of language. A third issue raises the
question of accountability and the extent to which linguistic enquiry
should be made relevant to the practical problems of everyday life.

The scope of linguistics

[…] linguistics has traditionally been based on an idealization which
abstracts the formal properties of the language code from the contextual
circumstances of actual instances of use, seeking to identify some
relatively stable linguistic knowledge (langue, or competence) which
underlies the vast variety of linguistic behaviour (parole, or
performance). It was also pointed out that there are two reasons for
idealizing to such a degree of abstraction. One has to do with practical
feasibility: it is convenient to idealize in this way because the
actuality of language behaviour is too elusive to capture by any
significant generalization. But the other reason has to do with
theoretical validity, and it is this which motivates Chomsky’s
competence-performance distinction. The position here is that the data
of actual behaviour are disregarded not because they are elusive but
because they are of little real theoretical interest: they do not
provide reliable evidence for the essential nature of human language.
Over recent years, this formalist definition of the scope of linguistics
has been challenged with respect to both feasibility and validity.

As far as feasibility is concerned, it has been demonstrated that
the data of behaviour are not so resistant to systematic account as they
were made out to be. There are two aspects of behaviour. One is
psychological and concerns how linguistic knowledge is organized for
access and what the accessing processes might be in both the acquisition
and use of language. This has been a subject of enquiry in
psycholinguistics. The second aspect of behaviour is sociological. This
accessing of linguistic knowledge is prompted by some communicative
need, some social context which calls for an appropriate use of
language. These conditions for appropriateness can be specified, as
indeed was demonstrated in part in the discussion of pragmatics. The
account of the relationship between linguistic code and social context
is the business of sociolinguistics.

Psycholinguistic work on accessing processes and socio-linguistic
work on appropriateness conditions have demonstrated that there are
aspects of behaviour that can be systematically studied, and that
rigorous enquiry does not depend on the high degree of abstraction
proposed in formalist linguistics. In other words, psycholinguistics and
sociolinguistics have things to say about language which are also within
the legitimate scope of the discipline. Such a point of view would be a
tolerant and neighbourly one: we stake out different areas of language
study, each with its own legitimacy.

But the challenge to the formalist approach in respect to validity
is quite different. It is not tolerant and neighbourly at all, but a
matter of competing claims for the same territory. It is not just an
issue of delimitation but of definition, and proposes a functionalist
one in opposition to a formalist one. The argument here is that it
diminishes the very study of language to reduce it to abstract forms
because to do so is to eliminate from consideration just about
everything that is really significant about it and to make it hopelessly
remote from people’s actual experience. Language, the argument goes, is
not essentially a static and well-defined cognitive construct but a mode
of communication which is intrinsically dynamic and unstable. Its forms
are of significance only so far as we can associate them with their
communicative functions. On this account, the only valid linguistics is
functional linguistics.

But, as was indicated in Chapter z, there are two senses in which
linguistic forms can be said to be associated with functions, and
therefore two ways of defining functional linguistics. Firstly, we can
consider how the linguistic code has developed in response to the uses
to which it is put. In this sense, functional linguistics is the study
of how the formal properties of language are informed by the functions
it serves, how it encodes perceptions of reality, ways of thinking,
cultural values, and so on.

Secondly, we can think of the form-function association as a matter
not of encoded meaning potential but of its actual realization in
communication; and here we are concerned with the way language forms
function pragmatically in different contexts of use. In this case
formalist linguistics is challenged not because it defines the language
code too narrowly without regard to the social factors which have formed
it, but because it defines language only in reference to the code,
without regard to how it is put to use in communication. The argument
here is that linguistics should extend its scope to account not only for
the knowledge of the internalized language of the code, or linguistic
competence, but for the knowledge people have of how this is
appropriately acted upon, or communicative competence.

These two senses of functional linguistics are frequently confused,
and there has sometimes been a tendency to suppose that if you define
the code in reference to the communicative functions that have
influenced its formation over time, then it follows that you will
automatically be accounting for the way in which the code functions in
communication here and now. But to do this is to equate the semantic
potential of the code with actual pragmatic realizations of it in

Functional linguistics, in both senses, considers language as an
essentially social phenomenon, designed for communication. There is no
interest in what makes human language a species-specific endowment, in
those universal features of language which might provide evidence of
innateness which were described in Chapter 1. The concerns of functional
linguistics are closer in this respect to the reality of language as
people experience it, and it is therefore often seen as more likely than
formal linguistics to be applicable to the problems of everyday life.
Opponents might argue that this is only achieved at the expense of
theoretical rigour. This raises the general question of how far
relevance and accountability are valid considerations in linguistic
enquiry, and this will be taken up again a little later. It also raises
the question of what the source of linguistic data should be, and it is
to this matter that we now turn.

The data of linguistics

There are, broadly speaking, three sources of linguistic data we can
draw upon to infer facts about language. We can, to begin with, use
introspection, appealing to our own intuitive competence as the data
source. This is a tradition in linguistics of long standing, and
essentially makes operational Saussure’s concept of langue as common
knowledge, imprinted in the mind like a book of which all members of the
community have identical copies. So if linguists want data, as
representative members of a language community they have only to consult
the copy in their head. Most grammars and dictionaries until recent
times have been based on this assumption that linguistic description can
be drawn from the linguist’s introspection. And it is not only
linguistic competence which is accessible to introspection, but
communicative competence as well, so the argument is that the
conventions that define appropriate language use can also be drawn from
the same intuitive source.

If, however, there is some reason to doubt the representative
nature of such intuitive sampling, there is a second way of getting at
data, namely by elicitation. In this case, you use other members of the
community as informants, drawing on their intuitions. And again, this
might be directed at obtaining the data of the code or its communicative
use. Thus, you might ask informants whether a particular combination of
linguistic elements are grammatically possible in their language, or
what would be an appropriate expression given a particular context.

Introspection and elicitation can be used to establish both the
formal properties of a language and how they typically function in use.
But in both cases the data is abstract knowledge, and not actual
behaviour. They reveal what people know about what they do but not what
they actually do. If you want data of that kind, the data of performance
rather than competence, you need to turn to observation.

The development of computer technology over recent years has made
observation possible on a vast scale. Programs have been devised within
corpus linguistics to collect and analyse large corpora of actually
occurring language, both written and spoken, and this analysis reveals
facts about the frequency and co-occurrence of lexical and grammatical
items which are not intuitively accessible by introspection or

It would seem on the face of it that this is a much more reliable
source of data. It is surely better to find out what people actually do
than depend on intuitions which are often uncertain and contradictory.
Claims have indeed been made that these large-scale observations reveal
patterns of attested usage which call for a complete revision of the
existing categories of linguistic description, which are generally based
on intuition and elicitation. Corpus linguistics, in dealing with actual
behaviour, clearly has an affinity with functional linguistics in that
it too claims to get closer to the facts of ‘real’ language.

There is no doubt that corpus analysis can reveal facts of usage,
the data of actual linguistic performance, which throw doubt on the
validity of any model of language based on the idea of a stable and
well-defined system. The elaborate picture it presents is very different
from the abstract painting proposed by the formal linguist. If language
use is indeed a rule-governed activity, as is often said, the rules are
not easy to discern in the detail. And it is also true that this detail
is not accessible to introspection or elicitation. Even a limited corpus
analysis can show patterns of occurrence of which language users, the
very producers of the data, are unaware. Corpus linguistics transcends
intuitive knowledge and in this respect can be seen as a valuable, and
valid, corrective to unfounded abstraction: a case of description
influencing theory for once, rather than the other way round.

But the claims of corpus linguistics can be questioned too. The
facts of usage revealed by computer analysis, for example, carry no
guarantee of absolute truth. The intuitions that people have about their
language have their own validity as data. These conceptual constructs
are also real, but the reality is of a different order.

One example of this is the way lexical knowledge (in some areas of
vocabulary at least) seems to be organized semantically in terms of
prototypes, and these cannot be observed, but only elicited. Thus, when
a group of English-speaking informants were asked to give the first
example that came to mind of a more inclusive category of things they
showed a striking unanimity. The word ‘bird’ elicited ‘robin’ (rather
than, say, ‘chaffinch’ or ‘wren’) and the word ‘vegetable’ elicited
‘pea’ (rather than, say, ‘parnsip’ or ‘potato’). For these informants,
then, a robin is the prototypical bird, a pea the prototypical
vegetable. But this conceptual preference does not correspond with how
frequently these words actually occur in a corpus. The same point can be
made about grammatical structures. If English-speaking informants are
asked to provide examples of a sentence, they are likely to come up with
simple subject-verb-object (SVO) constructions (‘The man opened the
door’; ‘John kissed Mary’). These, we might say, are prototypical
English sentences. But they are unlikely to figure very frequently in a
corpus of actual usage. Since people do not use simple sentences like
this very often, they do not have much reality as observed data, but
they may have a significant psychological reality nevertheless. They may
be evidence of competence which is not reflected in the facts of

Prototypes thus elicited do not, of course, invalidate the observed
data of corpus linguistics. They provide a different kind of data which
are evidence of competence which is not directly projected into
performance. Intuitive, elicited, and observed data all have their own
validity, but this validity depends on what kind of evidence you are
looking for, on what aspects of language knowledge or behaviour you are
seeking to explain. If you are looking for evidence of the internal
relationship between language and the mind, you are more likely to
favour intuition and elicitation. If you are looking for evidence of how
language sets up external links with society, then you are more likely
to look to the observed data of actual occurrence. The validity of
different kinds of linguistic data is not absolute but relative: one
kind is no more ‘real’ than another. It depends on what you claim the
data are evidence of, and what you are trying to explain.

The relevance of linguistics

From questions of validity we turn now to questions of utility. What is
linguistics for? What good is it to anybody? What practical uses can it
be put to? One response to such questions is, of course, to deny the
presupposition that it needs any practical justification at all. Like
other disciplines, linguistics is an intellectual enquiry, a quest for
explanation, and that is sufficient justification in itself.
Understanding does not have to be accountable to practical utility,
particularly when it concerns the nature of language, which, as was
indicated in Chapter i, is so essential and distinctive a feature of the
human species.

Whether or not linguistics should be accountable, it has been
turned to practical account. Indeed, one important impetus for the
development of linguistics in the first part of this century was the
dedicated work done in translating the Bible into languages hitherto
unwritten and undescribed. This practical task implied a prior exercise
in descriptive linguistics, since it involved the analysis of the
languages (through elicitation and observation) into which the
scriptures were to be rendered. And this necessarily called for a
continual reconsideration of established linguistic categories to ensure
that they were relevant to languages other than those, like English,
upon which they were originally based. The practical tasks of
description and translation inevitably raised issues of wider
theoretical import.

They raise other issues as well about the relationship between
theory and practice and the role of the linguist, issues which are of
current relevance in other areas of enquiry, and which bear upon the
relationship between descriptive and applied linguistics.

The process of translation involves the interpretation of a text
encoded in one language and the rendering of it into another text which,
though necessarily different in form, is, as far as possible, equivalent
in meaning. In so far as it raises questions about the differences
between language codes it can be seen as an exercise in contrastive
analysis. In so far as it raises questions about the meaning of
particular texts, particular communicative uses of the codes, it can be
seen as an exercise in discourse analysis. Both of these areas of
enquiry have laid claim to practical relevance and so to be the business
of applied linguistics.

With regard to contrastive analysis, one obvious area of
application is language teaching. After all, second language learning,
like translation, has to do with working out relationships between one
language and another: the first language (L1) you know and the second
language (L2) you do not. It seems self-evident that the points of
difference between the two codes will constitute areas of difficulty for
learners and that a contrastive analysis will therefore be of service in
the design of a teaching programme.

It turns out, however, that the findings of such analysis cannot be
directly applied in this way. Although learners do undoubtedly refer the
second language they are learning (L2) to their own mother tongue (L1),
in effect using translation as a strategy for learning, they do not do
so in any regular or predictable manner. Linguistic difference is not a
reliable measure of learning difficulty. The data of actual learner
performance, as established by error analysis, call for an alternative
theoretical explanation.

One possibility is that learners conform to a pre-programmed
cognitive agenda and so acquire features of language in a particular
order of acquisition. In this way they proceed through different interim
stages of an interlanguage which is unique to the acquisition process
itself. Enquiry into this possibility in Second Language Acquisition
(SLA) research has been extensive.

There is another possibility. It might be that the categories of
description typically used in contrastive analysis are not sufficiently
sensitive to record certain aspects of learner language. Learners may be
influenced by features of their L1 experience other than the most
obvious forms of the code. Contrastive analysis has been mainly
concerned with syntactic structure, but this is only one aspect of
language, and one which, furthermore, inter-relates with others in
complex ways. So it may be that the learners’ difficulties do correspond
to differences between their L1 and L2, but that we need a more
sophisticated theory to discern what the differences are, a theory which
takes a more comprehensive view of the nature of language by taking
discourse into account.

Discourse analysis is potentially relevant to the problems of
language pedagogy in two other ways. Firstly, it can provide a means of
describing the eventual goal of learning, the ability to communicate,
and so to cope with the conventions of use associated with certain
discourses, written or spoken. Secondly, it can provide the means of
describing the contexts which are set up in classrooms to induce the
process of learning. In this case it can provide a basis for classroom

But the relevance of discourse analysis is not confined to language
teaching. It can be used to investigate how language is used to sustain
social institutions and manipulate opinion; how it is used in the
expression of ideology and the exercise of power. Such investigations in
critical discourse analysis seek to raise awareness of the social
significance and the political implications of language use. Discourse
analysis can also be directed to developing awareness of the
significance of linguistic features in the interpretation of literary
texts, the particular concern of stylistics.

In these and other cases, descriptive linguistics becomes applied
linguistics to the extent that the descriptions can be shown to be
relevant to an understanding of practical concerns associated with
language use and learning. These concerns may take the form of quite
specific problems: how to design a literacy programme, for example, or
how to interpret linguistic evidence in a court of law (the concern of
the growing field of forensic linguistics).





Y|Y›YUTHuTH a&aEada9a:a?aqarawa?a?aKaeLae




2trol of power. Language is too important a human resource for its
understanding to be kept confined to linguists. Language is so
implicated in human life that we need to be as fully aware of it as
possible, for otherwise we remain in ignorance of what constitutes our
essential humanity.

Похожие записи