Introduction to APiCS (Atlas volume)

1. The nature of this atlas

The Atlas of Pidgin and Creole Language Structures (APiCS) brings together the expertise of 88 language experts to provide a systematic comparison of key structural features of 76 creoles, pidgins and mixed languages in the areas of syntax, semantics, morphology, lexicon and phonology. It is accompanied by the three-volume Survey of Pidgin and Creole Languages, written by the same team of authors and editors.

To be able to address general questions about the nature and origin of contact languages, a broad comparative perspective is crucial. The languages need to be compared with their source languages (lexifiers, substrates, as well as other languages that have played a role in their history), but they also need to be compared with each other.

Since 2005, comparative linguists have had an important resource for the world-wide comparison of language structures, the World Atlas of Language Structures (WALS, Haspelmath et al. 2005, online since 2008 at At about the time when WALS appeared, just after the “Conference on creole language structure between substrates and superstrates” (June 2005 in Leipzig), some creolists suggested that there should be a similar work dealing specifically with creole and pidgin languages. We took on this task in 2006, but we chose a different approach for gathering the data: While the maps on structural features in WALS were put together by comparative linguists who gathered the relevant information from grammatical descriptions of the languages, we adopted a consortium approach for APiCS. We invited experts on 76 languages and varieties to collaborate with us, supplying data on a detailed questionnaire of 120 features that we drew up, to serve as a basis for an atlas similar to WALS. In addition, we asked every author to write a chapter for the accompanying Survey of Pidgin and Creole Languages. 1

The language experts have thus made two contributions to the overall project: They provided a “structure dataset” in response to our questionnaire, with detailed exemplification and comments, and they wrote a chapter for the Survey. The datasets are published together online, under, by the Max Planck Institute for Evolutionary Anthropology. Although there is no printed version of the datasets, they should be considered full-fledged scholarly publications, and cited as follows:

Mufwene, Salikoko. 2013. Kikongo-Kituba structure dataset. In: Michaelis, Susanne Maria & Maurer, Philippe & Haspelmath, Martin & Huber, Magnus (eds.), Atlas of Pidgin and Creole Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http:/

2. The APiCS Consortium

The <Atlas contains 130 world maps of structural linguistic features showing 76 languages. The database underlying the maps is joint work of the four editors and the 76 authors or author teams for the 76 languages (88 authors in total). Each map is accompanied by a chapter written by one (or sometimes two) of the editors that explains and illustrates the feature and draws some conclusions. The input of the APiCS contributors was so crucial that we decided that they should be considered as coauthors of these chapters. In practical terms, listing all 88 authors with each chapter is impossible, so we adopted a convention from the natural sciences, where a team of scientists from different institutions who are working towards a common goal is called a “consortium”. These consortia can then be authors of research papers (e.g. Chimpanzee Sequencing and Analysis Consortium 2005). Thus, we call our 88 contributors the “APiCS Consortium” (listed in Table 1). This allows us to abbreviate the author designation for each chapter: “Philippe Maurer and the APiCS Consortium” means that there are actually 88 authors: Philippe Maurer (the editor who wrote the chapter) and the 87 other colleagues who contributed the database in response to our questionnaire. For this reasons, the four editors appear as “editors” and not as “authors” of the Atlas, even though we wrote all the texts and put together all the maps in this book.

The 88 Consortium members who contributed to the APiCS database are listed in Table 1. Some people worked on several languages (or varieties), and the databases for some languages were contributed by several authors working together.

Table 1. The APiCS Consortium members
Aboh, Enoch O.Saramaccan (with Tonjes Veenstra and Norval S.H. Smith)
Angelo, DeniseKriol (with Eva Schultze-Berndt)
Ansaldo, UmbertoSinglish (with Lisa Lim)
Baker, PhilipMauritian Creole (with Sibylle Kriegel)
Bakker, PeterMichif
Baptista, MarlyseCape Verdean Creole of Brava
Bartens, Angela(2) Nicaraguan Creole English, San Andres Creole English
Baxter, Alan N.Papiá Kristang
Biagui, Noël BernardCasamancese Creole (with Nicolas Quint)
Biberauer, TheresaAfrikaans (with Hans den Besten)
Bollée, AnnegretReunion Creole
Bruyn, AdrienneEarly Sranan (with Margot C. van den Berg)
Cardoso, Hugo C.Diu Indo-Portuguese
Clements, J. ClancyKorlai
Colot, Serge(2) Guadeloupean Creole, Martinican Creole (with Ralph Ludwig)
den Besten, Hans 2 Afrikaans (with Theresa Biberauer)
Devonish, HubertCreolese (with Dahlia Thompson)
Ehrhart, SabineTayo (with Melanie Revis)
Escure, GenevièveBelizean Creole
Faraclas, NicholasNigerian Pidgin
Farquharson, Joseph T.Jamaican
Fattier, DominiqueHaitian Creole
Finney, Malcolm AwadajinKrio
Foley, William A.Yimas-Arafundi Pidgin
Grant, Anthony P.Chinuk Wawa
Green, LisaAfrican American English
Hackert, StephanieBahamian Creole
Hagemeijer, TjerkSantome
Holm, JohnGuinea-Bissau Kriyol (with Incanha Intumbo & Liliana Inverno)
Huber, Magnus(editor) Ghanaian Pidgin English
Intumbo, IncanhaGuinea-Bissau Kriyol (with John Holm & Liliana Inverno)
Inverno, LilianaGuinea-Bissau Kriyol (with John Holm & Incanha Intumbo)
Khin Khin Aye, Singapore Bazaar Malay
Klein, Thomas B.Gullah
Klingler, Thomas A.Louisiana Creole (with Ingrid Neumann-Holzschuh)
Kouwenberg, Silvia(2) Berbice Dutch, Papiamentu
Kriegel, SibylleMauritian Creole (with Philip Baker)
Lang, JürgenCape Verdean Creole of Santiago
Li, MichelleChinese Pidgin English (with Stephen Matthews)
Lim, LisaSinglish (with Umberto Anslado)
Ludwig, Ralph(2) Guadeloupean Creole, Martinican Creole (with Serge Colot)
Luffin, XavierKinubi
Manfredi, StefanoJuba Arabic (with Sara Petrollino)
Matthews, StephenChinese Pidgin English (with Michelle Li)
Maurer, Philippe(editor) (3) Angolar, Batavia Creole, Principense
Meakins, FelicityGurindji Kriol
Meeuwis, MichaelLingala
Mesthrie, RajendFanakalo
Meyerhoff, MiriamBislama
Michaelis, Susanne Maria(editor) Seychelles Creole (with Marcel Rosalie)
Migge, BettinaNengee
Mous, MaartenMixed Ma’a/Mbugu
Mufwene, Salikoko S.Kikongo-Kituba
Mühleisen, SusanneTrinidad English Creole
Mühlhäusler, PeterNorf’k
Muysken, PieterMedia Lengua
Neumann-Holzschuh, IngridLouisiana Creole (with Thomas A. Klingler)
Paauw, ScottAmbon Malay
Perekhvalskaya, ElenaChinese Pidgin Russian
Petrollino, SaraJuba Arabic (with Stefano Manfredi)
Pfänder, StefanGuyanais
Plag, IngoSranan (with Donald Winford)
Post, MarikeFa d’Ambô
Prescod, PaulaVincentian Creole
Quint, NicolasCasamancese Creole (with Noël Bernard Biagui)
Revis, MelanieTayo (with Sabine Ehrhart)
Roberts, Sarah J.Pidgin Hawaiian
Rosalie, MarcelSeychelles Creole (with Susanne Maria Michaelis)
Samarin, William J.Sango
Schröder, AnneCameroon Pidgin English
Schultze-Berndt, EvaKriol (with Denise Angelo)
Schwegler, ArminPalenquero
Siegel, Jeff(2) Pidgin Hindustani, Tok Pisin (with Geoff Smith)
Sippola, Eeva(2) Ternate Chabacano, Cavite Chabacano
Slomanson, PeterSri Lankan Malay
Smith, Geoff P.Tok Pisin (with Jeff Siegel)
Smith, Ian R.Sri Lanka Portuguese
Smith, Norval S.H.Saramaccan (with Enoch O. Aboh & Tonjes Veenstra)
Steinkrüger, Patrick O.Zamboanga Chabacano
Swolkien, DominikaCape Verdean Creole of São Vicente
Thompson, DahliaCreolese (with Hubert Devonish)
van den Berg, Margot C.Early Sranan (with Adrienne Bruyn)
van der Voort, HeinEskimo Pidgin
van Sluijs, RobbertNegerhollands
Veenstra, TonjesSaramaccan (with Enoch O. Aboh & Norval S.H. Smith)
Velupillai, VivekaHawai‘i Creole
Winford, DonaldSranan (with Ingo Plag)
Yakpo, KofiPichi

3. The languages of APiCS

The Atlas of Pidgin and Creole Language Structures contains information about 76 languages, listed in Table 2. This is a large subset of the existing contact languages (Smith (1994) lists over 500 pidgins, creoles, and mixed languages), but including more languages was not possible with our resources. Besides pidgins and creoles we have also included a few mixed languages. All these languages are also covered in the three-volume Survey of Pidgin and Creole Languages (though there are a few mismatches between the APiCS Consortium and the Survey authors, noted in footnotes to Table 2).

Table 2. APiCS languages and Consortium members
language (variety)lexifierConsortium member
African American EnglishEnglishLisa Green
Afrikaans 3 DutchHans den Besten & Theresa Biberauer
Ambon MalayMalayScott Paauw
AngolarPortuguesePhilippe Maurer
Bahamian CreoleEnglishStephanie Hackert
Batavia CreolePortuguesePhilippe Maurer
Belizean CreoleEnglishGeneviève Escure
Berbice DutchDutchSilvia Kouwenberg
BislamaEnglishMiriam Meyerhoff
Cameroon Pidgin EnglishEnglishAnne Schröder
Cape Verdean Creole of BravaPortugueseMarlyse Baptista
Cape Verdean Creole of SantiagoPortugueseJürgen Lang
Cape Verdean Creole of São VicentePortugueseDominika Swolkien
Casamancese CreolePortugueseNoël Bernard Biagui & Nicolas Quint
Cavite ChabacanoSpanishEeva Sippola
Chinese Pidgin EnglishEnglishMichelle Li & Stephen Matthews
Chinese Pidgin RussianRussianElena Perekhvalskaya
Chinuk WawaCoastal ChinookAnthony P. Grant
CreoleseEnglishHubert Devonish & Dahlia Thompson
Diu Indo-PortuguesePortugueseHugo C. Cardoso
Early Sranan 4 EnglishMargot C. van den Berg & Adrienne Bruyn
Eskimo PidginEskimoHein van der Voort
Fa d’AmbôPortugueseMarike Post
FanakaloZuluRajend Mesthrie (& Clarissa Surek-Clark) 5
Ghanaian Pidgin EnglishEnglishMagnus Huber
Guadeloupean Creole 6 FrenchSerge Coloti & Ralph Ludwig
Guinea-Bissau KriyolPortugueseIncanha Intumbo & Liliana Inverno & John Holm
GullahEnglishThomas B. Klein
Gurindji KriolGurindji/KriolFelicity Meakins
GuyanaisFrenchStefan Pfänder
Haitian CreoleFrenchDominique Fattier
Hawai‘i CreoleEnglishViveka Velupillai
JamaicanEnglishJoseph T. Farquharson
Juba ArabicArabicStefano Manfredi & Sara Petrollino
Kikongo-KitubaKikongo-KimanyangaSalikoko S. Mufwene
KinubiArabicXavier Luffin
KorlaiPortugueseJ. Clancy Clements
KrioEnglishMalcolm Awadajin Finney
Kriol 7 EnglishEva Schultze-Berndt & Denise Angelo
LingalaBobangiMichael Meeuwis
Louisiana CreoleFrenchThomas A. Klingler & Ingrid Neumann-Holzschuh
Martinican CreoleFrenchSerge Colot & Ralph Ludwig
Mauritian CreoleFrenchPhilip Baker & Sibylle Kriegel
Media LenguaSpanishPieter Muysken
MichifFrench/CreePeter Bakker
Mixed Ma’a/MbuguCushitic/MaasaiMaarten Mous
NegerhollandsDutchRobbert van Sluijs
NengeeEnglishBettina Migge
Nicaraguan Creole EnglishEnglishAngela Bartens
Nigerian PidginEnglishNicholas Faraclas
Norf’kEnglishPeter Mühlhäusler
PalenqueroSpanishArmin Schwegler
Papiá KristangPortugueseAlan N. Baxter
Papiamentu 8 SpanishSilvia Kouwenberg
PichiEnglishKofi Yakpo
Pidgin HawaiianHawaiianSarah J. Roberts
Pidgin HindustaniHindustaniJeff Siegel
PrincipensePortuguesePhilippe Maurer
Reunion CreoleFrenchAnnegret Bollée
San Andres Creole EnglishEnglishAngela Bartens
SangoNgbandiWilliam J. Samarin
SantomePortugueseTjerk Hagemeijer
SaramaccanEnglishEnoch O. Aboh & Tonjes Veenstra & Norval S.H. Smith
Seychelles CreoleFrenchSusanne Maria Michaelis & Marcel Rosalie
Singapore Bazaar MalayMalayKhin Khin Aye
Singlish 9 EnglishLisa Lim & Umberto Ansaldo
SrananEnglishDonald Winford & Ingo Plag
Sri Lanka PortuguesePortugueseIan R. Smith
Sri Lankan MalayMalayPeter Slomanson
TayoFrenchSabine Ehrhart & Melanie Revis
Ternate ChabacanoSpanishEeva Sippola
Tok PisinEnglishGeoff P. Smith & Jeff Siegel
Trinidad English CreoleEnglishSusanne Mühleisen
Vincentian CreoleEnglishPaula Prescod
Yimas-Arafundi PidginYimasWilliam A. Foley
Zamboanga ChabacanoSpanishPatrick O. Steinkrüger

Inevitably, the choice of languages for such an enterprise will be partially opportunistic and potentially controversial. For some little-studied languages that we would have liked to include, we did not find any authors that could have contributed a chapter. In choosing the languages, we were confronted with the problem that while there is a vibrant field of pidgin and creole language studies, there are no commonly agreed criteria by which the category of creoles and pidgins can be readily delimited. All experts in this field agree that pidgins and creoles are new languages that are distinct from the languages from which they took the bulk of their lexicon or their grammar, and not just “corrupted” or “broken” versions of their lexifiers (lexicon-providing languages). We know that all of them arose as a result of an unusually high degree of language contact influence in special social circumstances such as long-distance trade and forced or indentured labour. Sometimes these languages have therefore simply been called “contact languages” in recent years (cf. Holm & Michaelis (eds.) 2008). But the term “contact language” has also been used in a much broader sense, and since our focus was on the languages that are traditionally discussed under the heading of “pidgin” and “creole”, this term was not suitable as a title for our project.

Our choice of the term “pidgin and creole languages” thus simply follows traditional ordinary usage in the field, and should not be taken to imply a particular definition of “pidgin” or “creole”. Our main goal with this work (the Atlas and the Survey) has been to provide a new systematic and solid factual basis, on which future work can build, not to engage in theoretical or ideological debates.

Languages which underwent an unusually high degree of contact are not only those languages widely known as pidgins and creoles, but also include languages which are more often discussed under the rubric of “mixed languages” (cf. Matras & Bakker 2003), of which there are various types (e.g. “bilingual mixed languages”, “intertwined languages”). The debate on the nature and origin of all these varieties is still going on, so we decided to include a number of these languages in both the Atlas and the Survey. Typical cases of languages which are regarded as mixed languages that are not pidgins or creoles are Michif, Media Lengua, Gurindji Kriol, and Mixed Ma’a/Mbugu. Moreover, we also included languages with a history of high contact that are sometimes called “semi-creoles” (Holm 2004) and languages that are in other ways similar or related to pidgins or creoles, such as African American English, Afrikaans, and Sri Lankan Malay. Thus, our general approach was to be inclusive rather than exclusive. If a language is not included here that meets all the above criteria (and no doubt there are many such languages), this is for practical reasons, not for any principled reasons. We are aware that we were not able to fulfil everybody’s wishes, but we are confident that our selection of languages can be seen as representative of the kinds of languages that contact linguists have focused their research on over the last few decades. Readers who wish to classify our languages according to their criteria into groups are invited to do so on the basis of the sociohistorical and structural information provided in the individual survey chapters.

Where we want to contrast the languages dealt with in APiCS with the world’s languages, we rarely use the full description “creole languages, pidgin languages, and mixed languages”, as it would be cumbersome. Instead, we simply talk about “the APiCS languages” (or sometimes we use the term “contact languages” to encompass all three kinds of languages). Note that all the varieties included in APiCS are full-fledged languages with fairly fixed conventions. We have not included any jargons, i.e. speech forms that are used where no common full-fledged language is available and that do not have fixed conventions.

A factor that played no role in our choice of languages is the numerical or social significance of the language. Only a few of the languages can be regarded as having the status of “national language” of some sort, but most of these are languages of small countries, and where they are spoken throughout larger countries (like Haitian Creole and Nigerian Pidgin), they generally have low (or even very low) prestige. Since our interest is primarily in the language structures, this was irrelevant: We included languages regardless of their status and the number of their speakers. Some APiCS languages have very few speakers and never had many (e.g. Norf’k with 800 speakers, Tayo with about 3000 speakers, Fa d’Ambô with about 5000 speakers). Others were more vigorous in the past, but it is foreseeable that they will become extinct in the not so distant future (e.g. Papiá Kristang, Principense, Cavite Chabacano). Still others no longer have any speakers (e.g. Berbice Dutch, Batavia Creole, Eskimo Pidgin, Chinese Pidgin Russian). In one case, we included both an earlier and a modern variety of the same language: Early Sranan (based on the extensive documentation from the 18th century) and modern Sranan.

Sometimes the languages or varieties that the APiCS language experts described were not internally homogeneous, but different subvarieties (or lects) had different value choices for some feature. For example, in ordinary Guinea-Bissau Kriyol, “adjectives sometimes agree in gender with the head noun when the noun refers to a human” (Intumbo et al. 2013). But in the variety of the older generation, such gender agreement is entirely absent. In such cases, we allowed the authors to enter this information as well, but it is not reflected on the maps of this atlas, which only show the default lect. The information on such non-default lects is given only in APiCS Online. So the default lect that was primarily described by the contributors need not be representative for the entire language.

4. Classification by lexifier

To help the reader’s orientation, we have classified our languages into English-based, Dutch-based, Portuguese-based, and so on (see the middle column in Table 2). The 76 APiCS languages are shown in Map 0, with different colours for different lexifiers. Again, this classification is not entirely uncontroversial. On the one hand, contact languages are characterized by strong influence from multiple languages, so saying, for instance, that Haitian Creole is French-based is problematic, as it glosses over the very important contribution of the African languages, especially to the grammar of the language. For this reason, many authors have used expressions like “French-lexified”, “Dutch-lexified” for such languages, which only refer to the role of the European languages as primary lexicon-providers. 10 We agree that such terms are more precise, but they are also more cumbersome, so we have mostly used the older (and still much more widespread) manner of talking about groups of creoles and pidgins. We think that it is sufficiently well-known that “English-based” (etc.) is not meant to imply anything other than that the bulk of the language’s lexicon is derived from English.

On the other hand, the notion of being based on a language is problematic in the case of languages with several lexifiers, especially Gurindji Kriol and Michif. These are shown as having two lexifiers on Map 0. There are also a few other cases where it is not fully clear what the primary lexifier is. Saramaccan’s vocabulary has a very large Portuguese component, but for simplicity we classify it as English-based here. Papiamentu is often thought to be originally (Afro-)Portuguese-based, but as it has long been influenced much more by Spanish, we classify it as Spanish-based.

Map 0. The APiCS languages and their lexifiers

We made no attempt to use a uniform system for the language names, so we used names such as “Bahamian Creole”, “Nicaraguan Creole English” and “Trinidad English Creole” side by side. We used the names that the authors preferred, and that they think are most widely used, by scholars and/or by the speakers themselves. Since there is no agreement on what exactly a pidgin and a creole is, it was impossible to try to use these terms systematically in the language names. Moreover, quite a few of the languages have well-established names that do not contain the elements “pidgin” or “creole” at all (Bislama, Sranan, Tayo, Papiamentu, etc.). Note that when we refer to creole or pidgin languages in general, we never capitalize these terms. They are only capitalized as part of language names, and sometimes when “Pidgin” or “Creole” are used as abbreviations of a longer language name (as when “Creole” is used to refer to Seychelles Creole, in a context where it is clear that the creole of the Seychelles is referred to).

Of our 76 languages and varieties, 27 are English-based, 14 are Portuguese-based, 9 are French-based, and 6 are Spanish-based. Our sample of languages is thus not genealogically balanced at all. Most specialists of pidgins and creoles do not classify their languages by families, but the reasons why typologists usually work with genealogically balanced samples (e.g. D. Bakker 2011) also apply to pidgins and creoles: If the languages in the sample are not independent cases, it is easy to be misled, especially if one applies quantitative measures. Our sample of 76 languages does include a fair number of languages with non-European lexifiers, but most have European lexifiers, and almost half of those with European lexifiers are English-based. And these languages not only share a lexifier, but many of them (e.g. those in the Caribbean) share a significant part of their more recent history. Some of our languages are so similar to each other that they are mutually intelligible (e.g. Mauritian Creole and Seychelles Creole) Thus, when looking at the figures, one cannot translate “most APiCS languages have X” into “pidgin and creole languages generally have X”. The reader is urged to interpret all quantitative statements that we make with great caution.

5. The structural features

Like the choice of languages, the choice of features involved some difficult decisions. Languages can be compared with respect to an indefinite number of diverse dimensions, and while we would ideally like to have a representative set of features covering all domains of language structure, it is not even clear what representativeness might mean here. So we had to proceed on the basis of our intuitions, choosing features that we regarded as interesting and that we hoped the users of this atlas would find interesting as well. We tried to include the classical features discussed in the literature on creoles (e.g. those discussed in the wake of Bickerton 1981). In deciding on the choice of features, we made ample use of inspiration from two sources: the World Atlas of Language Structures (WALS), and Comparative Creole Syntax (Holm & Patrick 2007). To a significant extent, the present work stands on the shoulders of these predecessors.

As in WALS, the features of APiCS are synchronic structural features, with a small set (between two and nine) fixed values. In contrast to WALS, we decided to allow multiple choice, i.e. a language can have two or more values on a given feature.

The features are structural features in the sense that they concern abstract structures of the languages, rather than concrete form-meaning combinations (morphs). In dialect atlases, one commonly finds maps displaying the distribution of specific morphs. For example, in Kortmann & Lunkenheimer (2011), Feature 1 is “She/her used for inanimate referents”. This makes reference to the specific morphs she and her. Such features are not possible when one compares languages that are not closely related. In order to capture similarities and differences between languages that do not descend from a common ancestor, one needs to define abstract structural features that make reference to structural properties that can be identified in any language. These can be general concepts of language form such as “precedes/follows”, “overt/zero”, “identical/different”, or semantic-pragmatic concepts like “negation”, “question”, “focus”, or more complex comparative concepts defined on the basis of such elementary formal concepts and semantic-pragmatic concepts (e.g. “subject”, “pronoun”).

The features are synchronic features in that they make no reference to the history of the relevant forms. Of course, in creole studies it is often diachronic changes that are of interest, and creolists often ask what is the origin of particular forms, e.g. whether the plural marker originates from a 3rd person plural pronoun (e.g. Papiamentu kas-nan ‘houses’, nan ‘they’). But our perspective is synchronic, even though the APiCS authors often provide interesting information about diachronic sources of grammatical markers. However, we can only ask questions that can in principle be answered for any language, even if nothing is known about its history. Thus, we refrained from including diachronic features in the Atlas of Pidgin and Creole Language Structures. But in the comments to the value assignments in the database, the APiCS contributors have often included information about diachronic aspects, and we often included diachronic considerations in the chapter texts.

However, we made exceptions in two lexical features, Chapter 109 (“Pequenino”) and Chapter 110 (“Savvy”). The Portuguese words pequeninho ‘little’ and saber ‘know’ came to be widely used also in English-based pidgins and creoles (with forms such as pickaninny and savvy), so we thought that including them in APiCS would be interesting, even though these are specific forms, not abstract structures, and thus should not strictly speaking be part of a comparison of language structures.

The structural features have between two and nine different fixed values. Like the choice of features, the choice of values had to be based on our intuitions of what kinds of distinctions would give the most interesting results. For example, in Chapter 40 (“Gender agreement of adnominal adjectives”), we distinguish the four values in Table 3.

Table 3. The value box of Chapter 40

No adjective agrees with the noun


Only few adjectives agree with the noun


Many adjectives agree with the noun


All adjectives agree with the noun


Instead of four values, we could have distinguished only two (gender agreemeent exists/does not exist), or we could have made even more distinctions along any number of dimensions: optional/obligatory agreement, agreement depending on adjective position, agreement depending on semantic class of adjective, on animacy of head noun, and so on. Thus, there is nothing “natural” or “true” about the value choices that we made: These are our choices that we feel best capture the diversity and similarity among our languages, but alternative choices would always have been possible.

It is important to be aware that the concepts that we use to compare the languages in structural terms are a special set of comparative concepts (Haspelmath 2010) and need not be identical to the descriptive categories that one would use to describe these languages. Descriptive categories are defined in language-particular terms and thus not suitable for cross-linguistic comparison. For example, “adjective” may be defined in a variety of ways in different languages (Dixon 2004), making reference to lack of marking in adnominal contexts, presence of a copula in predicative contexts, presence of comparative marking, use of special degree words, and so on. But for cross-linguistic comparison, we must limit ourselves to criteria that can be applied to all languages, i.e. semantic criteria. Thus, in Chapter 3 (“Order of adjective and noun”), we define “adjectives” as word with property meanings such as ‘hot’, ‘old’ and ‘blue’, regardless of how they are treated grammatically in each language. Thus, in languages which do not make a grammatical adjective-verb distinction, it is still possible to answer the question about the order between the adjective and the noun. Likewise, in the chapters dealing with definite articles (Chapters 9, 28, 31), we provide a semantic definition of definite articles, and we classify elements as definite articles that are normally regarded as demonstratives in the languages in question.

Some grammatical concepts are generally treated as if they had a cross-linguistically valid definition, though this is not in fact the case. The most striking example of this is the concept ‘word’ (as well as its counterparts, ‘affix’ and ‘phrase’). Linguists often compare languages with respect to the contrast between morphological expression (by affixes) and syntactic expression (by words and phrases), and we do this occasionally in APiCS as well (e.g. in Chapters 37, 45-46, 62, 100), but one needs to be aware that it is very difficult to define ‘affix’ as a comparative concept (Haspelmath 2011). Thus, a more precisely defined notion of “tighteness of combination” (as in Chapters 45-46) is likely to give more information than the information about affixhood (which tends to primarily reflect writing traditions).

6. Relationship between APiCS features and features in Holm & Patrick (2007) and in WALS

Table 4 shows those 48 APiCS features that are also represented in Holm & Patrick (2007) in one way or another, and those 47 features that are also represented in WALS (Haspelmath, Dryer, Gil and Comrie 2005).

Table 4. APiCS features with counterparts in Comparative Creole Syntax andWALS
APiCS chapter/feature title Holm & Patrick (2007) number WALS chapter number


Order of subject, object and verb



Order of possessor and possessum




Order of adjective and noun




Order of adposition and noun phrase



Order of demonstrative and noun




Order of cardinal numeral and noun



Order of relative clause and noun



Order of degree word and adjective



Position of interrogative phrases in content questions



Gender distinctions in personal pronouns




Inclusive/exclusive distinction in independent personal pronouns



Politeness distinctions in second person pronouns



Interrogative pronouns



Indefinite pronouns



Occurrence of nominal plural markers



Expression of nominal plural meaning



The associative plural




Nominal plural marker and 3rd person plural pronoun



Definite articles




Indefinite articles




Generic noun phrases in subject function



Cooccurrence of demonstrative and definite article



Pronominal and adnominal demonstratives




Distance contrasts in demonstratives




Adnominal distributive numerals



Ordinal numerals



Marking of pronominal possessors



Marking of possessor noun phrases




Independent pronominal possessor



Gender agreement of adnominal adjectives



Comparative adjective marking



Comparative standard marking

(15.6-7, 14.5)



Internal order of tense, aspect, and mood markers



Tightness of the link between the past marker and the verb



Tightness of link between the progressive marker and the verb



Uses of the progressive marker



Uses of the habitual marker



Present reference of stative verbs and past perfective reference of dynamic verbs



Aspect markers and inchoative meaning



Suppletion according to tense and aspect



The prohibitive



Alignment of case marking of full noun phrases



Alignment of case marking of personal pronouns




Ditransitive constructions with ‘give’



Expression of pronominal subjects



Comitatives and instrumentals



Noun phrase conjunction and comitative



Nominal and verbal conjunction



Predicative noun phrases




Predicative adjectives



Predicative locative phrases



Predicative noun phrases and predicative locative phrases



Predicative possession



Existential verb and transitive possession verb



Motion-to and motion-from



Directional serial verb constructions with 'come' and 'go'



'Give' serial verb constructions



Intensifiers and reflexive pronouns




Reciprocal constructions



Passive constructions



Applicative constructions



Subject relative clauses

9.3 (15.8, 17.9)


Object relative clauses

9.4, 9.6 (17.9)


Instrument relative clauses

(9.5, 17.9)


Complementizer with verbs of speaking

8.5-8.7, 14.4


Complementizer with verbs of knowing



‘Want’ complement subjects



Negative morpheme types




Position of standard negation



Negation and indefinite pronouns




Polar questions




Focusing of the noun phrase



Verb doubling and focus



Vocative markers



Para-linguistic usages of clicks



‘Hand’ and ‘arm’






Nasal vowels


In addition to these matching features, there are also about 25 chapters that have a topic that is related to the topic of a WALS chapter, but the definition of the feature and/or the choice of feature values is so different that a direct comparison is not possible.

And even for those 47 features where there is a close match between the WALS features and the APiCS features (usually because the APiCS feature was explicitly modeled on the WALS feature), we had to change the APiCS values occasionally. Sometimes this was necessary because we wanted to include more information and avoid uninformative values such as “other” or “mixed” (see also §6 below on multiple-choice features). In a number of cases we felt that more distinctions should be made than in WALS, and sometimes we ignored minor values that occur only in highly special circumstances and that would have been difficult to explain to our contributors. Another common situation is that certain values that occur in WALS do not occur in APiCS because they are rare and are not found in our relatively small sample of 76 languages.

While the Holm & Patrick (2007) features were specifically chosen in order to highlight properties that are characteristic of creoles, the features that were adopted from WALS are more neutral.

7. Multiple-choice features

In contrast to WALS, we decided to allow multiple-choice features, i.e. features in which a language allows several possibilities. Thus, in Chapter 2 (“Order of possessor and possessum”), we have two values (cf. Table 2), but there are three types of languages: those with possessor-possessum order (20 languages, exclusively value 1), those with possessum-possessor order (29 languages, exclusively value 2), and those with both orders (27 languages, sharing values 1 and 2).

Table 5. The value box of a multiple-choice feature (Chapter 2)












Languages which show multiple values for a given feature are shown by a pie chart in different colours on the map.

In WALS, by contrast, there are three values, and for each language a single choice must be made. Table 6 shows the three values of WALS’s Chapter 86 (Order of genitive and noun, Dryer 2005f; note that Dryer uses “genitive” in the same sense as APiCS uses “possessor”). If a language allows both orders and neither of the two orders is dominant (as in English, where one can say both my friend’s house and the house of my friend), the language is assigned to the third type (“both”).

Table 6. Value distribution in Dryer’s (2005f) WALS chapter

1. Genitive-noun (GenN)


2. Noun-genitive (NGen)


3. Both orders occur with neither order dominant


If both orders are possible but one of the orders is marginal (e.g. genitive-noun order in Russian, which is possible under highly restricted conditions), then a language is classified in the WALS chapter according to the majority pattern, and the minority option is ignored.

In APiCS, we did not want to ignore minority patterns, and we wanted to distinguish between different situations of value combinations. Thus, we decided to add a weighting to the different values in a multiple-choice situation. Whenever more than two values were chosen for a feature, the contributors were asked to indicate the relative importance of each value, either numerically or by verbal description. There are five different degrees of relative importance as described verbally, which we translated into numerical values:

about half

So if there are two values, and one of them is described as “majority” and the other as “minority” by the contributor, the pie chart on the map shows 70% of its area in the colour of the first value, and 30% in the colour of the second value. It has to be kept in mind that relative importance can refer to different concepts: For example, in the Feature “Order of subject, object and verb” (Chapter 1) relative importance refers to text (or token) frequency. That is, it indicates how often a particular word order occurs in spoken or written language in comparison to the other word orders. On the other hand, token frequency is irrelevant for the Feature “Female and male animals” (Chapter 117), where relative frequency expresses the paradigm frequency, i.e. it indicates how many different animal names a particular sex-denoting word-formation process applies to, irrespective of whether the names are often or rarely used.

But the pies are not limited to 10% vs. 90%, 30% vs. 70%, and 50% vs. 50%. There are two ways in which other divisions of the values can come about. On the one hand, the contributors were also allowed to indicate the relative importance numerically, by entering precise percentage numbers. Of course, just like the verbal importance indications, these numbers are usually based on impressionistic estimates, rather than on text counts, but we felt that the impressionistic knowledge of language experts was a valuable resource that we wanted to include in the database when it was available.

On the other hand, numerical distributions other than the ones implied by the five degrees listed above can come about when three (or more) different values are chosen. In such cases, the contributors often chose verbal relative-importance indications that add up to more than 100% by the above correspondences. For example, if three values are chosen, and one of them is said to be “pervasive”, while the other two values are “marginal”, the figures add up to 110%. To turn such value choices into a pie chart, we had to normalize the figures: The pervasive pattern now gets 82% (= 90/110) instead of 90%, while two other values get 9% each (= 10/110) instead of 10%. Occasionally, this kind of normalization was also necessary when only two values were chosen. For example, if one value is said to be “pervasive” (90%) and the other value “minority” (30%), we assign the resulting importance values 75% (= 90/120) and 25% (= 30/120).

Of the 130 structural features, 43 are multiple-choice features and 87 are single-choice features.

8. The chapter texts

Each chapter consists of a text that explains and exemplifies the different structural types (or values) of the feature, as well as of a world map showing the APiCS languages. In most cases, the chapter text occupies two pages, and the world map occupies two pages as well. However, for several features that show relatively little variation in our languages, we decided to make the map smaller (occupying just half a page rather than two pages), and to fit the text into one and a half pages. This concerns features 10, 15, 27, 36, 58, 91, and 99. Moreover, some of the phonological features, where we had less to say in the chapter texts, are treated differently: The texts for Chapters 118-119 occupy just a single page each, and the texts for Chapters 121-130 occupy just half a page each and are accompanied by smaller maps.

The chapter texts were written by the editors, but since the texts are based on the value assignments and the examples provided by the members of the APiCS Consortium, we decided to make the APiCS Consortium a co-author of each chapter (as mentioned earlier in §2). This may look a bit unusual, especially in the context of a work of linguistics where named author groups are not (yet) common. But it accurately reflects the enormous contribution made by the Consortium members. We would not have felt comfortable if the individual chapters of the Atlas were referred to as "Michaelis 2013" or "Maurer 2013", as if these were papers based entirely on our research. It must be emphasized that the coauthorship of the Consortium does not imply any responsibility for the interpretation of the results. An alternative would have been not to assign individual authorship to the chapters at all, but if we had chosen that option, the individual contributions of the editors would have been left unclear. As a rule, the first chapter author was responsible not only for writing the chapter, but also for reviewing and checking the corresponding feature dataset in the database (§10), as well as for formulating the feature values and the annotation in the first project phase. All this preparatory work was much more time-consuming than the actual writing of the chapter on the basis of the final database. When a chapter has two editors as co-authors, usually one of them did most the preparatory work while the other one wrote most of the text.

The chapter texts explain the feature in question, define the (between two and nine) feature values, provide examples, discuss the geographical or historical distribution of the values, and situate it in a wider context, often relating it to debates in pidgin and creole studies or to the world-wide distribution of languages as reported in WALS. Due to the brevity of the APiCS chapters and the huge amount of relevant literature especially in creole studies, we were able to cite only a small part of it. Our chapters can thus be used as an introduction to the grammatical patterns of pidgins and creoles, but not as an overview of what has been written about them (see Holm (1988) for such an overview).

Each chapter contains a value box, which is largely identical to the value legend on the corresponding map (the value names are sometimes abbreviated on the maps). The value boxes of the single-choice features just give the number of languages in each value, while for the multiple-choice features, three numbers are given for each value: the number of languages that have this value exclusively, the number of languages that have this value among others (“shared”), and the number of languages that have this value at all (see Table 5 above).

Many of the chapters contain quite a few examples (usually example sentences) of the various phenomena from different languages of APiCS. These examples were almost all taken from the structure datasets supplied by the APiCS contributors, and in order to highlight these contributions, each example is accompanied by a reference to the structure dataset from which we took it. It should be noted that the examples often come from earlier published sources, which are mentioned in the structure dataset as published in APiCS Online.

The examples are usually accompanied by interlinear word-by-word glosses. For languages with more complex word-internal structure, we have often broken up the words into morphemes and provided morpheme-by-morpheme glosses. The interlinear glosses follow the conventions of the Leipzig Glossing Rules.

9. The maps

The maps show the world with country boundaries for orientation, and dot symbols (or pie charts) in different colours for each of the 76 languages. Each feature can have between two and nine different values. The colour of the dot tells the user for each language which value(s) it has for the given feature. The political boundaries are shown only for rough orientiation, and they should by no means be taken to endorse a particular political view. Online versions of the maps are also available in APiCS Online (at, but the chapter texts are available only in the printed form of the atlas.

The world map uses the Gall-Peters projection, which looks a bit unusual, because the shapes of the continents are different from what we are used to from the more frequently used projections. However, some of the commonly used projections have the disadvantage of enlarging the circum-polar regions, especially the well-known Mercator projection. Since the majority of APiCS languages are spoken in the circum-equatorial regions, the Gall-Peters projection with its specialized cylindrical equal-area representation of the world seemed best to us. Even so, two inset maps were necessary because of the high number of languages in these regions: One for the eastern Caribbean and one for the Gulf of Guinea. When information is lacking for a language (as happens in about 3% of the cases), the language is simply not shown on the map.

The different feature values are shown by dots with different colours. We tried to choose the colours in such a way that they facilitate the interpretation of the differences between the languages. The principles that have guided us are quite similar to those used in WALS:

  • we avoid green dots because some readers cannot easily distinguish green and red
  • we use similar colours for similar values across different features
  • we use white dots for values expressing absence of a certain property
  • we normally use red and blue for the two main types (apart from absence of a property)
  • we try to use light red and light blue for values that are similar to the two main types
  • we use yellow for identical coding of a function, red for different coding, orange for overlap, and black for identity and differentiation
  • for constituent order features such as the order of adposition and noun phrase (Chapter 4), we use red for the order where the head (or grammatical element) precedes, and blue for the order where it follows
  • for word order features where there is no general correlation with the order of object and verb, we use yellow for the order where a modifier precedes the noun and purple for the order where it follows the noun
  • we use maximally distinct colours for rare types that need to be salient

In addition to the main world maps showing the APiCS languages, we have included small inset maps cited from WALS for those 47 features that match WALS features. However, the maps cite only the data of WALS, not the actual map layout. In WALS, a different map projection and orientation is chosen, so we adjusted the WALS maps to the APiCS maps in this regard. The WALS maps sometimes had to be modified, as described in the map caption (e.g. “with minor values omitted”). There is no separate legend for the WALS insets – the dot colours have (at least largely) the same meaning as in the APiCS map. These inset maps allow the users to get a good sense of the world-wide distribution of the feature, thus allowing them to assess the likelihood that the feature is due to universal trends or substrate effects.

10. The APiCS project 2006–2012

The Atlas of Pidgin and Creole Language Structures is the result of intensive large-scale collaboration over several years. In this section, we describe the process in its broad outlines.

The idea was originally proposed in the context of the Groupe Européen de Recherche en Langues Créoles in 2005 by Philippe Maurer. This proposal was taken up by Susanne Maria Michaelis, and Magnus Huber and Martin Haspelmath joined them as co-editors. Thus, the editorial team consisted of a specialist of Ibero-Romance-based creoles (Maurer), a specialist of French-based creoles (Michaelis), a specialist of English-based pidgins and creoles (Huber), and a general comparative linguist with an interest in language contact (Haspelmath).

Funding came from the Max Planck Institute for Evolutionary Anthropology (and its director Bernard Comrie), as well as (between 2008 and 2011) from the Deutsche Forschungsgemeinschaft through a grant to the University of Giessen (Susanne Michaelis and Magnus Huber). This allowed Susanne Maria Michaelis to work primarily on this project for most of the project time, and covered the costs for a substantial number of student assistants, as well as our main assistant Melanie Revis and our database manager Bradley Taylor. We also had funding for regular meetings of the editors (mostly in Leipzig, but once in Aubonne near Lake Geneva). In addition, we were given the generous opportunity to organize seven workshops and conferences in Leipzig between 2006 and 2010 (February 2006, October 2006, March 2007, October 2007, June 2008, November 2009, November 2010). All APiCS Consortium members were invited to attend at least one of these events, and almost all of them came to Leipzig at least once. Finally, the APiCS project was presented by one or more of the editors at various conferences on (pidgin and) creole languages during this time, so that many members of the field had the chance to hear about it early on (January 2007 at the SPCL meeting in Anaheim, June 2007 at the SPCL meeting in Amsterdam, July 2008 at the SCL/SPCL meeting in Cayenne, January 2009 at the SPCL meeting in San Francisco, April 2009 at the Giessen Creolistics Workshop, August 2009 at the SPCL meeting in Cologne).

Most of the key design features of the APiCS database, of the resulting Atlas, and of the Survey volumes were intensively discussed during the meetings in the early phase of the project, and later meetings discussed first results from the database.

The most important aspect of the work of the editors in the first phase was to put together a set of features and feature values for the APiCS questionnaire. It would hardly have been possible to formulate the definitive questionnaire without a pilot phase, because the definition of the phenomena and the subdivision of the types depends on what is actually there. For this reason, we formulated a test questionnaire of 153 features in 2006 (inspired by Holm & Patrick 2007 and by WALS, see §6) that we asked a number of colleagues to fill in. After getting these preliminary datasets, we thoroughly revised the questionnaire and reduced it to 120 primary features. To this we added 28 sociolinguistic features (available only online) and over 100 segment features (for the phonological segments, we asked the contributors to give the complete inventory), which were much easier to fill in than the 120 primary features. The 130 maps of this atlas show the 120 primary features plus 10 particularly interesting segment features.

The feedback from the test questionnaires and discussions with the contributors during the meetings in 2006 and 2007 helped the editors to formulate the APiCS questionnaire in the best possible way. In many cases, we realized that some of the concepts that we had used were not sufficiently clear and the definitions needed to be clarified (e.g. “definite article”, “adjective”, “serial verb”). Some of the contributors asked us what to do in case different subvarieties of their language show different types, so we allowed for the possibility of adding further lects, in addition to the default lect, to the database (note that the maps in this volume only show the distribution of values in the default lect). The question how to deal with multiple-choice features and how to indicate relative importance (by words or by estimated percentages, as mentioned in §6 above) was also discussed intensively.

The final questionnaire was sent out to the contributors in the spring of 2008, and we received most of the filled questionnaires by the end of the year. A screenshot of the database questionnaire (main layout) is show in Figure 1. The contributors’ tasks were to carefully read the feature description in the upper left corner of the layout, to look at the feature values and their annotations, and to select the correct value (or several, in the case of multiple-choice features). They were asked to provide a bibliographical reference (in case the phenomenon was discussed in the literature on their language) and one or more examples, and they were given the opportunity to enter a prose comment into a comment field.

Figure 1. Main layout of the APiCS questionnaire (application FileMaker Pro)

The resulting datasets were thoroughly reviewed by the editors: Each editor looked at the value choices, the comments and the examples for the features that he or she was responsible for, and then provided comments to the contributors. In the majority of cases, the comment was limited to “OK”, but in many other cases, we had to ask a question or point out an omission. It turned out that some definitions of features and values were not fully clear and demanded revision, and there were also cases where we later realized that even in the revised version of the questionnaire, some unclarities of definitions remained. This reviewing by the editors was rather time-consuming, but it was very important to ensure the consistency of the datasets.

At the next stage, the contributors got their datasets back with our comments and were asked to revise them in accordance with our requests. Thus, the datasets were revised much more thoroughly than an average journal paper, and our contributors had to be very patient with us. For this reason, we kept stressing their crucial role in this enterprise, which is reflected in the fact that the APiCS Consortium is a coauthor of every chapter of this Atlas (cf. §2 above).

In addition to the value assignment in the main layout, the contributors were asked to provide examples in a separate layout. This layout has fields for the primary text, an analyzed text (with hyphenation for morpheme breakup), an interlinear gloss, and an idiomatic translation into English. We also asked the contributors to give an example reference and to specify the example type (spoken, written, constructed, etc.). Further optional fields can contain translations into other languages and general comments.

The final APiCS database contains over 15,000 examples, over half of them naturalistic spoken examples. Over 1000 of the examples were constructed by native speaker linguists (at least ten of the contributors are native speakers of the languages they have described). The glossing style of the examples was made fully consistent across the entire database due to the hard work of our student assistants, who spent a lot of time tracking down the meanings of unusual abbreviations and homogenizing capitalization and punctuation conventions. A major role was played by our main assistant Melanie Revis, who coordinated the students and also provided invaluable help in making other aspects of the database consistent (as well as editing the Survey). All examples are accessible via APiCS Online ( The chapter texts in this Atlas just contain a small selection of the examples that we collected. The APiCS example collection must be one of the largest sets of interlinear glossed text from multiple languages.

Figure 2. Example layout of the APiCS questionnaire