Semitic language and lingua franca of the Arab world
This article is about the general language (macrolanguage). For specific varieties of Arabic and other uses, see
This article needs attention from an expert in linguistics. The specific problem is: There seems to be some confusion surrounding the chronology of Arabic's origination, including notably in the paragraph on Qaryat Al-Faw (also discussed on talk). There are major sourcing gaps from "Literary Arabic" onwards.WikiProject Linguistics may be able to help recruit an expert.(August 2022)
Arabic is traditionally written with the
Arabic alphabet, a
right-to-leftabjad. This alphabet is the official script for MSA. Colloquial varieties were traditionally not written, however, with the emergence of social media, the amount of written dialects has significantly increased online. Besides the Arabic alphabet, dialects are also often written in
left to right or in
Hebrew characters (in Israel) with no standardized orthography.
Hassaniya are the only varieties officially written in a Latin alphabet.
Arabic is usually classified as a
Central Semitic language. Linguists still differ as to the best classification of Semitic language sub-groups. The Semitic languages changed significantly between
Proto-Semitic and the emergence of Central Semitic languages, particularly in grammar. Innovations of the Central Semitic languages—all maintained in Arabic—include:
The conversion of the suffix-conjugated stative formation (jalas-) into a past tense.
The conversion of the prefix-conjugated
preterite-tense formation (yajlis-) into a present tense.
The elimination of other prefix-conjugated mood/aspect forms (e.g., a present tense formed by doubling the middle root, a
perfect formed by
infixing a /t/ after the first root consonant, probably a jussive formed by a stress shift) in favor of new moods formed by endings attached to the prefix-conjugation forms (e.g., -u for indicative, -a for subjunctive, no ending for jussive, -an or -anna for energetic).
On the other hand, several Arabic varieties are closer to other Semitic languages and maintain features not found in Classical Arabic, indicating that these varieties cannot have developed from Classical Arabic. Thus, Arabic
vernaculars do not descend from Classical Arabic: Classical Arabic is a sister language rather than their direct ancestor.
Arabia boasted a wide variety of Semitic languages in antiquity. In the southwest, various
Central Semitic languages both belonging to and outside of the
Ancient South Arabian family (e.g. Southern Thamudic) were spoken. It is also believed that the ancestors of the
Modern South Arabian languages (non-Central Semitic languages) were also spoken in southern Arabia at this time. To the north, in the oases of northern
Taymanitic held some prestige as inscriptional languages. In Najd and parts of western Arabia, a language known to scholars as Thamudic C is attested. In eastern Arabia, inscriptions in a script derived from ASA attest to a language known as
Hasaitic. Finally, on the northwestern frontier of Arabia, various languages known to scholars as
Thamudic B, Thamudic D,
Hismaic are attested. The last two share important
isoglosses with later forms of Arabic, leading scholars to theorize that Safaitic and Hismaic are in fact early forms of Arabic and that they should be considered
Linguists generally believe that "Old Arabic" (a collection of related dialects that constitute the precursor of Arabic) first emerged around the 1st century CE. Previously, the earliest attestation of Old Arabic was thought to be a single 1st century CE inscription in
Sabaic script at Qaryat al-Faw, in southern present-day Saudi Arabia. However, this inscription does not participate in several of the key innovations of the Arabic language group, such as the conversion of Semitic
nunation in the singular. It is best reassessed as a separate language on the Central Semitic dialect continuum.
It was also thought that Old Arabic coexisted alongside—and then gradually displaced--
epigraphicAncient North Arabian (ANA), which was theorized to have been the regional tongue for many centuries. ANA, despite its name, was considered a very distinct language, and mutually unintelligible, from "Arabic". Scholars named its variant dialects after the towns where the inscriptions were discovered (Dadanitic, Taymanitic, Hismaic, Safaitic). However, most arguments for a single ANA language or language family were based on the shape of the definite article, a prefixed h-. It has been argued that the h- is an archaism and not a shared innovation, and thus unsuitable for language classification, rendering the hypothesis of an ANA language family untenable. Safaitic and Hismaic, previously considered ANA, should be considered Old Arabic due to the fact that they participate in the innovations common to all forms of Arabic.
The earliest attestation of continuous Arabic text in an
ancestor of the modern Arabic script are three lines of poetry by a man named Garm(')allāhe found in
En Avdat, Israel, and dated to around 125 CE. This is followed by the
Namara inscription, an epitaph of the Lakhmid king Imru' al-Qays bar 'Amro, dating to 328 CE, found at Namaraa, Syria. From the 4th to the 6th centuries, the Nabataean script evolves into the Arabic script recognizable from the early Islamic era. There are inscriptions in an undotted, 17-letter Arabic script dating to the 6th century CE, found at four locations in Syria (
Zabad, Jabal 'Usays, Harran, Umm el-Jimal). The oldest surviving papyrus in Arabic dates to 643 CE, and it uses dots to produce the modern 28-letter Arabic alphabet. The language of that papyrus and of the Qur'an are referred to by linguists as "Quranic Arabic", as distinct from its codification soon thereafter into "
Arabic from the Quran in the old Hijazi dialect (Hijazi script, 7th century AD)
In late pre-Islamic times, a transdialectal and transcommunal variety of Arabic emerged in the
Hejaz, which continued living its parallel life after literary Arabic had been institutionally standardized in the 2nd and 3rd century of the
Hijra, most strongly in Judeo-Christian texts, keeping alive ancient features eliminated from the "learned" tradition (Classical Arabic).This variety and both its classicizing and "lay" iterations have been termed Middle Arabic in the past, but they are thought to continue an
Old Higazi register. It is clear that the orthography of the
Qur'an was not developed for the standardized form of Classical Arabic; rather, it shows the attempt on the part of writers to record an archaic form of Old Higazi.
The Qur'an has served and continues to serve as a fundamental reference for Arabic. (
Blue Qur'an, 9th-10th century)
In the late 6th century AD, a relatively uniform intertribal "poetic koine" distinct from the
spoken vernaculars developed based on the
Bedouin dialects of
Najd, probably in connection with the court of
al-Ḥīra. During the first Islamic century, the majority of Arabic poets and Arabic-writing persons spoke Arabic as their mother tongue. Their texts, although mainly preserved in far later manuscripts, contain traces of non-standardized
Classical Arabic elements in morphology and syntax.
Evolution of early
Arabic script (9th–11th century), with the Basmala as an example, from
kuficQur'ān manuscripts: (1) Early 9th century, script with no dots or diacritic marks;(2) and (3) 9th–10th century under Abbasid dynasty,
Abu al-Aswad's system established red dots with each arrangement or position indicating a different short vowel; later, a second black-dot system was used to differentiate between letters like fā’ and qāf; (4) 11th century, in
al-Farāhidi's system (system used today) dots were changed into shapes resembling the letters to transcribe the corresponding long vowels.
Abu al-Aswad al-Du'ali (
c. 603–689) is credited with standardizing
Arabic grammar, or an-naḥw (النَّحو "the way"), and pioneering a system of
diacritics to differentiate consonants (نقط الإعجامnuqat l-i'jām "pointing for non-Arabs") and indicate
vocalization (التشكيلat-tashkil).Al-Khalil ibn Ahmad al-Farahidi (718 – 786) compiled the first Arabic dictionary,
Kitāb al-'Ayn (كتاب العين "The Book of the Letter
ع"), and is credited with establishing the rules of Arabic
prosody.Al-Jahiz (776-868) proposed to
Al-Akhfash al-Akbar an overhaul of the grammar of Arabic, but it would not come to pass for two centuries. The standardization of Arabic reached completion around the end of the 8th century. The first comprehensive description of the ʿarabiyya "Arabic",
Sībawayhi'sal-Kitāb, is based first of all upon a corpus of poetic texts, in addition to Qur'an usage and Bedouin informants whom he considered to be reliable speakers of the ʿarabiyya.
By the 8th century, knowledge of Classical Arabic had become an essential prerequisite for rising into the higher classes throughout the Islamic world, both for Muslims and non-Muslims. For example,
Andalusi Jewish philosopher, authored works in
Judeo-Arabic—Arabic written in
Hebrew script—including his famous The Guide for the Perplexed (דלאלת אלחאירין, دلالة الحائرينDalālat al-ḥāʾirīn).
koine theory claims that the modern Arabic dialects collectively descend from a single military koine that sprang up during the Islamic conquests; this view has been challenged in recent times. Ahmad al-Jallad proposes that there were at least two considerably distinct types of Arabic on the eve of the conquests: Northern and Central (Al-Jallad 2009). The modern dialects emerged from a new contact situation produced following the conquests. Instead of the emergence of a single or multiple koines, the dialects contain several sedimentary layers of borrowed and areal features, which they absorbed at different points in their linguistic histories. According to Veersteegh and Bickerton, colloquial Arabic dialects arose from
pidginized Arabic formed from contact between Arabs and conquered peoples. Pidginization and subsequent
creolization among Arabs and
arabized peoples could explain relative morphological and phonological simplicity of vernacular Arabic compared to Classical and MSA.
The Nahda was a cultural and especially literary renaissance of the 19th century in which writers sought "to fuse Arabic and European forms of expression." According to
James L. Gelvin, "Nahda writers attempted to simplify the Arabic language and script so that it might be accessible to a wider audience."
In the wake of the
industrial revolution and European
colonialism, pioneering Arabic presses, such as the
Amiri Press established by
Muhammad Ali (1819), dramatically changed the diffusion and consumption of Arabic
literature and publications.Rifa'a al-Tahtawi proposed the establishment of
Madrasat al-Alsun in 1836 and led a translation campaign that highlighted the need for a lexical injection in Arabic, to suit concepts of the industrial and post-industrial age. In response, a number of Arabic academies modeled after the Académie française were established with the aim of developing standardized additions to the Arabic lexicon to suit these transformations, first in
Damascus (1919), then in
ar] (1993), and
Tunis (1993).They review language development, monitor new words and approve inclusion of new words into their published standard dictionaries. They also publish old and historical Arabic manuscripts. In 1997, a bureau of Arabization standardization was added to the
Educational, Cultural, and Scientific Organization of the
Arab League. These academies and organizations have worked toward the
Arabization of the sciences,
creating terms in Arabic to describe new concepts, toward the standardization of these new terms throughout the Arabic-speaking world, and toward the development of Arabic as a
world language. This gave rise to what Western scholars call Modern Standard Arabic. From the 1950s,
Arabization became a postcolonial nationalist policy in countries such as Tunisia, Algeria, Morocco, and Sudan.
Arabic usually refers to Standard Arabic, which Western linguists divide into
Classical Arabic and Modern Standard Arabic. It could also refer to any of a variety of regional vernacular
Arabic dialects, which are not necessarily mutually intelligible.
Modern Standard Arabic (MSA) largely follows the grammatical standards of Classical Arabic and uses much of the same vocabulary. However, it has discarded some grammatical constructions and vocabulary that no longer have any counterpart in the spoken varieties and has adopted certain new constructions and vocabulary from the spoken varieties. Much of the new vocabulary is used to denote concepts that have arisen in the
post-industrial era, especially in modern times. Due to its grounding in Classical Arabic, Modern Standard Arabic is removed over a millennium from everyday speech, which is construed as a multitude of dialects of this language. These dialects and Modern Standard Arabic are described by some scholars as not mutually comprehensible. The former are usually acquired in families, while the latter is taught in formal education settings. However, there have been studies reporting some degree of comprehension of stories told in the standard variety among preschool-aged children. The relation between Modern Standard Arabic and these dialects is sometimes compared to that of
Classical Latin and
Vulgar Latin vernaculars (which became
Romance languages) in medieval and early modern Europe.
MSA is the variety used in most current, printed Arabic publications, spoken by some of the Arabic media across North Africa and the Middle East, and understood by most educated Arabic speakers. "Literary Arabic" and "Standard Arabic" (فُصْحَىfuṣḥá) are less strictly defined terms that may refer to Modern Standard Arabic or Classical Arabic.
Some of the differences between Classical Arabic (CA) and Modern Standard Arabic (MSA) are as follows:
Certain grammatical constructions of CA that have no counterpart in any modern vernacular dialect (e.g., the
energetic mood) are almost never used in Modern Standard Arabic.
Case distinctions are very rare in Arabic vernaculars. As a result, MSA is generally composed without case distinctions in mind, and the proper cases are added after the fact, when necessary. Because most case endings are noted using final short vowels, which are normally left unwritten in the Arabic script, it is unnecessary to determine the proper case of most words. The practical result of this is that MSA, like English and
Standard Chinese, is written in a strongly determined word order and alternative orders that were used in CA for emphasis are rare. In addition, because of the lack of case marking in the spoken varieties, most speakers cannot consistently use the correct endings in extemporaneous speech. As a result, spoken MSA tends to drop or regularize the endings except when reading from a prepared text.
The numeral system in CA is complex and heavily tied in with the case system. This system is never used in MSA, even in the most formal of circumstances; instead, a significantly simplified system is used, approximating the system of the conservative spoken varieties.
MSA uses much Classical vocabulary (e.g., dhahaba 'to go') that is not present in the spoken varieties, but deletes Classical words that sound obsolete in MSA. In addition, MSA has borrowed or coined many terms for concepts that did not exist in Quranic times, and MSA continues to evolve.Some words have been borrowed from other languages—notice that transliteration mainly indicates spelling and not real pronunciation (e.g., فِلْمfilm 'film' or ديمقراطيةdīmuqrāṭiyyah 'democracy').
However, the current preference is to avoid direct borrowings, preferring to either use
loan translations (e.g., فرعfarʻ 'branch', also used for the branch of a company or organization; جناحjanāḥ 'wing', is also used for the wing of an airplane, building, air force, etc.), or to coin new words using forms within existing
roots (استماتةistimātah '
apoptosis', using the root موتm/w/t 'death' put into the
Xth form, or جامعةjāmiʻah 'university', based on جمعjamaʻa 'to gather, unite'; جمهوريةjumhūriyyah 'republic', based on جمهورjumhūr 'multitude'). An earlier tendency was to redefine an older word although this has fallen into disuse (e.g., هاتفhātif 'telephone' < 'invisible caller (in Sufism)'; جريدةjarīdah 'newspaper' < 'palm-leaf stalk').
Colloquial or dialectal Arabic refers to the many national or regional varieties which constitute the everyday spoken language. Colloquial Arabic has many regional variants; geographically distant varieties usually differ enough to be
mutually unintelligible, and some linguists consider them distinct languages. However, research indicates a high degree of mutual intelligibility between closely related Arabic variants for native speakers listening to words, sentences, and texts; and between more distantly related dialects in interactional situations.
The varieties are typically unwritten. They are often used in informal spoken media, such as
soap operas and
talk shows, as well as occasionally in certain forms of written media such as poetry and printed advertising.
Hassaniya Arabic and
Maltese are only varieties of modern Arabic to have acquired official status. The Senegalese government adopted the Latin script to write Hassaniya Maltese is spoken in (predominantly
Malta and written with the
Latin script. Linguists agree that it is a variety of spoken Arabic, descended from
Siculo-Arabic, though it has experienced extensive changes as a result of sustained and intensive contact with Italo-Romance varieties, and more recently also with English. Due to "a mix of social, cultural, historical, political, and indeed linguistic factors," many Maltese people today consider their language Semitic but not a type of Arabic.
Even during Muhammad's lifetime, there were dialects of spoken Arabic. Muhammad spoke in the dialect of
Mecca, in the western
Arabian peninsula, and it was in this dialect that the Quran was written. However, the dialects of the eastern Arabian peninsula were considered the most prestigious at the time, so the language of the Quran was ultimately converted to follow the eastern
phonology. It is this phonology that underlies the modern pronunciation of Classical Arabic. The phonological differences between these two dialects account for some of the complexities of Arabic writing, most notably the writing of the
glottal stop or hamzah (which was preserved in the eastern dialects but lost in western speech) and the use of alif maqṣūrah (representing a sound preserved in the western dialects but merged with ā in eastern speech).
Status and usage
The sociolinguistic situation of Arabic in modern times provides a prime example of the linguistic phenomenon of
diglossia, which is the normal use of two separate varieties of the same language, usually in different social situations. Tawleed is the process of giving a new shade of meaning to an old classical word. For example, al-hatif lexicographically, means the one whose sound is heard but whose person remains unseen. Now the term al-hatif is used for a telephone. Therefore, the process of tawleed can express the needs of modern civilization in a manner that would appear to be originally Arabic. In the case of Arabic, educated Arabs of any nationality can be assumed to speak both their school-taught Standard Arabic as well as their native dialects, which depending on the region may be mutually unintelligible. Some of these dialects can be considered to constitute separate languages which may have "sub-dialects" of their own.When educated Arabs of different dialects engage in conversation (for example, a Moroccan speaking with a Lebanese), many speakers
code-switch back and forth between the dialectal and standard varieties of the language, sometimes even within the same sentence. Arabic speakers often improve their familiarity with other dialects via music or film.
The issue of whether Arabic is one language or many languages is politically charged, in the same way it is for the
varieties of Chinese,
Scots and English, etc. In contrast to speakers of Hindi and Urdu who claim they cannot understand each other even when they can, speakers of the varieties of Arabic will claim they can all understand each other even when they cannot. While there is a minimum level of comprehension between all Arabic dialects, this level can increase or decrease based on geographic proximity: for example, Levantine and Gulf speakers understand each other much better than they do speakers from the Maghreb. The issue of diglossia between spoken and written language is a significant complicating factor: A single written form, significantly different from any of the spoken varieties learned natively, unites a number of sometimes divergent spoken forms. For political reasons, Arabs mostly assert that they all speak a single language, despite significant issues of mutual incomprehensibility among differing spoken versions.
From a linguistic standpoint, it is often said that the various spoken varieties of Arabic differ among each other collectively about as much as the
Romance languages.This is an apt comparison in a number of ways. The period of divergence from a single spoken form is similar—perhaps 1500 years for Arabic, 2000 years for the Romance languages. Also, while it is comprehensible to people from the
Maghreb, a linguistically innovative variety such as
Moroccan Arabic is essentially incomprehensible to Arabs from the
Mashriq, much as French is incomprehensible to Spanish or Italian speakers but relatively easily learned by them. This suggests that the spoken varieties may linguistically be considered separate languages.
Status in the Arab world vis-à-vis other languages
With the sole example of Medieval linguist
Abu Hayyan al-Gharnati – who, while a scholar of the Arabic language, was not ethnically Arab – Medieval scholars of the Arabic language made no efforts at studying comparative linguistics, considering all other languages inferior.
In modern times, the educated upper classes in the Arab world have taken a nearly opposite view.
Yasir Suleiman wrote in 2011 that "studying and knowing English or French in most of the Middle East and North Africa have become a badge of sophistication and modernity and ... feigning, or asserting, weakness or lack of facility in Arabic is sometimes paraded as a sign of status, class, and perversely, even education through a mélange of code-switching practises."
As a foreign language
Arabic has been taught worldwide in many
secondary schools, especially Muslim schools. Universities around the world have classes that teach Arabic as part of their
Middle Eastern studies, and
religious studies courses.
Arabic language schools exist to assist students to learn Arabic outside the academic world. There are many Arabic
language schools in the Arab world and other
Muslim countries. Because the Quran is written in Arabic and all
Islamic terms are in Arabic, millions of Muslims (both Arab and non-Arab) study the language. Software and books with tapes are also important part of Arabic learning, as many of Arabic learners may live in places where there are no academic or Arabic language school classes available. Radio series of Arabic language classes are also provided from some radio stations. A number of websites on the
Internet provide online classes for all levels as a means of distance education; most teach Modern Standard Arabic, but some teach regional varieties from numerous countries.
The most important sources of borrowings into (pre-Islamic) Arabic are from the related (Semitic) languages
Aramaic, which used to be the principal, international language of communication throughout the ancient Near and Middle East, and
Ethiopic. In addition, many cultural, religious and political terms have entered Arabic from
Iranian languages, notably
Parthian, and (Classical) Persian, and Hellenistic Greek (kīmiyāʼ has as origin the Greek khymia, meaning in that language the melting of metals; see
Roger Dachez, Histoire de la Médecine de l'Antiquité au XXe siècle, Tallandier, 2008, p. 251), alembic (distiller) from ambix (cup), almanac (climate) from almenichiakon (calendar). (For the origin of the last three borrowed words, see Alfred-Louis de Prémare, Foundations of Islam, Seuil, L'Univers Historique, 2002.) Some Arabic borrowings from Semitic or Persian languages are, as presented in De Prémare's above-cited book:
medina (مدينة, city or city square), a word of Aramaic origin ܡܕ݂ܝܼܢ݇ܬܵܐ/"məḏī(n)ttā" (in which it means "state/city").
jazīrah (جزيرة), as in the well-known form الجزيرة "Al-Jazeera," means "island" and has its origin in the Syriac ܓܵܙܲܪܬܵܐ gāzartā.
lāzaward (لازورد) is taken from Persian لاژورد lājvard, the name of a blue stone, lapis lazuli. This word was borrowed in several European languages to mean (light) blue – azure in English, azur in French and azul in Portuguese and Spanish.
A comprehensive overview of the influence of other languages on Arabic is found in Lucas & Manfredi (2020).
In addition, English has many Arabic loanwords, some directly, but most via other Mediterranean languages. Examples of such words include admiral, adobe, alchemy, alcohol, algebra, algorithm, alkaline, almanac, amber, arsenal, assassin, candy, carat, cipher, coffee, cotton, ghoul, hazard, jar, kismet, lemon, loofah, magazine, mattress, sherbet, sofa, sumac, tariff, and zenith. Other languages such as Maltese and
Kinubi derive ultimately from Arabic, rather than merely borrowing vocabulary or grammatical rules.
Terms borrowed range from religious terminology (like Berber taẓallit, "prayer", from salat (صلاةṣalāh)), academic terms (like
Uyghurmentiq, "logic"), and economic items (like English coffee) to
placeholders (like Spanish fulano, "so-and-so"), everyday terms (like Hindustani lekin, "but", or Spanish taza and French tasse, meaning "cup"), and expressions (like Catalan a betzef, "galore, in quantity"). Most Berber varieties (such as
Kabyle), along with Swahili, borrow some numbers from Arabic. Most Islamic religious terms are direct borrowings from Arabic, such as صلاة (salat), "prayer", and إمام (imam), "prayer leader."
In languages not directly in contact with the Arab world, Arabic loanwords are often transferred indirectly via other languages rather than being transferred directly from Arabic. For example, most Arabic loanwords in Hindustani and Turkish entered through Persian. Older Arabic loanwords in Hausa were borrowed from
Kanuri. Most Arabic loanwords in
Yoruba entered through Hausa.
Arabic words also made their way into several West African languages as Islam spread across the Sahara. Variants of Arabic words such as كتابkitāb ("book") have spread to the languages of African groups who had no direct contact with Arab traders.
Since, throughout the Islamic world, Arabic occupied a position similar to that of Latin in Europe, many of the Arabic concepts in the fields of science, philosophy, commerce, etc. were coined from Arabic roots by non-native Arabic speakers, notably by Aramaic and Persian translators, and then found their way into other languages. This process of using Arabic roots, especially in Kurdish and Persian, to translate foreign concepts continued through to the 18th and 19th centuries, when swaths of Arab-inhabited lands were under
Ottoman rule.
Sparsely populated area or no indigenous Arabic speakers
Solid area fill: variety natively spoken by at least 25% of the population of that area or variety indigenous to that area only
Hatched area fill: minority scattered over the area
Dotted area fill: speakers of this variety are mixed with speakers of other Arabic varieties in the area
Colloquial Arabic is a collective term for the spoken dialects of Arabic used throughout the
Arab world, which differ radically from the
literary language. The main dialectal division is between the varieties within and outside of the
Arabian peninsula, followed by that between
sedentary varieties and the much more conservative
Bedouin varieties. All the varieties outside of the Arabian peninsula (which include the large majority of speakers) have many features in common with each other that are not found in Classical Arabic. This has led researchers to postulate the existence of a prestige
koine dialect in the one or two centuries immediately following the
Arab conquest, whose features eventually spread to all newly conquered areas. These features are present to varying degrees inside the Arabian peninsula. Generally, the Arabian peninsula varieties have much more diversity than the non-peninsula varieties, but these have been understudied.
Within the non-peninsula varieties, the largest difference is between the non-Egyptian
North African dialects (especially
Moroccan Arabic) and the others. Moroccan Arabic in particular is hardly comprehensible to Arabic speakers east of
Libya (although the converse is not true, in part due to the popularity of Egyptian films and other media).
One factor in the differentiation of the dialects is influence from the languages previously spoken in the areas, which have typically provided a significant number of new words and have sometimes also influenced pronunciation or word order; however, a much more significant factor for most dialects is, as among
Romance languages, retention (or change of meaning) of different classical forms. Thus
North Africankayən all mean 'there is', and all come from Classical Arabic forms (yakūn, fīhi, kā'in respectively), but now sound very different.
Charles A. Ferguson, the following are some of the characteristic features of the
koiné that underlies all the modern dialects outside the Arabian peninsula. Although many other features are common to most or all of these varieties, Ferguson believes that these features in particular are unlikely to have evolved independently more than once or twice and together suggest the existence of the koine:
Loss of the
dual number except on nouns, with consistent plural agreement (cf. feminine singular agreement in plural inanimates).
Change of a to i in many affixes (e.g., non-past-tense prefixes ti- yi- ni-; wi- 'and'; il- 'the'; feminine -it in the
Loss of third-weak verbs ending in w (which merge with verbs ending in y).
Reformation of geminate verbs, e.g., ḥalaltu 'I untied' → ḥalēt(u).
Conversion of separate words lī 'to me', laka 'to you', etc. into indirect-object
Certain changes in the
cardinal number system, e.g., khamsat ayyām 'five days' → kham(a)s tiyyām, where certain words have a special plural with prefixed t.
Egyptian Arabic is spoken by 67 million people in
Egypt. It is one of the most understood varieties of Arabic, due in large part to the widespread distribution of Egyptian films and television shows throughout the Arabic-speaking world
Maltese, spoken on the
island of Malta, is the only fully separate standardized language to have originated from an Arabic dialect (the extinct
Siculo-Arabic dialect), with independent literary norms. Maltese has evolved independently of Modern Standard Arabic and its varieties into a standardized language over the past 800 years in a gradual process of
Latinisation. Maltese is therefore considered an exceptional descendant of Arabic that has no
diglossic relationship with
Standard Arabic or
Classical Arabic. Maltese is also different from Arabic and other
Semitic languages since its
morphology has been deeply influenced by
Sicilian. It is also the only Semitic language written in the
Latin script. In terms of basic everyday language, speakers of Maltese are reported to be able to understand less than a third of what is said to them in
Tunisian Arabic, which is related to Siculo-Arabic, whereas speakers of Tunisian are able to understand about 40% of what is said to them in Maltese. This
asymmetric intelligibility is considerably lower than the
mutual intelligibility found between Maghrebi Arabic dialects. Maltese has its own dialects, with urban varieties of Maltese being closer to Standard Maltese than rural varieties.
Sudanese Arabic is spoken by 17 million people in
Sudan and some parts of southern
Egypt. Sudanese Arabic is quite distinct from the dialect of its neighbor to the north; rather, the Sudanese have a dialect similar to the Hejazi dialect.
Judeo-Arabic dialects – these are the dialects spoken by the
Jews that had lived or continue to live in the
Arab World. As Jewish migration to Israel took hold, the language did not thrive and is now considered endangered. So-called
Of the 29 Proto-Semitic consonants, only one has been lost: */ʃ/, which merged with /s/, while /ɬ/ became /ʃ/ (see
Semitic languages). Various other consonants have changed their sound too, but have remained distinct. An original */p/ lenited to /f/, and */ɡ/ – consistently attested in pre-Islamic Greek transcription of Arabic languages – became palatalized to /ɡʲ/ or /ɟ/ by the time of the Quran and /
ʒ/ or /ɟ/ after
early Muslim conquests and in MSA (see
Arabic phonology#Local variations for more detail). An original
voiceless alveolar lateral fricative*/ɬ/ became /ʃ/. Its
emphatic counterpart /ɬˠ~ɮˤ/ was considered by Arabs to be the most unusual sound in Arabic (Hence the Classical Arabic's appellation لُغَةُ ٱلضَّادِlughat al-ḍād or "language of the ḍād"); for most modern dialects, it has become an emphatic stop /dˤ/ with loss of the laterality or with complete loss of any pharyngealization or velarization, /d/. (The classical ḍād pronunciation of
pharyngealization/ɮˤ/ still occurs in the
Mehri language, and the similar sound without velarization, /
ɮ/, exists in other
Modern South Arabian languages.)
Other changes may also have happened. Classical Arabic pronunciation is not thoroughly recorded and different
reconstructions of the sound system of Proto-Semitic propose different phonetic values. One example is the emphatic consonants, which are pharyngealized in modern pronunciations but may have been velarized in the eighth century and glottalized in Proto-Semitic.
Reduction of /j/ and /w/ between vowels occurs in a number of circumstances and is responsible for much of the complexity of third-weak ("defective") verbs. Early Akkadian transcriptions of Arabic names shows that this reduction had not yet occurred as of the early part of the 1st millennium BC.
The Classical Arabic language as recorded was a poetic
koine that reflected a consciously archaizing dialect, chosen based on the tribes of the western part of the
Arabian Peninsula, who spoke the most conservative variants of Arabic. Even at the time of Muhammed and before, other dialects existed with many more changes, including the loss of most glottal stops, the loss of case endings, the reduction of the diphthongs /aj/ and /aw/ into monophthongs /eː, oː/, etc. Most of these changes are present in most or all modern varieties of Arabic.
An interesting feature of the writing system of the Quran (and hence of Classical Arabic) is that it contains certain features of Muhammad's native dialect of Mecca, corrected through diacritics into the forms of standard Classical Arabic. Among these features visible under the corrections are the loss of the glottal stop and a differing development of the reduction of certain final sequences containing /j/: Evidently, final /-awa/ became /aː/ as in the Classical language, but final /-aja/ became a different sound, possibly /eː/ (rather than again /aː/ in the Classical language). This is the apparent source of the alif maqṣūrah 'restricted alif' where a final /-aja/ is reconstructed: a letter that would normally indicate /j/ or some similar high-vowel sound, but is taken in this context to be a logical variant of alif and represent the sound /aː/.
The "colloquial" spoken dialects of Arabic are learned at home and constitute the native languages of Arabic speakers. "Formal" Modern Standard Arabic is learned at school; although many speakers have a native-like command of the language, it is technically not the native language of any speakers. Both varieties can be both written and spoken, although the colloquial varieties are rarely written down and the formal variety is spoken mostly in formal circumstances, e.g., in radio and TV broadcasts, formal lectures, parliamentary discussions and to some extent between speakers of different colloquial dialects. Even when the literary language is spoken, however, it is normally only spoken in its pure form when reading a prepared text out loud and communication between speakers of different colloquial dialects. When speaking
extemporaneously (i.e. making up the language on the spot, as in a normal discussion among people), speakers tend to deviate somewhat from the strict literary language in the direction of the colloquial varieties. In fact, there is a continuous range of "in-between" spoken varieties: from nearly pure Modern Standard Arabic (MSA), to a form that still uses MSA grammar and vocabulary but with significant colloquial influence, to a form of the colloquial language that imports a number of words and grammatical constructions in MSA, to a form that is close to pure colloquial but with the "rough edges" (the most noticeably "vulgar" or non-Classical aspects) smoothed out, to pure colloquial. The particular variant (or register) used depends on the social class and education level of the speakers involved and the level of formality of the speech situation. Often it will vary within a single encounter, e.g., moving from nearly pure MSA to a more mixed language in the process of a radio interview, as the interviewee becomes more comfortable with the interviewer. This type of variation is characteristic of the
diglossia that exists throughout the Arabic-speaking world.
Although Modern Standard Arabic (MSA) is a unitary language, its pronunciation varies somewhat from country to country and from region to region within a country. The variation in individual "accents" of MSA speakers tends to mirror corresponding variations in the colloquial speech of the speakers in question, but with the distinguishing characteristics moderated somewhat. It is important in descriptions of "Arabic" phonology to distinguish between pronunciation of a given colloquial (spoken) dialect and the pronunciation of MSA by these same speakers. Although they are related, they are not the same. For example, the phoneme that derives from Classical Arabic /ɟ/ has many different pronunciations in the modern spoken varieties, e.g., [d͡ʒ ~ ʒ ~ j ~ ɡʲ ~ ɡ] including the proposed original [ɟ]. Speakers whose native variety has either [
d͡ʒ] or [
ʒ] will use the same pronunciation when speaking MSA. Even speakers from Cairo, whose native Egyptian Arabic has [
ɡ], normally use [
ɡ] when speaking MSA. The [
j] of Persian Gulf speakers is the only variant pronunciation which isn't found in MSA; [d͡ʒ~ʒ] is used instead, but may use [j] in MSA for comfortable pronunciation. Another reason of different pronunciations is influence of
colloquial dialects. The differentiation of
pronunciation of colloquial dialects is the influence from other languages previously spoken and some still presently spoken in the regions, such as
Coptic in Egypt,
Phoenician in North Africa,
Modern South Arabian, and
Old South Arabian in Yemen and Oman, and
Canaanite languages (including
Phoenician) in the
Another example: Many colloquial varieties are known for a type of
vowel harmony in which the presence of an "emphatic consonant" triggers backed
allophones of nearby vowels (especially of the low vowels /aː/, which are backed to [
ɑ(ː)] in these circumstances and very often fronted to [
æ(ː)] in all other circumstances). In many spoken varieties, the backed or "emphatic" vowel allophones spread a fair distance in both directions from the triggering consonant; in some varieties (most notably Egyptian Arabic), the "emphatic" allophones spread throughout the entire word, usually including prefixes and suffixes, even at a distance of several syllables from the triggering consonant. Speakers of colloquial varieties with this vowel harmony tend to introduce it into their MSA pronunciation as well, but usually with a lesser degree of spreading than in the colloquial varieties. (For example, speakers of colloquial varieties with extremely long-distance harmony may allow a moderate, but not extreme, amount of spreading of the harmonic allophones in their MSA speech, while speakers of colloquial varieties with moderate-distance harmony may only harmonize immediately adjacent vowels in MSA.)
Modern Standard Arabic has six pure
vowels (while most modern dialects have eight pure vowels which includes the long vowels /eː oː/), with short /a i u/ and corresponding long vowels /aː iː uː/. There are also two
diphthongs: /aj/ and /aw/.
The pronunciation of the vowels differs from speaker to speaker, in a way that tends to reflect the pronunciation of the corresponding colloquial variety. Nonetheless, there are some common trends. Most noticeable is the differing pronunciation of /a/ and /aː/, which tend towards fronted [
a(ː)] or [
ɛ(ː)] in most situations, but a back [
ɑ(ː)] in the neighborhood of
emphatic consonants. Some accents and dialects, such as those of the
Hejaz region, have an open [
a(ː)] or a central [
ä(ː)] in all situations. The vowel /a/ varies towards [
ə(ː)] too. Listen to the final vowel in the recording of al-ʻarabiyyah at the beginning of this article, for example. The point is, Arabic has only three short vowel phonemes, so those phonemes can have a very wide range of allophones. The vowels /u/ and /ɪ/ are often affected somewhat in emphatic neighborhoods as well, with generally more back or centralized
allophones, but the differences are less great than for the low vowels. The pronunciation of short /u/ and /i/ tends towards [ʊ~o] and [i~e~ɨ], respectively, in many dialects.
The definition of both "emphatic" and "neighborhood" vary in ways that reflect (to some extent) corresponding variations in the spoken dialects. Generally, the consonants triggering "emphatic" allophones are the
pharyngealized consonants /tˤ dˤ sˤ ðˤ/; /
q/; and /
r/, if not followed immediately by /i(ː)/. Frequently, the
velarfricatives/x ɣ/ also trigger emphatic allophones; occasionally also the
pharyngeal consonants/ʕ ħ/ (the former more than the latter). Many dialects have multiple emphatic allophones of each vowel, depending on the particular nearby consonants. In most MSA accents, emphatic coloring of vowels is limited to vowels immediately adjacent to a triggering consonant, although in some it spreads a bit farther: e.g., وقتwaqt[wɑqt] 'time'; وطنwaṭan[wɑtˤɑn] 'homeland'; وسط المدينةwasṭ al-madīnah[wæstˤ ɑl mæˈdiːnæ] 'downtown' (also [wɑstˤ æl mæˈdiːnæ] or similar).
In a non-emphatic environment, the vowel /a/ in the diphthong /aj/ is pronounced [æj] or [ɛj]: hence سيفsayf[sajf ~ sæjf ~ sɛjf] 'sword' but صيفṣayf[sˤɑjf] 'summer'. However, in accents with no emphatic allophones of /a/ (e.g., in the
Hejaz), the pronunciation [aj] or [äj] occurs in all situations.
The phoneme /d͡ʒ/ is represented by the Arabic letter jīm (ج) and has many standard pronunciations. [
d͡ʒ] is characteristic of north Algeria, Iraq, and most of the Arabian peninsula but with an allophonic [
ʒ] in some positions; [
ʒ] occurs in most of the
Levant and most of North Africa; and [
ɡ] is standard in Egypt, coastal Yemen, and western Oman. Generally this corresponds with the pronunciation in the colloquial dialects. In Sudan and Yemen, as well as in some Sudanese and Yemeni varieties, it may be either [ɡʲ] or [
ɟ], representing the original pronunciation of Classical Arabic.Foreign words containing /
ɡ/ may be transcribed with ج, غ, ك, ق, گ, ݣ or ڨ, depending on the regional practice. In northern Egypt, where the Arabic letter jīm (ج) is normally pronounced [
ɡ], a separate phoneme /
ʒ/, which may be transcribed with چ, occurs in a small number of mostly non-Arabic loanwords, e.g., /ʒakitta/ 'jacket'.
/θ/ (ث) can be pronounced as [
s]. In some places of Maghreb it can be also pronounced as [
/x/ and /ɣ/ (خ, غ) are velar, post-velar, or uvular.
/l/ is pronounced as velarized [
ɫ] in الله /ʔallaːh/, the name of God, q.e. Allah, when the word follows a, ā, u or ū (after i or ī it is unvelarized: بسم اللهbismi l–lāh/bismillaːh/).
The emphatic consonant /dˤ/ was actually pronounced [ɮˤ], or possibly [d͡ɮˤ]—either way, a highly unusual sound. The medieval Arabs actually termed their language lughat al-ḍād 'the language of the
Ḍād' (the name of the letter used for this sound), since they thought the sound was unique to their language. (In fact, it also exists in a few other minority Semitic languages, e.g., Mehri.)
Arabic has consonants traditionally termed "emphatic" /tˤ, dˤ, sˤ, ðˤ/ (ط, ض, ص, ظ), which exhibit simultaneous
pharyngealization[tˤ, dˤ, sˤ, ðˤ] as well as varying degrees of
velarization[tˠ, dˠ, sˠ, ðˠ] (depending on the region), so they may be written with the "Velarized or pharyngealized" diacritic ( ̴) as: /t̴, d̴, s̴, ð̴/. This simultaneous articulation is described as "Retracted Tongue Root" by phonologists. In some transcription systems, emphasis is shown by capitalizing the letter, for example, /dˤ/ is written ⟨D⟩; in others the letter is underlined or has a dot below it, for example, ⟨ḍ⟩.
Vowels and consonants can be phonologically short or long. Long (
geminate) consonants are normally written doubled in Latin transcription (i.e. bb, dd, etc.), reflecting the presence of the
Arabic diacritic mark shaddah, which indicates doubled consonants. In actual pronunciation, doubled consonants are held twice as long as short consonants. This consonant lengthening is phonemically contrastive: قبلqabila 'he accepted' vs. قبّلqabbala 'he kissed'.
Arabic has two kinds of syllables: open syllables (CV) and (CVV)—and closed syllables (CVC), (CVVC) and (CVCC). The syllable types with two
morae (units of time), i.e. CVC and CVV, are termed heavy syllables, while those with three morae, i.e. CVVC and CVCC, are superheavy syllables. Superheavy syllables in Classical Arabic occur in only two places: at the end of the sentence (due to
pausal pronunciation) and in words such as حارّḥārr 'hot', مادّةmāddah 'stuff, substance', تحاجواtaḥājjū 'they disputed with each other', where a long ā occurs before two identical consonants (a former short vowel between the consonants has been lost). (In less formal pronunciations of Modern Standard Arabic, superheavy syllables are common at the end of words or before
clitic suffixes such as -nā 'us, our', due to the deletion of final short vowels.)
In surface pronunciation, every vowel must be preceded by a consonant (which may include the
glottal stop[ʔ]). There are no cases of
hiatus within a word (where two vowels occur next to each other, without an intervening consonant). Some words do have an underlying vowel at the beginning, such as the definite article al- or words such as اشتراishtarā 'he bought', اجتماعijtimāʻ 'meeting'. When actually pronounced, one of three things happens:
If the word occurs after another word ending in a consonant, there is a smooth transition from final consonant to initial vowel, e.g., الاجتماعal-ijtimāʻ 'meeting' /alid͡ʒtimaːʕ/.
If the word occurs after another word ending in a vowel, the initial vowel of the word is
elided, e.g., بيت المديرbaytu (a)l-mudīr 'house of the director' /bajtulmudiːr/.
If the word occurs at the beginning of an utterance, a glottal stop [ʔ] is added onto the beginning, e.g., البيت هوal-baytu huwa ... 'The house is ...' /ʔalbajtuhuwa ... /.
Word stress is not phonemically contrastive in Standard Arabic. It bears a strong relationship to vowel length. The basic rules for Modern Standard Arabic are:
A final vowel, long or short, may not be stressed.
Only one of the last three syllables may be stressed.
Given this restriction, the last
heavy syllable (containing a long vowel or ending in a consonant) is stressed, if it is not the final syllable.
If the final syllable is super heavy and closed (of the form CVVC or CVCC) it receives stress.
If no syllable is heavy or super heavy, the first possible syllable (i.e. third from end) is stressed.
As a special exception, in Form VII and VIII verb forms stress may not be on the first syllable, despite the above rules: Hence inkatab(a) 'he subscribed' (whether or not the final short vowel is pronounced), yankatib(u) 'he subscribes' (whether or not the final short vowel is pronounced), yankatib 'he should subscribe (juss.)'. Likewise Form VIII ishtarā 'he bought', yashtarī 'he buys'.
These rules may result in differently stressed syllables when final case endings are pronounced, vs. the normal situation where they are not pronounced, as in the above example of mak-ta-ba-tun 'library' in full pronunciation, but mak-ta-ba(-tun) 'library' in short pronunciation.
The restriction on final long vowels does not apply to the spoken dialects, where original final long vowels have been shortened and secondary final long vowels have arisen from loss of original final -hu/hi.
Some dialects have different stress rules. In the Cairo (Egyptian Arabic) dialect a heavy syllable may not carry stress more than two syllables from the end of a word, hence mad-ra-sah 'school', qā-hi-rah 'Cairo'. This also affects the way that Modern Standard Arabic is pronounced in Egypt. In the Arabic of
Sanaa, stress is often retracted: bay-tayn 'two houses', mā-sat-hum 'their table', ma-kā-tīb 'desks', zā-rat-ḥīn 'sometimes', mad-ra-sat-hum 'their school'. (In this dialect, only syllables with long vowels or diphthongs are considered heavy; in a two-syllable word, the final syllable can be stressed only if the preceding syllable is light; and in longer words, the final syllable cannot be stressed.)
Levels of pronunciation
The final short vowels (e.g., the case endings -a -i -u and mood endings -u -a) are often not pronounced in this language, despite forming part of the formal paradigm of nouns and verbs. The following levels of pronunciation exist:
Full pronunciation with pausa
This is the most formal level actually used in speech. All endings are pronounced as written, except at the end of an utterance, where the following changes occur:
Final short vowels are not pronounced. (But possibly an exception is made for feminine plural -na and shortened vowels in the jussive/imperative of defective verbs, e.g., irmi! 'throw!'".)
The entire indefinite noun endings -in and -un (with
nunation) are left off. The ending -an is left off of nouns preceded by a tāʾ marbūṭah ة (i.e. the -t in the ending -at- that typically marks feminine nouns), but pronounced as -ā in other nouns (hence its writing in this fashion in the Arabic script).
The tāʼ marbūṭah itself (typically of feminine nouns) is pronounced as h. (At least, this is the case in extremely formal pronunciation, e.g., some Quranic recitations. In practice, this h is usually omitted.)
Formal short pronunciation
This is a formal level of pronunciation sometimes seen. It is somewhat like pronouncing all words as if they were in pausal position (with influence from the
colloquial varieties). The following changes occur:
Most final short vowels are not pronounced. However, the following short vowels are pronounced:
feminine plural -na
shortened vowels in the jussive/imperative of defective verbs, e.g., irmi! 'throw!'
second-person singular feminine past-tense -ti and likewise anti 'you (fem. sg.)'
final -a in certain short words, e.g., laysa 'is not', sawfa (future-tense marker)
nunation endings -an -in -un are not pronounced. However, they are pronounced in adverbial accusative formations, e.g., taqrīban تَقْرِيبًا 'almost, approximately', ʻādatan عَادَةً 'usually'.
The tāʾ marbūṭah ending ة is unpronounced, except in
construct state nouns, where it sounds as t (and in adverbial accusative constructions, e.g., ʻādatan عَادَةً 'usually', where the entire -tan is pronounced).
The masculine singular
nisbah ending -iyy is actually pronounced -ī and is unstressed (but plural and feminine singular forms, i.e. when followed by a suffix, still sound as -iyy-).
Full endings (including case endings) occur when a
clitic object or
possessive suffix is added (e.g., -nā 'us/our').
Informal short pronunciation
This is the pronunciation used by speakers of Modern Standard Arabic in
extemporaneous speech, i.e. when producing new sentences rather than reading a prepared text. It is similar to formal short pronunciation except that the rules for dropping final vowels apply even when a
clitic suffix is added. Basically, short-vowel case and mood endings are never pronounced and certain other changes occur that echo the corresponding colloquial pronunciations. Specifically:
All the rules for formal short pronunciation apply, except as follows.
The past tense singular endings written formally as -tu -ta -ti are pronounced -t -t -ti. But masculine ʾanta is pronounced in full.
Unlike in formal short pronunciation, the rules for dropping or modifying final endings are also applied when a
clitic object or possessive suffix is added (e.g., -nā 'us/our'). If this produces a sequence of three consonants, then one of the following happens, depending on the speaker's native colloquial variety:
A short vowel (e.g., -i- or -ǝ-) is consistently added, either between the second and third or the first and second consonants.
Or, a short vowel is added only if an otherwise unpronounceable sequence occurs, typically due to a violation of the
sonority hierarchy (e.g., -rtn- is pronounced as a three-consonant cluster, but -trn- needs to be broken up).
Or, a short vowel is never added, but consonants like r l m n occurring between two other consonants will be pronounced as a
syllabic consonant (as in the English words "butter bottle bottom button").
When a doubled consonant occurs before another consonant (or finally), it is often shortened to a single consonant rather than a vowel added. (However, Moroccan Arabic never shortens doubled consonants or inserts short vowels to break up clusters, instead tolerating arbitrary-length series of arbitrary consonants and hence Moroccan Arabic speakers are likely to follow the same rules in their pronunciation of Modern Standard Arabic.)
The clitic suffixes themselves tend also to be changed, in a way that avoids many possible occurrences of three-consonant clusters. In particular, -ka -ki -hu generally sound as -ak -ik -uh.
Final long vowels are often shortened, merging with any short vowels that remain.
Depending on the level of formality, the speaker's education level, etc., various grammatical changes may occur in ways that echo the colloquial variants:
Any remaining case endings (e.g. masculine plural nominative -ūn vs. oblique -īn) will be leveled, with the oblique form used everywhere. (However, in words like ab 'father' and akh 'brother' with special long-vowel case endings in the
construct state, the nominative is used everywhere, hence abū 'father of', akhū 'brother of'.)
Feminine plural endings in verbs and clitic suffixes will often drop out, with the masculine plural endings used instead. If the speaker's native variety has feminine plural endings, they may be preserved, but will often be modified in the direction of the forms used in the speaker's native variety, e.g. -an instead of -na.
Dual endings will often drop out except on nouns and then used only for emphasis (similar to their use in the colloquial varieties); elsewhere, the plural endings are used (or feminine singular, if appropriate).
As mentioned above, many spoken dialects have a process of emphasis spreading, where the "emphasis" (
emphatic consonants spreads forward and back through adjacent syllables, pharyngealizing all nearby consonants and triggering the back allophone [
ɑ(ː)] in all nearby
low vowels. The extent of emphasis spreading varies. For example, in Moroccan Arabic, it spreads as far as the first full vowel (i.e. sound derived from a long vowel or diphthong) on either side; in many Levantine dialects, it spreads indefinitely, but is blocked by any /
j/ or /
ʃ/; while in Egyptian Arabic, it usually spreads throughout the entire word, including prefixes and suffixes. In Moroccan Arabic, /i u/ also have emphatic allophones [e~ɛ] and [o~ɔ], respectively.
Unstressed short vowels, especially /i u/, are deleted in many contexts. Many sporadic examples of short vowel change have occurred (especially /a/→/i/ and interchange /i/↔/u/). Most Levantine dialects merge short /i u/ into /ə/ in most contexts (all except directly before a single final consonant). In Moroccan Arabic, on the other hand, short /u/ triggers
labialization of nearby consonants (especially
velar consonants and
uvular consonants), and then short /a i u/ all merge into /ə/, which is deleted in many contexts. (The labialization plus /ə/ is sometimes interpreted as an underlying phoneme /ŭ/.) This essentially causes the wholesale loss of the short-long vowel distinction, with the original long vowels /aː iː uː/ remaining as half-long [aˑ iˑ uˑ], phonemically /a i u/, which are used to represent both short and long vowels in borrowings from Literary Arabic.
Most spoken dialects have
monophthongized original /aj aw/ to /eː oː/ in most circumstances, including adjacent to emphatic consonants, while keeping them as the original diphthongs in others e.g. مَوْعِد/mawʕid/. In most of the
Sahel and Southeastern) Arabic dialects, they have subsequently merged into original /iː uː/.
In most dialects, there may be more or fewer phonemes than those listed in the chart above. For example, [
g] is considered a native phoneme in most Arabic dialects except in Levantine dialects like Syrian or Lebanese where ج is pronounced [
ʒ] and ق is pronounced [
d͡ʒ] or [
ʒ] (ج) is considered a native phoneme in most dialects except in Egyptian and a number of Yemeni and Omani dialects where ج is pronounced [
g]. [zˤ] or [ðˤ] and [dˤ] are distinguished in the dialects of Egypt, Sudan, the Levant and the Hejaz, but they have merged as [ðˤ] in most dialects of the Arabian Peninsula, Iraq and Tunisia and have merged as [dˤ] in Morocco and Algeria. The usage of non-native [
p]پ and [
v]ڤ depends on the usage of each speaker but they might be more prevalent in some dialects than others. The Iraqi and Gulf Arabic also has the sound [
t͡ʃ] and writes it and [ɡ] with the Persian letters چ and گ, as in گوجةgawjah "plum"; چمةchimah "truffle".
Early in the expansion of Arabic, the separate emphatic phonemes [ɮˤ] and [ðˤ] coalesced into a single phoneme [ðˤ]. Many dialects (such as Egyptian, Levantine, and much of the Maghreb) subsequently lost
interdentalfricatives, converting [θ ð ðˤ] into [t d dˤ]. Most dialects borrow "learned" words from the Standard language using the same pronunciation as for inherited words, but some dialects without interdental fricatives (particularly in Egypt and the Levant) render original [θ ð ðˤ dˤ] in borrowed words as [s z zˤ dˤ].
Another key distinguishing mark of Arabic dialects is how they render the original velar and uvular plosives /
d͡ʒ/ (Proto-Semitic /
ɡ/), and /
q/ retains its original pronunciation in widely scattered regions such as Yemen, Morocco, and urban areas of the Maghreb. It is pronounced as a
ʔ] in several
prestige dialects, such as those spoken in Cairo, Beirut and Damascus. But it is rendered as a voiced velar plosive [
ɡ] in Persian Gulf, Upper Egypt, parts of the Maghreb, and less urban parts of the Levant (e.g. Jordan). In Iraqi Arabic it sometimes retains its original pronunciation and is sometimes rendered as a voiced velar plosive, depending on the word. Some traditionally Christian villages in rural areas of the Levant render the sound as [
k], as do Shiʻi Bahrainis. In some Gulf dialects, it is palatalized to [
d͡ʒ] or [
ʒ]. It is pronounced as a voiced uvular constrictive [
ʁ] in Sudanese Arabic. Many dialects with a modified pronunciation for /
q/ maintain the [
q] pronunciation in certain words (often with religious or educational overtones) borrowed from the Classical language.
ج/d͡ʒ/ is pronounced as an affricate in Iraq and much of the Arabian Peninsula but is pronounced [
ɡ] in most of North Egypt and parts of Yemen and Oman, [
ʒ] in Morocco, Tunisia, and the Levant, and [
j], [i̠] in most words in much of the Persian Gulf.
k/ usually retains its original pronunciation but is palatalized to /
t͡ʃ/ in many words in Israel and the Palestinian Territories, Iraq, and countries in the eastern part of the Arabian Peninsula. Often a distinction is made between the suffixes /-ak/ ('you', masc.) and /-ik/ ('you', fem.), which become /-ak/ and /-it͡ʃ/, respectively. In Sana'a, Omani, and Bahrani /-ik/ is pronounced /-iʃ/.
Pharyngealization of the emphatic consonants tends to weaken in many of the spoken varieties, and to spread from emphatic consonants to nearby sounds. In addition, the "emphatic" allophone [
ɑ] automatically triggers pharyngealization of adjacent sounds in many dialects. As a result, it may be difficult or impossible to determine whether a given
coronal consonant is phonemically emphatic or not, especially in dialects with long-distance emphasis spreading. (A notable exception is the sounds /
t/ vs. /
tˤ/ in Moroccan Arabic, because the former is pronounced as an
t͡s] but the latter is not.)
Examples of how the Arabic root and form system works
As in other Semitic languages, Arabic has a complex and unusual
morphology (i.e. method of constructing words from a basic
root). Arabic has a
nonconcatenative "root-and-pattern" morphology: A root consists of a set of bare consonants (usually
three), which are fitted into a discontinuous pattern to form words. For example, the word for 'I wrote' is constructed by combining the root k-t-b 'write' with the pattern -a-a-tu 'I Xed' to form katabtu 'I wrote'. Other verbs meaning 'I Xed' will typically have the same pattern but with different consonants, e.g. qaraʼtu 'I read', akaltu 'I ate', dhahabtu 'I went', although other patterns are possible (e.g. sharibtu 'I drank', qultu 'I said', takallamtu 'I spoke', where the subpattern used to signal the past tense may change but the suffix -tu is always used).
From a single root k-t-b, numerous words can be formed by applying different patterns:
تَكَاتَبْنَا takātabnā 'we corresponded with each other'
أَكْتُبُ 'aktubu 'I write'
أُكَتِّبُ 'ukattibu 'I have (something) written'
أُكَاتِبُ 'ukātibu 'I correspond (with someone)'
أُكْتِبُ 'uktibu 'I dictate'
أَكْتَتِبُ 'aktatibu 'I subscribe'
نَتَكَتِبُ natakātabu 'we correspond each other'
كُتِبَ kutiba 'it was written'
أُكْتِبَ 'uktiba 'it was dictated'
مَكْتُوبٌ maktūbun 'written'
مُكْتَبٌ muktabun 'dictated'
كُتُبٌ kutubun 'books'
كَاتِبٌ kātibun 'writer'
كُتَّابٌ kuttābun 'writers'
مَكْتَبٌ maktabun 'desk, office'
مَكْتَبَةٌ maktabatun 'library, bookshop'
Nouns and adjectives
Nouns in Literary Arabic have three grammatical
genitive [also used when the noun is governed by a preposition]); three
numbers (singular, dual and plural); two
genders (masculine and feminine); and three "states" (indefinite, definite, and
construct). The cases of singular nouns (other than those that end in long ā) are indicated by
suffixed short vowels (/-u/ for nominative, /-a/ for accusative, /-i/ for genitive).
The feminine singular is often marked by ـَة /-at/, which is pronounced as /-ah/ before a pause. Plural is indicated either through endings (the
sound plural) or internal modification (the
broken plural). Definite nouns include all proper nouns, all nouns in "construct state" and all nouns which are
prefixed by the definite article اَلْـ /al-/. Indefinite singular nouns (other than those that end in long ā) add a final /-n/ to the case-marking vowels, giving /-un/, /-an/ or /-in/ (which is also referred to as
Adjectives in Literary Arabic are marked for case, number, gender and state, as for nouns. However, the plural of all non-human nouns is always combined with a singular feminine adjective, which takes the ـَة /-at/ suffix.
Pronouns in Literary Arabic are marked for person, number and gender. There are two varieties, independent pronouns and
enclitics. Enclitic pronouns are attached to the end of a verb, noun or preposition and indicate verbal and prepositional objects or possession of nouns. The first-person singular pronoun has a different enclitic form used for verbs (ـنِي /-nī/) and for nouns or prepositions (ـِي /-ī/ after consonants, ـيَ /-ya/ after vowels).
Nouns, verbs, pronouns and adjectives agree with each other in all respects. However, non-human plural nouns are grammatically considered to be feminine singular. Furthermore, a verb in a verb-initial sentence is marked as singular regardless of its semantic number when the subject of the verb is explicitly mentioned as a noun. Numerals between three and ten show "chiasmic" agreement, in that grammatically masculine numerals have feminine marking and vice versa.
The past and non-past paradigms are sometimes also termed
imperfective, indicating the fact that they actually represent a combination of
aspect. The moods other than the
indicative occur only in the non-past, and the
future tense is signaled by prefixing سَـ sa- or سَوْفَ sawfa onto the non-past. The past and non-past differ in the form of the stem (e.g., past كَتَبـ katab- vs. non-past ـكْتُبـ -ktub-), and also use completely different sets of affixes for indicating person, number and gender: In the past, the person, number and gender are fused into a single
suffixal morpheme, while in the non-past, a combination of
prefixes (primarily encoding person) and suffixes (primarily encoding gender and number) are used. The passive voice uses the same person/number/gender affixes but changes the vowels of the stem.
The following shows a paradigm of a regular Arabic verb, كَتَبَ kataba 'to write'. In Modern Standard, the energetic mood (in either long or short form, which have the same meaning) is almost never used.
For verbs, a given root can occur in many different
derived verb stems (of which there are about fifteen), each with one or more characteristic meanings and each with its own templates for the past and non-past stems, active and passive participles, and verbal noun. These are referred to by Western scholars as "Form I", "Form II", and so on through "Form XV" (although Forms XI to XV are rare). These stems encode grammatical functions such as the
reflexive. Stems sharing the same root consonants represent separate verbs, albeit often semantically related, and each is the basis for its own
conjugational paradigm. As a result, these derived stems are part of the system of
derivational morphology, not part of the
Examples of the different verbs formed from the root كتب k-t-b 'write' (using حمر ḥ-m-r 'red' for Form IX, which is limited to colors and physical defects):
Form II is sometimes used to create transitive
denominative verbs (verbs built from nouns); Form V is the equivalent used for intransitive denominatives.
The associated participles and verbal nouns of a verb are the primary means of forming new lexical nouns in Arabic. This is similar to the process by which, for example, the
English gerund "meeting" (similar to a verbal noun) has turned into a noun referring to a particular type of social, often work-related event where people gather together to have a "discussion" (another lexicalized verbal noun). Another fairly common means of forming nouns is through one of a limited number of patterns that can be applied directly to roots, such as the "nouns of location" in ma- (e.g. maktab 'desk, office' < k-t-b 'write', maṭbakh 'kitchen' < ṭ-b-kh 'cook').
The only three genuine suffixes are as follows:
The feminine suffix -ah; variously derives terms for women from related terms for men, or more generally terms along the same lines as the corresponding masculine, e.g. maktabah 'library' (also a writing-related place, but different from maktab, as above).
nisbah suffix -iyy-. This suffix is extremely productive, and forms adjectives meaning "related to X". It corresponds to English adjectives in -ic, -al, -an, -y, -ist, etc.
nisbah suffix -iyyah. This is formed by adding the feminine suffix -ah onto nisba adjectives to form abstract nouns. For example, from the basic root sh-r-k 'share' can be derived the Form VIII verb ishtaraka 'to cooperate, participate', and in turn its verbal noun ishtirāk 'cooperation, participation' can be formed. This in turn can be made into a nisbah adjective ishtirākī 'socialist', from which an abstract noun ishtirākiyyah 'socialism' can be derived. Other recent formations are jumhūriyyah 'republic' (lit. "public-ness", < jumhūr 'multitude, general public'), and the
Gaddafi-specific variation jamāhīriyyah 'people's republic' (lit. "masses-ness", < jamāhīr 'the masses', pl. of jumhūr, as above).
The spoken dialects have lost the case distinctions and make only limited use of the dual (it occurs only on nouns and its use is no longer required in all circumstances). They have lost the mood distinctions other than imperative, but many have since gained new moods through the use of prefixes (most often /bi-/ for indicative vs. unmarked subjunctive). They have also mostly lost the indefinite "nunation" and the internal passive.
The following is an example of a regular verb paradigm in Egyptian Arabic.
Arabic calligraphy written by a Malay Muslim in Malaysia. The calligrapher is making a rough draft.
The Arabic alphabet derives from the Aramaic through
Nabatean, to which it bears a loose resemblance like that of
Cyrillic scripts to
Greek script. Traditionally, there were several differences between the Western (North African) and Middle Eastern versions of the alphabet—in particular, the faʼ had a dot underneath and qaf a single dot above in the Maghreb, and the order of the letters was slightly different (at least when they were used as numerals).
However, the old Maghrebi variant has been abandoned except for calligraphic purposes in the Maghreb itself, and remains in use mainly in the Quranic schools (
zaouias) of West Africa. Arabic, like all other Semitic languages (except for the Latin-written Maltese, and the languages with the
Ge'ez script), is written from right to left. There are several styles of scripts such as
rayhan, and notably
naskh, which is used in print and by computers, and
ruqʻah, which is commonly used for correspondence.
Originally Arabic was made up of only rasm without diacritical marks Later diacritical points (which in Arabic are referred to as nuqaṯ) were added (which allowed readers to distinguish between letters such as b, t, th, n and y). Finally signs known as Tashkil were used for short vowels known as harakat and other uses such as final postnasalized or long vowels.
Khalil ibn Ahmad al Farahidi finally fixed the Arabic script around 786, many styles were developed, both for the writing down of the Quran and other books, and for inscriptions on monuments as decoration.
Arabic calligraphy has not fallen out of use as calligraphy has in the Western world, and is still considered by
Arabs as a major art form; calligraphers are held in great esteem. Being cursive by nature, unlike the Latin script, Arabic script is used to write down a
verse of the Quran, a
hadith, or a
proverb. The composition is often abstract, but sometimes the writing is shaped into an actual form such as that of an animal. One of the current masters of the genre is
Hassan Massoudy.
In modern times the intrinsically calligraphic nature of the written Arabic form is haunted by the thought that a typographic approach to the language, necessary for digitized unification, will not always accurately maintain meanings conveyed through calligraphy.
There are a number of different standards for the
romanization of Arabic, i.e. methods of accurately and efficiently representing Arabic with the Latin script. There are various conflicting motivations involved, which leads to multiple systems. Some are interested in
transliteration, i.e. representing the spelling of Arabic, while others focus on
transcription, i.e. representing the pronunciation of Arabic. (They differ in that, for example, the same letter ي is used to represent both a consonant, as in "you" or "yet", and a vowel, as in "me" or "eat".) Some systems, e.g. for scholarly use, are intended to accurately and unambiguously represent the phonemes of Arabic, generally making the phonetics more explicit than the original word in the Arabic script. These systems are heavily reliant on
diacritical marks such as "š" for the sound equivalently written sh in English. Other systems (e.g. the
Bahá'í orthography) are intended to help readers who are neither Arabic speakers nor linguists with intuitive pronunciation of Arabic names and phrases. These less "scientific" systems tend to avoid
diacritics and use
digraphs (like sh and kh). These are usually simpler to read, but sacrifice the definiteness of the scientific systems, and may lead to ambiguities, e.g. whether to interpret sh as a single sound, as in gash, or a combination of two sounds, as in gashouse. The
ALA-LC romanization solves this problem by separating the two sounds with a
prime symbol ( ′ ); e.g., as′hal 'easier'.
During the last few decades and especially since the 1990s, Western-invented text communication technologies have become prevalent in the Arab world, such as
personal computers, the
World Wide Web,
bulletin board systems,
instant messaging and
mobile phone text messaging. Most of these technologies originally had the ability to communicate using the Latin script only, and some of them still do not have the Arabic script as an optional feature. As a result, Arabic speaking users communicated in these technologies by transliterating the Arabic text using the Latin script, sometimes known as IM Arabic.
To handle those Arabic letters that cannot be accurately represented using the Latin script, numerals and other characters were appropriated. For example, the numeral "3" may be used to represent the Arabic letter ⟨ع⟩. There is no universal name for this type of transliteration, but some have named it
Arabic Chat Alphabet. Other systems of transliteration exist, such as using dots or capitalization to represent the "emphatic" counterparts of certain consonants. For instance, using capitalization, the letter ⟨د⟩, may be represented by d. Its emphatic counterpart, ⟨ض⟩, may be written as D.
In most of present-day North Africa, the
Western Arabic numerals (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) are used. However, in Egypt and Arabic-speaking countries to the east of it, the
Eastern Arabic numerals (٠ – ١ – ٢ – ٣ – ٤ – ٥ – ٦ – ٧ – ٨ – ٩) are in use. When representing a number in Arabic, the lowest-valued
position is placed on the right, so the order of positions is the same as in left-to-right scripts. Sequences of digits such as telephone numbers are read from left to right, but numbers are spoken in the traditional Arabic fashion, with units and tens reversed from the modern English usage. For example, 24 is said "four and twenty" just like in the German language (vierundzwanzig) and
Classical Hebrew, and 1975 is said "a thousand and nine-hundred and five and seventy" or, more eloquently, "a thousand and nine-hundred five seventy".
Arabic alphabet and nationalism
There have been many instances of national movements to convert Arabic script into Latin script or to Romanize the language. Currently, the only Arabic variety to use Latin script is
The Beirut newspaper La Syrie pushed for the change from Arabic script to Latin letters in 1922. The major head of this movement was
Louis Massignon, a French Orientalist, who brought his concern before the Arabic Language Academy in Damascus in 1928. Massignon's attempt at Romanization failed as the academy and population viewed the proposal as an attempt from the Western world to take over their country.
Sa'id Afghani, a member of the academy, mentioned that the movement to Romanize the script was a
Zionist plan to dominate Lebanon.Said Akl created a Latin-based alphabet for
Lebanese and used it in a newspaper he founded, Lebnaan, as well as in some books he wrote.
After the period of colonialism in Egypt, Egyptians were looking for a way to reclaim and re-emphasize Egyptian culture. As a result, some Egyptians pushed for an Egyptianization of the Arabic language in which the formal Arabic and the colloquial Arabic would be combined into one language and the Latin alphabet would be used. There was also the idea of finding a way to use
Hieroglyphics instead of the Latin alphabet, but this was seen as too complicated to use. A scholar,
Salama Musa agreed with the idea of applying a Latin alphabet to Arabic, as he believed that would allow Egypt to have a closer relationship with the West. He also believed that Latin script was key to the success of Egypt as it would allow for more advances in science and technology. This change in alphabet, he believed, would solve the problems inherent with Arabic, such as a lack of written vowels and difficulties writing foreign words that made it difficult for non-native speakers to learn.Ahmad Lutfi As Sayid and
Muhammad Azmi, two Egyptian intellectuals, agreed with Musa and supported the push for Romanization. The idea that Romanization was necessary for modernization and growth in Egypt continued with Abd Al-Aziz Fahmi in 1944. He was the chairman for the Writing and Grammar Committee for the Arabic Language Academy of Cairo. However, this effort failed as the Egyptian people felt a strong cultural tie to the Arabic alphabet. In particular, the older Egyptian generations believed that the Arabic alphabet had strong connections to Arab values and history, due to the long history of the Arabic alphabet (Shrivtiel, 189) in Muslim societies.
abShachmon, Ori; Mack, Merav (2016). "Speaking Arabic, Writing Hebrew. Linguistic Transitions in Christian Arab Communities in Israel". Wiener Zeitschrift für die Kunde des Morgenlandes. University of Vienna. 106: 223–224.
abcdSemitic languages: an international handbook / edited by Stefan Weninger; in collaboration with Geoffrey Khan, Michael P. Streck, Janet C. E.Watson; Walter de Gruyter GmbH & Co. KG, Berlin/Boston, 2011.
^Bernards, Monique, "Ibn Jinnī", in: Encyclopaedia of Islam, THREE, Edited by: Kate Fleet, Gudrun Krämer, Denis Matringe, John Nawas, Everett Rowson. Consulted online on 27 May 2021
First published online: 2021
First print edition: 9789004435964, 20210701, 2021-4
^Borg and Azzopardi-Alexander (1997). Maltese. Routledge. p. xiii.
ISBN978-0-415-02243-9. In fact, Maltese displays some areal traits typical of Maghrebine Arabic, although over the past 800 years of independent evolution it has drifted apart from Tunisian Arabic
Maltese – an unusual formula.
Archived from the original on 8 December 2015. Retrieved 17 February 2018. Originally Maltese was an Arabic dialect but it was immediately exposed to Latinisation because the Normans conquered the islands in 1090, while Christianisation, which was complete by 1250, cut off the dialect from contact with Classical Arabic. Consequently Maltese developed on its own, slowly but steadily absorbing new words from Sicilian and Italian according to the needs of the developing community.
Thelwall, Robin (2003). "Arabic". Handbook of the International Phonetic Association a guide to the use of the international phonetic alphabet. Cambridge: Cambridge University Press.
Traini, R. (1961), Vocabolario di arabo [Dictionary of Modern Written Arabic] (in Italian), Rome: I.P.O., Harassowitz
Vaglieri, Laura Veccia, Grammatica teorico-pratica della lingua araba, Rome: I.P.O.