Combiner la linguistique et la génétique pour mieux comprendre l’évolution des langues

Il est souvent difficile d’interpréter un corpus en linguistique sans avoir recours aux autres domaines de recherche. Une percée dans les recherches en linguistique nécessite ainsi une nouvelle approche. Partant de ce postulat, le projet AnR LANGECO s'est efforcé de briser la barrière entre sciences humaines et sciences naturelles en réexaminant la diversité des langues à la lumière des résultats existants de la recherche en linguistique et en génétique.
Plus spécifiquement le projet a étudié la région frontalière entre les provinces de Gansu et Qinghai en Chine. Coïncidant avec une section de la Route de la Soie, cette aire linguistique présente une situation très complexe : différents groupes ethniques y habitent et les langues parlées, qui appartiennent à des familles de langue différentes, ont influencé de manière spectaculaire la langue chinoise.

Une meilleure compréhension du Dongxiang et la découverte d’une langue peu connue

L’équipe a étudié la (non)corrélation entre les langues et les gènes de plusieurs populations de cette région. Leurs travaux fournissent une image claire de l'histoire de la migration humaine entre l'Asie centrale et l'Asie de l'Est, permettant de mieux comprendre la divergence (langues de la même origine) et la convergence (langues différentes dans une zone linguistique) de langues. Les résultats en linguistique, histoire et biologie (génétique) ont montré que Dongxiang est une nouvelle langue qui appartient au groupe de langues mongoles et dont le lexique originel a été remplacé petit à petit. L’équipe a aussi étudié une langue (Xuejiawan) inconnue (connue comme une langue non chinoise) avant ses investigations.
Ces travaux ont donné lieu à une trentaine de publications internationales, en chinois et anglais. L’équipe continue à travailler en élargissant le champ de la recherche et de la collaboration. Un travail conjoint avec le ministère de l’Education en Chine a ainsi débuté courant 2016.

Quatre livres liés au projet sont ou seront publiés 

- 2014. Tangwang hua yanjiu [Etudes sur la langue Tangwang] Beijing: Minzu chubanshe. 472p. (Dan XU, auteur seul)
- 2017 Languages and genes in Northwestern China and adjacent regions. Singapour: Springer Nature. 156p. (Auteur et co-éditeur avec Hui Li).
- 2017 Tangwang-A case study from interdisciplinary perspective. Dordrecht: Springer. 190p. (Dan XU, auteur seul)
- 2017 Yuyan jiechu yu yuyan bianyi. [Contacts de langue et variations] Beijing: Shangwu yinshuguan [Commercial Press]. (Auteur et co-éditeur avec Jingqi Fu). 


Language mixing and replacement due to migration from Central Asia to Northwest China​

The Gansu-Qinghai area was the most important migration corridor between Central and East Asia. The languages and populations of this region have been competing, mixing and merging for centuries. We have focused on three languages and populations, Dongxiang (Santa), Bao’an and Salar, to illustrate the situation. Dongxiang and Bao’an are Mongolic languages, while Salar is categorized as a Turkic language.

The region targeted by our studies is located around the Gansu-Qinghai border, and partially overlaps the Silk Road. The Gansu-Qinghai area is famously the site of the Majiayao (3300-2100 BC) and Qijia (2200-1600 BC) cultures. The Gansu-Qinghai area is inhabited by different ethnic groups: Amdo Tibetan, Chinese (Han), Hui (Muslims), Dongxiang (Santa), Bao’an (Baonan), Monguor (Tu), Eastern Yugur, Western Yugur, Salar, etc. From a linguistic point of view, the Dongxiang, Bao’an, Monguor and Eastern Yugur languages belong to the Mongolic group of the Altaic family (still disputed), while Western Yugur and Salar are part of the easternmost branch of Turkic, and the Amdo Tibetan group is classified in the Sino-Tibetan family.
Sinitic languages show a clear convergence towards non-Han languages: borrowing is not limited to words and word orders, but also includes morphology and even parts of the phonological system. At a syntactic level, these languages have changed from SVO word order to SOV order. Morphologically, these languages have begun to display a case marking system, and mark plurality without regard to a noun’s human feature. These new morphological and syntactic means are categorically different from other Sinitic languages. As for the phonological system, which is in general the most difficult to change, some Sinitic languages in this region have begun the process of tone simplification, or have even lost their tone system, while tones are phonemic in the phonology of Chinese. In summary, language contact has impacted the languages (both Han and non-Han) spoken in this area in both directions: Sinitic languages tend to preserve their vocabulary remarkably well, but adopt syntactic means from non- Chinese languages; while non-Chinese languages have a tendency to borrow Sinitic language vocabulary but preserve their syntax (Xu 2017).
The Dongxiang, Bao’an and Salar populations came from Central Asia (and some from Western Asia) and today speak different languages. The Dongxiang and Bao’an languages belong to the Mongolic family, while Salar is a Turkic language. The Dongxiang and Bao’an languages formed due to violent historical events. Historical documents indicate that at the beginning of the 13th century, a huge number of migrants were brought to China by Genghis Khan’ armies and later by his descendants. There were young men forced to enroll in the army, as well as artisans of different handicrafts. This migration has left a mark on Chinese history, and even on East Asian history (Liu 2003: 143). The Dongxiang and Bao’an migrants were forced to learn Mongolian and today one finds few traces of Central Asian languages among these people. These Central Asians were Mongolized and their languages formed from 13th century Mongolian. This in part explains why their Mongolic languages are more similar to Ancient Mongolian, i.e. that attested in the Secret History of Mongolia, a document dated to 1240, rather to other contemporary Mongolic dialects. Actually, the Dongxiang and Bao’an languages were replaced languages even though these languages are classified as Mongolic languages. In previous research, scholars often dealt with these Arabic, Turkic and Persian words in the Dongxiang and Bao’an languages as “loan words”. We have shown that these words reflect the substrate of their ancestral languages rather than loan words (Xu et al. 2013).
Salar people came from Central Asia and oral legends tell us that two brothers of “Salur tribe” in Samarkand (Uzbekistan) led their families in migrating to the East on white camels to flee persecution by local rulers. It is interesting to consider that speakers of the Salar language, part of the easternmost branch of Turkic languages, hav preserved their ancestral language(s) relatively well in a linguistically isolated environment. Geographically, the Salar language is surrounded by Sinitic languages to the East, and Mongolic, especially Amdo Tibetan languages to the West, North and South. In other words, even though their language has borrowed almost 32% (Xu and Wen 2017: 71) of its words from Chinese, the syntax remains Turkic, unlike the Dongxiang and Bao’an languages which completely lost their Turkic, Arabic, and Persian ancestral languages.
During the formation of the Dongxiang, Bao’an and Salar groups, other ethnic groups joined them forming populations with diverse origins. The genetic data (see details in Xu and Wen 2017) converges with historical documents indicating that the main paternal lineage of these three populations came from Central Asia (haplogroup R, colored orange in the pie charts) and Western Asia (haplogroups I and J in blue). Since Y chromosomes are only transmitted from father to son and males drive migration, it is assumed that Y-chromosomes and language coevolution match better than M-mitochondria and languages (Comas et al. 2008 among others). The other haplogroups representing the typical East Asian Y-DNA such as C, D, N and O, might have mixed with these migrants on their migration route or during subsequent settlement in Northwest China. The paternal lineages observed in these three populations today reflect a long admixture of local people. But the dominant lineage of these three population’s paternal gene pools was identified as coming from Central and West Asia.
Our research has shown that the complex linguistic situation in the Gansu-Qinghai area is closely related to the composition of their speaker populations. The historical migration from Central Asia to China mentioned above impacted each of these populations and languages differently. Adopting an interdisciplinary approach linking linguistics, molecular biology, and history will help us to provide a historical description which should be closer and more faithful to reality.
Dan XU
