
Evaluation of methods was done on a validation set of 108 pairs.Īlignment of the word vectors was performed with the Procrustes method in which an orthogonal weight matrix is learned to map source to target word vectors (Conneau et al., 2018) and with the Retrieval Criterion following (Joulin et al., 2018). A dictionary consisting of 1097 word pairs was scraped and manually edited for supervised alignment. Given the absence of parallel data, we performed Unsupervised Translation, which relies on cross-lingual embeddings. We initialized word vectors with Glove and fine tuned on the corpus with a CBOW model trained with 8 negative samples, window size of 5, dimension of 300 and a batch size of 3000 for 5 epochs. dem say na serious gbege if dem catch anybody with biabia for inside di campus dis one na one of di first songs wey commot dis year for nigeria but as dem release am, yawa dey. Below are examples of sentences in the corpus: In total, we obtained a corpus consisting of 56048 sentences and 32925 unique words by scraping pidgin news websites. Obtaining Corpus and Training Word Vectors Unsupervised Machine Translation from Pidgin to Englishġ.Cross-lingual embedding of Pidgin and English.Provision of a Pidgin corpus and training of Pidgin Word Vectors.The problems this research addresses are the following: There are over 75 million speakers in Nigeria alone, however, there is no known Natural Language Processing work on this language. Despite the obvious diversity amongst these languages, one language significantly unifies them all - Pidgin English. Over 1000 languages are spoken across West and Central Africa, with over 250 of them being Nigerian. However, little work has been done on African languages. Many Machine Translation works have focused on popular languages like English, French, German, Chinese and so on. Translation is an important area of research in Artificial Intelligence, and, most of all, communication.
Pidgin english translator code#
You can skip to the Results section at the end of the article to see some example translations by our model and the link to the code on github. TLDR: We trained a model that can translate sentences from West African Pidgin (Creole) to English - and vice versa - without showing it a single parallel sentence (a Pidgin sentence and its English equivalent) to learn from.
