Yoruba Language Task Description:
The task is to develop low-level NLP tools for Standard Yorùbá.
Standard Yorùbá is language spoken by over 35 million people out of 200 million Nigerian population and in other countries like Benin, Togo and Ghana, Cote D’ivoire, Sudan and Sierra-Leone. Outside Africa, a great number of speakers of the language are in Brazil, Cuba, Haiti, Caribbean Islands, Trinidad and Tobago, UK and America.
It is among the under-resourced languages in the world i.e. languages for which limited digital resources exist; and thus, languages whose computerization poses unique challenges. These challenges include the non-availability of: electronic lexica, standardized electronic corpus, and NLP tools such as Part of Speech (POS) tagger, Multilevel Segment Tokenizer, Morphology Analyzer (MA).
For this task, we are providing medium data sets for Yorùbá in order to develop tokenizer, morphological analyzer and Automatic Language Identification system.
Multilevel Segment Tokenizer (NER) is defined as the task of decomposing stream of text into its units’ segment (i.e. phone, syllable and word level)
Morphology Analyzer (MA) is an important task of NLP used for quick and accurate analysis of text for automatic translation. The task involves breaking down of words into morphemes and grammatical constituents.
Automatic Language Identification (ALI) is a system that detect language that a document is written. ALI has a variety of applications e.g. Text Processing Techniques to real world data, Information Storage and Retrieval, Detection of language of a document for machine translation.
The task has 3
4.B Morphology Analyzer
4 C Automatic Language Identification
The link to data set for the task will be made available soon.
The standard evaluation metrics for evaluating and ranking the teams will be macro-averaged F1 scores.
The simple probabilistic baseline (the most frequent tags get assigned to each token) will be provided by the organizers.
Training data set will be made available by 22nd April, 2019. Other deadlines are as per the workshop schedule.
Results will be made available as per the workshop schedule
Paper submission instructions will be same as for the workshop
If you have any queries regarding this task, please refer to the task organizers:
Adeyanju Sosimi email@example.com
Sunday O. Ojo <OjoSO@tut.ac.za>