A finite state transducer for morphological segmentation of Swahili verbs/ (Record no. 88540)

MARC details
000 -LEADER
fixed length control field 03469nam a2200217 4500
003 - CONTROL NUMBER IDENTIFIER
control field KE-MeUCS
005 - DATE AND TIME OF LATEST TRANSACTION
control field 20240430113817.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 240409b xxu||||| |||| 00| 0 eng d
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number 000000
040 ## - CATALOGING SOURCE
Transcribing agency KE-MeUCS
Modifying agency KE-MeUCS
050 ## - LIBRARY OF CONGRESS CALL NUMBER
Classification number QA76.76.M38 2023
100 ## - MAIN ENTRY--PERSONAL NAME
Personal name Muthee George Mutwiri
245 ## - TITLE STATEMENT
Title A finite state transducer for morphological segmentation of Swahili verbs/
Statement of responsibility, etc George Mutwiri Muthee
260 ## - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of publication, distribution, etc Meru
Name of publisher, distributor, etc George Mutwiri Muthee
Date of publication, distribution, etc 2023
300 ## - PHYSICAL DESCRIPTION
Extent xi, 12op
500 ## - GENERAL NOTE
General note A thesis submitted in partial fulfillment of requirements for the conferment of the degree of Masters of Science in Computer Science of Meru University of Science and Technology
520 ## - SUMMARY, ETC.
Summary, etc Morphological segmentation is a subtask of natural language processing (NLP) that specializes in identifying the constituent morphemes of words in a language. As a subtask of morphological analysis, morphological segmentation is a crucial preprocessing step that improves the overall output of the NLP system. Low resource languages have recently received attention in research, with researchers aiming to improve NLP in these languages. However, the finer details within languages are often overlooked, which has led to low quality results among a variety of studies. Since morphological analysis is an important task in NLP, extracting the morphological syntax of verbs thus remains crucial. In this work, the researcher demonstrates a finite state transducer for the morphological segmentation of the Swahili verb. This research work set out to achieve four objectives, namely to analyze key parameters for the morphological segmentation techniques of Swahili verbs, to implement a web scraper to populate a dataset of Swahili verbs, to integrate morphological segmentation parameters into a finite state transducer for Swahili verb segmentation, and to validate the finite state transducer on the Swahili verb. The model performs morphological analysis of the Swahili verb by identifying morphological slots such as the subject, object, derivational suffixes, and any grammatical errors within the verb. It was implemented as a finite state network built out of regular expressions in an object-oriented programming (OOP) language. The same finite state transducer was also implemented in the Xerox Finite State Tools (XFST). Input verbs were extracted from an online dictionary using a web scrapper and separated into two datasets. Dataset A comprised 163 simple Swahili verbs while dataset B comprised 715 non-Arabic verbs. The OOP model outperformed its XFST counterpart, achieving a 98.77% accuracy on dataset A and 68.67% accuracy on dataset B. The results from the experiments prove that OOP rule-based techniques perform better than their XFST-based counterparts. The research work was quantitative, with the accuracy of the models evaluated using experiments. This work is beneficial in optimizing search engines that use Swahili, where verbal keywords need to be segmented to obtain their root. This work is also pivotal in assisting learners new to Swahili in understanding the structure of the verb and enabling them to explore possible combinations of morphemes that make up a correctly formed verb. Further, the work significantly contributes towards the development of a spell checker, a corpus and a syntax analyzer for Swahili verbs.
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element finite state transducer
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element morphological segmentation of Swahili Verbs
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme Library of Congress Classification
Koha item type Thesis
Cataloguer Intern
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Damaged status Not for loan Home library Current library Shelving location Date acquired Source of acquisition Cost, normal purchase price Cataloger Total Checkouts Full call number Barcode Date last seen Price effective from Koha item type
    Library of Congress Classification     Meru University Meru University Periodical Section 09/04/2024 - 0.00 Intern   QA76.76.M38 2023 24-37888 09/04/2024 09/04/2024 Thesis


Meru University of Science and Technology | P.O. Box 972-60200 Meru. | Tel 020 2092048 Fax 0208027449 | Email: library@must.ac.ke