Method for Developing Phonetically Rich and Balanced Lexical Corpus
Potential Commercialised
violet
Reg. ID : 16960
Comments
Description
The various embodiments of the present invention provide a method for developing a phonetically rich and balanced lexical corpus. the method comprises accumulating a plurality of sentences from a target language through a plurality of data sources. The sentences collected from the web-source are raw as it contains unstructured data such as duplication of sentences as well as the presence of alien words, special characters, and boilerplates from the extracted sentences. At least one sentence is selected from the accumulated sentences. The selected sentence is phonetically rich and balanced with relatively small database size. A plurality of selected sentences is evaluated for creating a balanced database. The result of the said method is the phonetically rich and balanced lexical corpus that is only a fraction in size of the initial lexical database.
Contact Person/Inventor
Name | Contact Phone | |
---|---|---|
Um Centre Of Innovation And Enterprise (Umcie) | umcie@um.edu.my | 013-2250151 / 03-79677351 |
Award
Award Title | Award Achievement | Award Year Received |
---|---|---|
0 | 0 | 0 |
Comment