kun conjunction | ano pronoun (interrogative) | kadakó predicative | an determiner (common) | butones referential | sugad predicative | man clitic (class 3) | an determiner (common) | kadákó predicative | han determiner (common) | ohales referential |
Part of Speech | Example |
---|---|
tr.ir.imp.(transitive.irrealis.imperative) | -kuha, get (some thing) |
tr.ir.prt. (transitive.irrealis.partitive) | -kuhai, get some |
tr.ir.inty.(transitive.irrealis.intiretive) | -kuhaa, get the entire thing |
tr.ir.prd. (transitive.irrealis.predictive) | -kuhaon, will get s.t. |
intr.irrealis.prd.(intransitive.irrealis.predictive. | -makuha, will get.s.t. |
intr.ir.dcd. (intransitive.irrealis.decided) | -tikuha, will get s.t. |
tr.r.n. (transitive.realis.neutral) | -nakuha, got s.t. (long syllable on ku, naKUha) |
intr.r.n.(intransitive.realis.neutral) | -nakuha, is getting s.t. (long syllables both first & 2nd, NAKUha) |
tr.r.cntrl.(transitive.realis.controlled) | -kinuha, got s.t. |
intr.r.cntrl. (intransitive.realis.controlled) | -kinuha, got s.t. (long syllable on oth first & 2nd, KINUha) (Eastern samar variety); -kumuha, got s.t.(Leyte-Samar variety) |
tr.r.del.(transitive.realis.deliberate) | -ginkuha, got s.t. |
This algorithm is based on principles outlined by Voltaire Oyzon in "A Corpus-based study of the morphosyntactic functions of Waray substantive lexical items" (2020). It uses a dictionary of known syntax (location in clause) and morphology (prefix, suffix) patterns in the Waray language to evaluate 23 rules (below). It then applies a scoring system to estimate the probability of predicate (verb), referential (noun), or modificative (adjective) of the target word.
Common modifiers (e.g., "la," "pa,", "gad", "ngay-an") are often inserted between substantive words that would indicate part of speech. Therefore, the algorithm ignores these when evaluating syntax. For example, it will parse "gin-aanak pa la hiya" as "gin-aanak hiya," and can identify that a pronoun ("hiya") is following the word "gin-aanak".
For similar reasons, clausal beginnings ("kun", "kay", "ano") are ignored. For example, "Kun ano kadakó an butones sugad man an kadákó han ohales" will consider "kadakó" the beginning of the clause for the purposes of identifying part of speech.
The part of speech of adjacent words often indicates a word's likely part of speech. For example, a clause is less likely to have a predicate adjacent to another predicate, rathern than adjacent to a modifier or referential. The algorithm therefore evaluates the part of speech of adjacent words to predict the target word's part of speech. It achieves this in two ways: first, it checks the Waray dictionary for the adjacent word's part of speech; if the word's part of speech is not found in the dictionary, it then applies its own part-of-speech guessing algorithm to the adjacent word (the algorithm uses itself to improve its guess for the target word!). If it can establish a high probability of that word's part of speech, it then can apply 'adjacency' rules to the target word.
At present, this algorithm does not evaluate infixes, which are a common feature of Filipino languages (e.g., "palit" [buy] becomes "pumalit" [bought] with the infix "um"). In order for the algorithm to evaluate infixes, it would need a dictionary of Waray word roots. This effort is underway.
A future planned enhancement is to add corpus similarity comparison to the algorithm: if a sample sentence can be found in the corpus that demonstrates sufficient similarlity (e.g., the adjacent words in the sample sentence are the same as in the evaluated sentence), the dictionary's part of speech can be calculated into the probability.
A caveat about part of speech tagging algorithms: they cannot be 100% accurate. To give just one example from English, in the sentence "Working late into the night is draining," the word "working" functions as a referential. However, if the same clause is located in "Working late into the night, Mark was drained," now "working" functions as a predicative.
Rule | Example (target word is underlined) | Weight |
---|---|---|
Target word is preceded by a pronoun | ...harayo kaupay han akon gindakoan | m: 0 points r: 2 points p: 0 points |
Target word is followed by a pronoun | Dinako ako nga waray kag-anak | m: 0 points r: 0 points p: 1 points |
Target word is preceded by "nga" | ...makit-an it' usa nga burod | m: 0 points r: 1 points p: 0 points |
Target word is followed by "nga" | Nakabati ka na han burod nga lalaki nga baboy? | m: 5 points r: 1 points p: 0 points |
Predicative precedes target word | Harayo na gud ngay-an an ak' ginkaturungan kagab-i nga salida | m: 3 points r: 3 points p: 0 points |
Modificative precedes target word | Waray palad nga maraut, waray palad nga maupay. | m: 0 points r: 3 points p: 3 points |
Referential precedes target word | Kay it' babayi nabuburod man; | m: 3 points r: 0 points p: 3 points |
Predicative follows target word | m: 3 points r: 3 points p: 0 points | |
Modificative follows target word | m: 0 points r: 3 points p: 3 points | |
Referential follows target word | m: 3 points r: 0 points p: 3 points | |
Following word likely indicates target is predicative | Min, magpapatron na | m: 0 points r: 0 points p: 3 points |
Following word suggests target is predicative | Kun ano kadakó an butones sugad man an kadákó han ohales. | m: 0 points r: 0 points p: 1 points |
Preceding word likely indicates target is referential | m: 0 points r: 3 points p: 0 points | |
Preceding word suggests target is referential | Waray hunong an dalagan. | m: 0 points r: 1 points p: 0 points |
Preceding word likely indicates target is modificative | m: 3 points r: 0 points p: 0 points | |
Target word begins clause | Nagtadong hiya ngan nag-asawa | m: 0 points r: 0 points p: 3 points |
Target word ends clause | Dii liwat pwede sumakob it' táwo | m: 0 points r: 3 points p: 0 points |
Prefix likely indicates predicative | Nakit-an ko hi Papa Jesus nga gin-aanak pa la hiya | m: 0 points r: 0 points p: 4 points |
Prefix suggests predicative | Didto han tabo ha Palo an ak' tawgi napalit mo intawon Nagtadong hiya ngan nag-asawa | m: 0 points r: 0 points p: 1 points |
Prefix suggests modificative | m: 1 points r: 0 points p: 0 points | |
Suffix suggests predicative | m: 0 points r: 0 points p: 1 points | |
Suffix suggests modificative, less likely referential | m: 2 points r: 1 points p: 0 points | |
Suffix suggests referential | Waray hunong an dalagan. | m: 0 points r: 1 points p: 0 points |