Part of Speech Identifer

About | Source code

Sentence to be tagged

kun
conjunction
ano
pronoun (interrogative)
kadakó
predicative
an
determiner (common)
butones
referential
sugad
predicative
man
clitic (class 3)
an
determiner (common)
kadákó
predicative
han
determiner (common)
ohales
referential


Reference: Tagging of Verbal predicates

Part of Speech Example
tr.ir.imp.(transitive.irrealis.imperative)-kuha, get (some thing)
tr.ir.prt. (transitive.irrealis.partitive)-kuhai, get some
tr.ir.inty.(transitive.irrealis.intiretive)-kuhaa, get the entire thing
tr.ir.prd. (transitive.irrealis.predictive)-kuhaon, will get s.t.
intr.irrealis.prd.(intransitive.irrealis.predictive.-makuha, will get.s.t.
intr.ir.dcd. (intransitive.irrealis.decided)-tikuha, will get s.t.
tr.r.n. (transitive.realis.neutral)-nakuha, got s.t. (long syllable on ku, naKUha)
intr.r.n.(intransitive.realis.neutral)-nakuha, is getting s.t. (long syllables both first & 2nd, NAKUha)
tr.r.cntrl.(transitive.realis.controlled)-kinuha, got s.t.
intr.r.cntrl. (intransitive.realis.controlled)-kinuha, got s.t. (long syllable on oth first & 2nd, KINUha) (Eastern samar variety); -kumuha, got s.t.(Leyte-Samar variety)
tr.r.del.(transitive.realis.deliberate)-ginkuha, got s.t.


Methodology of the Waray Part of Speech Identifier

This algorithm is based on principles outlined by Voltaire Oyzon in "A Corpus-based study of the morphosyntactic functions of Waray substantive lexical items" (2020). It uses a dictionary of known syntax (location in clause) and morphology (prefix, suffix) patterns in the Waray language to evaluate 23 rules (below). It then applies a scoring system to estimate the probability of predicate (verb), referential (noun), or modificative (adjective) of the target word.

Common modifiers (e.g., "la," "pa,", "gad", "ngay-an") are often inserted between substantive words that would indicate part of speech. Therefore, the algorithm ignores these when evaluating syntax. For example, it will parse "gin-aanak pa la hiya" as "gin-aanak hiya," and can identify that a pronoun ("hiya") is following the word "gin-aanak".

For similar reasons, clausal beginnings ("kun", "kay", "ano") are ignored. For example, "Kun ano kadakó an butones sugad man an kadákó han ohales" will consider "kadakó" the beginning of the clause for the purposes of identifying part of speech.

The part of speech of adjacent words often indicates a word's likely part of speech. For example, a clause is less likely to have a predicate adjacent to another predicate, rathern than adjacent to a modifier or referential. The algorithm therefore evaluates the part of speech of adjacent words to predict the target word's part of speech. It achieves this in two ways: first, it checks the Waray dictionary for the adjacent word's part of speech; if the word's part of speech is not found in the dictionary, it then applies its own part-of-speech guessing algorithm to the adjacent word (the algorithm uses itself to improve its guess for the target word!). If it can establish a high probability of that word's part of speech, it then can apply 'adjacency' rules to the target word.

At present, this algorithm does not evaluate infixes, which are a common feature of Filipino languages (e.g., "palit" [buy] becomes "pumalit" [bought] with the infix "um"). In order for the algorithm to evaluate infixes, it would need a dictionary of Waray word roots. This effort is underway.

A future planned enhancement is to add corpus similarity comparison to the algorithm: if a sample sentence can be found in the corpus that demonstrates sufficient similarlity (e.g., the adjacent words in the sample sentence are the same as in the evaluated sentence), the dictionary's part of speech can be calculated into the probability.

A caveat about part of speech tagging algorithms: they cannot be 100% accurate. To give just one example from English, in the sentence "Working late into the night is draining," the word "working" functions as a referential. However, if the same clause is located in "Working late into the night, Mark was drained," now "working" functions as a predicative.

RuleExample (target word is underlined)Weight
Target word is preceded by a pronoun...harayo kaupay han akon gindakoanm: 0 points
r: 2 points
p: 0 points
Target word is followed by a pronounDinako ako nga waray kag-anakm: 0 points
r: 0 points
p: 1 points
Target word is preceded by "nga"...makit-an it' usa nga burodm: 0 points
r: 1 points
p: 0 points
Target word is followed by "nga"Nakabati ka na han burod nga lalaki nga baboy?m: 5 points
r: 1 points
p: 0 points
Predicative precedes target wordHarayo na gud ngay-an an ak' ginkaturungan kagab-i nga salidam: 3 points
r: 3 points
p: 0 points
Modificative precedes target wordWaray palad nga maraut, waray palad nga maupay.m: 0 points
r: 3 points
p: 3 points
Referential precedes target wordKay it' babayi nabuburod man;m: 3 points
r: 0 points
p: 3 points
Predicative follows target wordm: 3 points
r: 3 points
p: 0 points
Modificative follows target wordm: 0 points
r: 3 points
p: 3 points
Referential follows target wordm: 3 points
r: 0 points
p: 3 points
Following word likely indicates target is predicativeMin, magpapatron nam: 0 points
r: 0 points
p: 3 points
Following word suggests target is predicativeKun ano kadakó an butones sugad man an kadákó han ohales.m: 0 points
r: 0 points
p: 1 points
Preceding word likely indicates target is referentialm: 0 points
r: 3 points
p: 0 points
Preceding word suggests target is referentialWaray hunong an dalagan.m: 0 points
r: 1 points
p: 0 points
Preceding word likely indicates target is modificativem: 3 points
r: 0 points
p: 0 points
Target word begins clauseNagtadong hiya ngan nag-asawam: 0 points
r: 0 points
p: 3 points
Target word ends clauseDii liwat pwede sumakob it' táwom: 0 points
r: 3 points
p: 0 points
Prefix likely indicates predicativeNakit-an ko hi Papa Jesus nga gin-aanak pa la hiyam: 0 points
r: 0 points
p: 4 points
Prefix suggests predicativeDidto han tabo ha Palo an ak' tawgi napalit mo intawon
Nagtadong hiya ngan nag-asawa
m: 0 points
r: 0 points
p: 1 points
Prefix suggests modificativem: 1 points
r: 0 points
p: 0 points
Suffix suggests predicativem: 0 points
r: 0 points
p: 1 points
Suffix suggests modificative, less likely referentialm: 2 points
r: 1 points
p: 0 points
Suffix suggests referentialWaray hunong an dalagan.m: 0 points
r: 1 points
p: 0 points