Part of Speech Identifer

About | Source code

Sentence to be tagged

kun
conjunction
ano
pronoun (interrogative)
kadakó
predicative
an
determiner (common)
butones
referential
sugad
predicative
man
clitic (class 3)
an
determiner (common)
kadákó
predicative
han
determiner (common)
ohales
referential


Reference: Tagging of Verbal predicates

Part of Speech Example
tr.ir.imp.(transitive.irrealis.imperative)-kuha, get (some thing)
tr.ir.prt. (transitive.irrealis.partitive)-kuhai, get some
tr.ir.inty.(transitive.irrealis.intiretive)-kuhaa, get the entire thing
tr.ir.prd. (transitive.irrealis.predictive)-kuhaon, will get s.t.
intr.irrealis.prd.(intransitive.irrealis.predictive.-makuha, will get.s.t.
intr.ir.dcd. (intransitive.irrealis.decided)-tikuha, will get s.t.
tr.r.n. (transitive.realis.neutral)-nakuha, got s.t. (long syllable on ku, naKUha)
intr.r.n.(intransitive.realis.neutral)-nakuha, is getting s.t. (long syllables both first & 2nd, NAKUha)
tr.r.cntrl.(transitive.realis.controlled)-kinuha, got s.t.
intr.r.cntrl. (intransitive.realis.controlled)-kinuha, got s.t. (long syllable on oth first & 2nd, KINUha) (Eastern samar variety); -kumuha, got s.t.(Leyte-Samar variety)
tr.r.del.(transitive.realis.deliberate)-ginkuha, got s.t.


Methodology of the Waray Part of Speech Identifier

This algorithm is based on principles outlined by Voltaire Oyzon in "A Corpus-based study of the morphosyntactic functions of Waray substantive lexical items" (2020). It uses a dictionary of known syntax (location in clause) and morphology (prefix, suffix) patterns in the Waray language to evaluate 23 rules (below). It then applies a scoring system to estimate the probability of predicate (verb), referential (noun), or modificative (adjective) of the target word.

Common modifiers (e.g., "la," "pa,", "gad", "ngay-an") are often inserted between substantive words that would indicate part of speech. Therefore, the algorithm ignores these when evaluating syntax. For example, it will parse "gin-aanak pa la hiya" as "gin-aanak hiya," and can identify that a pronoun ("hiya") is following the word "gin-aanak".

For similar reasons, clausal beginnings ("kun", "kay", "ano") are ignored. For example, "Kun ano kadakó an butones sugad man an kadákó han ohales" will consider "kadakó" the beginning of the clause for the purposes of identifying part of speech.

The part of speech of adjacent words often indicates a word's likely part of speech. For example, a clause is less likely to have a predicate adjacent to another predicate, rathern than adjacent to a modifier or referential. The algorithm therefore evaluates the part of speech of adjacent words to predict the target word's part of speech. It achieves this in two ways: first, it checks the Waray dictionary for the adjacent word's part of speech; if the word's part of speech is not found in the dictionary, it then applies its own part-of-speech guessing algorithm to the adjacent word (the algorithm uses itself to improve its guess for the target word!). If it can establish a high probability of that word's part of speech, it then can apply 'adjacency' rules to the target word.

At present, this algorithm does not evaluate infixes, which are a common feature of Filipino languages (e.g., "palit" [buy] becomes "pumalit" [bought] with the infix "um"). In order for the algorithm to evaluate infixes, it would need a dictionary of Waray word roots. This effort is underway.

A future planned enhancement is to add corpus similarity comparison to the algorithm: if a sample sentence can be found in the corpus that demonstrates sufficient similarlity (e.g., the adjacent words in the sample sentence are the same as in the evaluated sentence), the dictionary's part of speech can be calculated into the probability.

A caveat about part of speech tagging algorithms: they cannot be 100% accurate. To give just one example from English, in the sentence "Working late into the night is draining," the word "working" functions as a referential. However, if the same clause is located in "Working late into the night, Mark was drained," now "working" functions as a predicative.

RuleExample (target word is underlined)Weight
Target word is preceded by a pronoun...harayo kaupay han akon gindakoanm: 0 points
r: 2 points
p: 0 points
Target word is followed by a pronounDinako ako nga waray kag-anakm: 0 points
r: 0 points
p: 1 points
Target word is preceded by "nga"...makit-an it' usa nga burodm: 0 points
r: 1 points
p: 0 points
Target word is followed by "nga"Nakabati ka na han burod nga lalaki nga baboy?m: 5 points
r: 1 points
p: 0 points
Predicative precedes target wordHarayo na gud ngay-an an ak' ginkaturungan kagab-i nga salidam: 3 points
r: 3 points
p: 0 points
Modificative precedes target wordWaray palad nga maraut, waray palad nga maupay.m: 0 points
r: 3 points
p: 3 points
Referential precedes target wordKay it' babayi nabuburod man;m: 3 points
r: 0 points
p: 3 points
Predicative follows target wordm: 3 points
r: 3 points
p: 0 points
Modificative follows target wordm: 0 points
r: 3 points
p: 3 points
Referential follows target wordm: 3 points
r: 0 points
p: 3 points
Following word likely indicates target is predicativeMin, magpapatron nam: 0 points
r: 0 points
p: 3 points
Following word suggests target is predicativeKun ano kadakó an butones sugad man an kadákó han ohales.m: 0 points
r: 0 points
p: 1 points
Preceding word likely indicates target is referentialm: 0 points
r: 3 points
p: 0 points
Preceding word suggests target is referentialWaray hunong an dalagan.m: 0 points
r: 1 points
p: 0 points
Preceding word likely indicates target is modificativem: 3 points
r: 0 points
p: 0 points
Target word begins clauseNagtadong hiya ngan nag-asawam: 0 points
r: 0 points
p: 3 points
Target word ends clauseDii liwat pwede sumakob it' táwom: 0 points
r: 3 points
p: 0 points
Prefix likely indicates predicativeNakit-an ko hi Papa Jesus nga gin-aanak pa la hiyam: 0 points
r: 0 points
p: 4 points
Prefix suggests predicativeDidto han tabo ha Palo an ak' tawgi napalit mo intawon
Nagtadong hiya ngan nag-asawa
m: 0 points
r: 0 points
p: 1 points
Prefix suggests modificativem: 1 points
r: 0 points
p: 0 points
Suffix suggests predicativem: 0 points
r: 0 points
p: 1 points
Suffix suggests modificative, less likely referentialm: 2 points
r: 1 points
p: 0 points
Suffix suggests referentialWaray hunong an dalagan.m: 0 points
r: 1 points
p: 0 points

Baseline tests for part of speech algorithm

WordSentenceActualPredictionResultConfidenceRationale
ninangIni hi Mak-Mak kinanhi, an babayi ikakasal ha bulan han yana nga june, gin-iimbitar ak' usa nga ninang, an upod hi Grace.rrm: 0% p: 0% r: 70%PASS
  • Target word ends clause
  • Target word is preceded by "nga"
dalaganWaray hunong an dalagan.rrm: 0% p: 0% r: 100%PASS
  • Target word ends clause
  • Preceding word suggests target is referential
  • Suffix suggests referential
NagtadongNagtadong hiya ngan nag-asawappm: 0% p: 100% r: 0%PASS
  • Target word begins clause
  • Target word is followed by a pronoun
  • Prefix suggests predicative
magpapatronMin, magpapatron nappm: 20% p: 70% r: 10%PASS
  • Following word likely indicates target is predicative
  • Prefix likely indicates predicative
  • Suffix suggests modificative, less likely referential
tadonghan iya tadong nga balanmmm: 63% p: 0% r: 38%PASS
  • Target word is preceded by a pronoun
  • Target word is followed by "nga"
nánayHala gad bga baga nauli na an nánay nakadungog hia ngatitinuok an iya anakrrm: 43% p: 0% r: 57%PASS
  • Preceding word suggests target is referential
  • Predicative follows target word
nanayAmo adto hi nanay nagmatarrm: 33% p: 11% r: 56%PASS
  • Target word is preceded by a pronoun
  • Prefix suggests predicative
  • Predicative follows target word
NánayNánay ka na.ppm: 0% p: 70% r: 0%PASS
  • Target word begins clause
  • Target word is followed by a pronoun
nánayAmo baya ini an nánay nga halasmmm: 71% p: 0% r: 29%PASS
  • Preceding word suggests target is referential
  • Target word is followed by "nga"
táwoDii liwat pwede sumakob it' táworrm: 0% p: 0% r: 70%PASS
  • Target word ends clause
matáwoIni usa nga matáwo ha akon balay amo an magigin akon sumuronod.ppm: 0% p: 67% r: 33%PASS
  • Target word is preceded by "nga"
  • Following word suggests target is predicative
  • Prefix suggests predicative
táwoNakilal-an an nasabi nga táwo nga hi X nga nakita nga makuri hidapkan.mmm: 71% p: 0% r: 29%PASS
  • Target word is preceded by "nga"
  • Target word is followed by "nga"
anakDadayawon nire usa nga masinugtanon nga anakrrm: 0% p: 0% r: 70%PASS
  • Target word ends clause
  • Target word is preceded by "nga"
gin-aanakNakit-an ko hi Papa Jesus nga gin-aanak pa la hiyappm: 0% p: 83% r: 17%PASS
  • Target word is preceded by "nga"
  • Target word is followed by a pronoun
  • Prefix likely indicates predicative
anakIni ka han tikaiha na kay tungod iton nga an ira problema iton usa nga anak nga kabayommm: 71% p: 0% r: 29%PASS
  • Target word is preceded by "nga"
  • Target word is followed by "nga"
ginsisiringIto hiya an ginsisiring nga may-ada healthy lifestylemmm: 63% p: 13% r: 25%PASS
  • Preceding word suggests target is referential
  • Target word is followed by "nga"
  • Prefix suggests predicative
gindakóan...harayou kaupay han akon gindakóanrrm: 0% p: 14% r: 86%PASS
  • Target word ends clause
  • Target word is preceded by a pronoun
  • Prefix suggests predicative
  • Suffix suggests referential
kadakóKun ano kadakó an butones sugad man an kadákó han ohales.ppm: 0% p: 100% r: 0%PASS
  • Target word begins clause
  • Following word suggests target is predicative
  • Prefix suggests predicative
damoLinalauman an damo pa nga LGUs ug iba pa nga regulatory officesmmm: 71% p: 0% r: 29%PASS
  • Preceding word suggests target is referential
  • Target word is followed by "nga"
napalitDidto han tabo ha Palo an ak' tawgi napalit mo intawonppm: 38% p: 63% r: 0%PASS
  • Target word is followed by a pronoun
  • Prefix suggests predicative
  • Referential precedes target word
ginpalitTa, kay an bucket an am' ginpalit nga tag-80 di na la nam' babaydan.mmm: 42% p: 33% r: 25%PASS
  • Target word is preceded by a pronoun
  • Target word is followed by "nga"
  • Prefix likely indicates predicative