Punctuation (recognized by the corresponding Unicode property) that is conventionally written adjacent to the preceding or following word is separated during tokenization.
Some special cases worth mentioning:

An abbreviation marked by a period, as in թ. “year”, becomes two tokens, թ and . .

A compound containing a hyphen becomes three tokens (two words and the hyphen), as in անգլո-ամերիկյան “anglo-american”, պատմա-բանասիրական “historical-philological”.
In these cases, the first token is a special form of adjective that never occurs independently.
Compounds without a hyphen are not split, thus ռազմածովային “navy” is one token but հասարակական-քաղաքական “civic-social” would be three tokens.
Another common case of splitting-on-hyphen are reduplicative or echo words as in մեծ-մեծ “very big”, շուն-մուն “dog or something”.

Inflectional bound morphemes and hypens after phrases or sentences used as names in quotation marks or after abbreviations marked by a period, as in «Երկիր Նաիրի»-ից “from “Yerkir Nairi” or 1937 թ.-ին “in year 1937” are split and are considered as separate tokens: { «ԵրկիրՆաիրի»-ից } and { 1937 , թ , . , - , ին } .
The word before the hypen is the head and the bound morpheme is linked with a goeswith. Tokenizing and segmenting this way seems easier for parsing.

The words that contain “infixed” punctuation (question, exclamation, emphasis and Armenian abbreviation marks), as in ինչո՞ւ “why?”, are considered multi-word tokens and become two tokens, ինչու and ՞ . EXCEPTION is the apostrophe, as in Ժաննա դ՚Արկ “Joan of Arc”, which is split and belongs to the preceding word, { Ժաննա , դ՚ , Արկ }.

Sentence splitting

Each sentence contains only one root.
Splitting is usually performed after an end-of-sentence full stop or after a dot, ellipsis or colon when these punctuation marks separate unrelated subparts of a sentence. Items in a list may sometimes be rendered as separate sentences.