A Probabilistic Representation of Systemic
Functional Grammar

Text-only Preview

A Probabilistic Representation of Systemic Functional GrammarRobert MunroEndangered Languages ArchiveDepartment of LinguisticsSchool of Oriental and African StudiesUniversity of London[email protected]AbstractThe notion of language as probabilistic is well known within Systemic Functional Linguistics.Aspects of language are discussed as meaningful tendencies, not as deterministic rules. In pastcomputational representations of functional grammars, this probabilistic property has typicallybeen omitted. This paper will present the results of a recent project aimed at the computationallearning, representation and application of a fundamentally probabilistic functional grammar.Recent advances in machine learning have allowed the large scale inference of truly probabilis-tic representations of language for the ﬁrst time. In this work, a machine learning algorithm isdeveloped that learns aspects of a functional grammar from labeled text. This is representedprobabilistically, in the sense that there is a measurable gradation of functional realisation be-tween all categories. Looking at a single term, this allows that term to be described as realisingmultiple functions simultaneously. Looking at all the terms in a text or register, this allows usto examine the relationships between the functions with respect to the closeness and/or over-lap of functions, and the extent to which these relationships differ between different texts orregisters. With a focus on function within the noun phrase (nominal group), the methodologyis shown to infer an accurate description of functional categories that classiﬁes new exampleswith above 90% accuracy, even across registers of text that are very different from the text thatwas learned on. Importantly, the learner is deliberately restricted from remembering speciﬁcwords, so that the functions are (necessarily) learned and represented in terms of features suchas part-of-speech, context and collocational tendencies. This restriction allows the successfulapplication to different registers and demonstrates that function is much more a product ofcontext than a property of the words themselves. The inferred grammar is also shown to haveinteresting applications in the analysis of layers of delicacy. The discovery of ﬁner delicaciesoccurs with a high level of sophistication, indicating a potential for the automated discoveryand representation of lexis as most delicate grammar.1IntroductionResearch describing functional grammars is often prefaced with strong assertions that the grammars (andtherefore the systems, constraints, constituencies and dependencies) are probabilistic, with aspects oflanguage variously described as a gradational, fuzzy and/or cline (Hasan, 1987; Halliday, 1994; Tucker,1998; Fawcett, 2000; Halliday, 2002). While functional categories have long been described as mean-ingful tendencies in a continuous space, these shades of grammar have rarely been explored.More commonly ‘probabilistic linguistics’ is used to refer to conﬁdences across multiple deterministicmodels, or within a single deterministic model (probability of constituency) rather than a single grada-tional model. This is largely because probabilistic parsing techniques have grown out of deterministictheories.The functions of modiﬁcation within the noun phrase (nominal group) provide good examples fordescribing such gradations. Classiﬁers such as those in ‘the 1,000 metre race’ and ‘the red wine’ stillfunction close to the Numerative and Epithet from which they originated, and will typically realise bothfunctions. Gradient representations of function are necessary to describe this gradience of realisation.Even where individual instances of functional modiﬁcation are not gradationally realised, gradationalmodelling is still necessary. Common solutions for describing some new object/concept include creatinga new word (often through compounding), creating a new sense for an existing word or using multiplewords. Combinations of the three are possible, as can be seen in the phrase ‘notebook computer’. ‘Note-book’ was created as a compound, ‘notebook computer’ became a multi-word entity and now ‘notebook’alone has the new sense of a type of computer. There is little ambiguity between Epithets, Classiﬁersand Things here, but given that the uptake of the new term/sense will not be uniform and that a givenperson’s use may not be consistent (they may only use the new sense of ‘notebook’ in the context ofcomputers). This shows that the computational modelling of nominals still needs to be gradational inmodelling across deterministic instances.It might be assumed that part-of-speech is a good indicator of functional modiﬁcation, giving an insightinto the part of the world we represent in a noun phrase (the experiential metafunction of nominals), withexceptions being rare or idiosyncratic. Previous functional parsers have relied on this assumption. In thiswork, it is demonstrated that assuming the unmarked functions given by part-of-speech and word orderwill only account for about half the instances of Classiﬁers in the registers investigated here, showingthat more sophisticated modelling is required for computational representations.The difﬁculty in building a fundamentally probabilistic model of a grammar lies in deﬁning the gra-dations. Deﬁning a probability distribution across two or more categories in terms of a large number offeatures is a difﬁcult manual task, and it is not surprising that previous models have relied on computa-tional processing over labelled data to calculate these. Machine learning is the most popular method forcombining this with the ability to predict new instances. In this work, a new machine learning algorithm,Seneschal, is developed that models tendencies in the data as an optimal number of soft clusters, usingthe probability of membership of a cluster to make supervised classiﬁcations of new data.The most sophisticated models utilising machine learning have been probabilistic-context-free gram-mars and stochastic grammars that have focused their interpretation of results on the accuracy of theinferred syntax (Bod, 1993; Collins, 1999; Charniak, 2000; Johnson, 2003). In a functional lexicogram-mar this roughly corresponds to only the logical metafunction (although the feature spaces used are muchricher, and gradational models have been suggested (Aarts, 2004; Manning, 2003)) but similar techniquescan be used for modelling more complicated functional relationships.In Systemic Functional Grammar (SFG), computational representations and applications of artiﬁcialintelligence are not new, but most work in this area has focussed on language generation (Mann andMattheissen, 1985; Matthiessen and Bateman, 1991) and machine learning has not previously been usedin the inference of a functional grammar.1 The most well-known systemic parser is WAG (O’Donnell,1994). It was the ﬁrst parser to implement a full SFG formalism and it performed both parsing and textgeneration. Drawing from work with context free grammars, it treated the grammar as deterministic,giving good but limited coverage. It didn’t attempt the disambiguation of the unmarked cases of thefunctions of words. There have been a number of earlier implementations of SFG parsers, but withmore limited coverage, (Kasper, 1988; O’Donoghue, 1991; Dik, 1992). For German, Bohnet, Klatt andWanner implemented a successful method for the identiﬁcation of Deictics, Numeratives, Epithets andClassiﬁers within the noun phrase by implementing a bootstrapping algorithm that relied on the generalordering of the functions (Bohnet et al., 2002). They were able to assign a function to 95% of words, witha little under 85% precision. A more extensive review of related work can be found in Munro (2003b).2Machine Learning for Linguistic AnalysisSupervised machine learning algorithms are typically used as black boxes, restricted to classifying inde-pendent categories or ﬂat structures (for an exception in computational linguistics see (Lane and Hen-derson, 2001)). Unsupervised machine learning is a technique for ﬁnding meaningful rules, clustersand/or trends in unlabeled data and are more commonly used to discover fuzzy (soft), hierarchical and/orconnectionist structures. As such, the goal of unsupervised learning is often analysis, not classiﬁcation.In this work, unsupervised and supervised learning are combined so that a single model can be de-scribed in both its ability to identify functions and to provide information for detailed analysis.Here, we seek to discover ﬁner layers of delicacy by looking for meaningful clusters within eachfunction. In SFG ‘delicacy’ describes the granularity chosen in describing a given function. For example,in Table 1, the terms ‘one’ and ‘ﬁrst’ both function as Numeratives, but could have been broken down intothe more delicate functions of Quantitatives and Ordinatives respectively. As more delicate functions aresought, more constraints and tendencies can be described, and therefore we can build a more informativemodel.3Scope of StudyThis study explored functional categories across all groups/phrases of English, but only those of thenoun phrase are described here. See Munro (2003b) for the results and analysis of the other functions.Examples of nominal functions taken from the corpus used here are given in Table 1.Deﬁnitions are drawn from Halliday (1994), Matthiessen (1995) and O’Donnell (1998). Below we de-scribe the functions that are the target of the supervised classiﬁcation (in bold), and those that were/couldbe discovered through unsupervised learning at ﬁner layers of delicacy (in italics):Deictic: Deictics ﬁx the noun phrase in relation to the speech exchange, usually through the orientationof the speaker. At a ﬁner layer of delicacy this includes Demonstratives, (‘this’, ‘that’, ‘those’), andPossessives, (‘my’, ‘their’, ‘Dr Smith’s’).Ordinative: An Ordering Numerative, (‘ﬁrst’, ‘2nd’, ‘last’).Quantitative: A Quantitative Numerative, (‘one’, ‘2’, ‘many’, ‘few’, ‘more’). They may used Discur-sively, (‘the 12 championships’) or simply be Tabulated results, which was common here due to thechoice of registers.1Machine learning has been used to learn formal grammars that include functional constraints such as Lexical FunctionalGrammar (Bresnan, 2001), a theory that is also still evolving. Its F-structure could be described as a functional grammar bysome (or arguably many) deﬁnitions. Describing the relationship between LFG and SFG theories is outside the scope of thispaper, but it is a comparison that is probably overdue.DeicticNumerativeEpithetClassiﬁerThingthethirdfastesttimetheAtlanta OlympicsBurundi’s5,000 metreschampiontheirﬁrstWorld CupColombia’sformerteambossthedefendingchampionacontroversialﬁnaltheSupermanridingstylethree ﬁrst-roundmatchesthebronzemedalrealdatasetsrobustparametricmethodsasinglemicroarraychipthebootstrappedversionherownfortunesthesmooth unmarkedoutlinealittleparchmentvolumethisonesceneTable 1: Example of functional categoriesEpithet: Describes some quality or process. At a ﬁner layer of delicacy there are Attitudinal Epithets,(‘the ugly lamp’), and Experiential Epithets, (‘the red lamp’). They are most commonly realised byan adjective, but are also commonly realised by a verb, (‘the running water’).Classiﬁer: Describes a sub-classiﬁcation. Classiﬁers are commonly realised by a noun, (‘the tablelamp’), a verb, (‘the running shoe’), or an adjective, (‘the red wine’), but other realisations are alsopossible. Classiﬁers are commonly thought of as providing a taxonomic function, a HyponymicClassiﬁer. They may also be used to expand the description of the Head: an Expansive Classiﬁer(Matthiessen, 1995). The latter are classiﬁcations that can more easily be reworded as Qualiﬁersor expanded clauses, for example, ‘knee surgery’ can be re-written as ‘surgery of the knee’. In thework described here, they were a particularly interesting cases, as they allowed anaphoric referenceof non-Head terms, (‘she underwent knee surgery after it was injured...’).Thing: Typically the semantic head of the phrase. Some entity, be it physical, (‘the lamp’), or abstract,(‘the idea’), undergoing modiﬁcation by the other noun phrase constituents. Delicacies within Thinginclude into Countable and non-Countable, Named Entities (First, Intermediate and Last Names),and those simply realised by nouns and non-nouns. Of all the functions in the noun phrase, variationin function of the Thing corresponds most strongly with variation in the function of the phrase suchas the Referring and Informing functions of a noun phrase (the heads of such phrases are calledStated and Described Things respectively). When a noun phrase is realised by a single word, thefunction is best described in terms of the function of phrase.4Testing Framework4.1AlgorithmSeneschal is a hybrid of supervised and unsupervised clustering techniques. It has been demonstratedto be generally suited to the efﬁcient supervised classiﬁcation and analysis of various data sets (Munro,2003a). Similar to the EM algorithm and Bayesian learning, it seeks to describe the data in terms of anInformation Measure (IM), combining agglomerative and hierarchical clustering methods.Given an item i with value iα for categorical attribute α, and given that iα occurs in cluster C withfrequency f (iα, C), within the data set T , i’s information measure for n categorical attributes for C withsize s(C) is given by:nf (iIM (i, C) =α, C ) + 1−ln(1)f (i ,T )αα=1s(C) + (1 −)f (i ,T )α−s(T )Given an item i with value iβ for continuous attribute β, i’s information measure for n continuousattributes for a cluster C that for attribute β that has an average of µCβ and standard deviation of σCβ isgiven by:n(iIM (i, C) =β − µC β)2(2)2σβ=1C β2The algorithm maps to an SFG in the following ways:1. It is probabilistic, giving a gradation of membership across all categories.2. The algorithm treats all classes independently. If the feature space describes two classes as overlap-ping, this will be apparent in the model, capturing the overlapping categories. This is particularlyimportant here, as we need a learner that represents each class as accurately as possible. A learnerthat only represents categories by deﬁning boundaries between them goes against our knowledge ofmultiple and gradational realisation.23. The discovery of the optimal number of clusters within a class maps to the task of describing theemergent ﬁner layers of delicacy within a function.4. Beyond a minimum threshold, the algorithm is not frequency sensitive, so it will not intrinsicallyfavour the patterns of realisation of functions in the training corpus. This makes it more appro-priate than other algorithms that seek to discover an optimal number of clusters by strong a prioriassumptions of optimal cluster size.34.2CorporaOne training corpus and four test corpora were used. The process of manually tagging the corpus withthe correct functions took about 20 hours, performed by two linguists with input from domain experts inthe ﬁelds of bio-informatics and motor sports. Here, we simply labelled a term with its most dominantfunction.4The training corpus comprised 10,000 words of Reuters sports newswires from 1996. It was chosenbecause Reuters is one of the most common sources of text used in Computational Linguistics, and thechoice of only sports newswires was motivated by two factors: taking the corpus from only one register2In terms of Aart’s deﬁnitions of Subsective and Intersective Gradience (2004), the probability of cluster membershipdescribed here is Subsective Gradience, and the cross-cluster costs are Intersective Gradience. Note that if the clusters werenot formed independently and prevented from overlapping, then the probability of membership could not be thought of asSubsective Gradience as the cluster (category) would be partially deﬁned in terms of its intersection with other categories.3In the work reported here, assuming that the relative frequencies of the categories are the same in the test set is the equivalentto the learner assuming that all text is sports newswires. This is a well-known problem in natural language processing, known asdomain dependence, and the algorithm described goes some way in addressing the problems. Gradation not wholly dependenton observed frequency is, in itself, a desirable quality when dealing with sparse data.4It would be interesting to see how explicitly deﬁning gradient membership for the training data would affect the modellearned, but this would be a complicated task in a largely untested area of machine-learning.was desirable for testing purposes, and sports terminology is known to be an interesting and difﬁcultone to study as, for example, it is necessary to learn that a ‘test match’ is a type of cricket match and a‘1,000 metre race’ is a type of race (this is what allows ‘I won the 1,000’ and ‘They played the test’ tobe grammatical).Four testing corpora were used, all of approximately 1,000 words. The register (domain) dependenceof NLP tasks is well known so they were drawn from a variety of registers:1. Reuters sports newswires from 1996 (Reuters-A), from the same corpus as the training set.2. Reuters sports newswires from 2003 (Reuters-B). This is presumed to be the same register, but isincluded to test the extent to which ‘topic shift’ is overcome.3. Bio-Informatics abstracts (BIO-INF), to test the domain dependence of results in a register with ahigh frequency of rare words/phrases, and with some very large and marked Classiﬁer constructions.4. An excerpt from a modernist ﬁction (MOD-FIC), ‘The Voyage Out’, Virginia Woolf (1915), to testthe domain dependence of results on an Epithet frequent register.4.3Featurespart-of-speech : POS was assign mxpost (Ratnaparkhi, 1996). It was modelled to a context window oftwo words. The standard codes for POS are used here.POS augmentations : Features representing capitalisation and type of number were used, as mxpostover assigned NNP’s to capitalised words, and under-assigned numbers. Number Codes: NUM =only numerals, WRD = word equivalent of a numeral, MIX = a mix, eg ‘6-Jan’, ‘13th’.punctuation : Features were included that represented punctuation occuring before and after the term.Punctuation itself was not treated as a token.collocational tendencies : Features were included that represented the collocational tendencies of aterm with the previous and following words and the ratio between them. These were obtained auto-matically using the alltheweb search engine, as it reports the number of web documents containinga searched term, and could therefore be used to automatically extract measures from a large source.For two terms ‘A’ and ‘B’, this is given by the number of documents containing both ‘A’ and ‘B’,divided by the number of documents containing the bi-gram ‘A B’.repetition : (self-co-occurrence) The observed percentage of documents containing a term that con-tained more than one instance of that term. These were taken from a large corpus of about onehundred thousand documents of Reuters newswires, Bio-Informatics abstracts, and the full ‘TheVoyage out’ split into equivalent sized chunks.phrase context and boundary : The following and previous phrase types were included , as was theterm’s position in its own phrase.The words themselves were omitted from the study to demonstrate that functions are not simply aproperty of a word (like most parts-of-speech) but a product of context. It is expected that allowing thealgorithm to learn that a certain word has previously had a certain function would give a small increasesin accuracy but a substantial increase in domain dependency.Other additional features were considered, such as the use of lexico-semantic ontologies and morecomplex modelling of repetition, but were not included here to simplify the analysis (or were investigatedindependently).Figure 1: Gradational realisation: the IM costs between functions, at two layers of delicacy5AnalysisThe raw accuracy of classifying functions within the noun phrase was 89.9%. The accuracy of a parseronly seeking to describe unmarked functions based on part-of-speech and word order would classify with82.6% accuracy on these test corpora, so the method here almost halved the error of existing methods.This baseline was reached by Seneschal after only 5% of the data was seen (the overall accuracy for allother group/phrase types was over 95%).A confusion matrix (number of cross-categorical errors) doesn’t capture the probabilistic nature ofthe distribution. Here, the gradations are measured as the average IM cost for assigning items betweenall clusters/functions. Figure 1 represents the pairwise calculations of gradations topographically. Thetop map shows the relationships between the targets of the supervised task, the bottom map betweenmore delicate clusters/functions. If there were no probabilistic boundaries between the functions, themaps in would be a diagonal series of white peaks on a black background with the height of a peakFunctionSigniﬁcant FeaturesExamplesDemonstrativepos: DT=80%, PRP=15%‘a’,prev phrs: prep=56%, verb=36%‘the’,next phrs: prep=44%, noun=23%‘these’Possessivepos: DT=32%, POS=25%, NNP=24%‘our’,prev phrs: noun=48%, prep=39%‘The Country Club’s’next phrs: verb=57%, prep=32%Tabularnum type: NUM=69%, MIX=18%,‘1, 2, 20’(Quantitative)prev phrs: noun=100%phrs end: yes=92%coll prev: (ave= 0.10, var= 0.04)coll next: (ave= 0.05, var= 0.01)Discursivenum type: WRD=39%, MIX=26%‘2 cars,’(Quantitative)prev phrs: prep=40%, verb=30%‘the twelve championships,’phrs end: yes=41%coll prev: (ave= 0.02, var= 0.00)coll next: (ave= 0.11, var= 0.06)Ordinativenum type: ORD=88%, WRD=8%‘the third fastest’,prev phrs: prep=42%, verb=36%‘the top four’phrs end: yes=23%coll prev: (ave= 0.24, var= 0.09)coll next: (ave= 0.22, var= 0.15)Table 2: Properties of the Deictic and Numerative functionsrepresenting how tightly that function was deﬁned by the features. In this study, the signiﬁcance ofdiscovered delicacies is precisely the difference in the complexity of the two maps in Figure 1.The ordering of the functions in Figure 1 is simply the general observed ordering. What is not repre-sented in Figure 1 is the attributes that were the most signiﬁcant in distinguishing the various functions,that is, the attributes that contributed most signiﬁcantly to a given ‘valley’.The remainder of this section describes the more delicate functions, including the features that werethe most signiﬁcant in distinguishing them. It is important to remember that these features are both adescription of that function and the reason that Seneschal identiﬁed them, and that co-signiﬁcant featuresare also features that correlated with each other for that function.5.1Deictics and NumerativesThere were two clusters/functions discovered within the Deictic function corresponding well to the moredelicately described functions of Demonstratives and Possessives. The Possessive cluster containedmostly Genitives in the form of embedded noun phrases. The proﬁles of these and the Numerativesare given in Table 2. While differentiation in the part-of-speech distributions are as expected, the phrasecontext is particularly interesting, as it shows that the Possessives are more likely to occur in the Subjectposition, given by their being more likely to occur before a verb phrase and after a noun phrase.The Quantitative function was divided into two sub-clusters, here simply labelled ‘Tabular’ and ‘Dis-cursive’ as they are divided along the lines of reported results and modiﬁers within a phrase. As Figure 1shows, the relationship between the tabulated numbers and the other functions is the least probabilistic.It would be easy to assume that no relationship existed between them at all, but they do leak into eachother in the ﬁnal phrase in sentences like ‘Fernando Gonzalez beat American Brian Vahaly 7-5, 6-2’.The Ordinatives differentiate themselves from the Quantitatives by possessing particularly strong col-locational tendencies with the previous words, as an Ordinative is much more likely to require exactdetermination from a small selection of closed-group words.FunctionSigniﬁcant FeaturesExamplesEpithetpos: JJ=78%, RB=4%, JJR=4%‘erratic play’,prev pos: DT= 47%, IN= 10%‘bigger chance’repetitn: (ave= 0.26, var= 0.04)coll prev: (ave= 0.16, var= 0.05)coll next: (ave= 0.20, var= 0.13)Expansivepos: JJ=34%, NN=31%, NNP=16%‘knee surgery’,(Classiﬁer)prev pos: IN= 30%, NN= 16%‘optimization problems’repetitn: (ave= 0.42, var= 0.06)coll prev: (ave= 0.02, var= 0.00)coll next: (ave= 0.34, var= 0.19)Hyponymicpos: NN=53%, JJ=17%, NNP=14%‘the gold medal’,(Classiﬁer)prev pos: JJ=37%, DT=27%‘theworld3,000metresrepetition: (ave= 0.47, var= 0.04)record’coll prev: (ave= 0.26, var= 0.15)coll next: (ave= 0.30, var= 0.16)Table 3: Properties of the Epithet and Classiﬁer functions5.2Epithets and ClassiﬁersThe difference between Attitudinal and Experiential Epithets is probably the most common example ofdelicacy given in the literature. Nonetheless, either the attributes failed to capture this, the learner failedto ﬁnd it or it wasn’t present in the corpora, as this distinction was not discovered.The proﬁles for Epithets and Classiﬁers are in Table 3. Within Classiﬁers, the clusters describe Clas-siﬁers that corresponded well to the functions of Expansive and Hyponymic Classiﬁcation.Expansive Classiﬁers are more closely related to Epithets, and Hyponymic Classiﬁers more closelyrelated to multi-word Things, so the distinction is roughly along the lines of marked and unmarkedClassiﬁers, although both contain a considerable percentage of marked cases realised by adjectives. It isinteresting that Figure 1 shows that the difference between the types of Classiﬁers is one of the most welldeﬁned indicating that the adjectives realising marked Hyponymic Classiﬁers were conﬁdently identiﬁed.Hyponymic Classiﬁers are much more likely to occur in compound or recursive Classifying structures(Matthiessen, 1995), which is why they exhibit strong collocational tendencies with the previous word,while the Expansive Classiﬁers exhibit almost none. As expected, the collocational tendencies with thefollowing word was greater for Classiﬁers than for Epithets, although the variance is also quite high.The selection of parts-of-speech context also differs between functions. While the Hyponymic Clas-siﬁers seem to follow adjectives, and therefore are likely to follow other Classiﬁers or Epithets, the Ex-pansive Classiﬁers most commonly follow a preposition, indicating that they are likely to occur withouta Deictic or Numerative and without sub-modiﬁcation.Epithets generally occur more frequently than Classiﬁers, so the probability of repetition of a Classiﬁerwithin a document being almost twice as high is especially signiﬁcant.5.3ThingThe clusters that were discovered can be roughly divided between those describing Named Entities(First, Intermediate and Last Names), those with the phrase realised by a single word (corresponding toNominative and non-Nominative functions within the clause) and nominals corresponding to the Refer-ring and Informing functions of a noun phrase. The properties of the Named Entity and Nominative/non-Nominative functions are well-known and there were few surprises in the features describing them here.Here, we investigate the relative frequencies of functional modiﬁcation of Stated and DescribedThings, assumming that most are some combination of Referring and Informing functions (O’Donnell,FunctionSigniﬁcant FeaturesExamplesStatedphrs start: yes=2%‘media questions’,(Thing)pos: NN= 67%, NNS=30%‘the invitation’,prev pos: JJ= 32%, DT= 27%, NN= 16%‘such comparisons’prev phrs: prep=46%, verb=33%, noun=12%next phrs: prep=45%, noun=21%, verb=10%coll prev: (ave= 0.31, var= 0.16)coll next: (ave= 0.08, var= 0.02)Describedphrs start: yes=58%‘20.67 seconds’,(Thing)pos: NN= 45%, NNS=13%‘former winner’,prev pos: J= 21%, NN= 21%, NNP= 19%‘our implementation’prev phrs: noun=78%, verb=12%, prep=8%next phrs: noun=91%, verb=4%, conj=2%coll prev: (ave= 0.10, var= 0.05)coll next: (ave= 0.01, var= 0.00)Table 4: Properties of the Stated/Described Things1998). The distinction between the two may be seen in the choices made within the Deictic and Classi-ﬁcation systems of delicacy. While the Stated Thing is twice as likely to be modiﬁed by a Deictic, over80% of these are Demonstratives, which don’t feature in the Described’s modiﬁcations. This trend is re-versed for Classiﬁers. The Described Things are more than twice as likely to be modiﬁed by a Classiﬁer,and within this over 70% of cases are Expansive, as opposed to about 25% for the Stated Things.As Figure 1 shows, the trend of Hyponymic Classiﬁers being more closely related to the Thing is re-versed for the Stated Things: unlike other Things, a Stated Thing is most closely related to an ExpansiveClassiﬁer. An explanation for this reversal is that it represents that a Hyponymic Classiﬁer may itselfundergo Classiﬁcation while an Expansive Classiﬁer generally does not, although the Stated thing seemsto deﬁne a number of aberrant ‘hills and valleys’ with the intersection of the other functions in Figure 1,indicating it may represent something more complicated.Not described in Figure 2 is that the percent of Epithets is much less than the percentage of precedingadjectives given in Table 4, indicating that markedness is common to both. The fact that the Statedis twice as likely as the Described to be modiﬁed Epithetically indicates that the labels given to themare not quite sufﬁcient in describing the complexities of the differences. This also demonstrates thatat ﬁner layers of delicacy, the variation in function can quickly become very emergent, even when thecorresponding parts-of-speech and other surface-level phenomena independently differ only slightly.5.4Inference of unmarked functionThe inference of unmarked function and register variation was investigated using traditional methods ofcalculation from categorical analysis techniques. Precision is the percentage of classiﬁcations made thatwere correct. Recall is the percentage of actual target classes that were correctly identiﬁed. An Fβ=1value is the harmonic average of the two.The baseline here was deﬁned as that given by an assumption of unmarked function, that is, the optimalresult given by word order and part-of-speech.It might be assumed that functions within a noun phrase are typically umarked. This work is the ﬁrstempirical investigation of this assumption and shows it to be false: less than 40% of non-ﬁnal adjectivesrealized Epithets; less than 50% of Classiﬁers were nouns; and 44% of Classiﬁers were marked. Whilethe relative frequency of the various functions varied between registers (Munro, 2003b), the ratio ofmarked to unmarked function was consistent. The only functions with a Fβ=1 baseline above 0.7 acrossall registers were Diectics and Things. For these two functions word order and close-group word-listscould have produced the same results without part-of-speech knowledge.

Publication Overview

The notion of language as probabilistic is well known within Systemic Functional.. read more

Embed HTML

Set your desired dimension then copy the code below to your blog/website.

Width:

Height:

Code:

A Probabilistic Representation of Systemic Functional Grammar

The notion of language as probabilistic is well known within Systemic Functional Linguistics. Aspects of language are discussed as meaningful tendencies, not as deterministic rules. In past computational representations of functional grammars, this probabilistic property has typically been omitted. This paper will present the results of a recent project aimed at the computational learning, representation and application of a fundamentally probabilistic functional grammar. Recent advances in machine learning have allowed the large scale inference of truly probabilistic representations of language for the first time.