Description

Installation

Unzip the file and set your CORE_NLP environment variable to point to the directory.

Install pynlp from pip

pip3 install pynlp

Quick Start

Launch the server

Lauch the StanfordCoreNLPServer using the instruction given here. Alternatively, simply run the module.

python3 -m pynlp

By default, this lauches the server on localhost using port 9000 and 4gb ram for the JVM. Use the --help option for instruction on custom configurations.

Example

Let's start off with an excerpt from a CNN article.

text=('GOP Sen. Rand Paul was assaulted in his home in Bowling Green, Kentucky, on Friday, ''according to Kentucky State Police. State troopers responded to a call to the senator\'s ''residence at 3:21 p.m. Friday. Police arrested a man named Rene Albert Boucher, who they ''allege "intentionally assaulted" Paul, causing him "minor injury". Boucher, 59, of Bowling ''Green was charged with one count of fourth-degree assault. As of Saturday afternoon, he ''was being held in the Warren County Regional Jail on a $5,000 bond.')

Annotate text

The nlp instance is callable. Use it to annotate the text and return a Document object.

document=nlp(text)print(document)# prints 'text'

Sentence splitting

Let's test the ssplit annotator. A Document object iterates over its Sentence objects.

forindex,sentenceinenumerate(document):print(index,sentence,sep=' )')

Output:

0) GOP Sen. Rand Paul was assaulted in his home in Bowling Green, Kentucky, on Friday, according to Kentucky State Police.
1) State troopers responded to a call to the senator's residence at 3:21 p.m. Friday.
2) Police arrested a man named Rene Albert Boucher, who they allege "intentionally assaulted" Paul, causing him "minor injury".
3) Boucher, 59, of Bowling Green was charged with one count of fourth-degree assault.
4) As of Saturday afternoon, he was being held in the Warren County Regional Jail on a $5,000 bond.

Lemmatization

Coreference resultion

Let's use pynlp to find the first CorefChain in the text.

chain=document.coref_chains[0]print(chain)

Output:

((GOP Sen. Rand Paul))-[id=4] was assaulted in (his)-[id=5] home in Bowling Green, Kentucky, on Friday, according to Kentucky State Police.
State troopers responded to a call to (the senator's)-[id=10] residence at 3:21 p.m. Friday.
Police arrested a man named Rene Albert Boucher, who they allege "(intentionally assaulted" Paul)-[id=16], causing him "minor injury.

In the string representation, coreferences are marked with parenthesis and the referent with double parenthesis.
Each is also labelled with a coref_id. Let's have a closer look at the referent.

ref=chain.referentprint('Coreference: {}\n'.format(ref))forattrin'type','number','animacy','gender':print(attr,getattr(ref,attr),sep=': ')# Note that we can also index coreferences by idassertchain[4].is_referent