Modelling Claim Language

Playing around with natural language processing has given me the confidence to attempt some claim language modelling. This may be used as a claim drafting tool or to process patent publication data. Here is a short post describing the work in progress.

Here, a caveat: this modelling will be imperfect. There will be claims that cannot be modelled. However, our aim is not a “perfect” model but a model whose utility outweighs its failings. For example, a model may be used to present suggestions to a human being. If useful output is provided 70% of the time, then this may prove beneficial to the user.

To start we will keep it simple. We will look at system or apparatus claims. As an example we can take Square’s payment dongle:

1. A decoding system, comprising:

a decoding engine running on a mobile device, the decoding engine in operation decoding signals produced from a read of a buyer’s financial transaction card, the decoding engine in operation accepting and initializing incoming signals from the read of the buyer’s financial transaction card until the signals reach a steady state, detecting the read of the buyer’s financial transaction card once the incoming signals are in a steady state, identifying peaks in the incoming signals and digitizing the identified peaks in the incoming signals into bits;

and

a transaction engine running on the mobile device and coupled to the decoding engine, the transaction engine in operation receiving as its input decoded buyer’s financial transaction card information from the decoding engine and serving as an intermediary between the buyer and a merchant, so that the buyer does not have to share his/her financial transaction card information with the merchant.

Let’s say a claim consists of “entities”. These are roughly the subjects of claim clauses, i.e. the things in our claim. They may appear as noun phrases, where the head word of the phrase is modelled as the core “entity”. They may be thought of as “objects” from an object-oriented perspective, or “nodes” in a graph-based approach.

In the above claim, we have core entities of:

“a decoding system”

“a decoding engine”

“a transaction engine”

An entity may have “properties” (i.e. “is” something) or may have other entities (i.e. “have” something).

In our example, the “decoding system” has the “decoding engine” and the “transaction engine” as child entities. Or put another way, the “decoding engine” and the “transaction engine” have the “decoding system” as a parent entity.

In the example, the properties of the entities are more complex. The “decoding system” does not have any. It just has the child entities. The “decoding engine” “is”:

“running on a mobile device”

“in operation decoding signals produced from a read of a buyer’s financial transaction card”

“in operation accepting and initializing incoming signals from the read of the buyer’s financial transaction card until the signals reach a steady state”

“detecting the read of the buyer’s financial transaction card once the incoming signals are in a steady state”

“identifying peaks in the incoming signals and digitizing the identified peaks in the incoming signals into bits”

In these “is” properties, we have a number of implicit entities. These are not in our claim but are referred to by the claim. They are basically the other nouns in our claim. They include:

“mobile device”

“read”

“buyer’s financial transaction card”

“signals”

“peaks”

“bits”

[When modelling the part of speech tagger is mostly there but probably required human tweaking and confirmation.]

Now, the NLTK toolkit provides default functions for 1) and 2). For 3) we have the options of a RegExParser, for which we need to supply noun phrase patterns, or Classifier-based chunkers. Both need a little extra work but there are tutorials on the Net.

Noun phrases should be used consistently throughout claim sentences – this can be used to resolve ambiguity.