On Tue, May 22, 2007 at 03:20:04PM +0900, Hans Fugal wrote:
> I would like to identify partial matching of a regular expression, for a
> stream of input, as described in the pcrepartial(3) manpage. Is this
> possible with ruby Regexp, or would I have to wrap (a piece of) pcre?
> (or implement my own regular expression engine, hah!)
It looks like someone has wrapped pcre already:
http://raa.ruby-lang.org/project/pcre/
but that's quite old so you might need to fiddle with it a bit.
> As an aside, what I am really trying to do is write a lexer that works
> on stream input, and can decide whether any of the eligible tokens match
> before reading EOF (which may be a long, long way off both in bytes and
> time). If you can think of another approach (that still uses regexes)
> that'd work too.
Well, you can use regexps to distinguish a complete token from a partial
one, simply by checking if it is followed by a character which is not part
of the token. A little care is needed to handle EOF correctly - at worst you
could just stick a sentinel character onto the end.
A simple example, which matches (\w+) and (\s+) as tokens:
require 'stringio'
stream = StringIO.new("wibble bibble boing")
token = ""
chunk = stream.read(1)
token << chunk if chunk
loop do
case token
when /\A\w+/
match = $&
when /\A\s+/
match = $&
else
puts "Syntax error here! " + token.inspect
break
end
if match.size < token.size or chunk.nil?
puts "Match token: " + token.slice!(0,match.size).inspect
break if chunk.nil?
else
#puts "Partial match: " + token.inspect
chunk = stream.read(1)
token << chunk if chunk
end
end
This should also work if you use, say, read(4096) instead of read(1), so it
ought to be pretty efficient.
Regards,
Brian.