On Oct 3, 2005, at 7:01 AM, Gavin Kistner wrote:
> str = 'foo,bar ,, baz,qux,,,jorb,jing,,,,blat'
> out = []
> str.scan( /(.+?[^,],{2}*)(?:,(?!,)|$)/ ){ |a,b|
> out << a.gsub( ',,', ',' )
> }
> p out
> #=> ["foo", "bar , baz", "qux,", "jorb", "jing,,blat"]
Whenever I find myself about to do something like the above, I say to
myself:
"Hey, buddy, pre-allocating an array and shoving stuff onto it in a
block is neat as an exercise of the closure, but you should be using
something like #map."
Unfortunately, it would appear that #scan doesn't automagically map
the returned value from each iteration to an array. Man, wouldn't
that be nice?
Following is my hackish attempt to make a String#scan_and_map
function that does the above.
A few questions for the gurus:
a) Is there a better way to deal with bol? with StringScanner? (Boy,
it'd be nice if there was a Regexp#uses_bol_at_start_of_match? method.)
b) Is there a clean way to tell the 'arity' of a regexp (how many
captures it has, at max)? (Boy, it'd be nice if there was a
Regexp#arity method.)
c) Without knowing the arity, is there a clean/fast way to gather all
the 1..n submatches held in StringScanner? (Boy, it'd be nice if
StringScanner gave you access to an array of subcaptures as a single
property. And if it set the $1..$9 vars.)
require 'strscan'
class String
def scan_and_map( regexp )
# A naive check for beginning of line
use_bol = regexp.inspect =~ /\/(?:\((?:\?:)?)*\^/
# A naive check for sub-expression groups
# Will fail for unescaped ( inside [], for example
use_groups = regexp.inspect =~ /(\^|[^\\])\\{2}*\(/
results = []
ss = StringScanner.new( self )
while !ss.eos?
ss.scan_until( regexp ) unless ss.match?( regexp )
if use_bol and not ss.bol?
ss.pos += 1
else
result = ss.scan( regexp )
if use_groups
result = (1..9).to_a.map{ |i| ss[i] }
end
results << yield( result )
end
end
results
end
end
str = 'foo,bar ,, baz,qux,,,jorb,jing,,,,blat'
p str.scan_and_map( /(.+?[^,],{2}*)(?:,(?!,)|$)/ ){ |saved,others|
saved
}
#=> ["foo", "bar , baz", "qux,", "jorb", "jing,,blat"]