https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?15064139052009-08-04T12:51:04ZRuby Issue Tracking SystemRuby trunk - Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Grouphttps://bugs.ruby-lang.org/issues/1873?journal_id=50982009-08-04T12:51:04Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>=begin<br>
This request is in other words, enable capture history.</p>
<p>A-5. Disabled functions by default syntax<br>
+ capture history<br>
(?@...) and (?@...)<br>
ex. /(?@a)*/.match(&quot;aaa&quot;) ==&gt; [, , ]<br>
see sample/listcap.c file.<br>
<a href="http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt">http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt</a><br>
=end</p>
Ruby trunk - Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Grouphttps://bugs.ruby-lang.org/issues/1873?journal_id=51012009-08-04T18:40:02Zrunpaint (Run Paint Run Run)runrun@runpaint.org
<ul></ul><p>=begin</p>
<blockquote>
<p>This request is in other words, enable capture history.</p>
</blockquote>
<p>I&#39;m not sure I understand. The example shows that the history is already captured. Both the #inspect output and #to_a shows that the data is already stored inside the MatchData object; it&#39;s just not accessible with the #[Symbol] accessor.</p>
<p>IOW:</p>
<blockquote>
<blockquote>
<p>&#39;abc&#39;.match(/(?a)(?b)(?c)/)<br>
=&gt; #</p>
</blockquote>
</blockquote>
<p>Group &#39;a&#39; is shown as having the values &#39;a&#39;, &#39;b&#39;, and &#39;c&#39;, but #[:a] only returns &#39;c&#39;:</p>
<blockquote>
<blockquote>
<p>&#39;abc&#39;.match(/(?a)(?b)(?c)/)[:a]<br>
=&gt; &quot;c&quot;<br>
&#39;abc&#39;.match(/(?a)(?b)(?c)/).captures<br>
=&gt; [&quot;a&quot;, &quot;b&quot;, &quot;c&quot;]</p>
</blockquote>
</blockquote>
<p>This doesn&#39;t seem like a disabled function to me; just an assumption that each group name will only appear once.<br>
=end</p>
Ruby trunk - Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Grouphttps://bugs.ruby-lang.org/issues/1873?journal_id=51122009-08-05T02:09:09Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>=begin<br>
Yes, of cource, your desiable API needs more implementation after the option enabled.</p>
<p>I said by the comment before, Ruby 1.9&#39;s regexp is based on Oniguruma. So if a function is already implemented in Oniguruma, the function is more easy to realize than functions which are not implemented in Oniguruma.<br>
=end</p>
Ruby trunk - Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Grouphttps://bugs.ruby-lang.org/issues/1873?journal_id=51142009-08-05T02:33:40Zrunpaint (Run Paint Run Run)runrun@runpaint.org
<ul></ul><p>=begin<br>
Yui,</p>
<p>Thank you for your help. I didn&#39;t realise it would be difficult. I&#39;d assumed that as:</p>
<p>/(?a)(?b)(?c)/.named_captures<br>
=&gt; {&quot;a&quot;=&gt;[1, 2, 3]}</p>
<p>And:</p>
<p>&#39;abc&#39;.match(/(?a)(?b)(?c)/)[1..3]<br>
=&gt; [&quot;a&quot;, &quot;b&quot;, &quot;c&quot;]</p>
<p>It would be simply be a matter of mapping one to the other.</p>
<p>How about then if we set this ticket&#39;s priority to &#39;Low&#39;, then add a note to the documentation of MatchData#[] that explains this quirk? :-)</p>
<p>(As an aside that Oniguruma document would make a great start for the RDoc of Regexp. Currently the actual syntax of the patterns doesn&#39;t seem to appear anywhere in <code>ri</code>.)<br>
=end</p>
Ruby trunk - Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Grouphttps://bugs.ruby-lang.org/issues/1873?journal_id=51172009-08-05T03:33:25Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>=begin<br>
Oh sorry, i misunderstood. I think you wanted to access &quot;b&quot; with /(?\w)+/.match(&quot;abc&quot;).</p>
<p>Your proposal may be able to implement because those data is near by us. (you taught it by inspect data, sorry!)</p>
<p>Anyway however, changing return value from alwasy String to String or Array is difficult because of compatibility.<br>
If you want access another matched string, suggest to add a new API to access them.<br>
So what is the desirable API is the problem.<br>
=end</p>
Ruby trunk - Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Grouphttps://bugs.ruby-lang.org/issues/1873?journal_id=51302009-08-05T15:17:03Zrunpaint (Run Paint Run Run)runrun@runpaint.org
<ul></ul><p>=begin</p>
<blockquote>
<p>Oh sorry, i misunderstood. I think you wanted to access &quot;b&quot; with /(?\w)+/.match(&quot;abc&quot;).</p>
</blockquote>
<p>That&#39;s quite alright. :-)</p>
<blockquote>
<p>Anyway however, changing return value from alwasy String to String or Array is difficult because of <br>
compatibility.</p>
</blockquote>
<p>Hmmm... That&#39;s unfortunate MatchData#[] is a lovely API. I guess one approach is to return an Array when there are multiple matches; a String otherwise. Anybody who currently relies on only the last match being returned is both in the minority and taking advantage of a bug. But I confess not to being overly fond of this solution. :-/ I&#39;m not sure.<br>
=end</p>
Ruby trunk - Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Grouphttps://bugs.ruby-lang.org/issues/1873?journal_id=51332009-08-05T15:59:05Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>=begin</p>
<blockquote>
<p>one approach is to return an Array when there are multiple matches; a String otherwise</p>
</blockquote>
<p>There are few APIs which returns different types except true/false or obj/nil.<br>
This is because such API disturbs duck typing.</p>
<p>So new API which returns always Array seems the way.<br>
=end</p>
Ruby trunk - Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Grouphttps://bugs.ruby-lang.org/issues/1873?journal_id=51342009-08-05T16:20:23Zrunpaint (Run Paint Run Run)runrun@runpaint.org
<ul></ul><p>=begin<br>
I think adding another method of this form to MatchData will be confusing. How about overloading MatchData#values_at ? It currently takes one or more integer indices and returns an Array of corresponding values. It could be modified to take a list of Symbols (and Strings, if it must) and return an Array of the matches. This is backward compatible, requires no new methods, and uses the same principle as MatchData#[], which previously only accepted Integer arguments, and now accepts Symbols/Strings, too.</p>
<pre>&gt;&gt; &#39;haystack&#39;.match(/(?&lt;h&gt;ay).+(?&lt;h&gt;ack)/).values_at(:h)
[&#39;ay&#39;,&#39;ack&#39;]
&gt;&gt; &#39;haystack&#39;.match(/(?&lt;h&gt;ay).+(?&lt;h&gt;a(?&lt;seek&gt;ck))/).values_at(:seek)
[&#39;ck&#39;]
&gt;&gt; &#39;haystack&#39;.match(/(?&lt;h&gt;ay).+(?&lt;h&gt;a(?&lt;seek&gt;ck))/).values_at(:seek, &#39;h&#39;)
[&#39;ck&#39;,&#39;ay&#39;,&#39;ack&#39;]
</pre>
<p>=end</p>
Ruby trunk - Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Grouphttps://bugs.ruby-lang.org/issues/1873?journal_id=52722009-08-16T23:12:42Zerikh (Erik Hollensbe)erik@hollensbe.org
<ul><li><strong>File</strong> <a href="/attachments/477/re_named_values_at.patch.gz">re_named_values_at.patch.gz</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/477/re_named_values_at.patch.gz">re_named_values_at.patch.gz</a> added</li></ul><p>=begin<br>
Attached is a patch which implements the overloaded #values_at functionality. If there is a problem, let me know and I&#39;ll alter it. Test cases and docs included.</p>
<p>For now, any information (be it named capture, index, or unexpected type) that doesn&#39;t yield information is effectively a no-op, is ignored and no result appears in the array. This is the behavior I noticed in other areas of the MatchData class and figured it was safest to honor that.<br>
=end</p>
Ruby trunk - Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Grouphttps://bugs.ruby-lang.org/issues/1873?journal_id=53112009-08-19T05:20:58Zrunpaint (Run Paint Run Run)runrun@runpaint.org
<ul></ul><p>=begin<br>
Thanks, Erik. I tried the patch out and it works well. :-)<br>
=end</p>
Ruby trunk - Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Grouphttps://bugs.ruby-lang.org/issues/1873?journal_id=91372010-03-20T03:07:57Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>=begin<br>
Hi,</p>
<blockquote>
<p>How about overloading MatchData#values_at ?</p>
</blockquote>
<p>Why do you attempt to reuse existing method? :-/<br>
I think it is better to introduce new method like MatchData#all_values.</p>
<blockquote>
<blockquote>
<blockquote>
<p>&#39;haystack&#39;.match(/(?ay).+(?a(?ck))/).values_at(:seek, &#39;h&#39;)<br>
[&#39;ck&#39;,&#39;ay&#39;,&#39;ack&#39;]</p>
</blockquote>
</blockquote>
</blockquote>
<p>I expect values_at returns an array whose length is equal to the number<br>
of arguments.</p>
<p>By the way, I think it is strange (or even a bug) for MatchData#values_at<br>
to reject Symbols.</p>
<p>-- <br>
Yusuke ENDOH <a href="mailto:mame@tsg.ne.jp">mame@tsg.ne.jp</a><br>
=end</p>
Ruby trunk - Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Grouphttps://bugs.ruby-lang.org/issues/1873?journal_id=96792010-04-02T08:23:44Zznz (Kazuhiro NISHIYAMA)
<ul><li><strong>Target version</strong> set to <i>2.0.0</i></li></ul><p>=begin</p>
<p>=end</p>
Ruby trunk - Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Grouphttps://bugs.ruby-lang.org/issues/1873?journal_id=246942012-03-18T14:39:41Znahi (Hiroshi Nakamura)nakahiro@gmail.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/24694/diff?detail_id=17974">diff</a>)</li><li><strong>Assignee</strong> set to <i>naruse (Yui NARUSE)</i></li></ul> Ruby trunk - Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Grouphttps://bugs.ruby-lang.org/issues/1873?journal_id=247002012-03-18T15:01:51Znaruse (Yui NARUSE)naruse@airemix.jp
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Feedback</i></li></ul><p>This feature itself is acceptable, but proposed method name (API) is not acceptable.</p>
Ruby trunk - Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Grouphttps://bugs.ruby-lang.org/issues/1873?journal_id=249132012-03-18T21:31:49Ztrans (Thomas Sawyer)
<ul></ul><p>This is the first time I&#39;ve seen regular expression groups, so it&#39;s interesting.</p>
<p>It occurs to me that with this addition MatchData is both a sort of Array and a sort of Hash. That being so consider <code>md.to_h</code>.</p>
Ruby trunk - Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Grouphttps://bugs.ruby-lang.org/issues/1873?journal_id=258272012-04-11T18:34:22Zerikh (Erik Hollensbe)erik@hollensbe.org
<ul><li><strong>File</strong> <a href="/attachments/2593/re_all_values.patch.gz">re_all_values.patch.gz</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/2593/re_all_values.patch.gz">re_all_values.patch.gz</a> added</li></ul><p>I&#39;ve attached a new patch -- this implements the same functionality but refers to it as <code>all_values</code> and reverts the old changes to <code>values_at</code>. This is fundamentally the same functionality as <code>values_at</code> with the overridden functionality described in the ticket.</p>
<p>Sorry for the latency on this, it&#39;s been a crazy few years. :)</p>
<p>Tests pass, including the new ones.</p>
Ruby trunk - Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Grouphttps://bugs.ruby-lang.org/issues/1873?journal_id=268292012-05-26T06:48:05ZAnonymous
<ul></ul><p>Hi everyone. I am a newbie user of computer languages (&lt; 1 year), and I am<br>
providing my feedback from this position.</p>
<p>Summary:<br>
I find that this feature proposal is basically an extension of Regex state<br>
machine functionality. I am against it. I think, that the current behavior<br>
is natural: When one uses the same capture group name again, the old value<br>
is lost, just like when one assigns a new value to the same variable name.<br>
In Regex machine, I value simplicity and memorizeability over abundance of<br>
features. Moreover, as runpaint points out himself, this feature is really<br>
not missing: &quot;lost&quot; captures are available via #to_a and #capture methods.</p>
<p>Rationale:<br>
As a newbie, I still remember hard time that I had learning Regex. I find<br>
the learning overhead for Ruby acceptable to make it usefull as a tool for<br>
people, who are not programmers by profession. But I found that to actually<br>
solve even simple domain-specific programming tasks, one has to also learn<br>
seemingly endless list of formats, sub-standards, sub-languages, programers&#39;<br>
editors and other idiosyncrasies, which in total take much greater effort<br>
that learning Ruby itself. Regex is one of these sub-languages.</p>
<p>The idea of a simple state machine that performs matching tasks far beyond<br>
find &amp; replace comes as very natural to me. I might have learned Regex in<br>
1 hour in good old days when it only had 25 features. But today, Regex has<br>
(lemme count) 8 anchors, 9 character classes, 8 assertions, 10 quantifiers,<br>
9 backreferences, 10 range syntaxes, 7 pattern modifiers, 14 metacharacters,<br>
passive/active, greedy/ungreedy concept, in total, roughly 75 symbols and<br>
concepts to confuse one&#39;s head. Old dog programmers who have been with Regex<br>
throughout its history might not be noticing this, but for newbies, who have<br>
to memorize whole Regex machine in one fell swoop, is is already very hard<br>
to learn. I think that with the number of features that Regex already has,<br>
adding more causes disproportionate growth in learning overhead for newbies.</p>
Ruby trunk - Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Grouphttps://bugs.ruby-lang.org/issues/1873?journal_id=278792012-07-09T11:50:31Zerikh (Erik Hollensbe)erik@hollensbe.org
<ul></ul><p>Hi folks, can I get some feedback on this patch before feature freeze? Thanks.</p>
Ruby trunk - Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Grouphttps://bugs.ruby-lang.org/issues/1873?journal_id=278892012-07-09T18:41:07Znaruse (Yui NARUSE)naruse@airemix.jp
<ul></ul><p>erikh (Erik Hollensbe) wrote:</p>
<blockquote>
<p>I&#39;ve attached a new patch -- this implements the same functionality but refers to it as <code>all_values</code> and reverts the old changes to <code>values_at</code>. This is fundamentally the same functionality as <code>values_at</code> with the overridden functionality described in the ticket.</p>
</blockquote>
<p>I don&#39;t think all_values is a good name.<br>
Your implementation is good for making the feature clear.</p>
<p>mame says</p>
<blockquote>
<p>I expect values_at returns an array whose length is equal to the number of arguments.</p>
</blockquote>
<p>I agree this.<br>
This depends usual question; what is the use case?</p>
<blockquote>
<p>Tests pass, including the new ones.</p>
</blockquote>
<p>Adding tests is good contribution, but this doesn&#39;t cover the feature.<br>
Anyway, how it should behave depends on use case.<br>
I&#39;m considering more simple API to get the array of strings which are captured by a group,<br>
but it also needs the use case.</p>
<p>boris_stitnicky (Boris Stitnicky) wrote:</p>
<blockquote>
<p>Hi everyone. I am a newbie user of computer languages (&lt; 1 year), and I am<br>
providing my feedback from this position.</p>
</blockquote>
<p>Regexp is not for newbies, but for well-trained programers.</p>
<blockquote>
<p>Summary:<br>
I find that this feature proposal is basically an extension of Regex state<br>
machine functionality. I am against it. I think, that the current behavior<br>
is natural: When one uses the same capture group name again, the old value<br>
is lost, just like when one assigns a new value to the same variable name.<br>
In Regex machine, I value simplicity and memorizeability over abundance of<br>
features. Moreover, as runpaint points out himself, this feature is really<br>
not missing: &quot;lost&quot; captures are available via #to_a and #capture methods.</p>
</blockquote>
<p>As you can see through the patch, Oniguruma, the regexp engine of Ruby 1.9,<br>
doesn&#39;t lost the old value.</p>
<blockquote>
<p>Rationale:<br>
As a newbie, I still remember hard time that I had learning Regex. I find<br>
the learning overhead for Ruby acceptable to make it usefull as a tool for<br>
people, who are not programmers by profession. But I found that to actually<br>
solve even simple domain-specific programming tasks, one has to also learn<br>
seemingly endless list of formats, sub-standards, sub-languages, programers&#39;<br>
editors and other idiosyncrasies, which in total take much greater effort<br>
that learning Ruby itself. Regex is one of these sub-languages.</p>
</blockquote>
<p>Ruby won&#39;t reject a new feature because it is difficult for a newbie,<br>
because Ruby belives a newbie shall be a professional.<br>
Ruby won&#39;t barrier the growth of programmers.</p>
<blockquote>
<p>The idea of a simple state machine that performs matching tasks far beyond<br>
find &amp; replace comes as very natural to me. I might have learned Regex in<br>
1 hour in good old days when it only had 25 features. But today, Regex has<br>
(lemme count) 8 anchors, 9 character classes, 8 assertions, 10 quantifiers,<br>
9 backreferences, 10 range syntaxes, 7 pattern modifiers, 14 metacharacters,<br>
passive/active, greedy/ungreedy concept, in total, roughly 75 symbols and<br>
concepts to confuse one&#39;s head. Old dog programmers who have been with Regex<br>
throughout its history might not be noticing this, but for newbies, who have<br>
to memorize whole Regex machine in one fell swoop, is is already very hard<br>
to learn. I think that with the number of features that Regex already has,<br>
adding more causes disproportionate growth in learning overhead for newbies.</p>
</blockquote>
<p>You haven&#39;t see the delight of regular expression.</p>
Ruby trunk - Feature #1873: MatchData#[]: Omits All But Last Captures Corresponding to the Same Named Grouphttps://bugs.ruby-lang.org/issues/1873?journal_id=316292012-10-27T04:49:51Zko1 (Koichi Sasada)
<ul><li><strong>Target version</strong> changed from <i>2.0.0</i> to <i>next minor</i></li></ul><p>I changed the target to &quot;next minor&quot; because no discussion there.</p>