https://bugs.ruby-lang.org/https://bugs.ruby-lang.org/favicon.ico?15064139052008-12-19T09:16:28ZRuby Issue Tracking SystemRuby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=21292008-12-19T09:16:28Zbrixen (Brian Shirai)brixen@gmail.com
<ul></ul><p>In Rubinius, we have found it useful to have <code>String.pattern(size, value)</code> where value can be a fixnum or a string. The string created will be size characters where value is repeated. For example:</p>
<pre><code class="ruby syntaxhl"><span class="CodeRay"><span class="constant">String</span>.pattern(<span class="integer">5</span>, <span class="integer">?a</span>) <span class="comment"># =&gt; &quot;aaaaa&quot;</span>
<span class="constant">String</span>.pattern(<span class="integer">5</span>, <span class="string"><span class="delimiter">&quot;</span><span class="content"> </span><span class="delimiter">&quot;</span></span>) <span class="comment"># =&gt; &quot; &quot;</span>
<span class="constant">String</span>.pattern(<span class="integer">5</span>, <span class="string"><span class="delimiter">&quot;</span><span class="content">110</span><span class="delimiter">&quot;</span></span>) <span class="comment"># =&gt; &quot;11011&quot;</span>
</span></code></pre>
<p>Something like <code>&quot;ab&quot; * 5</code> then becomes <code>String.pattern(&quot;ab&quot;.size * 5, &quot;ab&quot;)</code> behind the scenes in our implementation.</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=21302008-12-19T09:31:31Zmatz (Yukihiro Matsumoto)matz@ruby-lang.org
<ul></ul><p>Hi</p>
<p>In message &quot;Re: [Feature <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: Add String.new(fixnum) to preallocate large buffer (Closed)" href="https://bugs.ruby-lang.org/issues/905">#905</a>] Add String.new(fixnum) to preallocate large buffer&quot;<br>
on Fri, 19 Dec 2008 08:46:13 +0900, Charles Nutter <a href="mailto:redmine@ruby-lang.org">redmine@ruby-lang.org</a> writes:</p>
<blockquote>
<p>Because Strings are used in ruby as arbitrary byte buffers, and because the cost of growing a String increases as it gets larger (especially when it starts small), <code>String.new</code> should support a form that takes a fixnum and ensures the backing store will have at least that much room. This is analogous to <code>Array.new(fixnum)</code> which does the same thing.</p>
</blockquote>
<p>I like the idea.</p>
<p>But I&#39;d prefer adding a new class method for the purpose,<br>
say. <code>String#buffer(n)</code>, to adding new role to an argument by type,<br>
or there may be a better name.</p>
<pre> matz.
</pre> Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=21322008-12-19T09:40:40Zpragdave (Dave Thomas)dave@pragprog.com
<ul></ul><p>On Dec 18, 2008, at 6:23 PM, Yukihiro Matsumoto wrote:</p>
<blockquote>
<p>But I&#39;d prefer adding a new class method for the purpose,<br>
say. <code>String#buffer(n)</code>, to adding new role to an argument by type,<br>
or there may be a better name.</p>
</blockquote>
<p>Maybe</p>
<pre><code class="ruby syntaxhl"><span class="CodeRay">str = <span class="constant">String</span>.capacity(n)
</span></code></pre>
<p>or</p>
<pre><code class="ruby syntaxhl"><span class="CodeRay">str = <span class="constant">String</span>.sized(n)
</span></code></pre>
<p>(presumably <code>n</code> is in bytes, not characters)</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=21332008-12-19T10:13:10Zheadius (Charles Nutter)headius@headius.com
<ul></ul><p>Yukihiro Matsumoto wrote:</p>
<blockquote>
<p>I like the idea.</p>
<p>But I&#39;d prefer adding a new class method for the purpose,<br>
say. <code>String#buffer(n)</code>, to adding new role to an argument by type,<br>
or there may be a better name.</p>
</blockquote>
<p>I thought <code>String.new(1000)</code> would be a nice equivalent to <br>
<code>Array.new(1000)</code>, since they both do essentially the same thing. But I&#39;m <br>
not opposed to a separate method. I would vote for something active and <br>
descriptive like &quot;allocate&quot;, but that&#39;s obviously not available. <br>
&quot;buffer&quot; isn&#39;t bad.</p>
<p>I like &quot;new&quot; best. And of course <code>Array.new</code> changes behavior depending on <br>
argument types too.</p>
<pre><code class="ruby syntaxhl"><span class="CodeRay"><span class="constant">Array</span>.new(size=<span class="integer">0</span>, obj=<span class="predefined-constant">nil</span>)
<span class="constant">Array</span>.new(array)
<span class="constant">Array</span>.new(size) {|index| block }
</span></code></pre>
<p>What&#39;s good for the goose...</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=21342008-12-19T10:35:04Zheadius (Charles Nutter)headius@headius.com
<ul></ul><p>Brian Ford wrote:</p>
<blockquote>
<p>Issue <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: Add String.new(fixnum) to preallocate large buffer (Closed)" href="https://bugs.ruby-lang.org/issues/905">#905</a> has been updated by Brian Ford.</p>
<p>In Rubinius, we have found it useful to have String.pattern(size, value) where value can be a fixnum or a string. The string created will be size characters where value is repeated. For example:</p>
<pre><code class="ruby syntaxhl"><span class="CodeRay"><span class="constant">String</span>.pattern(<span class="integer">5</span>, <span class="integer">?a</span>) <span class="comment"># =&gt; &quot;aaaaa&quot;</span>
<span class="constant">String</span>.pattern(<span class="integer">5</span>, <span class="string"><span class="delimiter">&quot;</span><span class="content"> </span><span class="delimiter">&quot;</span></span>) <span class="comment"># =&gt; &quot; &quot;</span>
<span class="constant">String</span>.pattern(<span class="integer">5</span>, <span class="string"><span class="delimiter">&quot;</span><span class="content">110</span><span class="delimiter">&quot;</span></span>) <span class="comment"># =&gt; &quot;11011&quot;</span>
</span></code></pre>
<p>Something like <code>&quot;ab&quot; * 5</code> then becomes <code>String.pattern(&quot;ab&quot;.size * 5, &quot;ab&quot;)</code>.</p>
</blockquote>
<p>Yeah, seems like a reasonably good idea. The string formats seem a<br>
little weird to me though...not what I&#39;d expect. The analog with an<br>
array would be to accept another array as the fill value, which seems<br>
equally weird.</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=21372008-12-19T12:13:56Zheadius (Charles Nutter)headius@headius.com
<ul></ul><p>Dave Thomas wrote:</p>
<blockquote>
<p>(presumably <code>n</code> is in bytes, not characters)</p>
</blockquote>
<p>Yes, I&#39;m thinking bytes myself. Presumably you either want just a byte<br>
buffer or you know how many bytes you need to allocate for the character<br>
encoding you intend.</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=21452008-12-19T17:11:08Zduerst (Martin Dürst)duerst@it.aoyama.ac.jp
<ul></ul><p>At 09:08 08/12/19, Brian Ford wrote:</p>
<blockquote>
<p>Something like <code>&quot;ab&quot; * 5</code> then becomes <code>String.pattern(&quot;ab&quot;.size * 5, &quot;ab&quot;)</code>.</p>
</blockquote>
<p>I&#39;m at a total loss to see why</p>
<pre><code class="ruby syntaxhl"><span class="CodeRay"><span class="constant">String</span>.pattern(<span class="string"><span class="delimiter">&quot;</span><span class="content">ab</span><span class="delimiter">&quot;</span></span>.size * <span class="integer">5</span>, <span class="string"><span class="delimiter">&quot;</span><span class="content">ab</span><span class="delimiter">&quot;</span></span>)
</span></code></pre>
<p>should be in any way better than</p>
<pre><code class="ruby syntaxhl"><span class="CodeRay"><span class="string"><span class="delimiter">&quot;</span><span class="content">ab</span><span class="delimiter">&quot;</span></span> * <span class="integer">5</span>
</span></code></pre>
<p>but maybe that&#39;s just me. Can somebody explain?</p>
<p>Regards, Martin.</p>
<p>#-#-# Martin J. Du&quot;rst, Assoc. Professor, Aoyama Gakuin University<br>
#-#-# <a href="http://www.sw.it.aoyama.ac.jp">http://www.sw.it.aoyama.ac.jp</a> mailto:<a href="mailto:duerst@it.aoyama.ac.jp">duerst@it.aoyama.ac.jp</a></p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=21462008-12-19T17:34:26Zrklemme (Robert Klemme)shortcutter@googlemail.com
<ul></ul><p>2008/12/19 Martin Duerst <a href="mailto:duerst@it.aoyama.ac.jp">duerst@it.aoyama.ac.jp</a>:</p>
<blockquote>
<p>At 09:08 08/12/19, Brian Ford wrote:</p>
<blockquote>
<p>Something like &quot;ab&quot; * 5 then becomes String.pattern(&quot;ab&quot;.size * 5, &quot;ab&quot;).</p>
</blockquote>
<p>I&#39;m at a total loss to see why</p>
<pre><code class="ruby syntaxhl"><span class="CodeRay"><span class="constant">String</span>.pattern(<span class="string"><span class="delimiter">&quot;</span><span class="content">ab</span><span class="delimiter">&quot;</span></span>.size * <span class="integer">5</span>, <span class="string"><span class="delimiter">&quot;</span><span class="content">ab</span><span class="delimiter">&quot;</span></span>)
</span></code></pre>
<p>should be in any way better than</p>
<pre><code class="ruby syntaxhl"><span class="CodeRay"> <span class="string"><span class="delimiter">&quot;</span><span class="content">ab</span><span class="delimiter">&quot;</span></span> * <span class="integer">5</span>
</span></code></pre>
<p>but maybe that&#39;s just me. Can somebody explain?</p>
</blockquote>
<p>It&#39;s probably not. But if you want <code>&quot;ab&quot;</code> repeated 5 times, then <code>&quot;ab&quot; *<br>
5</code> is probably the ideal solution.</p>
<p>Fixing the length in bytes is a new and different feature, i.e. your<br>
pattern is cut off. With <code>String.pattern(&quot;bo&quot;,3)</code> you would get <code>&quot;bob&quot;</code><br>
or maybe <code>&quot;bo&quot;</code> but in a buffer of length 3 bytes.</p>
<p>We could also name it &quot;resize&quot; with this semantics:</p>
<p><code>String.resize(100, &quot;foo&quot;)</code> -&gt; &quot;foo&quot; and buffer has 100 bytes<br>
<code>String.resize(100)</code> -&gt; &quot;&quot; and buffer has 100 bytes</p>
<p>Or maybe just &quot;size&quot;. IMHO the idea was that you do not need to have<br>
a <code>String</code> beforehand so all solution which are instance methods receive<br>
a <code>String</code> as argument are probably suboptimal because they will<br>
typically be invoked like show above, i.e. with a <code>String</code> constructor<br>
which needs one object allocation (including GC bookkeeping overhead)<br>
and a bit of memory.</p>
<p>A typical use case would be this:</p>
<pre><code class="ruby syntaxhl"><span class="CodeRay"><span class="constant">File</span>.open <span class="string"><span class="delimiter">&quot;</span><span class="content">foo</span><span class="delimiter">&quot;</span></span> <span class="keyword">do</span> |io|
buffer = <span class="constant">String</span>.size(<span class="integer">1024</span>)
<span class="keyword">while</span> io.read(<span class="integer">1024</span>, buffer)
<span class="comment"># or even io.read(buffer.capacity, buffer)</span>
<span class="global-variable">$defout</span>.write(buffer)
<span class="keyword">end</span>
<span class="keyword">end</span>
</span></code></pre>
<p>Kind regards</p>
<p>robert</p>
<hr>
<p>remember.guy do |as, often| as.you_can - without end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=21502008-12-20T00:41:22Zheadius (Charles Nutter)headius@headius.com
<ul></ul><p>Jim Weirich wrote:</p>
<blockquote>
<p>There is a slight difference. <code>Array.new(1000)</code> creates an array with a <br>
thousand elements. The proposed <code>String.new(1000)</code> would create a string <br>
with zero characters, but the ability to grow to 1000 characters (umm, <br>
bytes) without internal reallocation.</p>
<p>I think this difference is enough to warrant different names.</p>
</blockquote>
<p>Yes, that&#39;s a good point; and the resulting array would <code>&lt;&lt;</code> to the 1001st <br>
element.</p>
<p>I suppose then coming up with a common name that works for both would be <br>
a good idea. I&#39;m back to liking &quot;buffer&quot; in both cases.</p>
<p><code>String.buffer(1000)</code> produces an empty string that can grow to 1000 bytes <br>
without needing to resize/copy.</p>
<p><code>Array.buffer(1000)</code> produces an empty array that can grow to 1000 <br>
elements without needing to resize/copy.</p>
<p>Whatever is decided, I think it&#39;s going to be something people want <br>
(need) on 1.8, so I&#39;ll probably submit a backport request as well (and <br>
perhaps write up a simple extension people can use until then).</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=21512008-12-20T00:58:53Zpragdave (Dave Thomas)dave@pragprog.com
<ul></ul><p>On Dec 19, 2008, at 9:33 AM, Charles Oliver Nutter wrote:</p>
<blockquote>
<p>I suppose then coming up with a common name that works for both<br>
would be a good idea. I&#39;m back to liking &quot;buffer&quot; in both cases.</p>
<p><code>String.buffer(1000)</code> produces an empty string that can grow to 1000<br>
bytes without needing to resize/copy.</p>
<p><code>Array.buffer(1000)</code> produces an empty array that can grow to 1000<br>
elements without needing to resize/copy.</p>
</blockquote>
<p>I think the reason I dislike this is that you&#39;re creating methods that<br>
are polymorphic on the types of their arguments, and yet we generally<br>
don&#39;t do that in Ruby-level code. So by creating these methods, you&#39;re<br>
giving them a different flavor from methods that would be written in<br>
straight Ruby.</p>
<p>How about something more Ruby-like:</p>
<pre><code class="ruby syntaxhl"><span class="CodeRay">s = <span class="constant">String</span>.new(<span class="key">initial_capacity</span>: <span class="integer">1000</span>)
t = <span class="constant">String</span>.new(buffer, <span class="key">initial_capacity</span>: <span class="integer">2</span>*buffer.length)
</span></code></pre>
<p>Dave</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=21522008-12-20T03:56:58Zheadius (Charles Nutter)headius@headius.com
<ul></ul><p>Dave Thomas wrote:</p>
<blockquote>
<p>I think the reason I dislike this is that you&#39;re creating methods that<br>
are polymorphic on the types of their arguments, and yet we generally<br>
don&#39;t do that in Ruby-level code. So by creating these methods, you&#39;re<br>
giving them a different flavor from methods that would be written in<br>
straight Ruby.</p>
</blockquote>
<p>Neither of those methods are polymorphic on anything. They&#39;re both new<br>
methods that accept a <code>Fixnum</code>.</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=21532008-12-20T06:38:28Zpragdave (Dave Thomas)dave@pragprog.com
<ul></ul><p>On Dec 19, 2008, at 12:48 PM, Charles Oliver Nutter wrote:</p>
<blockquote>
<p>Neither of those methods are polymorphic on anything. They&#39;re both<br>
new methods that accept a <code>Fixnum</code>.</p>
</blockquote>
<p><code>.new</code> is</p>
<p><code>String.buffer</code> is not a meaningful name for a constructor, in my<br>
opinion, whereas <code>String.new</code> has a pedigree. By adding a <code>initial_size:<br>
n</code> optional argument, you exactly express the meaning—you&#39;re asking for<br>
an initial allocation when <code>String.new</code> executes. Similarly,</p>
<pre><code class="ruby syntaxhl"><span class="CodeRay"><span class="constant">Array</span>.new([<span class="integer">1</span>,<span class="integer">2</span>,<span class="integer">3</span>], <span class="key">initial_size</span>: <span class="integer">100</span>)
</span></code></pre>
<p>lets you both initialize and allocation a new array.</p>
<p>Right now, we have <code>File.open(&quot;fred&quot;, &quot;w&quot;)</code>, rather than<br>
<code>File.open_write(&quot;fred&quot;)</code>. It seems like a good idea, particularly for<br>
an interface that&#39;s likely to grow over time (I can forsee</p>
<pre><code class="ruby syntaxhl"><span class="CodeRay"><span class="constant">String</span>.new(<span class="key">initial_size</span>: <span class="integer">1000</span>, <span class="key">fill_with</span>: <span class="string"><span class="delimiter">&quot;</span><span class="content"> </span><span class="delimiter">&quot;</span></span>, <span class="key">encoding</span>: binary,
<span class="key">etc</span>: ...)
</span></code></pre>
<p>Cheers</p>
<p>Dave</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=21542008-12-20T07:36:35Zheadius (Charles Nutter)headius@headius.com
<ul></ul><p>=begin<br>
Dave Thomas wrote:</p>
<blockquote>
<p>On Dec 19, 2008, at 12:48 PM, Charles Oliver Nutter wrote:</p>
<blockquote>
<p>Neither of those methods are polymorphic on anything. They&#39;re both new <br>
methods that accept a Fixnum.</p>
</blockquote>
<p>.new is</p>
</blockquote>
<p>And already has multiple forms in Array, so there&#39;s precedent. Also, <br>
adding multiple forms with different named arguments doesn&#39;t reduce the <br>
complexity of that single method any.</p>
<blockquote>
<p>String.buffer is not a meaningful name for a constructor, in my opinion, <br>
whereas String.new has a pedigree. By adding a initial_size: n optional <br>
argument, you exactly express the meaning—you&#39;re asking for an initial <br>
allocation when String.new executes. Similarly,</p>
<p>Array.new([1,2,3], initial_size: 100)</p>
</blockquote>
<p>But Array.new(initial_size: 100).size would == 0. That&#39;s confusing...I <br>
think buffer better expresses that it&#39;s the backing store being sized <br>
than the outward expression of the String or Array itself, which is what <br>
initial_size means.</p>
<p>I would also expect that the cost of allocating and populating an <br>
arguments hash for this would negate some of the gain from adding the <br>
new form. Array.buffer(100) adds almost no overhead on top of the <br>
physical creation of the backing store and object to wrap it, where <br>
Array.buffer(initial_size: 100) creates both a new Array and a new Hash. <br>
An implementation detail, sure, but we I think we just need something <br>
simple here. Perhaps buffer just doesn&#39;t express it clearly enough?</p>
<blockquote>
<p>Right now, we have File.open(&quot;fred&quot;, &quot;w&quot;), rather than <br>
File.open_write(&quot;fred&quot;). It seems like a good idea, particularly for an <br>
interface that&#39;s likely to grow over time (I can forsee</p>
<p>String.new(initial_size: 1000, fill_with: &quot; &quot;, encoding: binary, etc: <br>
...)</p>
</blockquote>
<p>It seems to me this is making the semantics of String.new much more <br>
complicated, rather than simpler and more uniform. And at least encoding <br>
is already available outside of &quot;new&quot;, so this is little more than a <br>
shortcut. But there&#39;s absolutely no way at present to allocate a string <br>
with a guaranteed backing store size, and that&#39;s the sole intention of <br>
this RFE.</p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=21552008-12-20T08:21:47Zpragdave (Dave Thomas)dave@pragprog.com
<ul></ul><p>=begin</p>
<p>On Dec 19, 2008, at 4:28 PM, Charles Oliver Nutter wrote:</p>
<blockquote>
<blockquote>
<p>String.buffer is not a meaningful name for a constructor, in my<br><br>
opinion, whereas String.new has a pedigree. By adding a<br><br>
initial_size: n optional argument, you exactly express the meaning— <br>
you&#39;re asking for an initial allocation when String.new executes.<br><br>
Similarly,<br>
Array.new([1,2,3], initial_size: 100)</p>
</blockquote>
<p>But Array.new(initial_size: 100).size would == 0. That&#39;s<br><br>
confusing...I think buffer better expresses that it&#39;s the backing<br><br>
store being sized than the outward expression of the String or Array<br><br>
itself, which is what initial_size means.</p>
<p>I would also expect that the cost of allocating and populating an<br><br>
arguments hash for this would negate some of the gain from adding<br><br>
the new form. Array.buffer(100) adds almost no overhead on top of<br><br>
the physical creation of the backing store and object to wrap it,<br><br>
where Array.buffer(initial_size: 100) creates both a new Array and a<br><br>
new Hash. An implementation detail, sure, but we I think we just<br><br>
need something simple here. Perhaps buffer just doesn&#39;t express it<br><br>
clearly enough?</p>
</blockquote>
<p>I don&#39;t think the cost of a hash is going to be significant--if it is,<br><br>
then I&#39;d hope that implementors find a way of optimizing these styles<br><br>
of keyword hashes, because they&#39;re used more and more<br><br>
(<em>cough*Rails*cough</em>).</p>
<p>I agree initial_size: is misleading. Perhaps String.new(preallocate: n)</p>
<p>Dave</p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=21562008-12-20T12:17:22Zheadius (Charles Nutter)headius@headius.com
<ul></ul><p>=begin<br>
Dave Thomas wrote:</p>
<blockquote>
<p>I don&#39;t think the cost of a hash is going to be significant--if it is, <br>
then I&#39;d hope that implementors find a way of optimizing these styles of <br>
keyword hashes, because they&#39;re used more and more (<em>cough*Rails*cough</em>).</p>
</blockquote>
<p>Hard to do, since it has to be a hash on the callee side. Constructing <br>
the hash could perhaps be delayed, in case the callee was a C function, <br>
but it still has to be something. In comparison, new(1000) is almost free.</p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=21572008-12-20T14:53:02Zrogerdpack (Roger Pack)rogerpack2005@gmail.com
<ul></ul><p>=begin</p>
<blockquote>
<p>Hard to do, since it has to be a hash on the callee side. Constructing the<br>
hash could perhaps be delayed, in case the callee was a C function, but it<br>
still has to be something. In comparison, new(1000) is almost free.</p>
</blockquote>
<p>I suppose a clever implementation could optimize that out.<br>
-=R</p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=21582008-12-20T15:26:40Zbrixen (Brian Shirai)brixen@gmail.com
<ul></ul><p>=begin<br>
On Dec 19, 12:02 am, Martin Duerst <a href="mailto:due...@it.aoyama.ac.jp">due...@it.aoyama.ac.jp</a> wrote:</p>
<blockquote>
<p>At 09:08 08/12/19, Brian Ford wrote:</p>
<blockquote>
<p>Something like &quot;ab&quot; * 5 then becomes String.pattern(&quot;ab&quot;.size * 5, &quot;ab&quot;).</p>
</blockquote>
<p>I&#39;m at a total loss to see why<br>
String.pattern(&quot;ab&quot;.size * 5, &quot;ab&quot;)<br>
should be in any way better than<br>
&quot;ab&quot; * 5<br>
but maybe that&#39;s just me. Can somebody explain?</p>
</blockquote>
<p>It&#39;s not better. It&#39;s not intended that you see it. Behind the scenes<br>
it has been useful for us to have this method. This is one example of<br>
its usage. The string is allocated in one step and filled in one loop.</p>
<p>There are other places it has been useful. The most useful aspect is<br>
requesting a particular size. The initial contents is a lesser, but<br>
still useful aspect.</p>
<p>The mileage for other implementations may vary.</p>
<p>Cheers,<br>
Brian</p>
<blockquote>
<p>Regards, Martin.</p>
<p>#-#-# Martin J. Du&quot;rst, Assoc. Professor, Aoyama Gakuin University<br>
#-#-# <a href="http://www.sw.it.aoyama.ac.jp%C2%A0">http://www.sw.it.aoyama.ac.jp </a> mailto:<a href="mailto:due...@it.aoyama.ac.jp">due...@it.aoyama.ac.jp</a> </p>
</blockquote>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=21592008-12-20T16:35:54Zheadius (Charles Nutter)headius@headius.com
<ul></ul><p>=begin<br>
Gary Wright wrote:</p>
<blockquote>
<p>How about:</p>
<p>String.reserve(100)<br>
Array.reserve(100)</p>
</blockquote>
<p>&quot;reserve&quot; is pretty good. I&#39;ll abstain from commenting on any other <br>
forms since I really just want the single-param fixnum version myself.</p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=22542008-12-23T01:46:23Zheadius (Charles Nutter)headius@headius.com
<ul></ul><p>=begin<br>
I guess the relative silence on this issue means there&#39;s not much more <br>
to discuss. Here&#39;s a summary up to now:</p>
<ul>
<li><p>Everyone seems to agree it&#39;s a good idea to add, so we should add it. <br>
And I would like to see it backported to 1.8.6/7.</p></li>
<li><p>Everyone likes the flat fixnum form except Dave Thomas, who would like <br>
it to be a keyword argument. But that would not support backporting and <br>
no core methods currently accept keyword arguments, plus it would create <br>
a throw-away hash in all current implementations.</p></li>
<li><p>Several names have been suggested: overload &#39;new&#39;, buffer, <br>
preallocate, capacity, sized, reserve. I prefer &#39;buffer&#39; and &#39;reserve&#39;, <br>
with a strong lean toward &#39;buffer&#39; because it mimics a well-known idiom <br>
in the Java world: &quot;String.buffer(1000)&quot; == &quot;new StringBuffer(1000)&quot;.</p></li>
<li><p>Other forms have been suggested that accept a fill fixnum or fill <br>
string; however I believe we should skip these cases for now since we&#39;re <br>
not actually creating a string of a certain size (and content), we&#39;re <br>
creating an empty string with a backing store of a certain size. The <br>
expectation is that the contents of that backing store are unimportant <br>
(perhaps \000s), and so fill params are meaningless.</p></li>
</ul>
<p>So for me, the solution is String.buffer(1000). I rest my case, your honor.<br>
=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=22552008-12-23T01:46:28Zheadius (Charles Nutter)headius@headius.com
<ul></ul><p>=begin<br>
I guess the relative silence on this issue means there&#39;s not much more <br>
to discuss. Here&#39;s a summary up to now:</p>
<ul>
<li><p>Everyone seems to agree it&#39;s a good idea to add, so we should add it. <br>
And I would like to see it backported to 1.8.6/7.</p></li>
<li><p>Everyone likes the flat fixnum form except Dave Thomas, who would like <br>
it to be a keyword argument. But that would not support backporting and <br>
no core methods currently accept keyword arguments, plus it would create <br>
a throw-away hash in all current implementations.</p></li>
<li><p>Several names have been suggested: overload &#39;new&#39;, buffer, <br>
preallocate, capacity, sized, reserve. I prefer &#39;buffer&#39; and &#39;reserve&#39;, <br>
with a strong lean toward &#39;buffer&#39; because it mimics a well-known idiom <br>
in the Java world: &quot;String.buffer(1000)&quot; == &quot;new StringBuffer(1000)&quot;.</p></li>
<li><p>Other forms have been suggested that accept a fill fixnum or fill <br>
string; however I believe we should skip these cases for now since we&#39;re <br>
not actually creating a string of a certain size (and content), we&#39;re <br>
creating an empty string with a backing store of a certain size. The <br>
expectation is that the contents of that backing store are unimportant <br>
(perhaps \000s), and so fill params are meaningless.</p></li>
</ul>
<p>So for me, the solution is String.buffer(1000). I rest my case, your honor.</p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=22572008-12-23T03:01:47Zpragdave (Dave Thomas)dave@pragprog.com
<ul></ul><p>=begin</p>
<p>On Dec 22, 2008, at 10:37 AM, Charles Oliver Nutter wrote:</p>
<blockquote>
<p>no core methods currently accept keyword arguments, plus it would<br><br>
create a throw-away hash in all current implementations.</p>
</blockquote>
<p>File.open...</p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=29532009-02-03T10:44:16Zshyouhei (Shyouhei Urabe)shyouhei@ruby-lang.org
<ul><li><strong>Assignee</strong> set to <i>matz (Yukihiro Matsumoto)</i></li></ul><p>=begin</p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=85752010-03-04T01:26:35Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul><li><strong>Status</strong> changed from <i>Open</i> to <i>Feedback</i></li></ul><p>=begin<br>
Hi,</p>
<blockquote>
<p>This would allow heavy string-appending algorithms and libraries (like ERb) to avoid doing so many memory copies while they run.</p>
</blockquote>
<p>Is it really a bottleneck? Please make an experiment and show us<br>
the result.</p>
<p>We can continue API discussion after we confirm the feature really<br>
makes sense.</p>
<p>-- <br>
Yusuke Endoh <a href="mailto:mame@tsg.ne.jp">mame@tsg.ne.jp</a><br>
=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=85902010-03-04T15:11:02Zcoatl (caleb clausen)
<ul></ul><p>=begin<br>
Do we really need a benchmark to confirm that copying large strings is expensive? Pre-sized buffers are a well-known performance win on other systems, so why not for ruby as well?</p>
<p>I would like to try to create a benchmark to prove this would help, but it may be some time before I can get to it.<br>
=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=85932010-03-04T18:03:14Znow (Nikolai Weibull)now@disu.se
<ul></ul><p>=begin<br>
On Thu, Mar 4, 2010 at 07:11, caleb clausen <a href="mailto:redmine@ruby-lang.org">redmine@ruby-lang.org</a> wrote:</p>
<blockquote>
<p>Issue <a class="issue tracker-2 status-5 priority-4 priority-default closed" title="Feature: Add String.new(fixnum) to preallocate large buffer (Closed)" href="https://bugs.ruby-lang.org/issues/905">#905</a> has been updated by caleb clausen.</p>
<p>Do we really need a benchmark to confirm that copying large strings is expensive? Pre-sized buffers are a well-known performance win on other systems, so why not for ruby as well?</p>
</blockquote>
<p>Doesn’t this unnecessarily expose implementation details about String?<br>
Preallocation doesn’t make as much sense if Strings were implemented<br>
using, for example, Ropes [1].</p>
<p>[1] <a href="http://en.wikipedia.org/wiki/Rope_(computer_science)">http://en.wikipedia.org/wiki/Rope_(computer_science)</a></p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=85992010-03-04T21:26:54Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>=begin<br>
Hi,</p>
<p>2010/3/4 caleb clausen <a href="mailto:redmine@ruby-lang.org">redmine@ruby-lang.org</a>:</p>
<blockquote>
<p>Pre-sized buffers are a well-known performance win on other systems, so why not for ruby as well?</p>
</blockquote>
<p>Indeed, it will bring speed up to Ruby, but if the speed up is<br>
negligibly-small, it is not only actually useless but also bad<br>
for code maintenance.</p>
<p>If we confirm the performance up is significant and a patch is<br>
present, we&#39;ll be strongly encouraged to discuss the feature<br>
actively and to import the patch.</p>
<p>I have forgotten another matter. I also wonder how many case<br>
we can expect a precise length that ERB will generate.<br>
If we cannot in many cases, the feature may be still useless.</p>
<p>Well, it may be good only if the feature can be used in Rails...</p>
<p>-- <br>
Yusuke ENDOH <a href="mailto:mame@tsg.ne.jp">mame@tsg.ne.jp</a></p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=86072010-03-05T01:39:29Zmurphy (Kornelius Kalnbach)murphy@rubychan.de
<ul></ul><p>=begin<br>
Doesn&#39;t Ruby allocate already using a &quot;double memory if you run out&quot;<br>
rule? That makes string concatenation (amortized) linear, even if the<br>
string must be moved in the memory.</p>
<p>I doubt that there are real-world use cases that would be much faster<br>
with preallocation. As Yusuke said, ERb is more of a counter-example.</p>
<p>Even with this API extension, we wouldn&#39;t have control over the<br>
generation of the string buffer in many use cases, as in Array#join,<br>
String#% or in literals using #{}. Its use would be limited to String#&lt;&lt;.</p>
<p>[murphy]</p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=86082010-03-05T02:13:33Zhgs (Hugh Sasse)hgs@dmu.ac.uk
<ul></ul><p>=begin<br>
On Fri, 5 Mar 2010, Kornelius Kalnbach wrote:</p>
<blockquote>
<p>Doesn&#39;t Ruby allocate already using a &quot;double memory if you run out&quot;<br>
rule? That makes string concatenation (amortized) linear, even if the<br>
string must be moved in the memory.</p>
</blockquote>
<p>Yes (last time I looked), but while this sort of thing is<br>
being looked at I&#39;d like to remind people of the cunning code inside<br>
Lua for handling large string concatenations:</p>
<p><a href="http://www.lua.org/pil/11.6.html">http://www.lua.org/pil/11.6.html</a></p>
<p>It seems relevant in terms of moving data about.</p>
<pre> HTH
Hugh
</pre>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=86092010-03-05T02:29:17Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>=begin<br>
Hi,</p>
<p>2010/3/5 Kornelius Kalnbach <a href="mailto:murphy@rubychan.de">murphy@rubychan.de</a>:</p>
<blockquote>
<p>Doesn&#39;t Ruby allocate already using a &quot;double memory if you run out&quot;<br>
rule? That makes string concatenation (amortized) linear, even if the<br>
string must be moved in the memory.</p>
</blockquote>
<p>Yes, it does. This is why I think experiment is needed.</p>
<p>Because the suggested feature can be used to omit first some<br>
expansions, it will actually reduce time. But I guess if the<br>
reduced time is not so much.</p>
<blockquote>
<p>Even with this API extension, we wouldn&#39;t have control over the<br>
generation of the string buffer in many use cases, as in Array#join,<br>
String#% or in literals using #{}. Its use would be limited to String#&lt;&lt;.</p>
</blockquote>
<p>Absolutely. The feature is hard to use.<br>
Even if we pre-allocated a string, calling some method on the<br>
string may shrink it.</p>
<p>I think we should call the feature just &quot;optimization hint&quot;<br>
rather than API. It is better to think the hint may be even<br>
ignored.</p>
<p>-- <br>
Yusuke ENDOH <a href="mailto:mame@tsg.ne.jp">mame@tsg.ne.jp</a></p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=86102010-03-05T02:39:45Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>=begin<br>
Hi,</p>
<p>2010/3/5 Hugh Sasse <a href="mailto:hgs@dmu.ac.uk">hgs@dmu.ac.uk</a>:</p>
<blockquote>
<p>Yes (last time I looked), but while this sort of thing is<br>
being looked at I&#39;d like to remind people of the cunning code inside<br>
Lua for handling large string concatenations:</p>
<p><a href="http://www.lua.org/pil/11.6.html">http://www.lua.org/pil/11.6.html</a></p>
</blockquote>
<p>At first glance, the document explains the difference of destructive<br>
and non-destructive concatenations, like String#+ and #&lt;&lt;.</p>
<p>It is absolutely different topic from pre-allocation.</p>
<p>-- <br>
Yusuke ENDOH <a href="mailto:mame@tsg.ne.jp">mame@tsg.ne.jp</a></p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=86112010-03-05T02:51:50Zhgs (Hugh Sasse)hgs@dmu.ac.uk
<ul></ul><p>=begin</p>
<p>On Fri, 5 Mar 2010, Yusuke ENDOH wrote:</p>
<blockquote>
<p>Hi,</p>
<p>2010/3/5 Hugh Sasse <a href="mailto:hgs@dmu.ac.uk">hgs@dmu.ac.uk</a>:</p>
<blockquote>
<p>Yes (last time I looked), but while this sort of thing is<br>
being looked at I&#39;d like to remind people of the cunning code inside<br>
Lua for handling large string concatenations:</p>
<p><a href="http://www.lua.org/pil/11.6.html">http://www.lua.org/pil/11.6.html</a></p>
</blockquote>
<p>At first glance, the document explains the difference of destructive<br>
and non-destructive concatenations, like String#+ and #&lt;&lt;.</p>
<p>It is absolutely different topic from pre-allocation.</p>
</blockquote>
<p>It is related: the algorithm constructs large strings from smaller<br>
ones in an elegant way using a &quot;tower of Hanoi&quot;, and if the top<br>
string concatenation gets bigger than the one below it, only then<br>
are they joined together. Result is less copying and merging.<br>
Admittedly, it is less applicable with mutable strings, but while<br>
only the top of the tower is modified, there&#39;d be less churn in<br>
memory.</p>
<pre> Hugh
</pre>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=86152010-03-05T05:08:12Zcoatl (caleb clausen)
<ul></ul><p>=begin<br>
If String#&lt;&lt; is really O(1), there would seem to be little reason to change anything. I still want to investigate this myself when I get a chance. </p>
<p>I do like the tower of hanoi algorithm hugh pointed out. But it seems like a big change from where ruby&#39;s String class is now.<br>
=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=86172010-03-05T06:09:06Zmurphy (Kornelius Kalnbach)murphy@rubychan.de
<ul></ul><p>=begin<br>
On 04.03.10 21:08, caleb clausen wrote:</p>
<blockquote>
<p>If String#&lt;&lt; is really O(1), there would seem to be little reason to<br>
change anything. I still want to investigate this myself when I get a<br>
chance.<br>
O(n), where n is the size of the appended string.</p>
</blockquote>
<p>But I think it&#39;s always worth to look into speedups even if we can&#39;t<br>
expect to change the complexity class. The O-factor may not interesting<br>
to theorists, but it matters greatly to programmers. JRuby, for example,<br>
concats strings almost twice as fast in this benchmark:</p>
<p>require &#39;benchmark&#39;</p>
<p>N = 10_000_000<br>
Benchmark.bm 20 do |results|<br>
results.report &#39;loop&#39; do<br>
N.times { }<br>
end<br>
results.report &quot;&#39;&#39; &lt;&lt;&quot; do<br>
s = &#39;&#39;<br>
N.times { s &lt;&lt; &#39;.&#39; &lt;&lt; &#39;word&#39; }<br>
end<br>
end</p>
<p>ruby19 string_buffer.rb<br>
user system total real<br>
loop 1.240000 0.010000 1.250000 ( 1.255154)<br>
&#39;&#39; &lt;&lt; 5.820000 0.060000 5.880000 ( 5.889959)</p>
<p>jruby string_buffer.rb<br>
user system total real<br>
loop 0.584000 0.000000 0.584000 ( 0.488000)<br>
&#39;&#39; &lt;&lt; 2.900000 0.000000 2.900000 ( 2.900000)</p>
<p>So, there is room for optimization somewhere.</p>
<p>[murphy]</p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=86182010-03-05T09:12:57Zkstephens (Kurt Stephens)
<ul></ul><p>=begin<br>
+1</p>
<p>Preallocation of String would be immensely useful in large ERB templates.</p>
<p>So much so, I was looking to patching into rb_str_resize(str, len) with a method, to get around related performance issues. Ruby Strings already support the difference between the string length and the allocated buffer size -- we need to expose it and ensure that Strings do not automatically &quot;shrink&quot; the internal String buffers. There should probably be a method to explicitly shrink the internal buffer, if needed.</p>
<p>From what I can tell string growth is roughly O(log2 N) because of the power-of-2 buffer resizing. For large buffers making this O(1) for large strings helps performance and reduces malloc() memory fragmentation.</p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=86202010-03-05T11:58:46Zmurphy (Kornelius Kalnbach)murphy@rubychan.de
<ul></ul><p>=begin<br>
On 05.03.10 01:13, Kurt Stephens wrote:</p>
<blockquote>
<p>Preallocation of String would be immensely useful in large ERB<br>
templates.<br>
How big would the buffer size have to be for this template?</p>
</blockquote>
<p>&lt;%= link_to @record.name, @record %&gt;</p>
<blockquote>
<p>So much so, I was looking to patching into rb_str_resize(str, len)<br>
with a method, to get around related performance issues. Ruby<br>
Strings already support the difference between the string length and<br>
the allocated buffer size -- we need to expose it and ensure that<br>
Strings do not automatically &quot;shrink&quot; the internal String buffers.<br>
There should probably be a method to explicitly shrink the internal<br>
buffer, if needed.<br>
This sounds like C to me.</p>
<p>From what I can tell string growth is roughly O(log2 N) because of<br>
the power-of-2 buffer resizing.<br>
You probably mean O(N * log2 N). But even in the worst case (smallest<br>
possible steps, string data must be relocated for each buffer<br>
extension), it&#39;s still just O(N) where N is the length of the final<br>
string. Example:</p>
</blockquote>
<p>s = &#39;&#39;<br>
s &lt;&lt; &#39;1&#39; # allocate 1 byte, relocate 0 bytes, write 1 byte<br>
s &lt;&lt; &#39;2&#39; # allocate 2 bytes, relocate 1 byte, write 1 byte<br>
s &lt;&lt; &#39;3&#39; # allocate 4 bytes, relocate 2 bytes, write 1 byte<br>
s &lt;&lt; &#39;4&#39; # write 1 byte (buffer is long enough)<br>
s &lt;&lt; &#39;5&#39; # allocate 8 bytes, relocate 4 bytes, write 1 byte<br>
...</p>
<p>So it&#39;s exactly n bytes for the writes, and O(n) bytes must be relocated<br>
in total (about 2*n since sum[i=0..k] 2<sup>i</sup> &lt; 2<sup>k+1</sup>). Allocation itself<br>
is O(1) for each step.</p>
<p>But I don&#39;t say it can&#39;t be further optimized in the real world.</p>
<blockquote>
<p>For large buffers making this O(1)<br>
for large strings helps performance and reduces malloc() memory<br>
fragmentation.<br>
Ropes have been mentioned, they provide constant time concatenation, but<br>
have slower iteration and indexing. They also use more memory.</p>
</blockquote>
<p>Is Array#join optimized for the case where all entries are strings? As in:</p>
<p>if array.all? { |obj| obj.is_a? String }<br>
buffer_size = array.map { |str| str.size }.sum<br>
else<br>
buffer_size = whatever<br>
end<br>
result = allocate(buffer_size)<br>
array.each { |str| result &lt;&lt; str }</p>
<p>We could have rope-like performance for concatenation then by using<br>
Arrays, and #join them in linear time to get the final result. Wouldn&#39;t<br>
change the complexity, but is probably faster.</p>
<p>[murphy]</p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=86242010-03-05T17:20:19Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>=begin<br>
Hi,</p>
<p>2010/3/5 Hugh Sasse <a href="mailto:hgs@dmu.ac.uk">hgs@dmu.ac.uk</a>:</p>
<blockquote>
<blockquote>
<p>At first glance, the document explains the difference of destructive<br>
and non-destructive concatenations, like String#+ and #&lt;&lt;.</p>
<p>It is absolutely different topic from pre-allocation.</p>
</blockquote>
<p>It is related: the algorithm constructs large strings from smaller<br>
ones in an elegant way using a &quot;tower of Hanoi&quot;, and if the top<br>
string concatenation gets bigger than the one below it, only then<br>
are they joined together. Result is less copying and merging.</p>
</blockquote>
<p>Ah, sorry. I had to read all more carefully.</p>
<p>The algorithm itself is interesting, but I understand it is<br>
just workaround to implement efficient string buffer by using<br>
<em>immutable</em> strings (because Lua String seems always immutable).</p>
<p>But Ruby String is mutable. Is it also more efficient with<br>
<em>mutable</em> string than current direct concatenation? I wonder<br>
if the algorithm needs more memcpy than the current.</p>
<p>-- <br>
Yusuke ENDOH <a href="mailto:mame@tsg.ne.jp">mame@tsg.ne.jp</a></p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=86252010-03-05T17:58:55Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>=begin<br>
Hi,</p>
<p>2010/3/5 Kornelius Kalnbach <a href="mailto:murphy@rubychan.de">murphy@rubychan.de</a>:</p>
<blockquote>
<blockquote>
<p>Preallocation of String would be immensely useful in large ERB<br>
templates.<br>
How big would the buffer size have to be for this template?</p>
</blockquote>
<p>&lt;%= link_to @record.name, @record %&gt;</p>
</blockquote>
<p>Yes, it is generally difficult to determine the size.</p>
<p>We may be able to estimate it by using domain knowledge in some cases.<br>
(e.g., certain page size is empirically known as about 10KB, etc.)<br>
But if the expectation is disappointed, it will cause wasteful memory<br>
allocation or no speed up.</p>
<blockquote>
<blockquote>
<p>So much so, I was looking to patching into rb_str_resize(str, len)<br>
with a method, to get around related performance issues. Ruby<br>
Strings already support the difference between the string length and<br>
the allocated buffer size -- we need to expose it and ensure that<br>
Strings do not automatically &quot;shrink&quot; the internal String buffers.<br>
There should probably be a method to explicitly shrink the internal<br>
buffer, if needed.<br>
This sounds like C to me.</p>
</blockquote>
</blockquote>
<p>Agreed. It is too easy to waste memory.</p>
<blockquote>
<p>But I don&#39;t say it can&#39;t be further optimized in the real world.</p>
</blockquote>
<p>Agreed. So, we need a benchmark to discuss this.</p>
<blockquote>
<blockquote>
<p>For large buffers making this O(1)<br>
for large strings helps performance and reduces malloc() memory<br>
fragmentation.<br>
Ropes have been mentioned, they provide constant time concatenation, but<br>
have slower iteration and indexing. They also use more memory.</p>
</blockquote>
<p>Is Array#join optimized for the case where all entries are strings?</p>
</blockquote>
<p>I think Array#join already does so.</p>
<p>Thank you very much for saying almost all I want to say :-)</p>
<p>-- <br>
Yusuke ENDOH <a href="mailto:mame@tsg.ne.jp">mame@tsg.ne.jp</a></p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=86272010-03-05T19:07:56Zhgs (Hugh Sasse)hgs@dmu.ac.uk
<ul></ul><p>=begin<br>
On Fri, 5 Mar 2010, Yusuke ENDOH wrote:</p>
<blockquote>
<p>Hi,</p>
<p>2010/3/5 Hugh Sasse <a href="mailto:hgs@dmu.ac.uk">hgs@dmu.ac.uk</a>:</p>
<blockquote>
<blockquote>
<p>At first glance, the document explains the difference of destructive<br>
and non-destructive concatenations, like String#+ and #&lt;&lt;.</p>
<p>It is absolutely different topic from pre-allocation.</p>
</blockquote>
<p>It is related: the algorithm constructs large strings from smaller<br>
ones in an elegant way using a &quot;tower of Hanoi&quot;, and if the top<br>
string concatenation gets bigger than the one below it, only then<br>
are they joined together. Result is less copying and merging.</p>
</blockquote>
<p>Ah, sorry. I had to read all more carefully.</p>
<p>The algorithm itself is interesting, but I understand it is<br>
just workaround to implement efficient string buffer by using<br>
<em>immutable</em> strings (because Lua String seems always immutable).</p>
<p>But Ruby String is mutable. Is it also more efficient with<br>
<em>mutable</em> string than current direct concatenation? I wonder<br>
if the algorithm needs more memcpy than the current.</p>
</blockquote>
<p>Possibly. I&#39;ve not gone into this in much depth. I thought it<br>
might be helpful to raise it in case this would give significant<br>
help to garbage collection. I&#39;m thinking that as the strings get<br>
longer they fill up space in the heap so need to be moved to the<br>
newly allocated space. Dealing with only the top of the &quot;tower of<br>
Hanoi&quot; would be handling smaller chunks. I think this would need to<br>
be tested, but could be worth exploring. Lua is rather quick, and<br>
the article talks about a big speed increase.</p>
<p>On the other hand, it is difficult to decide when to invoke this<br>
algorithm. It is probably too heavy for just joining two strings,<br>
but for reading in lots of chunks and appending them, it could be a<br>
big help. I don&#39;t know how to detect that distinction in user code.<br>
It might be too much work.</p>
<pre> Hugh
</pre>
<blockquote>
<p>-- <br>
Yusuke ENDOH <a href="mailto:mame@tsg.ne.jp">mame@tsg.ne.jp</a></p>
</blockquote>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=86372010-03-06T02:25:19Znow (Nikolai Weibull)now@disu.se
<ul></ul><p>=begin<br>
On Fri, Mar 5, 2010 at 17:25, Caleb Clausen <a href="mailto:vikkous@gmail.com">vikkous@gmail.com</a> wrote:</p>
<blockquote>
<p>On 3/5/10, Yusuke ENDOH <a href="mailto:mame@tsg.ne.jp">mame@tsg.ne.jp</a> wrote:</p>
<blockquote>
<p>2010/3/5 Kornelius Kalnbach <a href="mailto:murphy@rubychan.de">murphy@rubychan.de</a>:</p>
<blockquote>
<p>How big would the buffer size have to be for this template?</p>
<p> &lt;%= link_to @record.name, @record %&gt;</p>
</blockquote>
<p>Yes, it is generally difficult to determine the size.</p>
<p>We may be able to estimate it by using domain knowledge in some cases.<br>
(e.g., certain page size is empirically known as about 10KB, etc.)<br>
But if the expectation is disappointed, it will cause wasteful memory<br>
allocation or no speed up.</p>
</blockquote>
<p>Generally, a given template should expand to about the same size every<br>
time.</p>
</blockquote>
<p>I’m getting the feeling thath the only real use case that we’ve got<br>
for this so far is ERb. Wouldn’t it make more sense to change the way<br>
ERb (and similar “string concatenators”) creates its result?</p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=86412010-03-06T07:44:02Zmurphy (Kornelius Kalnbach)murphy@rubychan.de
<ul></ul><p>=begin<br>
On 05.03.10 18:25, Nikolai Weibull wrote:</p>
<blockquote>
<p>I’m getting the feeling thath the only real use case that we’ve got<br>
for this so far is ERb. Wouldn’t it make more sense to change the way<br>
ERb (and similar “string concatenators”) creates its result?<br>
How about an optimized StringBuffer class in stdlib that&#39;s optimized for<br>
this kind of stuff? But only if we really find a way to speed it up.</p>
</blockquote>
<p>[murphy]</p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=86452010-03-06T10:26:07Zmurphy (Kornelius Kalnbach)murphy@rubychan.de
<ul></ul><p>=begin<br>
On 06.03.10 01:31, Kurt Stephens wrote:</p>
<blockquote>
<p>ERB template rendering is one of my greatest performance issues right now.<br>
Have you really identified String concatenation as the primary issue?<br>
There&#39;s so much more going on when building a template (especially in<br>
Rails).</p>
</blockquote>
<p>Somehow, my feeling is that the actual concatenation of a small string<br>
takes even less time than the calling overhead of String#&lt;&lt; (accessing<br>
self, method lookup, checking arguments, returning the recipient, ...)<br>
We could be talking about, say, 2% of the time your template needs to<br>
compile.</p>
<p>By the way, fact check: ERb really uses String#&lt;&lt;, right?<br>
[murphy]</p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=86472010-03-06T11:49:35Zmurphy (Kornelius Kalnbach)murphy@rubychan.de
<ul><li><strong>File</strong> <a href="/attachments/876/string_buffer.diff">string_buffer.diff</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/876/string_buffer.diff">string_buffer.diff</a> added</li></ul><p>=begin<br>
Here&#39;s a patch that doesn&#39;t work. I don&#39;t know what I&#39;m doing wrong here: RESIZE_CAPA seemed just right.</p>
<p>Any hints?<br>
=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=87732010-03-07T05:44:16Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>=begin<br>
Hi,</p>
<p>2010/3/6 Kornelius Kalnbach <a href="mailto:redmine@ruby-lang.org">redmine@ruby-lang.org</a>:</p>
<blockquote>
<p>Here&#39;s a patch that doesn&#39;t work. I don&#39;t know what I&#39;m doing wrong here: RESIZE_CAPA seemed just right.</p>
</blockquote>
<p>Thank you for your writing a patch!<br>
It seems to work on my environment. What made you think it does<br>
not work?</p>
<p>I confirmed it by the following program:</p>
<p>opt = false<br>
s = &quot;&quot;<br>
t = &quot;x&quot; * 1_000_000<br>
s.buffer(100_000_000) if opt<br>
100.times { s &lt;&lt; t }<br>
p s.size</p>
<p>The above program takes 0.205 sec. when opt is false, and takes<br>
0.195 sec. when opt is true.</p>
<p>But this is artificial example with very big string (1 GB).<br>
The following more realistic case (with 100 KB):</p>
<p>opt = false<br>
1000.times do<br>
s = &quot;&quot;<br>
s.buffer(opt ? 100_001 : 100)<br>
x = &quot;x&quot; * 1000<br>
100.times { s &lt;&lt; x }<br>
end</p>
<p>takes 0.115 sec. when opt is false, 0.130 sec. when opt is true.<br>
I don&#39;t know why it becomes slower, but the story seems not to be<br>
so simple.</p>
<p>Anyway, the overhead of concatenation seems not so big. I doubt<br>
if it is the bottleneck.</p>
<p>-- <br>
Yusuke ENDOH <a href="mailto:mame@tsg.ne.jp">mame@tsg.ne.jp</a></p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=87762010-03-07T08:29:00Zmurphy (Kornelius Kalnbach)murphy@rubychan.de
<ul></ul><p>=begin<br>
On 06.03.10 21:44, Yusuke ENDOH wrote:</p>
<blockquote>
<p>2010/3/6 Kornelius Kalnbach <a href="mailto:redmine@ruby-lang.org">redmine@ruby-lang.org</a>:</p>
<blockquote>
<p>Here&#39;s a patch that doesn&#39;t work. I don&#39;t know what I&#39;m doing wrong here: RESIZE_CAPA seemed just right.<br>
Thank you for your writing a patch!<br>
It seems to work on my environment. What made you think it does<br>
not work?<br>
The fact that the memory taken by the Ruby process didn&#39;t change in top.<br>
I requested a 200MB buffer, and the process was still at 2.8MB.</p>
</blockquote>
<p>Anyway, the overhead of concatenation seems not so big. I doubt<br>
if it is the bottleneck.<br>
That&#39;s my conclusion, too. But the JRuby team seems to have seen some<br>
10% speedup:</p>
</blockquote>
<p><a href="http://gist.github.com/323431">http://gist.github.com/323431</a> - without and with preset buffer</p>
<p>Maybe the question is, is it worth it?</p>
<p>[murphy]</p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=87942010-03-07T14:57:31Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>=begin<br>
Hi,</p>
<p>2010/3/7 Kornelius Kalnbach <a href="mailto:murphy@rubychan.de">murphy@rubychan.de</a>:</p>
<blockquote>
<p>On 06.03.10 21:44, Yusuke ENDOH wrote:</p>
<blockquote>
<p>2010/3/6 Kornelius Kalnbach <a href="mailto:redmine@ruby-lang.org">redmine@ruby-lang.org</a>:</p>
<blockquote>
<p>Here&#39;s a patch that doesn&#39;t work. I don&#39;t know what I&#39;m doing wrong here: RESIZE_CAPA seemed just right.<br>
Thank you for your writing a patch!<br>
It seems to work on my environment. What made you think it does<br>
not work?<br>
The fact that the memory taken by the Ruby process didn&#39;t change in top.<br>
I requested a 200MB buffer, and the process was still at 2.8MB.</p>
</blockquote>
</blockquote>
</blockquote>
<p>Hmm, I guess you saw physical memory size allocated.<br>
On many platform, physical memory is not allocated until<br>
writing into the page actually occurs.</p>
<p>If you use Linux, see virtual memory size (VSZ column of<br>
ps command), instead of %MEM. It would reflect your huge<br>
allocation.</p>
<p>The performance may be improved by using madvise, but I<br>
don&#39;t think it should be supported by ruby core.</p>
<p>-- <br>
Yusuke ENDOH <a href="mailto:mame@tsg.ne.jp">mame@tsg.ne.jp</a></p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=87962010-03-07T18:38:08Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul></ul><p>=begin<br>
Hi</p>
<p>At least, Linux madvise doesn&#39;t improve the performance in such case. current cruby + linux(glibc) realloc implementation makes very optimal behavior.<br>
a big size string makes a big size realloc() and a big size realloc() is using mremap(2) internally. Then, realloc() doesn&#39;t makes string copy at all.</p>
<p>IOW, the main benefit of string.buffer() is to reduce realloc() cost. but it is already zero. so I don&#39;t think it is worth method. sadly almost developers never use such no improve method, I expect.</p>
<p>Instead, I would propose improve JRuby&#39;s internal string representation and string concat implementation.</p>
<p>Thanks.<br>
=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=87982010-03-07T18:47:45Zwanabe (_ wanabe)s.wanabe@gmail.com
<ul></ul><p>=begin<br>
Hi, </p>
<blockquote>
<p>opt = false<br>
1000.times do<br>
s = &quot;&quot;<br>
s.buffer(opt ? 100_001 : 100)<br>
x = &quot;x&quot; * 1000<br>
100.times { s &lt;&lt; x }<br>
end</p>
<p>takes 0.115 sec. when opt is false, 0.130 sec. when opt is true.</p>
</blockquote>
<p>I tried too.<br>
Interestingly, it gets faster on my environment.</p>
<p>$ cat test.rb<br>
require &#39;benchmark&#39;<br>
opt = ARGV[0]<br>
list = Array.new(10) do<br>
Benchmark.realtime do<br>
1000.times do<br>
s = &quot;&quot;<br>
s.buffer(opt ? 100_001 : 100)<br>
x = &quot;x&quot; * 1000<br>
100.times { s &lt;&lt; x }<br>
end<br>
end<br>
end<br>
list.sort!<br>
p list.first, list.last</p>
<p>$ ./ruby -v -Ilib test.rb opt<br>
ruby 1.9.2dev (2010-03-07 trunk 26843) [i386-mingw32]<br>
0.1780099868774414<br>
0.18601107597351074</p>
<p>$ ./ruby -v -Ilib test.rb<br>
ruby 1.9.2dev (2010-03-07 trunk 26843) [i386-mingw32]<br>
0.21401190757751465<br>
0.22301316261291504</p>
<p>But, I guess, the patch may not work as expected in some cases.<br>
Some methods (String#succ!, sub!, []=, and so on) can let CAPA shrink.<br>
=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=87992010-03-07T19:58:52Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>=begin<br>
Hi,</p>
<p>2010/3/5 Kornelius Kalnbach <a href="mailto:murphy@rubychan.de">murphy@rubychan.de</a>:</p>
<blockquote>
<p>JRuby, for example,<br>
concats strings almost twice as fast in this benchmark:</p>
<p>require &#39;benchmark&#39;</p>
<p>N = 10_000_000<br>
Benchmark.bm 20 do |results|<br>
results.report &#39;loop&#39; do<br>
N.times { }<br>
end<br>
results.report &quot;&#39;&#39; &lt;&lt;&quot; do<br>
s = &#39;&#39;<br>
N.times { s &lt;&lt; &#39;.&#39; &lt;&lt; &#39;word&#39; }<br>
end<br>
end</p>
<p>ruby19 string_buffer.rb<br>
user system total real<br>
loop 1.240000 0.010000 1.250000 ( 1.255154)<br>
&#39;&#39; &lt;&lt; 5.820000 0.060000 5.880000 ( 5.889959)</p>
<p>jruby string_buffer.rb<br>
user system total real<br>
loop 0.584000 0.000000 0.584000 ( 0.488000)<br>
&#39;&#39; &lt;&lt; 2.900000 0.000000 2.900000 ( 2.900000)</p>
</blockquote>
<p>I wonder why such a simple loop is slower than jruby...?</p>
<p>I retested.</p>
<p>ruby19<br>
user system total real<br>
loop 2.100000 0.000000 2.100000 ( 2.095623)<br>
&#39;&#39; &lt;&lt; 11.720000 0.040000 11.760000 ( 11.768111)</p>
<p>jruby<br>
user system total real<br>
loop 2.263000 0.000000 2.263000 ( 2.228000)<br>
&#39;&#39; &lt;&lt; 10.193000 0.000000 10.193000 ( 10.193000)</p>
<p>Ko1 told me that GC makes the second benchmark slower than JRuby.<br>
In MRI, a string literal is duplicated whenever evaluated.<br>
I moved the literals out of the loop:</p>
<p>results.report &quot;&#39;&#39; &lt;&lt;&quot; do<br>
s = &#39;&#39;<br>
s1, s2 = &#39;.&#39;, &#39;word&#39;<br>
N.times { s &lt;&lt; s1 &lt;&lt; s2 }<br>
end</p>
<p>ruby19<br>
user system total real<br>
&#39;&#39; &lt;&lt; 6.810000 0.040000 6.850000 ( 6.851979)</p>
<p>jruby<br>
user system total real<br>
&#39;&#39; &lt;&lt; 7.159000 0.000000 7.159000 ( 7.126000)</p>
<p>Indeed, there is room for optimization in MRI, but in this case,<br>
it is not in string concatenation, I guess.</p>
<p>-- <br>
Yusuke ENDOH <a href="mailto:mame@tsg.ne.jp">mame@tsg.ne.jp</a></p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=88002010-03-07T20:46:17Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul></ul><p>=begin</p>
<blockquote>
<p>$ cat test.rb<br>
require &#39;benchmark&#39;<br>
opt = ARGV[0]<br>
list = Array.new(10) do<br>
Benchmark.realtime do<br>
1000.times do<br>
s = &quot;&quot;<br>
s.buffer(opt ? 100_001 : 100)<br>
x = &quot;x&quot; * 1000<br>
100.times { s &lt;&lt; x }<br>
end<br>
end<br>
end<br>
list.sort!<br>
p list.first, list.last</p>
<p>$ ./ruby -v -Ilib test.rb opt<br>
ruby 1.9.2dev (2010-03-07 trunk 26843) [i386-mingw32]<br>
0.1780099868774414<br>
0.18601107597351074</p>
<p>$ ./ruby -v -Ilib test.rb<br>
ruby 1.9.2dev (2010-03-07 trunk 26843) [i386-mingw32]<br>
0.21401190757751465<br>
0.22301316261291504</p>
</blockquote>
<p>Ah, yes. &quot;x&quot; * 1000 is not so big string. then, its realloc() doesn&#39;t use mremap.<br>
It mean string concat(i.e. &quot;&lt;&lt;&quot; operator) cause string copy on each time. but is<br>
this real issue? Does small string copy makes big peformance issue? when? So, I<br>
think we need good realistic benchmark.</p>
<p>Thanks.<br>
=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=88032010-03-07T22:21:09Zmurphy (Kornelius Kalnbach)murphy@rubychan.de
<ul></ul><p>=begin<br>
On 07.03.10 06:57, Yusuke ENDOH wrote:</p>
<blockquote>
<p>Hmm, I guess you saw physical memory size allocated.<br>
On many platform, physical memory is not allocated until<br>
writing into the page actually occurs.<br>
I didn&#39;t know that. Thanks!<br>
[murphy]</p>
</blockquote>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=88482010-03-08T00:34:52Zheadius (Charles Nutter)headius@headius.com
<ul></ul><p>=begin<br>
On Sun, Mar 7, 2010 at 4:58 AM, Yusuke ENDOH <a href="mailto:mame@tsg.ne.jp">mame@tsg.ne.jp</a> wrote:</p>
<blockquote>
<p>Ko1 told me that GC makes the second benchmark slower than JRuby.<br>
In MRI, a string literal is duplicated whenever evaluated.<br>
I moved the literals out of the loop:</p>
</blockquote>
<p>JRuby behaves the same, since literal strings are still separate<br>
objects and mutable.</p>
<blockquote>
<p> results.report &quot;&#39;&#39; &lt;&lt;&quot; do<br>
s = &#39;&#39;<br>
s1, s2 = &#39;.&#39;, &#39;word&#39;<br>
N.times { s &lt;&lt; s1 &lt;&lt; s2 }<br>
end</p>
<p> ruby19<br>
user system total real<br>
&#39;&#39; &lt;&lt; 6.810000 0.040000 6.850000 ( 6.851979)</p>
<p> jruby<br>
user system total real<br>
&#39;&#39; &lt;&lt; 7.159000 0.000000 7.159000 ( 7.126000)</p>
<p>Indeed, there is room for optimization in MRI, but in this case,<br>
it is not in string concatenation, I guess.</p>
</blockquote>
<p>My numbers came out somewhat differently. Make sure you&#39;re running<br>
with the JVM&#39;s &quot;server&quot; mode if you run on Hotspot (Sun/OpenJDK):</p>
<p>~/projects/jruby ➔ jruby --server string_bench.rb<br>
user system total real<br>
loop 0.572000 0.000000 0.572000 ( 0.523000)<br>
&#39;&#39; &lt;&lt; 1.470000 0.000000 1.470000 ( 1.470000)</p>
<p>~/projects/jruby ➔ ruby1.9 string_bench.rb<br>
user system total real<br>
loop 0.810000 0.000000 0.810000 ( 0.838414)<br>
&#39;&#39; &lt;&lt; 2.670000 0.040000 2.710000 ( 2.733041)</p>
<p>Here&#39;s numbers with a prototypical String.buffer implementation:</p>
<p>~/projects/jruby ➔ jruby --server string_bench.rb<br>
user system total real<br>
loop 0.655000 0.000000 0.655000 ( 0.606000)<br>
&#39;&#39; &lt;&lt; 1.390000 0.000000 1.390000 ( 1.390000)<br>
user system total real<br>
loop 0.321000 0.000000 0.321000 ( 0.321000)<br>
&#39;&#39; &lt;&lt; 1.241000 0.000000 1.241000 ( 1.241000)<br>
user system total real<br>
loop 0.314000 0.000000 0.314000 ( 0.314000)<br>
&#39;&#39; &lt;&lt; 1.229000 0.000000 1.229000 ( 1.229000)</p>
<p>Of course, this 10-15% improvement could simply be because the JVM<br>
does not provide a &quot;realloc&quot; for its arrays (for various reasons, some<br>
of them presumably because it moves objects around in memory a lot).<br>
In order to grow a string, we have to allocate a new array and copy<br>
its contents. Under those circumstances, String.buffer makes a lot of<br>
sense, since the copying can get expensive at large sizes.</p>
<p>I don&#39;t know enough about MRI internals to implement an equivalent<br>
String.buffer, but here&#39;s the patch to JRuby:</p>
<p>diff --git a/src/org/jruby/RubyString.java b/src/org/jruby/RubyString.java<br>
index 71e6b63..e618ec8 100644<br>
--- a/src/org/jruby/RubyString.java<br>
+++ b/src/org/jruby/RubyString.java<br>
@@ -451,6 +451,11 @@ public class RubyString extends RubyObject<br>
implements EncodingCapable {<br>
public static RubyString newStringLight(Ruby runtime, int size) {<br>
return new RubyString(runtime, runtime.getString(), new<br>
ByteList(size), false);<br>
}<br>
+<br>
+ @JRubyMethod(meta = true)<br>
+ public static IRubyObject buffer(ThreadContext context,<br>
IRubyObject self, IRubyObject size) {<br>
+ return newStringLight(context.getRuntime(),<br>
(int)size.convertToInteger().getLongValue());<br>
+ }</p>
<pre> public static RubyString newString(Ruby runtime, CharSequence str) {
return new RubyString(runtime, runtime.getString(), str);
</pre>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=88592010-03-08T12:40:17Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>=begin<br>
Hi,</p>
<p>2010/3/8 Charles Oliver Nutter <a href="mailto:headius@headius.com">headius@headius.com</a>:</p>
<blockquote>
<blockquote>
<p>Indeed, there is room for optimization in MRI, but in this case,<br>
it is not in string concatenation, I guess.</p>
</blockquote>
<p>My numbers came out somewhat differently. Make sure you&#39;re running<br>
with the JVM&#39;s &quot;server&quot; mode if you run on Hotspot (Sun/OpenJDK):</p>
</blockquote>
<p>Ah, I didn&#39;t specify the option:</p>
<pre> user system total real
</pre>
<p>loop 1.471000 0.000000 1.471000 ( 1.248000)<br>
&#39;&#39; &lt;&lt; 5.906000 0.000000 5.906000 ( 5.906000)</p>
<p>JRuby is great :-)</p>
<blockquote>
<p>Here&#39;s numbers with a prototypical String.buffer implementation:</p>
<p><em>snip</em></p>
<p>Of course, this 10-15% improvement could simply be because the JVM<br>
does not provide a &quot;realloc&quot; for its arrays (for various reasons, some<br>
of them presumably because it moves objects around in memory a lot).<br>
In order to grow a string, we have to allocate a new array and copy<br>
its contents. Under those circumstances, String.buffer makes a lot of<br>
sense, since the copying can get expensive at large sizes.</p>
</blockquote>
<p>Ok, we finally grasped the situation. To sum up:</p>
<ul>
<li>This feature is meaningless with MRI, at least, on Linux.</li>
<li>But it serves as a workaround for slow string concatenation of JRuby
that cannot be optimized due to JVM.</li>
<li>Does MRI provide the feature just for script compatibility?</li>
</ul>
<p>I cannot make the judgment. Please wait for matz.</p>
<p>-- <br>
Yusuke ENDOH <a href="mailto:mame@tsg.ne.jp">mame@tsg.ne.jp</a></p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=96702010-04-02T08:10:44Zznz (Kazuhiro NISHIYAMA)
<ul><li><strong>Target version</strong> set to <i>2.0.0</i></li></ul><p>=begin</p>
<p>=end</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=235972012-02-08T03:15:33Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul></ul><p>Matz, should we close this ticket?</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=268282012-05-26T04:25:53ZAnonymous
<ul></ul><p>Uh oh, this discussion is already a pain to read.</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=295102012-09-19T16:17:31Zheadius (Charles Nutter)headius@headius.com
<ul></ul><p>Trying to wake this beast up...</p>
<p>mame: I don&#39;t think we can say it would not help MRI without testing an implementation, can we? I misunderstood realloc in my comment from two years (!!!) ago According to realloc docs:</p>
<pre> The realloc() function tries to change the size of the allocation pointed to by ptr to size, and returns ptr. If there is not enough room to enlarge the memory allocation pointed
to by ptr, realloc() creates a new allocation, copies as much of the old data pointed to by ptr as will fit to the new allocation, frees the old allocation, and returns a pointer to
the allocated memory.
</pre>
<p>This seems to indicate that except under rare circumstances where the memory after the pointer is known to be free, realloc will behave exactly like the JVM, creating a new pointer, copying data, and freeing the old pointer.</p>
<p>To me, this means that a pre-allocated String construction method is most definitely useful.</p>
<p>It also occurred to me recently that String.new does not accept an integer argument. Perhaps all we need to do is add a String.new form that takes Integer, and possibly an optional fill byte/codepoint/single-char string?</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=295112012-09-19T16:36:38Zshyouhei (Shyouhei Urabe)shyouhei@ruby-lang.org
<ul></ul><p>Just a technical comment, not for the feature itself:</p>
<p>headius (Charles Nutter) wrote:</p>
<blockquote>
<pre> to by ptr, realloc() creates a new allocation, copies as much of the old data
</pre></blockquote>
<p>This &quot;copy&quot; is done by mremap(2) system call, which just reassembles OS&#39;s process-private virtual memory map to move a region of memory to another, in O(1). That is what mame said in &quot;This feature is meaningless with MRI, at least, on Linux.&quot;</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=306692012-10-15T05:02:17Zheadius (Charles Nutter)headius@headius.com
<ul></ul><p>I do not believe for a moment that realloc or mremap can in all cases perform the operation in O(1) time, and the docs seem to agree with me...first based on the doc above for realloc, and then for this doc on mremap:</p>
<pre> MREMAP_MAYMOVE
By default, if there is not sufficient space to expand a mapping at its current location, then mremap() fails. If this flag
is specified, then the kernel is permitted to relocate the mapping to a new virtual address, if necessary. If the mapping is
relocated, then absolute pointers into the old mapping location become invalid (offsets relative to the starting address of
the mapping should be employed).
</pre>
<p>It seems to me that preallocation is most definitely useful, even in the presence of realloc and mremap. I would like to see it added.</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=316242012-10-27T04:39:37Zko1 (Koichi Sasada)
<ul></ul><p>Who can judge this ticket?<br>
I can&#39;t understand this issue because there is long discussion.<br>
Could anyone summarize a conclusion?</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=317442012-10-27T10:44:46Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>Hello headius,</p>
<p>headius (Charles Nutter) wrote:</p>
<blockquote>
<p>Trying to wake this beast up...</p>
<p>mame: I don&#39;t think we can say it would not help MRI without testing an implementation, can we? I misunderstood realloc in my comment from two years (!!!) ago According to realloc docs:</p>
</blockquote>
<p>Linux&#39;s realloc(3) man-page does NOT say that.</p>
<p><a href="http://www.kernel.org/doc/man-pages/online/pages/man3/malloc.3.html">http://www.kernel.org/doc/man-pages/online/pages/man3/malloc.3.html</a></p>
<p>Perhaps you saw os x&#39;s realloc?<br>
I wonder this issue is valid on os x.<br>
Anyone can conduct a quantitative investigation?</p>
<p>headius (Charles Nutter) wrote:</p>
<blockquote>
<p>I do not believe for a moment that realloc or mremap can in all cases perform the operation in O(1) time, and the docs seem to agree with me...first based on the doc above for realloc, and then for this doc on mremap:</p>
</blockquote>
<p>Looks irrelevant. I guess realloc(3) just uses mremap with MREMAP_MAYMOVE<br>
internally.</p>
<p>-- <br>
Yusuke Endoh <a href="mailto:mame@tsg.ne.jp">mame@tsg.ne.jp</a></p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=317452012-10-27T10:50:01Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>ko1 (Koichi Sasada) wrote:</p>
<blockquote>
<p>Who can judge this ticket?<br>
I can&#39;t understand this issue because there is long discussion.<br>
Could anyone summarize a conclusion?</p>
</blockquote>
<p>Not concluded, but currently we know:</p>
<ul>
<li><p>This feature provides &quot;a Ruby-level workaround&quot; for a poor realloc<br>
implementation on some runtime, such as JVM, and possibly os x.</p></li>
<li><p>But at least, Linux (precisely, libc?)&#39;s realloc is well implemented.<br>
So this feature is meaningless in practice, in such environment.</p></li>
</ul>
<p>-- <br>
Yusuke Endoh <a href="mailto:mame@tsg.ne.jp">mame@tsg.ne.jp</a></p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=317662012-10-27T12:02:25Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul><li><strong>Target version</strong> changed from <i>2.0.0</i> to <i>next minor</i></li></ul> Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=317822012-10-27T15:51:45Zheadius (Charles Nutter)headius@headius.com
<ul></ul><p>mame: I do not understand how there&#39;s any way Linux would be different from any other platform. If there&#39;s no room in contiguous memory to expand a pointer, the data must be moved elsewhere in memory. Am I missing something?</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=317902012-10-27T19:33:12Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul></ul><p>headius (Charles Nutter) wrote:</p>
<blockquote>
<p>mame: I do not understand how there&#39;s any way Linux would be different from any other platform. If there&#39;s no room in contiguous memory to expand a pointer, the data must be moved elsewhere in memory. Am I missing something?</p>
</blockquote>
<p>Almost all recent practical operating systems are using the virtual memory mechanism.</p>
<p><a href="http://en.wikipedia.org/wiki/Virtual_memory">http://en.wikipedia.org/wiki/Virtual_memory</a></p>
<p>In the OS based on the mechanism, there is a mapping from virtual memory addresses to physical ones.<br>
By changing the map, contiguous virtual memory addresses can be (re)assigned without moving physical memory data.<br>
(This is why the system call in question is named &quot;remap&quot;, I think)</p>
<p>-- <br>
Yusuke Endoh <a href="mailto:mame@tsg.ne.jp">mame@tsg.ne.jp</a></p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=321212012-11-01T05:06:00Zheadius (Charles Nutter)headius@headius.com
<ul></ul><p>So we have something like this:</p>
<p>Platforms known to not support any sort of O(1) realloc: JVM</p>
<p>Platforms that may not support O(1) realloc: OS X, others?</p>
<p>Platforms that do (should?) support O(1) realloc: Linux</p>
<p>In any case, I still see that there&#39;s value in this feature:</p>
<ul>
<li>It would help JRuby and all runtimes that run on non-efficient-realloc platforms.</li>
<li>It does no harm and matches Array.new behavior.</li>
<li>For folks doing crypto stuff that want to know exactly how big the buffer is right away, this provides a way to do so.</li>
</ul>
<p>I won&#39;t try to argue whether realloc is consistently efficient across platforms or not. It seems like it&#39;s not guaranteed to be on any platform.</p>
<p>It&#39;s also such a tiny addition...why not?</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=321292012-11-01T05:53:18Zkosaki (Motohiro KOSAKI)kosaki.motohiro@gmail.com
<ul></ul><blockquote>
<p>So we have something like this:</p>
<p>Platforms known to not support any sort of O(1) realloc: JVM</p>
<p>Platforms that may not support O(1) realloc: OS X, others?</p>
<p>Platforms that do (should?) support O(1) realloc: Linux</p>
<p>In any case, I still see that there&#39;s value in this feature:</p>
<ul>
<li>It would help JRuby and all runtimes that run on non-efficient-realloc platforms.</li>
<li>It does no harm and matches Array.new behavior.</li>
<li>For folks doing crypto stuff that want to know exactly how big the buffer is right away, this provides a way to do so.</li>
</ul>
<p>I won&#39;t try to argue whether realloc is consistently efficient across platforms or not. It seems like it&#39;s not guaranteed to be on any platform.</p>
<p>It&#39;s also such a tiny addition...why not?</p>
</blockquote>
<p>I don&#39;t imagine a lot of people take a string.buffer game for<br>
optimization if it doesn&#39;t<br>
have big benefit. now, this feature is unclear how much useful out of<br>
jvm and how<br>
much useful on jvm. afaik, nobody show realistic benchmark result nor<br>
encompassing<br>
affect platform lists. I&#39;m not incline to agree <em>guess</em> game.</p>
<p>If the benefit is not so much, the feature will be dead and forgotten quickly.</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=321302012-11-01T05:53:18Znormalperson (Eric Wong)normalperson@yhbt.net
<ul></ul><p>&quot;headius (Charles Nutter)&quot; <a href="mailto:headius@headius.com">headius@headius.com</a> wrote:</p>
<blockquote>
<ul>
<li>For folks doing crypto stuff that want to know exactly how big the
buffer is right away, this provides a way to do so.</li>
</ul>
</blockquote>
<p>I&#39;m not sure exactly what you mean. Do you mean to avoid leaving<br>
sensitive data in the heap from realloc()? Yes it would help, but<br>
I think this is a poor API for that purpose.</p>
<p>Perhaps special methods like String#secure_cat and String#secure_wipe<br>
is more obvious for security-concious users.</p>
<blockquote>
<p>I won&#39;t try to argue whether realloc is consistently efficient across<br>
platforms or not. It seems like it&#39;s not guaranteed to be on any<br>
platform.</p>
</blockquote>
<p>I absolutely agree this can help performance regardless of platform,<br>
however...</p>
<blockquote>
<p>It&#39;s also such a tiny addition...why not?</p>
</blockquote>
<p>I&#39;m not a VM expert, but shouldn&#39;t it be possible for the VM to track<br>
the growth of strings allocated at different call sites and<br>
automatically optimize preallocations as time goes on?</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=321312012-11-01T09:30:28Zheadius (Charles Nutter)headius@headius.com
<ul></ul><p>On Wed, Oct 31, 2012 at 3:43 PM, Eric Wong <a href="mailto:normalperson@yhbt.net">normalperson@yhbt.net</a> wrote:</p>
<blockquote>
<p>&quot;headius (Charles Nutter)&quot; <a href="mailto:headius@headius.com">headius@headius.com</a> wrote:</p>
<blockquote>
<ul>
<li>For folks doing crypto stuff that want to know exactly how big the
buffer is right away, this provides a way to do so.</li>
</ul>
</blockquote>
<p>I&#39;m not sure exactly what you mean. Do you mean to avoid leaving<br>
sensitive data in the heap from realloc()? Yes it would help, but<br>
I think this is a poor API for that purpose.</p>
</blockquote>
<p>For security, you don&#39;t want strings to be growing and copying stuff<br>
around in memory, so being able to allocate a specific size ahead of<br>
time is useful.</p>
<blockquote>
<p>Perhaps special methods like String#secure_cat and String#secure_wipe<br>
is more obvious for security-concious users.</p>
</blockquote>
<p>And if secure_cat didn&#39;t use realloc (because it could leave sensitive<br>
data on the heap) you&#39;d <em>still</em> have a need to preallocate what you<br>
need. That doesn&#39;t solve anyhting.</p>
<blockquote>
<blockquote>
<p>I won&#39;t try to argue whether realloc is consistently efficient across<br>
platforms or not. It seems like it&#39;s not guaranteed to be on any<br>
platform.</p>
</blockquote>
<p>I absolutely agree this can help performance regardless of platform,<br>
however...</p>
<blockquote>
<p>It&#39;s also such a tiny addition...why not?</p>
</blockquote>
<p>I&#39;m not a VM expert, but shouldn&#39;t it be possible for the VM to track<br>
the growth of strings allocated at different call sites and<br>
automatically optimize preallocations as time goes on?</p>
</blockquote>
<p>A sufficiently smart compiler can do anything, of course. However I<br>
know of no VMs that track the eventual size of objects allocated at a<br>
given call site and eagerly allocate that memory, and such an<br>
optimization would be very tricky to do right.</p>
<ul>
<li>Charlie</li>
</ul>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=567202016-01-26T23:31:56Zmame (Yusuke Endoh)mame@ruby-lang.org
<ul><li><strong>Related to</strong> <i><a class="issue tracker-2 status-5 priority-4 priority-default closed" href="/issues/12024">Feature #12024</a>: Add String.buffer, for creating strings with large capacities</i> added</li></ul> Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=597522016-07-21T19:22:49Zheadius (Charles Nutter)headius@headius.com
<ul></ul><p>I accept <a href="https://bugs.ruby-lang.org/issues/12024" class="external">String.new(capacity: size)</a> as an acceptable implementation of this request.</p>
Ruby trunk - Feature #905: Add String.new(fixnum) to preallocate large bufferhttps://bugs.ruby-lang.org/issues/905?journal_id=597952016-07-26T07:33:58Zshyouhei (Shyouhei Urabe)shyouhei@ruby-lang.org
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Closed</i></li></ul><p>Closing. Please use String.new with capacity.</p>