Clojure JIRAhttp://dev.clojure.org/jira
This file is an XML representation of an issueen-us4.464925-07-2011[CLJS-728] (get [1 2 3] nil) -> 1http://dev.clojure.org/jira/browse/CLJS-728
ClojureScriptCLJS-728(get [1 2 3] nil) -> 1DefectMajorClosedCompletedDavid NolenDavid NolenMon, 16 Dec 2013 17:00:07 -0600Mon, 24 Feb 2014 06:04:06 -0600Mon, 24 Feb 2014 06:04:03 -060002<p>This is due to javascript coercing null to 0 in the various places where -assoc and -nth check (&lt;= 0 n) or (bit-and n MAGIC-NUMBER). E.g. in PersistentVector:</p>
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java">IIndexed
(-nth [coll n]
(aget (array-<span class="code-keyword">for</span> coll n) (bit-and n 0x01f)))
(-nth [coll n not-found]
(<span class="code-keyword">if</span> (and (&lt;= 0 n) (&lt; n cnt))
(-nth coll n)
not-found))</pre>
</div></div>
<p>I can track all these down and patch them by changing the guards to <tt>(and (not (nil? n)) (&lt;= 0 n) (&lt; n cnt))</tt> (or maybe even <tt>(or (zero? n) (&lt; 0 n cnt))</tt>) and adding <tt>(when-not (nil? n) ...)</tt> guards where appropriate.</p>
<p>However I wasn't sure if there's any intention to make the comparison operators <tt>&lt;= &lt; &gt; &gt;=</tt> null-safe. I.e. instead of adding these extra nil checks, should we instead make sure comparing against nil always returns false or throws an exception? (We would still need nil checks to make sure (get <span class="error">&#91;1&#93;</span> nil) returns nil like Clojure.)</p><p>Currently failing test cases.</p>
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java">(<span class="code-keyword">assert</span> (nil? (get [1 2] nil)))
(<span class="code-keyword">assert</span> (= :fail (<span class="code-keyword">try</span> (nth [1 2] nil) (<span class="code-keyword">catch</span> js/Error e :fail))))
(<span class="code-keyword">assert</span> (= 4 (get [1 2] nil 4)))
(<span class="code-keyword">assert</span> (= 4 (nth [1 2] nil 4)))
(<span class="code-keyword">assert</span> (= :fail (<span class="code-keyword">try</span> (assoc [1 2] nil 4) (<span class="code-keyword">catch</span> js/Error e :fail))))
(<span class="code-keyword">assert</span> (nil? (get (<span class="code-keyword">transient</span> [1 2]) nil)))
(<span class="code-keyword">assert</span> (= :fail (<span class="code-keyword">try</span> (nth (<span class="code-keyword">transient</span> [1 2]) nil) (<span class="code-keyword">catch</span> js/Error e :fail))))
(<span class="code-keyword">assert</span> (= 4 (get (<span class="code-keyword">transient</span> [1 2]) nil 4)))
(<span class="code-keyword">assert</span> (= 4 (nth (<span class="code-keyword">transient</span> [1 2]) nil 4)))
(<span class="code-keyword">assert</span> (= :fail (<span class="code-keyword">try</span> (assoc! (<span class="code-keyword">transient</span> [1 2]) nil 4)
(<span class="code-keyword">catch</span> js/Error e :fail))))
(<span class="code-keyword">assert</span> (nil? (get (subvec [1 2] 1) nil)))
(<span class="code-keyword">assert</span> (= :fail (<span class="code-keyword">try</span> (nth (subvec [1 2] 1) nil) (<span class="code-keyword">catch</span> js/Error e :fail))))
(<span class="code-keyword">assert</span> (= 4 (get (subvec [1 2] 1) nil 4)))
(<span class="code-keyword">assert</span> (= 4 (nth (subvec [1 2] 1) nil 4)))
(<span class="code-keyword">assert</span> (= :fail (<span class="code-keyword">try</span> (assoc (subvec [1 2] 1) nil 4)
(<span class="code-keyword">catch</span> js/Error e :fail))))</pre>
</div></div><p>Actually this fails for other values that may coerce to zero as well (e.g. false, numeric strings, etc). I solved this with a <tt>number?</tt> type check in appropriate places. Patch attached with comprehensive tests.</p><p>Putting <tt>number?</tt> checks on critical paths is unacceptable for performance reasons. Ideally we only add one <tt>number?</tt> check. And even then we'd want to know the performance implications.</p><p>Is <tt>number?</tt> really that slow? At least on v8 it seems to be twice as fast as <tt>nil?</tt>. Some repl checks in v8 don't seem to show a significant performance regression, but without a proper profiling harness and test cases I don't feel comfortable saying that for sure.</p>
<p>I think we can give up some type safety but still catch silent-coerce-to-zero cases by expressing the conditional as <tt>(and (&lt; i (.-cnt pv)) (or (&lt; 0 i) (zero? i)))</tt>. I'll try that later today and try to compare the performance.</p>
<p>Your concern about checking bounds only once can be resolved by moving the guts of <tt>array-for</tt> into <tt>unchecked-array-for</tt> and <tt>editable-array-for</tt> into <tt>unchecked-editable-array-for</tt>. This is probably a good idea even if we do nothing about (get <span class="error">&#91;1&#93;</span> nil) --at least <tt>-pop</tt> and <tt>-kv-reduce</tt> would benefit along with the not-found arities of <tt>-nth</tt> and <tt>-lookup</tt>. Perhaps a new ticket for this?</p><p>No coercions. Any check should be higher up the chain, like at <tt>nth</tt> and nowhere else.</p><p>What do you mean by "no coercions"? Do you mean edge cases like {{(get <span class="error">&#91;1 2&#93;</span> "1") =&gt; 2}} are unacceptable even if it means we can avoid a <tt>typeof</tt> check?</p><p>No coercions means no coercions. The only valid key to an <tt>IVector</tt> instance for lookup is a JavaScript number.</p><p>OK, then in that case a <tt>number?</tt> check somewhere is absolutely required, but I can move code around a bit to ensure that no type or bounds checking is done more than once for persistent and transient vectors (there is already a fair amount of redundant bounds checking). I'll work on this tonight.</p>
<p>(I sent a CA about 10 days ago but I see my name isn't on the <a href="http://clojure.org/contributing">CA page</a> yet. It's probably just delayed by holiday stuff.)</p><p>Updated patch.</p>
<ul>
<li>I've removed all redundant bounds checking that I could easily eliminate using <tt>unchecked-array-for</tt> in place of <tt>array-for</tt>.</li>
<li>IVector and IIndexed protocol implementations don't have <tt>number?</tt> checks in them. <tt>nth</tt> does the check.</li>
<li>But the <tt>number?</tt> check <b>is</b> in the IAssociative and ILookup implementations for these types.</li>
<li>The logic here is that for protocols which semantically require a numeric index, we assume the user of the protocol (<tt>nth</tt> or <tt>ci-reduce</tt> for example) has already done the necessary type checking on the outside. But for protocols which don't have this semantic (e.g. <tt>ILookup</tt>), we need to do the check in the protocol implementation.</li>
<li>I ran <tt>script/benchmark</tt> and compared master and this patch on the v8 and spidermonkey engines. I didn't see any performance regressions.</li>
</ul>
<p>Can we remove old patches, thanks.</p><p><br/>
JIRA won't let me delete them. The latest patch is cljs-728-nth-number.patch</p><p>This patch does too much. Why are we not just checking in <tt>nth</tt>?</p><p>Remember the problem isn't just (get <span class="error">&#91;1&#93;</span> nil), it's also (get <span class="error">&#91;1&#93;</span> :a) and also on Range, subvectors, and transientvector types (any non-numeric lookup on an indexed type). So a nil? check in <tt>nth</tt> is nowhere near enough.</p>
<p>For indexed types, <tt>get</tt> calls <tt>-lookup</tt> calls <tt>-nth</tt> (<tt>nth</tt> is not called), and you didn't want to do a <tt>nil?</tt> or <tt>number?</tt> check inside <tt>-nth</tt> (for good reason, as we can assume callers of <tt>-nth</tt> know they need to supply a number). This patch puts the number check on indexed types in <tt>-lookup</tt> (rationale given in my previous comment) and makes sure that other core code is calling <tt>-nth</tt> safely without checks.</p>
<p>The patch also removes some bounds and type checking that was redundant (the <tt>unchecked-array-for</tt> and <tt>-assoc-n</tt> stuff) which I noticed because I had to examine all those calling paths.</p>
<p>I suppose an alternative that would change less code is to have <tt>get</tt> use <tt>IIndexed</tt> if present and do the number check there, but then it's impossible for public code to use ILookup if there were a type where there was a difference. (e.g., Suppose an xml-element type wanted to provide access attributes with <tt>get</tt> and child elements with <tt>nth</tt>.)</p><p>The only code that needs to change it seems to me is <tt>nth</tt> and implementations of <tt>ILookup</tt> for <tt>IIndexed</tt> types.</p><p>-assoc and -assoc! need fixing too on various types because (assoc <span class="error">&#91;:a&#93;</span> nil :b) =&gt; <span class="error">&#91;:b&#93;</span>.</p>
<p>I have attached a smaller patch which does not include the hunks related to redundant bounds checking and will move those to <a href="http://dev.clojure.org/jira/browse/CLJS-757" title="Redundant bounds checking in indexed types"><del>CLJS-757</del></a>. This is the smallest patch which will pass all the tests related to using non-numeric indexes with indexed types.</p><p><tt>-assoc</tt> and <tt>-assoc!</tt> do not need fixing. Only <tt>-assoc-n</tt> and <tt>-assoc-n!</tt> do. The patch needs to be far more minimal if it hopes to get accepted.</p><p>It doesn't make sense for indexed types to have -lookup call -nth but -assoc-n call -assoc (as PersistentVector and Subvec do, but not TransientVector). This means if we know we are using a numeric key we call -nth on read, but call -assoc on writes instead of -assoc-n. -assoc-n is what has the required "key is a number" semantic, not -assoc, and this leads to an unnecessary <tt>number?</tt> check being done in the implementation of <tt>Subvec -conj</tt>. (There are also other places, e.g. Subvec -assoc, where -assoc-n could be called instead of assoc or -assoc, but I don't address those.)</p>
<p>In other words, it would be best if -assoc-n/-assoc-n! consistently meant "numeric key" 1) so that callers could avoid a number? check if they know their key is numeric, 2) so there is a sane analogy -lookup:-nth :: -assoc:-assoc-n :: -assoc!:-assoc-n!</p>
<p>New patch cljs-728-more-minimal.patch</p>
<p>Hunk-by-hunk justification of changes:</p>
<ol>
<li>core_test.cljs
<ol>
<li>test (get x n) (get x n nf) (nth x n) (nth x n nf) and (assoc x n v) with non-numeric <tt>n</tt> against PersistentVector, Subvec, TransientVector, and Range types.</li>
</ol>
</li>
<li>core.cljs
<ol>
<li>number? check in nth</li>
<li>number? check in PersistentVector -lookup</li>
<li>number? check in PersistentVector -assoc</li>
<li>number? check in Subvec -lookup</li>
<li>number? check in Subvec -assoc</li>
<li>number? check in TransientVector -assoc!</li>
<li>number? check in TransientVector -lookup</li>
</ol>
</li>
</ol>
<p>Although it does not swap -assoc-n and -assoc on Subvec and PersistentVector, I would hardly describe this patch as<br/>
"far more minimal" than the previous one--I hope that does not mean it is unacceptable. The only way I can see to<br/>
make this patch smaller is to file a different ticket for each hunk and corresponding test, which seems crazy.</p>
<p>If the patch is still not acceptable, could you please be much more specific about what exactly is wrong with it so we can close this bug.</p><p>That the ClojureScript vector types implement <tt>assoc</tt> in terms of <tt>assoc-n</tt> and not the other way around should be considered a flaw. Please refer to Clojure's PersistentVector.java, RT.java for a guide about how the logic should be implemented. I'll take a patch that aligns ClojureScript's logic closer to Clojure. Thanks.</p><p>Patch that aligns ClojureScript vector type's -assoc-n/-assoc with Clojure's is in <a href="http://dev.clojure.org/jira/browse/CLJS-767" title="vector and subvector implement assoc-n with assoc; (assoc [0] nil 1) -&gt; [1]"><del>CLJS-767</del></a>, and patch that fixes assoc! on TransientVectors is in <a href="http://dev.clojure.org/jira/browse/CLJS-768" title="(persistent! (assoc! (transient [0]) nil 1)) =&gt; [1]"><del>CLJS-768</del></a>.</p>
<p>I will cut conflicting hunks out of this <a href="http://dev.clojure.org/jira/browse/CLJS-728" title="(get [1 2 3] nil) -&gt; 1"><del>CLJS-728</del></a> patch (hunks 2.3, 2.5 and 2.6 as outlined above) so all can apply cleanly with one another. (I'll have to do it this evening though.)</p>
<p>Full list of related tickets and patches (stuff that was factored out of the initial patches):</p>
<ol>
<li><a href="http://dev.clojure.org/jira/browse/CLJS-757" title="Redundant bounds checking in indexed types"><del>CLJS-757</del></a>: remove redundant bounds checking in indexed types (will need a rebase once the others are in).</li>
<li><a href="http://dev.clojure.org/jira/browse/CLJS-767" title="vector and subvector implement assoc-n with assoc; (assoc [0] nil 1) -&gt; [1]"><del>CLJS-767</del></a>: Fix assoc on PersistentVector and Subvec</li>
<li><a href="http://dev.clojure.org/jira/browse/CLJS-768" title="(persistent! (assoc! (transient [0]) nil 1)) =&gt; [1]"><del>CLJS-768</del></a>: Fix assoc! on TransientVector</li>
</ol>
<p>Francis Avila, many thanks will look over these.</p><p>Patch rebased and trimmed: cljs-728-only-nth-lookup.patch.</p><p>Rebased cljs-728-only-nth-lookup.patch again. (core_test.cljs changes would have "conflicted".)</p><p>fixed <a href="https://github.com/clojure/clojurescript/commit/71f781c75bf6e9d19be8d0e529ed0275fa523942">https://github.com/clojure/clojurescript/commit/71f781c75bf6e9d19be8d0e529ed0275fa523942</a></p>Global Rank