Clojure JIRAhttp://dev.clojure.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+DCSV+AND+resolution+%3D+Unresolved+AND+component+is+EMPTY+ORDER+BY+priority+DESC
An XML representation of a search requesten-us4.464925-07-2011RE: [DCSV-13] Port data.csv to clojurescript
http://dev.clojure.org/jira/browse/DCSV-13?focusedCommentId=44578#comment-44578
Tue, 22 Nov 2016 06:32:53 -0600Erik Assum
<p>I guess this could be merged in the near future because of Alex work on the build platform?</p>
<br/>
<br/>
<table>
<tr>
<td>Author:</td>
<td><a
href="http://dev.clojure.org/jira/secure/ViewProfile.jspa?name=slipset">Erik Assum</a>
(<a href="http://dev.clojure.org/jira/browse/DCSV-13">DCSV-13</a>)</td>
</tr>
</table>
http://dev.clojure.org/jira/browse/DCSV-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=44578\#worklog-44578RE: [DCSV-15] Use Reducers/Transducers for better performance & resource handling
http://dev.clojure.org/jira/browse/DCSV-15?focusedCommentId=43871#comment-43871
Fri, 16 Sep 2016 08:46:31 -0500Rick Moynihan
<p>I agree not loading data into memory is a huge benefit, but we shouldn't necessarily conflate that streaming property with laziness/eagerness. </p>
<p>By using reducers/transducers you can still stream through a CSV file row by row and consume a constant amount of memory, e.g. reducing into a count of rows wouldn't require memory to be consumed, even though it is eager. Likewise if we used a transducer will a `CollReduce`able `CSVFile` object by using `transduce` you could request a lazy-seq of results with `sequence` where the parsing itself paid no laziness tax; alternatively you could request that results are loaded into memory eagerly by transducing into a vector. </p>
<p>Apologies for not providing any benchmark results with this ticket; it was actually Alex Miller who suggested I write this ticket after discussing things briefly with him on slack - and he'd suggested that I needn't provide the timings because the costs of laziness are well known. Regardless, I'll tidy up the code I used to take the timings and put them into a gist or something - maybe later on today.</p>
<br/>
<br/>
<table>
<tr>
<td>Author:</td>
<td><a
href="http://dev.clojure.org/jira/secure/ViewProfile.jspa?name=rickmoynihan">Rick Moynihan</a>
(<a href="http://dev.clojure.org/jira/browse/DCSV-15">DCSV-15</a>)</td>
</tr>
</table>
http://dev.clojure.org/jira/browse/DCSV-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=43871\#worklog-43871RE: [DCSV-15] Use Reducers/Transducers for better performance & resource handling
http://dev.clojure.org/jira/browse/DCSV-15?focusedCommentId=43870#comment-43870
Fri, 16 Sep 2016 02:18:21 -0500Jonas Enlund
<p>Can you share this benchmark? I did some comparisons when I initially wrote the lib and I didn't see such big differences.</p>
<p>I think that the lazy approach is an important feature in many cases where you don't want all those gigabytes in memory.</p>
<p>If we add some non-lazy parsing for performance reasons I would argue it should be additions to the public api.</p>
<br/>
<br/>
<table>
<tr>
<td>Author:</td>
<td><a
href="http://dev.clojure.org/jira/secure/ViewProfile.jspa?name=jonase">Jonas Enlund</a>
(<a href="http://dev.clojure.org/jira/browse/DCSV-15">DCSV-15</a>)</td>
</tr>
</table>
http://dev.clojure.org/jira/browse/DCSV-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=43870\#worklog-43870RE: [DCSV-13] Port data.csv to clojurescript
http://dev.clojure.org/jira/browse/DCSV-13?focusedCommentId=42694#comment-42694
Mon, 11 Apr 2016 15:13:44 -0500Erik Assum
<p>fixed in the 0003 patch. </p>
<br/>
<br/>
<table>
<tr>
<td>Author:</td>
<td><a
href="http://dev.clojure.org/jira/secure/ViewProfile.jspa?name=slipset">Erik Assum</a>
(<a href="http://dev.clojure.org/jira/browse/DCSV-13">DCSV-13</a>)</td>
</tr>
<tr>
<td>Edited by:</td>
<td><a href="http://dev.clojure.org/jira/secure/ViewProfile.jspa?name=slipset">Erik Assum</a></td>
</tr>
</table>
http://dev.clojure.org/jira/browse/DCSV-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=42694\#worklog-42694RE: [DCSV-13] Port data.csv to clojurescript
http://dev.clojure.org/jira/browse/DCSV-13?focusedCommentId=42693#comment-42693
Mon, 11 Apr 2016 14:58:43 -0500Jonas Enlund
<p>Can we resolve the reflection warnings in patch 0002?</p>
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java">$ rlwrap mvn clojure:repl
...
Clojure 1.8.0
user=&gt; (set! *warn-on-reflection* <span class="code-keyword">true</span>)
<span class="code-keyword">true</span>
user=&gt; (require '[clojure.data.csv :as csv])
Reflection warning, clojure/data/csv.cljc:62:8 - call to method unread on java.io.PushbackReader can't be resolved (argument types: unknown).
Reflection warning, clojure/data/csv.cljc:91:8 - call to method write on java.io.Writer can't be resolved (argument types: unknown).
nil
user=&gt;</pre>
</div></div>
<br/>
<br/>
<table>
<tr>
<td>Author:</td>
<td><a
href="http://dev.clojure.org/jira/secure/ViewProfile.jspa?name=jonase">Jonas Enlund</a>
(<a href="http://dev.clojure.org/jira/browse/DCSV-13">DCSV-13</a>)</td>
</tr>
</table>
http://dev.clojure.org/jira/browse/DCSV-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=42693\#worklog-42693RE: [DCSV-13] Port data.csv to clojurescript
http://dev.clojure.org/jira/browse/DCSV-13?focusedCommentId=42692#comment-42692
Mon, 11 Apr 2016 12:58:32 -0500Erik Assum
<p>Bummer<br/>
<a href="https://groups.google.com/forum/m/#!topic/clojure-dev/PDyOklDEv7Y">https://groups.google.com/forum/m/#!topic/clojure-dev/PDyOklDEv7Y</a></p>
<br/>
<br/>
<table>
<tr>
<td>Author:</td>
<td><a
href="http://dev.clojure.org/jira/secure/ViewProfile.jspa?name=slipset">Erik Assum</a>
(<a href="http://dev.clojure.org/jira/browse/DCSV-13">DCSV-13</a>)</td>
</tr>
</table>
http://dev.clojure.org/jira/browse/DCSV-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=42692\#worklog-42692RE: [DCSV-13] Port data.csv to clojurescript
http://dev.clojure.org/jira/browse/DCSV-13?focusedCommentId=42691#comment-42691
Mon, 11 Apr 2016 12:53:06 -0500Jonas Enlund
<p>I'm failing to run the clojure tests via the command `mvn clojure:test` which I think the ci server uses. I'm getting the following exception:</p>
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java">Exception in thread <span class="code-quote">"main"</span> java.io.FileNotFoundException: Could not locate clojure/data/test_runner/__init.class or clojure/data/test_runner/.clj on classpath:
at clojure.lang.RT.load(RT.java:432)
at clojure.lang.RT.load(RT.java:400)
at clojure.core$load$fn__4890.invoke(core.clj:5415)
at clojure.core$load.doInvoke(core.clj:5414)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at clojure.core$load_one.invoke(core.clj:5227)
at clojure.core$load_lib.doInvoke(core.clj:5264)
at clojure.lang.RestFn.applyTo(RestFn.java:142)
at clojure.core$apply.invoke(core.clj:603)
at clojure.core$load_libs.doInvoke(core.clj:5298)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at clojure.core$apply.invoke(core.clj:603)
at clojure.core$require.doInvoke(core.clj:5381)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at user$eval1.invoke(run-test3909933917568395357.clj:1)
at clojure.lang.<span class="code-object">Compiler</span>.eval(<span class="code-object">Compiler</span>.java:6511)
at clojure.lang.<span class="code-object">Compiler</span>.load(<span class="code-object">Compiler</span>.java:6952)
at clojure.lang.<span class="code-object">Compiler</span>.loadFile(<span class="code-object">Compiler</span>.java:6912)
at clojure.main$load_script.invoke(main.clj:283)
at clojure.main$script_opt.invoke(main.clj:343)
at clojure.main$main.doInvoke(main.clj:427)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at clojure.lang.Var.invoke(Var.java:415)
at clojure.lang.AFn.applyToHelper(AFn.java:161)
at clojure.lang.Var.applyTo(Var.java:532)
at clojure.main.main(main.java:37)</pre>
</div></div>
<br/>
<br/>
<table>
<tr>
<td>Author:</td>
<td><a
href="http://dev.clojure.org/jira/secure/ViewProfile.jspa?name=jonase">Jonas Enlund</a>
(<a href="http://dev.clojure.org/jira/browse/DCSV-13">DCSV-13</a>)</td>
</tr>
<tr>
<td>Edited by:</td>
<td><a href="http://dev.clojure.org/jira/secure/ViewProfile.jspa?name=jonase">Jonas Enlund</a></td>
</tr>
</table>
http://dev.clojure.org/jira/browse/DCSV-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=42691\#worklog-42691RE: [DCSV-10] Specify RFC4180 compatibilty in README
http://dev.clojure.org/jira/browse/DCSV-10?focusedCommentId=38227#comment-38227
Thu, 19 Mar 2015 05:33:40 -0500Jonas Enlund
<p>According to the RFC4180 spec:</p>
<ul>
<li>the lines should end with CRLF, this library also supports only LF as well</li>
<li>cells should be separated with commas and this lib also supports other separators</li>
</ul>
<p>I don't think "relaxed" is a standard term. I would certainly accept a patch that enhances the documentation.</p>
<br/>
<br/>
<table>
<tr>
<td>Author:</td>
<td><a
href="http://dev.clojure.org/jira/secure/ViewProfile.jspa?name=jonase">Jonas Enlund</a>
(<a href="http://dev.clojure.org/jira/browse/DCSV-10">DCSV-10</a>)</td>
</tr>
</table>
http://dev.clojure.org/jira/browse/DCSV-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=38227\#worklog-38227RE: [DCSV-10] Specify RFC4180 compatibilty in README
http://dev.clojure.org/jira/browse/DCSV-10?focusedCommentId=38226#comment-38226
Thu, 19 Mar 2015 05:13:13 -0500Leon Grapenthin
<p>Thanks for the explanation. <br/>
Then it should be pointed out in which regards read CSVs don't need to adhere to the spec and whether a strict mode exists or is planned and whether it is or will or would be more or less performant. </p>
<p>P.S.: Out of curiosity - Is this definition of relaxed some kind of standard in IT? I googled for it, but couldn't find anything related.</p>
<br/>
<br/>
<table>
<tr>
<td>Author:</td>
<td><a
href="http://dev.clojure.org/jira/secure/ViewProfile.jspa?name=lgs32a">Leon Grapenthin</a>
(<a href="http://dev.clojure.org/jira/browse/DCSV-10">DCSV-10</a>)</td>
</tr>
</table>
http://dev.clojure.org/jira/browse/DCSV-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=38226\#worklog-38226RE: [DCSV-10] Specify RFC4180 compatibilty in README
http://dev.clojure.org/jira/browse/DCSV-10?focusedCommentId=38214#comment-38214
Wed, 18 Mar 2015 10:54:33 -0500Jonas Enlund
<p>"relaxed" means it will <b>read</b> some files that does not adhere to the RFC4180 spec. Files written with write-csv will follow the spec. If this is not the case it should be considered a bug.</p>
<br/>
<br/>
<table>
<tr>
<td>Author:</td>
<td><a
href="http://dev.clojure.org/jira/secure/ViewProfile.jspa?name=jonase">Jonas Enlund</a>
(<a href="http://dev.clojure.org/jira/browse/DCSV-10">DCSV-10</a>)</td>
</tr>
</table>
http://dev.clojure.org/jira/browse/DCSV-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=38214\#worklog-38214RE: [DCSV-6] read-csv can not handle white-space at end of line
http://dev.clojure.org/jira/browse/DCSV-6?focusedCommentId=34754#comment-34754
Thu, 29 May 2014 09:31:29 -0500Paul Schulz
<p>This is related to DSCV-8</p>
<p>A quote at the beginning of the string, and ending in the middle of the string (eg. where additional characters appear after second quote) will cause the same problem.</p>
<br/>
<br/>
<table>
<tr>
<td>Author:</td>
<td><a
href="http://dev.clojure.org/jira/secure/ViewProfile.jspa?name=pschulz01">Paul Schulz</a>
(<a href="http://dev.clojure.org/jira/browse/DCSV-6">DCSV-6</a>)</td>
</tr>
<tr>
<td>Edited by:</td>
<td><a href="http://dev.clojure.org/jira/secure/ViewProfile.jspa?name=pschulz01">Paul Schulz</a></td>
</tr>
</table>
http://dev.clojure.org/jira/browse/DCSV-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=34754\#worklog-34754RE: [DCSV-7] data.csv does not handle BOMs
http://dev.clojure.org/jira/browse/DCSV-7?focusedCommentId=31630#comment-31630
Mon, 12 Aug 2013 23:46:53 -0500Jonas Enlund
<p>This isn't really a csv specific problem. I've encountered files with a byte order mark and then I have simply executed (.skip reader 1) before handing the reader over to read-csv. Is this not a good enough solution?</p>
<br/>
<br/>
<table>
<tr>
<td>Author:</td>
<td><a
href="http://dev.clojure.org/jira/secure/ViewProfile.jspa?name=jonase">Jonas Enlund</a>
(<a href="http://dev.clojure.org/jira/browse/DCSV-7">DCSV-7</a>)</td>
</tr>
</table>
http://dev.clojure.org/jira/browse/DCSV-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=31630\#worklog-31630RE: [DCSV-6] read-csv can not handle white-space at end of line
http://dev.clojure.org/jira/browse/DCSV-6?focusedCommentId=31154#comment-31154
Fri, 24 May 2013 16:35:40 -0500Cees van Kemenade
<p>To take the issue a little further, the same holds for whitespace in the middle of a line between the closing-quote and the separator, see:<br/>
=&gt; (read-csv (java.io.StringReader. "\"a\" , 5\n \"b,b\",\"6\"" ))<br/>
Exception CSV error (unexpected character: ) clojure.data.csv/read-quoted-cell (csv.clj:36)</p>
<p>This raises the question what happens if you put a space between the separator and the opening quote (first the default case):<br/>
=&gt; (read-csv (java.io.StringReader. "\"a\", 5\n\"b\",\"6\"" ))<br/>
(<span class="error">&#91;&quot;a&quot; &quot; 5&quot;&#93;</span> <span class="error">&#91;&quot;b&quot; &quot;6&quot;&#93;</span>)</p>
<p>Now adding one additional space:<br/>
=&gt; (read-csv (java.io.StringReader. "\"a\", 5\n \"b\",\"6\"" ))<br/>
(<span class="error">&#91;&quot;a&quot; &quot; 5&quot;&#93;</span> <span class="error">&#91;&quot; \&quot;b\&quot;&quot; &quot;6&quot;&#93;</span>)</p>
<p>Interesting, the white-space is considered to be the start of the string and the quote that follows is considered to be part of the tekst-value that is read.<br/>
The main reason for using quotes is to allow separators in text, so let us see that happens if we extend the string by putting a separator in it.<br/>
=&gt; (read-csv (java.io.StringReader. "\"a\", 5\n \"b,b\",\"6\"" ))<br/>
(<span class="error">&#91;&quot;a&quot; &quot; 5&quot;&#93;</span> <span class="error">&#91;&quot; \&quot;b&quot; &quot;b\&quot;&quot; &quot;6&quot;&#93;</span>)</p>
<p>Now we see that the separator is not quoted anymore and as expect, the line is interpreted to contain three values instead of two values.</p>
<p>When using standard libraries the issues mentioned above usually do not appear. However, in custom code that emits csv-files or when doing small manual fixes in a csv it is easy to introduce such an issue/error and subsequently it is quit tough to analyse this issue correctly. <br/>
Therefore I would opt for a mode of operation where white-space before an opening-quote or after a closing quote are considered to be void (unless it is an escaped quote like "").</p>
<br/>
<br/>
<table>
<tr>
<td>Author:</td>
<td><a
href="http://dev.clojure.org/jira/secure/ViewProfile.jspa?name=cvkemenade">Cees van Kemenade</a>
(<a href="http://dev.clojure.org/jira/browse/DCSV-6">DCSV-6</a>)</td>
</tr>
</table>
http://dev.clojure.org/jira/browse/DCSV-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=31154\#worklog-31154RE: [DCSV-5] No option for parsing into maps
http://dev.clojure.org/jira/browse/DCSV-5?focusedCommentId=31148#comment-31148
Fri, 24 May 2013 12:41:50 -0500Cees van Kemenade
<p>I've ran into the same question and prepared a small library to do my csv processing.<br/>
It uses data.csv as a workinghorse, but puts some additional functionality on top of it, such as:<br/>
1. csv-to-map: which does the same as the code above, but also maps strings in the first line to keywords. Furthermore, you can choose to translate the keys to lowercase, which is often needed when submitting the csv-data to a database<br/>
2. csv-columnMap: which does a selection of a subset of columns, renaming of these columns (aka renaming the first line of csv-data.<br/>
3. read-csv: my primary entry point using data.csv + csv-to-map + csv-columnMap<br/>
4. read-csv-lazy: A lazy variant which takes a processing function to be used in the inner loop (to allow large csv-datasets) <br/>
5. read-csv-to-db: pumping a csv into a database<br/>
6. map-seq-to-csv: mapping a uniform sequence of hashmaps to a dataset that can be written to a csv (first line contains the keys)</p>
<p>Feel free to reuse parts of the code. You can find the code here:</p>
<p><a href="https://github.com/cvkem/vinzi.tools/blob/master/vinzi.tools/src/main/clojure/vinzi/tools/vCsv.clj">https://github.com/cvkem/vinzi.tools/blob/master/vinzi.tools/src/main/clojure/vinzi/tools/vCsv.clj</a></p>
<br/>
<br/>
<table>
<tr>
<td>Author:</td>
<td><a
href="http://dev.clojure.org/jira/secure/ViewProfile.jspa?name=cvkemenade">Cees van Kemenade</a>
(<a href="http://dev.clojure.org/jira/browse/DCSV-5">DCSV-5</a>)</td>
</tr>
</table>
http://dev.clojure.org/jira/browse/DCSV-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=31148\#worklog-31148RE: [DCSV-5] No option for parsing into maps
http://dev.clojure.org/jira/browse/DCSV-5?focusedCommentId=31129#comment-31129
Tue, 21 May 2013 13:28:44 -0500Jonas Enlund
<p>I've seen this feature request before so I think that something like this should be added. One approach would be to provide a helper function:</p>
<div class="code panel" style="border-width: 1px;"><div class="codeContent panelContent">
<pre class="code-java">(defn csv-data-&gt;maps [vecs]
(map zipmap (repeat (first vecs)) (<span class="code-keyword">rest</span> vecs)))
(csv-data-&gt;maps (read-csv reader))</pre>
</div></div>
<br/>
<br/>
<table>
<tr>
<td>Author:</td>
<td><a
href="http://dev.clojure.org/jira/secure/ViewProfile.jspa?name=jonase">Jonas Enlund</a>
(<a href="http://dev.clojure.org/jira/browse/DCSV-5">DCSV-5</a>)</td>
</tr>
</table>
http://dev.clojure.org/jira/browse/DCSV-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=31129\#worklog-31129