Details

Description

Reported by dimi...@gashinsky.com, May 09, 2009
;; A lot of times I needed a padding option on the partition. This is
;; my attempt to solve this problem. Any suggestions are welcome. I
;; hope this patch or something similar will make its way into the
;; core.
Some discussion that happened here:
http://groups.google.com/group/clojure/browse_frm/thread/6fcc1dd999a5ec02?tvc=1
was integrated into the patch.

chouser@n01se.net said: It looks to me like the 3- and 4-arg bodies could be combined resulting in less code and no significant loss of performance. A pad of nil could be treated the same as no pad supplied, which would be different from a numeric pad (including 0).

Assembla Importer
added a comment - 28/Sep/10 7:52 AM chouser@n01se.net said: It looks to me like the 3- and 4-arg bodies could be combined resulting in less code and no significant loss of performance. A pad of nil could be treated the same as no pad supplied, which would be different from a numeric pad (including 0).

Interpreting nil passed to the 4 argument version as a request to get the behavior of the 3 argument version looks wrong to me. There would be no way to express both a desire for padding and that the padding value should be nil.

Assembla Importer
added a comment - 28/Sep/10 7:52 AM scgilardi said: By "combined" do you mean that the 3 argument version should call the 4 argument version with an explicit pad of nil?
Currently, the 3 argument version does no padding. Instead it stops as soon as there are less than n args left:

Interpreting nil passed to the 4 argument version as a request to get the behavior of the 3 argument version looks wrong to me. There would be no way to express both a desire for padding and that the padding value should be nil.

scgilardi said: Looking into the issue further, I see I made a mistake in my previous comment. The padding is given as a sequence, not a value. However, the key difference between the 3 and 4 arg versions remains. The 3 argument version never returns a "short" sequence at the end. The 4 arg version can.

Along the lines of your suggestion, we could combine the two by changing the 4 arg version to interpret a pad value of "[]" to mean "return a short sequence at the end if necessary" and a pad value of "nil" to mean "never return a short sequence". This would involve interpreting "nil" different from "the empty sequence" where in many other contexts they're equivalent. I'm not sure whether or not saving some code is a good trade for introducing that subtle difference.

Assembla Importer
added a comment - 28/Sep/10 7:52 AM scgilardi said: Looking into the issue further, I see I made a mistake in my previous comment. The padding is given as a sequence, not a value. However, the key difference between the 3 and 4 arg versions remains. The 3 argument version never returns a "short" sequence at the end. The 4 arg version can.
Along the lines of your suggestion, we could combine the two by changing the 4 arg version to interpret a pad value of "[]" to mean "return a short sequence at the end if necessary" and a pad value of "nil" to mean "never return a short sequence". This would involve interpreting "nil" different from "the empty sequence" where in many other contexts they're equivalent. I'm not sure whether or not saving some code is a good trade for introducing that subtle difference.

chouser@n01se.net said: I had also mistaken pad to be a number. I think you're right, producing different behavior when pad is an empty seq vs. nil is just asking for trouble. Thanks for taking a second look.

Assembla Importer
added a comment - 28/Sep/10 7:52 AM chouser@n01se.net said: I had also mistaken pad to be a number. I think you're right, producing different behavior when pad is an empty seq vs. nil is just asking for trouble. Thanks for taking a second look.

Assembla Importer
added a comment - 28/Sep/10 7:52 AM chouser@n01se.net said: Thanks for that link. Your final solution (not using nil) is already committed, but we should get those tests into clojure-test once it's location has been settled.

Assembla Importer
added a comment - 28/Sep/10 7:52 AM scgilardi said: I've uploaded part.clj which is another possible alternative. It handles a step of 1 without using the pad. Some invariants in the partitions it produces for the 4 argument case:

for a step size of n, if you provide n - 1 objects in the padding sequence, all output sequences will be of size n

for a step size n, the output sequences will begin with elements at offsets 0, n, 2n, 3n, ... in the original sequence until no such element exists.

scgilardi said: Ugh, the second "invariant" I listed doesn't hold for step = 1. It appears to hold for other step values. Perhaps part.clj will still be useful to someone in coming up with a better solution.

Assembla Importer
added a comment - 28/Sep/10 7:52 AM scgilardi said: Ugh, the second "invariant" I listed doesn't hold for step = 1. It appears to hold for other step values. Perhaps part.clj will still be useful to someone in coming up with a better solution.

Assembla Importer
added a comment - 28/Sep/10 7:52 AM richhickey said: I don't see why step of 1 should get special treatment. If the rule is the second one (yield partitions as long as step offsets are present in original coll), then

scgilardi said: The patch includes and updated doc string, the new 4 argument case, and a change to the whitespace in the 3 argument case for consistent indentation with the 4 argument case (current emacs clojure-mode). I can provide a patch that doesn't touch the 3 argument case whitespace if desired.

Assembla Importer
added a comment - 28/Sep/10 7:52 AM scgilardi said: The patch includes and updated doc string, the new 4 argument case, and a change to the whitespace in the 3 argument case for consistent indentation with the 4 argument case (current emacs clojure-mode). I can provide a patch that doesn't touch the 3 argument case whitespace if desired.

richhickey said: I think the doc should be even more explicit, something like:

... do not overlap. If no pad argument is supplied, will produce only complete partitions of size n, possibly not including items at the end if the size of coll is not a multiple of n. If the pad argument is supplied, will produce a partition at every offset present in the supplied collection, using the pad elements as necessary to pad trailing partitions up to n items each. If pad is nil or a collection containing fewer than n-1 items, ...

Assembla Importer
added a comment - 28/Sep/10 7:52 AM richhickey said: I think the doc should be even more explicit, something like:
... do not overlap. If no pad argument is supplied, will produce only complete partitions of size n, possibly not including items at the end if the size of coll is not a multiple of n. If the pad argument is supplied, will produce a partition at every offset present in the supplied collection, using the pad elements as necessary to pad trailing partitions up to n items each. If pad is nil or a collection containing fewer than n-1 items, ...

"Returns a lazy seq of lazy subseqs of coll, each of (nominally) n
items. The subseqs begin at offsets 0, step, 2*step, etc. in coll. If
step is not supplied, it defaults to n yielding adjacent, non-overlapping
subseqs. If pad is not supplied, produces only complete subseqs of n
items, possibly not including some items at the end of coll. If pad is
supplied, produces a subseq at every offset present in coll, using any
available items from pad to pad shorter subseqs up to n items. If pad is
a seq of at least n-1 items, produces only complete padded subseqs of n
items. If pad is shorter (or nil) trailing padded subseqs may be
shorter."

I adopted the mathematical terminology that the partition is the operation (the division into parts) and that the individual pieces are not "partitions", but something else: part, block, chunk, or as I propose here, subseq. I like subseq because it embodies succinctly the fact that the items from coll are always kept sequential.

"Returns a lazy seq of lazy subseqs of coll, each of (nominally) n
items. The subseqs begin at offsets 0, step, 2*step, etc. in coll. If
step is not supplied, it defaults to n yielding adjacent, non-overlapping
subseqs. If pad is not supplied, produces only complete subseqs of n
items, possibly not including some items at the end of coll. If pad is
supplied, produces a subseq at every offset present in coll, using any
available items from pad to pad shorter subseqs up to n items. If pad is
a seq of at least n-1 items, produces only complete padded subseqs of n
items. If pad is shorter (or nil) trailing padded subseqs may be
shorter."

I adopted the mathematical terminology that the partition is the operation (the division into parts) and that the individual pieces are not "partitions", but something else: part, block, chunk, or as I propose here, subseq. I like subseq because it embodies succinctly the fact that the items from coll are always kept sequential.

Assembla Importer
added a comment - 28/Sep/10 7:52 AM scgilardi said: OK, paddiing isn't necessary because the same thing can be accomplished by using concat to append a padding sequence of exactly n-1 items to coll before processing it with partition.
partition can't return lazy subseqs because it counts them which (in the general case) will realize them.
ticket-120.patch contains modified docs for partition, removed arity 4 case from partition, take-subs implemented, tests for take-subs based on tests for partition, enhanced one sub-test for partition.

digash said: I like the new take-subs, but I cannot figure out how to use concat instead of partition without counting it and realizing the whole sequence.
The initial motivation was to use partition with destructuring and creating matrix from a sequences. I do not see an easy way to reproduce this case with the new implementation.

I was playing around with partition as it was throwing StackOverflowError on a big (poorly designed) lazy tree of mine. Would it make sense to include a recur version of it? The following is more like partition-all at the moment and it is missing the padding part, but I thought it might still be useful. Here is the rough snipped:

Andrea Richiardi
added a comment - 11/Apr/15 10:47 AM I was playing around with partition as it was throwing StackOverflowError on a big (poorly designed) lazy tree of mine. Would it make sense to include a recur version of it? The following is more like partition-all at the moment and it is missing the padding part, but I thought it might still be useful. Here is the rough snipped:

Andrea, there are very few people who pay attention to comments made on closed tickets. I would recommend asking on the Clojure Google group, or the Clojure Dev Google group, if you are a member of that.

Andy Fingerhut
added a comment - 11/Apr/15 12:05 PM Andrea, there are very few people who pay attention to comments made on closed tickets. I would recommend asking on the Clojure Google group, or the Clojure Dev Google group, if you are a member of that.