final def##(): Int

final def==(arg0: Any): Boolean

Aggregate the elements of each partition, and then the results for all the partitions, using
given combine functions and a neutral "zero value".

Aggregate the elements of each partition, and then the results for all the partitions, using
given combine functions and a neutral "zero value". This function can return a different result
type, U, than the type of this RDD, T. Thus, we need one operation for merging a T into an U
and one operation for merging two U's, as in scala.TraversableOnce. Both of these functions are
allowed to modify and return their first argument instead of creating a new U to avoid memory
allocation.

final defasInstanceOf[T0]: T0

Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of
elements (a, b) where a is in this and b is in other.

defcheckpoint(): Unit

Mark this RDD for checkpointing.

Mark this RDD for checkpointing. It will be saved to a file inside the checkpoint
directory set with SparkContext.setCheckpointDir() and all references to its parent
RDDs will be removed. This function must be called before any job has been
executed on this RDD. It is strongly recommended that this RDD is persisted in
memory, otherwise saving it on a file will require recomputation.

Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.

Approximate version of count() that returns a potentially incomplete result
within a timeout, even if not all tasks have finished.

The confidence is the probability that the error bounds of the result will
contain the true value. That is, if countApprox were called repeatedly
with confidence 0.9, we would expect 90% of the results to contain the
true count. The confidence must be in the range [0,1] or an exception will
be thrown.

timeout

maximum time to wait for the job, in milliseconds

confidence

the desired statistical confidence in the result

returns

a potentially incomplete result, with error bounds

defcountApproxDistinct(relativeSD: Double): Long

Return approximate number of distinct elements in the RDD.

Return approximate number of distinct elements in the RDD.

The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice:
Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available
here.

relativeSD

Relative accuracy. Smaller values create counters that require more space.
It must be greater than 0.000017.

The confidence is the probability that the error bounds of the result will
contain the true value. That is, if countApprox were called repeatedly
with confidence 0.9, we would expect 90% of the results to contain the
true count. The confidence must be in the range [0,1] or an exception will
be thrown.

Aggregate the elements of each partition, and then the results for all the partitions, using a
given associative function and a neutral "zero value".

Aggregate the elements of each partition, and then the results for all the partitions, using a
given associative function and a neutral "zero value". The function
op(t1, t2) is allowed to modify t1 and return it as its result value to avoid object
allocation; however, it should not modify t2.

This behaves somewhat differently from fold operations implemented for non-distributed
collections in functional languages like Scala. This fold operation may be applied to
partitions individually, and then fold those results into the final result, rather than
apply the fold to each element sequentially in some defined ordering. For functions
that are not commutative, the result may differ from that of a fold applied to a
non-distributed collection.

defsaveAsObjectFile(path: String): Unit

Save this RDD as a compressed text file, using string representations of elements.

defsaveAsTextFile(path: String): Unit

Save this RDD as a text file, using string representations of elements.

final defsynchronized[T0](arg0: ⇒ T0): T0

Definition Classes

AnyRef

deftake(num: Int): List[T]

Take the first num elements of the RDD.

Take the first num elements of the RDD. This currently scans the partitions *one by one*, so
it will be slow if a lot of partitions are required. In that case, use collect() to get the
whole RDD instead.

Note

this method should only be used if the resulting array is expected to be small, as
all the data is loaded into the driver's memory.

final defwait(arg0: Long, arg1: Int): Unit

final defwait(arg0: Long): Unit

Zips this RDD with another one, returning key-value pairs with the first element in each RDD,
second element in each RDD, etc.

Zips this RDD with another one, returning key-value pairs with the first element in each RDD,
second element in each RDD, etc. Assumes that the two RDDs have the *same number of
partitions* and the *same number of elements in each partition* (e.g. one was made through
a map on the other).

Zip this RDD's partitions with one (or more) RDD(s) and return a new RDD by
applying a function to the zipped partitions.

Zip this RDD's partitions with one (or more) RDD(s) and return a new RDD by
applying a function to the zipped partitions. Assumes that all the RDDs have the
*same number of partitions*, but does *not* require them to have the same number
of elements in each partition.

Zips this RDD with its element indices. The ordering is first based on the partition index
and then the ordering of items within each partition. So the first item in the first
partition gets index 0, and the last item in the last partition receives the largest index.
This is similar to Scala's zipWithIndex but it uses Long instead of Int as the index type.
This method needs to trigger a spark job when this RDD contains more than one partitions.

Zips this RDD with generated unique Long ids. Items in the kth partition will get ids k, n+k,
2*n+k, ..., where n is the number of partitions. So there may exist gaps, but this method
won't trigger a spark job, which is different from org.apache.spark.rdd.RDD#zipWithIndex.