incubator-couchdb-user mailing list archives

On Mon, Aug 10, 2009 at 01:05:42PM -0700, Tommy Chheng wrote:
> It is a Ruby app using Couchrest(which uses restclient/net ruby lib)
>
> I'm basically comparing one document against all other documents(+30K
> documents in the dataset; so it's huge number of connections if the
> connections aren't being closed properly) like this:
> grants = NsfGrant.all.paginate(:page => current_page, :per_page =>
> page_size)
> grants.each do |doc2|
> NsfGrantSimilarity.compute_and_store(doc1, doc2)
But presumably NsfGrant.all only makes a single HTTP request, not 30K
separate requests? Looking at "netstat -n" will give you a rough idea, at
least for seeing how many sockets are left in TIME_WAIT state, but the
surest way is with tcpdump:
tcpdump -i lo -n -s0 'host 127.0.0.1 and tcp dst port 5984 and
(tcp[tcpflags] & tcp-syn != 0)'
should show you one line for each new HTTP connection made to CouchDB.
But in any case, for parsing 30K documents, you may not want to load all 30K
into RAM and then compare then afterwards. Couchrest lets you do a streaming
view, so that one object is read at a time - I think if you call view with a
block, then it works this way automatically. You need to have curl installed
for this to work, as it shells out a separate curl process and then reads
the response one line at a time.
# Query a CouchDB view as defined by a <tt>_design</tt> document. Accepts
# paramaters as described in http://wiki.apache.org/couchdb/HttpViewApi
def view(name, params = {}, &block)
keys = params.delete(:keys)
name = name.split('/') # I think this will always be length == 2, but maybe not...
dname = name.shift
vname = name.join('/')
url = CouchRest.paramify_url "#{@uri}/_design/#{dname}/_view/#{vname}", params
if keys
CouchRest.post(url, {:keys => keys})
else
if block_given?
@streamer.view("_design/#{dname}/_view/#{vname}", params, &block)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
else
CouchRest.get url
end
end
end
HTH,
Brian.