The end of vulnerabilities. Alternating between Python and Ruby; R&D, Consulting, and Ops; Linux and BSD. Moving from Austin to Skokie to Baltimore. Adoptive to Bio. Republican to Democrat, and other Things Done Backwards

Wednesday, January 13, 2010

It has been ages since I've played around with any of the Java scripting languages so I thought I'd give Jython a spin with MongoDB. I have no idea about the performance between the pure Python vs. Java driver but it would be an interesting benchmark.

This was done on Ubuntu 9.10 with OpenJDK in the standard repositories and assumes the jython shell script is in your path. It also assumes the Java MongoDB driver is in your path and I was lazy so I didn't bother with CLASSPATH.

To me it wasn't immediately obvious from the MongoDB Advanced Query documentation that you can string together multiple operators to perform existence, membership, and greater/than that tests. And since JSON can get very messy (and long!) and the syntax is slightly different from the Javascript in the documentation, instead of passing JSON directly to the find method of your collection pass a dictionary and assign the various conditions

So in my previous blog (using JavaScript) I introduced queries but you really can't do anything useful without using a cursor. If you've ever done any MySQL coding before you should be familiar with the concept. Basically it allows you to iterate through the results of a query.

Here we have the same expressions but you obviously need to quote the gt in Python.

So the MongoDB develop documentation is actually pretty decent, but it doesn't really use examples with real data. For me, it made it more difficult for some of the API and shell commands to sink in.

So to generate some real world queries I created a python script that parsed the access.log file[s] generated by squid. I'll follow this blog with one that covers pymongo but I think this will be helpful, and like most of the posts will provide a good reference because when you are rapidly approaching 40 not only your eyes go, but your memory. So here goes...

First of all this assumes you are running the mongo JavaScript shell and yeah I know running from root is a bad idea and not even necessary (I don't think) but sue me.

root@opti620:~/mongodb# ./bin/mongo

MongoDB shell version: 1.2.1

url: test

connecting to: test

type "help" for help

> show dbs

admin

local

mongosquid

test

> use mongosquid

switched to db mongosquid

> show collections

raw

system.indexes

>

Now let's have some fun. This was actually when I just imported a few lines in from the log file so there are a relatively small number of documents. A collection is essentially like a table but since this is #nosql it really isn't a table. It is just collection of documents. We'll see those next.

> db.raw.find().count()

1029

> db.raw.find()[1029]

> db.raw.find()[1028]

{

"_id" : ObjectId("4b496cddb15cb004a4000404"),

"squidcode" : "TCP_MISS/200",

"source" : "192.168.1.254",

"stamp" : 1263102993.841,

"format" : "-",

"url" : "agmoviecontrol.netflix.com:443",

"method" : "CONNECT",

"size" : 17499

}

The JSON above is the "document." Something you'll notice is there are two different data types basically strings and floating points. The size field and timestamp are obviously floats. That hash looking thing is actually a hash or GUID that is supposedly unique.

So one of the cool built in queries is to return only the unique values for a given field. This is handled by the distinct method.

So we can see here that there were HTTP Posts.

> db.raw.distinct("method")

[ "CONNECT", "GET" ]

And because of my screwed up natting I can't tell which of my kids was going to netflix.

I was pleased to find that you can use regular expressions. The first query tells me there are 3199 documents that have port 443 in them and the 2nd query returns the first document. One of the things I noticed is that retrieving the document based on the "index" is really really slow. But I believe that is because it isn't really an index, but we'll get to them later.

Saturday, January 09, 2010

It looks like the driver for rum has changed slightly in FreeBSD 8.0 from FreeBSD 7.2 because I was not able to use the same command-line syntax as I did previously. Basically the only thing different I did was the ifconfig wlan create...

I had this card running on old Dell Optiplex acting as a bridge for my kids network (and they were watching a lot of streaming media) and I was surprisingly impressed with it. Decent performance.

And while I'm at it, I hadn't seen any who actually installed 8.0 on a Lenovo Netbook but so far so good. I've got X working (I'll blog on that later) and re seems to work well enough. Obviously the Broadcom 4312's aren't going to work, but if you have USB wifi card or a tether you will be ok.

Next step see if I can get my Novatel u727 card working. I suspect it should work just fine, because it worked well on OpenBSD, but you never know...

Thursday, January 07, 2010

So first got turned on to #nosql databases a little over (or under) a year ago with CouchDB but lately I've been quite enamored with MongoDB as of late.

So forgot about deep architectural reasons for using it. Here are some quite practical some practical reasons, when you are a not full-time developer (or database guru) but you find yourself doing development that involves a data store and the thought of using MySQL (so like 2000s) in your app:

Abhorrence for schemas, ORMs, and migrations - this is basically the laziness argument. Basically I want/need to store stuff. And the stuff I want to store might change and I don't want to have to deal with changing the schema (and my) app to adapt to those changes. This was document oriented databases like CouchDB and MySQL rule. If everything is a JSON object it finds a great place for you to store stuff.

Ease of Installation & Compilation -- yep CouchDB has been in the latest Ubuntu repos for a while, but I use Lenny/Hardy server side, so forget about it. Dealing with Erlang (and finding all the dependencies to build SpiderMonkey was a big pain) the ass. Beam, what the hell is beam? Mongo has 32/64 bit Linux binaries that just work and a briefly managed to get it to compile on FreeBSD 7.2. And unlike some of the others out there it doesn't require require a JRE.

Map/Reduce hurts my head - ease of use is one of the key differentiators between Mongo and CouchDB is that is the simplicity of queries. I'm not an expert yet, but having to create Map/Reduce functions to create views to get at your data, it was a slippery concept for me.

Tuesday, January 05, 2010

So unfortunately some who I [used to] follow over on @frednecksec cited an article over on prisonplanet.com which allowed me to check out the cool sponsors such as the one pictured above but don't forget Silverlungs.

To each his own, I inhaling gaseous gold myself. Much better preparation for the "End Times," the "New World" or whatever the "elites" have in store for us.