Chris Umbel's Bloghttp://www.chrisumbel.com
The blog of Chris Umbel, software developer and database administrator.Fri, 22 Feb 2019 15:12:12 GMTAdding a Stand-alone Windows Worker to a bosh-managed Concourse Deploymenthttp://www.chrisumbel.com/article/windows_worker_to_bosh_deployed_concourse
<a href="http://concourse.ci"><img src="http://29dcc57c841c4009fbaa-6fd0170e32fb359031f5f9240015d9c4.r12.cf1.rackcdn.com/concourse.png" align="right" border="0"/></a><strong>THIS ARTICLE IS OBSOLETE. BOSH-MANAGED WINDOWS WORKERS ARE A THING NOW.</strong>
<p>In this article I'll demonstrate the procedure for adding a manually-built
Windows worker to an existing <a href="http://bosh.io/">bosh</a>-managed
<a href="http://concourse.ci/">Concourse</a> deployment. For demonstrative purposes I'll also
use our deployment to run a simple pipeline that builds a .NET console
application on our newly-created worker.</p>
<p>I'll assume that the Concourse deployment is relatively vanilla and the "tsa" job
hasn't substantially changed from the manifest provided by the
<a href="http://concourse.ci/clusters-with-bosh.html">installation instructions</a>.</p>
<p>This article will *not* be covering building Windows bosh stemcells or
deploying Windows workers with bosh. One day when the space matures a
little I may do so. </p>
<p><strong>Homework</strong></p>
<p>I'll assume provide the following resources. Everything else will be downloaded,
generated, built or otherwise conjured.</p>
<p>
<ul>
<li>A running Windows server</li>
<li>A bosh-managed Concourse deployment</li>
<li>A workstation with ssh-keygen</li>
<li>The bosh cli targeted at your Concourse's bosh director.</li>
<li>A basic understanding of <a href="http://concourse.ci/architecture.html">Concourse's architecture</a>,
specifically the <a href="http://concourse.ci/architecture.html#architecture-tsa">TSA</a> component.</li>
</ul>
</p>
<p>
<strong>Generating SSH Keys</strong>
</p>
<p>In order for a trust relationship to be established and for a worker to register itself with
the TSA two ssh RSA keypairs are required.</p>
<p>
<ul>
<li><strong>TSA Host Key</strong> - the key identifying TSA service.</li>
<li><strong>Worker Key</strong> - the key identifying the worker(s). Multiple
workers can share this key, but you can also have many of these keys spread across
many workers.</li>
</ul>
</p>
<p>The TSA Host and Worker keys can respectively be generated with commands similar to the
following:</p>
<pre class="output_blog">
~/ $ ssh-keygen -f tsakey -t rsa -N ''
~/ $ ssh-keygen -f workerkey -t rsa -N ''
</pre>
<p>which would grant us the files:</p>
<pre class="output_blog">
~/ $ ls
tsakey tsakey.pub workerkey workerkey.pub
</pre>
<p>The "tsakey" and "workerkey" files are the encoded private keys for the TSA host
and workers, respectively, with "tsakey.pub" and "workerkey.pub" being their
public keys. </p>
<p><strong>Preparing the TSA</strong></p>
<p>All of the configuration necessary to prepare the TSA to register our Windows worker
involves the keys generated above. </p>
<p>If your Concourse's "tsa" job remains stock it will likely look like this in your
deployment manifest:</p>
<pre class="ruby" name="code">
- name: tsa
release: concourse
properties: {}
</pre>
<p>We're going to want to add 3 properties to supplement that default configuration:</p>
<p><ul>
<li><strong>authorized_keys</strong> - an array of public keys belonging to workers that TSA should trust.
Add this property and a string entry containing the content of "workerkey.pub". This
establishes trust from our TSA to workers with the corresponding private key.</li>
<li><strong>host_key</strong> - the private key beling to our deployment's TSA job. Add this
property with its content copied from the "tsakey" file.</li>
<li><strong>host_public_key</strong> - the public key belonging to our deployment's TSA job.
Add this property with its content copied from the "tsakey.pub" file. Note that we'll be
distributing this key to our workers.</li>
</ul></p>
<p>resulting in a deployment manifest outlined like this:</p>
<pre class="ruby" name="code">
- name: tsa
release: concourse
properties:
authorized_keys:
- "<content of workerkey.pub>"
host_key: |
-----BEGIN RSA PRIVATE KEY-----
- "<content of tskakey>"
-----END RSA PRIVATE KEY-----
host_public_key: "<content of tsakey.pub>"
</pre>
<p>After updating our deployment as such</p>
<pre class="output_blog">
~/ bosh deploy
</pre>
<p>our TSA will be updated with the keys necessary to register the Windows worker we'll build.
</p>
<p><strong>Preparing The Windows Worker</strong></p>
<p>Now we turn our attention to our Windows server that we'll be turning in to a Concourse worker. </p>
<p>First we'll want to establish a directory to house our binaries for the worker service
and its data i.e. C:\concourse</p>
<pre class="output_blog">
C:\> mkdir concourse
C:\> cd concourse
C:\concourse>
</pre>
<p>Now download the Windows concourse binary (named something like "concourse_windows_amd64.exe")
from <a href="http://concourse.ci/downloads.html"> the Concourse download page</a> and
place it in our working directory. Also, we'll want to copy the "tsakey.pub" and
"workerkey" files there as well. </p>
<p>The fact that we'll provide our local concourse binary with "tsakey.pub" establishes that
we cryptographically trust the TSA server from our deployment.</p>
<p>We're now ready to start the worker and have it register itself with the TSA.</p>
<pre class="output_blog">
C:\concourse> .\concourse_windows_amd64.exe worker \
/work-dir .\work /tsa-host <IP of the TSA> \
/tsa-public-key .\tsakey.pub \
/tsa-worker-private-key .\workerkey
</pre>
<p>If all goes well we should see output similar to:</p>
<pre class="output_blog">
{"timestamp":"1478361158.394949198","source":"tsa","message":"tsa.connection.forward-worker.register.done","log_level":1
,"data":{"remote":"<IP:SOURCE-PORT of the TSA>","session":"3.1.4","worker-address":"<IP:PORT of this worker>","worker-platform":"windows",
"worker-tags":""}}
</pre>
<p>and the new worker should appear in the list via the Concourse CLI as such:</p>
<pre class="output_blog">
~/ $ fly -t ci workers
name containers platform tags team
2a334e70-c75c 3 linux none none
WORKERSHOSTNAME 0 windows none none
</pre>
<p><strong>Testing Things Out</strong></p>
<p>Assuming the .NET framework is present on our Worker with the build tools in the path
we could test this out by building this simple .NET Console app project:
<a href="https://github.com/chrisumbel/DatDotNet.git">https://github.com/chrisumbel/DatDotNet.git</a>.</p>
<p>Consider the pipeline:</p>
<pre class="ruby" name="code">
resources:
- name: code
type: git
source:
uri: https://github.com/chrisumbel/DatDotNet.git
branch: master
jobs:
- name: build
plan:
- aggregate:
- get: code
trigger: true
- task: compile
privileged: true
file: code/Pipeline/compile.yml
</pre>
<p>with the build task:</p>
<pre class="ruby" name="code">
platform: windows
inputs:
- name: code
run:
dir: code
path: msbuild
</pre>
<p>Note that the platform specified in the build task is "windows". That instructs
concourse to place the task on a Windows worker.</p>
<p>If all went well we should see a successful build with output similar to:</p>
<pre class="output_blog">
~/ $ fly -t ci trigger-job -j datdotnet/build --watch
started datdotnet/build #8
using version of resource found in cache
initializing
running msbuild
Microsoft (R) Build Engine version 4.6.1085.0
[Microsoft .NET Framework, version 4.0.30319.42000]
Copyright (C) Microsoft Corporation. All rights reserved.
Building the projects in this solution one at a time. To enable parallel build, please add the "/m" switch.
Build started 11/5/2016 4:04:00 PM.
...
nces, or take a dependency on references with a processor architecture that matches the targeted processor architecture of your project. [C:\concourse\work\containers\00000arl2se\tmp\build\36d0981b\code\DatDotNet\DatDotNet.csproj]
3 Warning(s)
0 Error(s)
Time Elapsed 00:00:00.22
succeeded
</pre>
http://www.chrisumbel.com/article/windows_worker_to_bosh_deployed_concourseSat, 05 Nov 2016 12:00:00 GMTRaspberry Pi Pig Tankhttp://www.chrisumbel.com/article/raspberry_pi_pig_tank
<img src="http://29dcc57c841c4009fbaa-6fd0170e32fb359031f5f9240015d9c4.r12.cf1.rackcdn.com/pig_side_thumb.jpg" align="right">This is my first post about the Raspberry Pi Pig-Tank -- a project my son and I have been working on for a little while. The project is still incomplete, but very functional. When it finally feels "1.0" we'll make a YouTube video that details the project further, but in the meantime I thought it'd be potentially useful to others to see our progress thus far.
<p>Eventually I'll also get a post together that outlines measurements, part numbers and has detailed photos, but that's a ways off.</p>
<p><strong>Motivation and Goals</strong></p>
<p>See, I've been fiddling with Raspberry Pi for several years now, but have largely used them simply as small, lower-power computers. Anything involving more interesting hardware-wise would have me resorting to a microcontroller. I was interested in using a Pi as a control system just for the heck of it, but lacked the inspiration for an interesting project.</p>
<p>After building a simple RC tank kit from Popular Mechanics my son mentioned the idea of mounting a camera on it. The kit itself lacked any real intelligence. It was pretty much just a chassis with treads, motors, motor drivers, IR receiver (this kit is technically not RC because it's not *Radio* controlled), battery box... standard stuff. If we were going to get a camera involved we were going to need to get some proper computing power on there. Considering I have more Raspberry Pi and camera boards than I can shake a stick at it seemed perfect for the project. </p>
<p><img src="http://29dcc57c841c4009fbaa-6fd0170e32fb359031f5f9240015d9c4.r12.cf1.rackcdn.com/pig_tank_before.jpg" align="left"/></p>
<p>As our conversations continued it was clear that we'd have the opportunity to extend the range of the vehicle. A Pi can easily interface with various wireless communication systems: XBee, WiFi, GSM... To keep things simple to start we agreed that WiFi made sense so we could at least control the vehicle around the house. </p>
<p>Also, it seemed reasonable that we should ditch specialized remote controls and go with something web-based. That would allow us to use our laptops, tablets and phones to control the vehicle. Not only are they easy for dealing with the video output of cameras but it also increases the cool factor a bit.</p>
<p>We agreed that our end goal should be to pilot the vehicle 0.6 miles from our house to the local grocery store and back just as an arbitrary measure of awesomeness (we'd obviously have to go beyond simple WiFi to get it done). If anything I pushed for that goal just to get it in his brain that remote control can actually be *very* remote. Maybe it would give him some additional appreciation for the wonderful work that's been done on massively-frickin-ridiculously-remote-controlled vehicles (MFRRCVs) like the great work NASA has done on the mars rovers. If we can take this thing from tens of feet to hundreds to miles the possibilities are endless!</p>
<p><strong>Materials</strong></p>
<p>So I set about getting materials together to build the bloody thing. </p>
<p>The plan was simple for the structure and drive. Use the chassis, treads and motors from the kit for the structure and drive system. From there holes could be drilled and components fastened.</p>
<p><img src="http://29dcc57c841c4009fbaa-6fd0170e32fb359031f5f9240015d9c4.r12.cf1.rackcdn.com/pig_tank_box.jpg" align="middle"/></p>
<p>The control system will be a Raspberry Pi with a USB WiFi dongle. Due to favorable power consumption characteristics a Model A was chosen (the Model A runs sub-200 mA idle while the B runs > 450 mA). </p>
<p>We went with these 7.4V (both 1000 mAh and 2200 mAh versions fit in my case) LiPo batteries I had laying around. Although the Pi and motors were all fine with 5V these batteries would give us some extra voltage for other additional systems in the future (maybe an amplifier for some crazy sound or something). The drawback of that extra 2.4V was that I had to employ a voltage regulator and waste some space to heatsink it, but that didn't seem unreasonable.</p>
<p><img src="http://29dcc57c841c4009fbaa-6fd0170e32fb359031f5f9240015d9c4.r12.cf1.rackcdn.com/pig_battery.jpg" align="middle"/></p>
<p>The standard Raspberry Pi camera module seemed to be small and light enough to fit the bill, plus I had a few stashed away. There are also relatively easy to use programs (raspystill, raspyvid) that provided a great starting point.</p>
<p>We also eventually decided to add some red LED eyes and use a 10mm white, ultra-bright LED as a headlight so I grabbed some from the parts bin.</p>
<p>Now to drive DC motors in both directions something like an H-bridge would be required. Sure, I could have monkeyed around with some MOSFETs to get one in place, but I decided to go with a <a href="http://www.pololu.com/product/713">little driver board from Pololu</a> instead (uses a Toshiba TB6612FNG dual motor driver). I've used different driver boards of theirs in 3D printing with good results so I figured it would likely save me some headaches. </p>
<p>Since there was no way all of our madness was going to fit into the body from the tank kit it was clear that we were going to have to mount a larger enclosure on the kit's chassis. We selected an appropriately-sized project box from Radio Shack to serve this need. Sure, it would end up looking like a rolling box, but who cares. </p>
<p><strong>Physical Construction</strong></p>
<p>The first step was to get the electronics together to prove that all of the ideas would work. After breadboarding everything out I put a board together from prototyping PCB with 8-pin female headers for the motor driver; 4-pin headers for ground, 7.4V and 5V power rails and the regulator with its associated capacitors/rigamarole. </p>
<p><img src="http://29dcc57c841c4009fbaa-6fd0170e32fb359031f5f9240015d9c4.r12.cf1.rackcdn.com/pig_driver.jpg" align="middle"/></p>
<p>To get the basic structure in place we started chassis from the tank kit and sawed off the connector pieces that the body attached to. From there we drilled some holes through both the base of the project box and the chassis for mounting hardware and motor wires. Machine screws fastened the bottom of the box to the chassis and long M2.5x20mm-ish screws were inserted to ultimately mount the Pi and the custom board mentioned above. </p>
<p><img src="http://29dcc57c841c4009fbaa-6fd0170e32fb359031f5f9240015d9c4.r12.cf1.rackcdn.com/pig_bottom.jpg" align="middle"/></p>
<p>We mounted the Pi and the custom board on the skinny screws using nylon spacers to keep everything apart. Everything was (and still is) wired up using jumper wires. 5V goes to the Pi's 5V in, the Pi's GPIO pins go to the motor driver's inputs and pretty much everything works as expected.</p>
<p><img src="http://29dcc57c841c4009fbaa-6fd0170e32fb359031f5f9240015d9c4.r12.cf1.rackcdn.com/pig_pi.jpg" align="middle"/></p>
<p>After toggling GPIO pins from test code to move the motors it was clear that we were on the right track. But, you know, the boy wanted more. He wanted LED eyes. Feature-creep happens at home too, I guess. It seemed that if we were going to go as far as to add ornamental LEDs a headlamp would be a useful addition. Since the Pi's GPIO pins don't source enough to get the 10mm ultra-bright LED to burn your retinas I put a board together with PNP transistors to trip it.</p>
<p>With the connection of the camera and mounting of the wheels/tracks we were in good shape. Everything worked with some test code so it was time to turn my attention to putting proper control software together.</p>
<p><strong>Software</strong></p>
<p>Now we came to the point where I'm actually somewhat professionally qualified (I'm a software developer, I don't hack apart toys by trade). It was time to develop the control software. </p>
<p>Since we were sticking with a vanilla Raspbian install on the Pi our options were open. It seemed the path of least resistance was to hack in python and use the "RPi.GPIO" package to control the peripherals. As I stated initially we wanted the system to be web-based. Our needs were simple and small so I chose CherryPy for the job.</p>
<p><img src="http://29dcc57c841c4009fbaa-6fd0170e32fb359031f5f9240015d9c4.r12.cf1.rackcdn.com/pig_software.jpg" align="middle"/></p>
<p>To start we kept the video simple. Rather than shoot and stream video we just used raspystill to shoot stills a few times per second. Our web interface would then periodically refresh to get a current view of the situation. If this proves cumbersome we may instead choose to stream video.</p>
<p>In time I'll get the software more organized and get it up on Github, but I'm just not there yet.</p>
<p><strong>The Result</strong></p>
<p>Well, here it is in its current form doing my bidding.</p>
<p><iframe width="560" height="315" src="//www.youtube.com/embed/DTtN8WPUoYw" frameborder="0" allowfullscreen></iframe></p>
http://www.chrisumbel.com/article/raspberry_pi_pig_tankFri, 07 Mar 2014 00:00:00 GMTSELinux on Amazon's AWS Linux AMIhttp://www.chrisumbel.com/article/selinux_amazon_aws_ec2_ami_linux
<a href="http://selinuxproject.org/page/Main_Page"><img src="http://29dcc57c841c4009fbaa-6fd0170e32fb359031f5f9240015d9c4.r12.cf1.rackcdn.com/selinux-penguin.jpg" align="right" border="0"/></a>One interesting omission from Amazon's Linux AMI is <a href="http://selinuxproject.org/page/Main_Page">SElinux</a> and I recently had occasion to install it on a few EC2 instances. The process of installing and enabling SELinux in this environment is actually quite strait-forward, although it can require digging through quite a bit of incorrect and obsolete documentation.
<p>The instructions below are what worked for me using the <a href="https://aws.amazon.com/amazon-linux-ami/2012.09-release-notes">2012.09 relase of the AMI</a>. 2012.09 ships with kernel (3.2.30-49.59.amzn1.x86_64), but these instruction will indeed upgrade it.</p>
<p>The first step is to install the following packages which include SELinux and some accompanying tools.</p>
<pre class="output_blog">[root@EC2]# yum install libselinux libselinux-utils libselinux-utils selinux-policy-minimum selinux-policy-mls selinux-policy-targeted policycoreutils </pre>
<p>Now we have to tell the kernel to enable SELinux on boot. Append the following to the kernel line in your /etc/grub.conf for your current kernel. Note that if you want to boot into permissive mode replace <em>enforcing=1</em> with <em>permissive=1</em>.</p>
<pre class="output_blog">selinux=1 security=selinux enforcing=1</pre>
<p>In my case the resulting /etc/grub.conf looked like:</p>
<pre class="output_blog"># created by imagebuilder
default=0
timeout=1
hiddenmenu
title Amazon Linux 2012.09 (3.2.30-49.59.amzn1.x86_64)
root (hd0)
kernel /boot/vmlinuz-3.2.30-49.59.amzn1.x86_64 root=LABEL=/ console=hvc0 selinux=1 security=selinux enforcing=1
initrd /boot/initramfs-3.2.30-49.59.amzn1.x86_64.img</pre>
<p>Now install a new kernel and build a new RAM disk. Don't worry, the options you added above will propogate to the new kernel.</p>
<pre class="output_blog">[root@EC2]# yum -y update</pre>
<p>Relabel the root filesystem</p>
<pre class="output_blog">[root@EC2]# touch /.autorelabel</pre>
<p>Now examine /etc/selinux/config and ensure the enforcement level and policy you desire are enabled. In my case I stuck with the default fully enforced targeted policy.</p>
<pre class="output_blog"># This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=enforcing
# SELINUXTYPE= can take one of these two values:
# targeted - Targeted processes are protected,
# mls - Multi Level Security protection.
SELINUXTYPE=targeted</pre>
<p>Now reboot the instance</p>
<pre class="output_blog">[root@EC2]# reboot</pre>
<p>Because the root file-system was set to be relabeled rebooting will take a few minutes longer than usual.</p>
<p>Once the instance comes back up log in and verify your work. If everything went as planned the <em>getenforce</em> command will generate the following (for full enforcement).</p>
<pre class="output_blog">[root@EC2]# getenforce
Enforcing</pre>
<p>And you're done! SELinux is installed and operating on your instance.</p>
http://www.chrisumbel.com/article/selinux_amazon_aws_ec2_ami_linuxFri, 14 Dec 2012 00:00:00 GMTMySQL Replication with Minimal Downtime Using R1Soft Hot Copy for Linuxhttp://www.chrisumbel.com/article/mysql_replication_no_downtime_r1soft_hot_copy_linux
<a href="http://www.mysql.com"><img src="http://29dcc57c841c4009fbaa-6fd0170e32fb359031f5f9240015d9c4.r12.cf1.rackcdn.com/mysql.jpg" align="left" border="0"/></a>At the office a while back I was experimenting with techniques to initialize <a href="http://www.mysql.com">MySQL</a> replication for both InnoDB and MyISAM tables without significant downtime. The MySQL systems in question didn't use LVMs and the idea of locking all tables and performing a backup to ship to the slave simply takes far too long. The method that I ultimately ended up adding to my toolbelt was an adaptation of the process outlined in <a href="http://badan.wordpress.com/2011/05/12/how-to-setup-mysql-replication-with-virtually-no-downtime-without-locking-tables-without-lvm/">this article by Badan Sergiu</a> which uses a tool from <a hre="http://www.idera.com/">Idera</a> called <a href="http://r1soft.idera.com/tools/linux-hot-copy/">R1Soft Hot Copy for Linux</a>.
<p><strong>R1Soft Hot Copy</strong></p>
<p><a href="http://www.idera.com"><img src="http://29dcc57c841c4009fbaa-6fd0170e32fb359031f5f9240015d9c4.r12.cf1.rackcdn.com/idera.jpeg" align="right" border="0"/></a>What differentiates this process from a more standard approach is the employment of R1Soft Hot Copy. R1Soft Hot Copy is a tool that facilities the creation a snapshot of a block device. When changes to the original device occur only the differences are placed in the snapshot in a Copy-on-Write fashion (similar to VSS in Microsoft Windows). This allows an administrator to create a functional, mountable backup of an entire device almost instantly with very little effort.</p>
<p><strong>Motivation and Caveats</strong></p>
<p>I'm posting these instructions because I'd like some feedback not only on my adaptation, but also on the initial method. Feel free to use any of this information, but please be careful. It worked for me, but I'm not qualified to write authoritative tutorials on the subject.</p>
<p><strong>Prerequisites and Requirements</strong></p>
<p>I'm going to make the assumption that the reader knows how to setup MySQL replication using the methods outlined in the <a href="http://dev.mysql.com/doc/refman/5.5/en/replication.html">official documentation</a> and that they've read <a href="http://badan.wordpress.com/2011/05/12/how-to-setup-mysql-replication-with-virtually-no-downtime-without-locking-tables-without-lvm">the source article</a> mentioned above.</p>
<p>Also keep in mind that R1Soft Hot Copy is a Linux utility making this article not directly applicable to other operating systems.</p>
<p><strong>Methods</strong></p>
<p>A central theme in Badan Sergiu's article was to avoid locking tables (or using lvm). The cost of not locking tables was a restart of the MySQL service itself on the master; meaning that even read queries were not able to be processed momentarily. My idea was to instead flush and lock tables in the standard fashion while creating the Hot Copy mount. That should allow read queries to still be processed and connection attempts to succeed. Writes will be temporarily blocked, but only briefly and clients should have an error free, albeit slower, experience.</p>
<p><strong>Step 1: Install R1Soft Hot Copy</strong></p>
<p>Use the <a href="http://r1soft.idera.com/tools/linux-hot-copy/">instructions on Idera's website</a> to install Hot Copy and then run</p>
<pre class="output_blog">
# hcp-setup --get-module
</pre>
<p>on the master.</p>
<p><strong>Step 2: Configure master</strong></p>
<p>Enable binary logging on the master server and configure a server id in my.cnf.</p>
<pre class="output_blog">
log-bin=mysql-bin
server-id=1
</pre>
<p>On the master create a user specifically to be used for replication.</p>
<pre class="sql" name="code">
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'SLAVE_IP_OR_HOSTNAME' IDENTIFIED BY 'slavepass';
</pre>
<p><strong>Step 3: Create/mount a snapshot</strong></p>
<p>Ensure mysql has flushed all data to disk and then lock tables so no writes can occurr.</p>
<pre class="sql" name="code">
mysql> FLUSH TABLES WITH READ LOCK;
</pre>
<p>Obtain log coordinates. Record the values of the File and Position fields.</p>
<pre class="sql" name="code">
mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000002 | 1234 | | |
+------------------+----------+--------------+------------------+
</pre>
<p>Create and mount the snapshot on the master. Because all tables are locked the coordinates obtained above will be consistent with the data in the snapshot.</p>
<pre class="output_blog">
# hcp -o /dev/sda2
</pre>
<p>... where /dev/sda2 is the device containing the filesystem which houses the MySQL databases to be replicated. Watch the output for the resulting mount point. This process should take mere seconds.</p>
<p>Release locks on the tables. This will return operation on the master to normal.</p>
<pre class="sql" name="code">
mysql> UNLOCK TABLES;
</pre>
<p><strong>Step 4: Shutdown the slave's mysqld and copy the data</strong></p>
<p>Run these commands on the slave:</p>
<pre class="output_blog">
# /etc/init.d/mysql stop
# rm -rf /var/lib/mysql
# rsync -avz root@MASTER_IP_OR_HOST:/var/hotcopy/sda2_hcp1/lib/mysql /var/lib/
</pre>
<p>... where /var/lib/mysql is an example path to MySQL's data.</p>
<p><strong>Step 5: Unmount the snapshot on the master</strong></p>
<pre class="output_blog">
# hcp -r /dev/hcp1
</pre>
<p><strong>Step 6: Configure the slave's identity and start MySQL</strong></p>
<p>Edit /etc/mysql/my.cnf on the slave and set a server id.</p>
<pre class="output_blog">
[mysqld]
server-id=2
</pre>
<pre class="output_blog">
# /etc/init.d/mysql start
</pre>
<p><strong>Step 7: Configure and start slave</strong></p>
<p>Now it's time to point the slave at the master and start replication. The MASTER_LOG_FILE and MASTER_LOG_POS should be set to the File and Position fields recorded in Step 3.</p>
<pre class="sql" name="code">
mysql> CHANGE MASTER TO
-> MASTER_HOST='MASTER_IP_OR_HOST',
-> MASTER_USER='repl',
-> MASTER_PASSWORD='slavepass',
-> MASTER_LOG_FILE='mysql-bin.000002',
-> MASTER_LOG_POS=1234;
mysql> START SLAVE;
</pre>
<p><strong>Conclusion</strong></p>
<p>At this point replication should be running and the only major service interruption was that writes were blocked for a short period on the master.</p>
<p>There's nothing fundamentally different in the finished product between replication setup in this fashion and a more typical dump-and-copy process. That means monitoring and maintenance should be quite standard.</p>
<p><strong>Thanks</strong></p>
<p>Also, thanks Badan Sergiu for posting the original article. It helped me immensely.</p>
http://www.chrisumbel.com/article/mysql_replication_no_downtime_r1soft_hot_copy_linuxSun, 18 Nov 2012 00:00:00 GMTNubimus - A Programmatic Dialoguehttp://www.chrisumbel.com/article/nubimus
<strong>Nubimus123:</strong> Hey, hey!
<img src="http://c243025.r25.cf1.rackcdn.com/programetes.jpg" border="0" align="right"/>
<p><strong>Pr0gram4tez:</strong> Ah, dear Nubimus. What inspires this chat window so close to happy hour on a Friday? Surely not a bug wreaking havoc on your users or offensive performance bottlenecks?<p>
<p><strong>Nubimus123:</strong> No, Programates. Quite the opposite, in fact. I was interrupting your work to invite you for a beer to celebrate my team's recent success. Surely you will join me.<p>
<p><strong>Pr0gram4tez:</strong> Perhaps, my friend. First tell me about your success. I wish to learn from your work while my head is still clear. My week was full of disappointment and your youthful wisdom and inspiration may be necessary for solutions in the week that follows.<p>
<p><strong>Nubimus123:</strong> Gladly, Programetes. The victory was one of cryptography. Our client's data contains highly sensitive personal information and credit card numbers. They sought us out to secure their data that it never be consumed by either hacker or fool.<p>
<p><strong>Pr0gram4tez:</strong> By Schneier! The protection of data of that nature is indeed important and critical to the order of the state.<p>
<p><strong>Nubimus123:</strong> And that is why they rightly entrusted us with securing it.<p>
<p><strong>Pr0gram4tez:</strong> Tell me, fellow developer, how did you achieve the security necessary to meet the requirements of your customer. Did you labor diligently designing an algorithm channelling all of your inventiveness? Did you employ all of your training and education designing a computation that was quick to perform while at the same time married to the highest standard of protection? Did you seek the cutting edge of mathematics and piety? My excitement implores an explanation!<p>
<p><strong>Nubimus123:</strong> LOL!!!!!111 I must confess to you, who have always been honest with me. It all boils down to how we got the contract initially. Our bid was low because we allowed the gods to do the work for us. We used the cryto API in the framework crafted by the Olympians themselves.<p>
<p><strong>Pr0gram4tez:</strong> How wonderful! It is said that mortals achieve their highest greatness when letting the gods handle the lofty computation so that they may focus their efforts on terrestrial business logic.<p>
<p><strong>Nubimus123:</strong> Exactly.<p>
<p><strong>Pr0gram4tez:</strong> Tell me, friend Nubimus, what class of divinely-crafted algorithms did you chose? Perhaps a symmetric-key cipher with Hermetic dispatch? Of either the stream or block subtype? Maybe even an asymmetric-key cipher with Alice and Bob unaware of each other's cryptographic secrets?<p>
<p><strong>Nubimus123:</strong> SMH... I believe you've failed to learn the very lesson you described mere sentences ago, Programates. Such details are of the divine and, while important, could only be made poorly by simple, mortal software developers. Though I believe the default algorithm for the framework was something called Athena’s Encryption Standard or AES. I'm lead to believe it's the finest cryptographic work produced by god or man. I know not of style or tactics.<p>
<p><strong>Pr0gram4tez:</strong> Interesting. The framework is so robust that the implementation required virtually no attention. You've not studied the algorithm itself?<p>
<p><strong>Nubimus123:</strong> Like I said. That's a matter for the gods.<p>
<p><strong>Pr0gram4tez:</strong> We can agree that the code behind the cryptography is hallowed, but I'm not aware of methods that tailor the implementation to your business case as a matter of course in any framework. There must be much to the implementation, surely. Can you not harm your client if your implementation is incorrect? Could not the gods themselves attempt to injure your business interests?<p>
<p><strong>Nubimus123:</strong> Assuredly not, Programetes. The framework's code is not only holy but was developed in pairs and certified by a consortium of deities eliminating the possibility of any one Olympian's mischief or mistake.<p>
<p><strong>Pr0gram4tez:</strong> The first question remains unanswered, methinks. It is not possible for you to hinder the service with even the most well-intentioned practical application?<p>
<p><strong>Nubimus123:</strong> Perhaps a layman. We are a professional team. We may not be gods, but I'm confident we consumed the framework as intended. Besides, the gods will assist us. We’ve prayed to them in social media. We’ve sacrificed to them with licensing fees.<p>
<p><strong>Pr0gram4tez:</strong> The admission of the potential fallibility of an implementation using the framework troubles me, even if by a neophyte. I can accept the quality of the framework as it stands but you've only offered the credentials of your team as proof the implementation is correct. You say you're professionals, but admit that you accepted the algorithm as a default without significant technical consideration. Did you investigate other algorithms offered by the framework? They’re equally divine and perhaps better suited to your toil.<p>
<p><strong>Nubimus123:</strong> No, old friend. We have to deliver! Besides, the customer will be satisfied with reasonable security. What they're really paying for is our work-flow and interface. There's no way parameters to a cryptographic problem will significantly impact the effective security of our application in a negative way within the realm of practicality.<p>
<p><strong>Pr0gram4tez:</strong> Whether implemented by either god or man are the cryptographic parameters not identically effective or ineffective? Speed and stability may be the domain of the gods, but would a small key size make you equally vulnerable regardless of the algorithm's origin? Can you not impart your human fallibility on these holy tools?<p>
<p><strong>Nubimus123:</strong> I must admit, Programetes, I'm unaware of the practical impact of key size in our implementation, but it's certainly adjustable. It's a problem easily solved with configuration. It's still overwhelmingly in the hands of the gods who sold us the framework.<p>
<p><strong>Pr0gram4tez:</strong> Very well, Nubimus. But keep in mind even with adequately-sized keys there are more cryptographic parameters such as the mode of operation. Even if adjustable at deploy-time are you aware of them now? Can you guarantee the data provided by your client will be ciphered optimally upon delivery so you don't have to trouble them for migrations later?<p>
<p><strong>Nubimus123:</strong> Another admission, oh thorough Promgrametes, follows. I have no answer as I've not studied modes of operation, but can note it for consideration with the team next week. Again, likely a matter of configration. You've proven the need for an audit so our configuration is correct upon delivery, but nothing has been stated yet indicating code needs changed or that our choice was flawed.<p>
<p><strong>Pr0gram4tez:</strong> Please understand my goal is not to convince you to change your code, dear Nubimus! I'm simply trying to understand how productive the framework your team uses is. Perhaps I've turned up some points that require attention as is common for a grey-beard like myself. I do have some questions that are somewhat higher-level if you have time before we leave.<p>
<p><strong>Nubimus123:</strong> My thirst grows, and my brain becomes weary on this Friday afternoon, but in exchange for the service you've provided I will certainly entertain them.<p>
<p><strong>Pr0gram4tez:</strong> My gratitude is immense, Nubimus. Well then, the keys... Does the framework you employ provide a secure key storage facility?<p>
<p><strong>Nubimus123:</strong> It does indeed, but we don't use it. In effort to centralize our data we store the keys directly in the application's relational store.<p>
<p><strong>Pr0gram4tez:</strong> I'm puzzled, Nubimus. You store the ciphertext in the relational store, correct?<p>
<p><strong>Nubimus123:</strong> That is true, Progametes.<p>
<p><strong>Pr0gram4tez:</strong> And, as you've stated, the keys are in the relational store. All of them?<p>
<p><strong>Nubimus123:</strong> Quite true.<p>
<p><strong>Pr0gram4tez:</strong> So in the event of a compromise, beg the gods not, an adversary, perhaps an agent of Syracuse, would have all the tools necessary to recover the plaintext and use it against you or your clients or Athens itself?<p>
<p><strong>Nubimus123:</strong> Well, I suppose so, Programetes. That seems unlikely, though. The attacker would have to compromise multiple, higher-level layers to achieve access.<p>
<p><strong>Pr0gram4tez:</strong> What evidence do you have that any other layer of your application isn't equally vulnerable? Have you not used the same strategy of trusting the framework above the judgement of mortal programmers? Is not the desired result of cryptography to protect your clients data in the event the higher levels of the application are compromised?<p>
<p><strong>Nubimus123:</strong> Well, I fear that may be true, rare friend.<p>
<p><strong>Pr0gram4tez:</strong> Tell me, then. Is the cryptography not potentially rendered irrelevant by your key storage practices. Is the net result that there is little security benefit but there is complexity incurred in the application?<p>
<p><strong>Nubimus123:</strong> I admit that my understanding, and perhaps that of the whole team, of how to implement a cryptographic system was inadequate. We became emboldened by the code crafted by the gods and it made us feel invincible. I now know that a tool of the gods in the hands of man doesn't make that same man godlike.<p>
<p><strong>Pr0gram4tez:</strong> I regret that our conversation may leave you with your mood diminished, Nubimus. I am sorry for that. It was not my intention to play your adversary, but I am thankful that you and your clients may benefit.<p>
<p><strong>Nubimus123:</strong> And I thank you for it, teacher Programetes. Regardless, my eyes can focus on pixels no longer. It's time to close this chat window.<p>
<p><strong>Pr0gram4tez:</strong> Beer, then?<p>
<p><strong>Nubimus123:</strong> I've changed my mind, Programetes. I feel I must stay sharp for a busy weekend of study.<p>
http://www.chrisumbel.com/article/nubimusSat, 10 Mar 2012 11:50:32 GMTThe Sylvester Matrix Library Comes to Node.js http://www.chrisumbel.com/article/sylvester_node_js_matrix_vector_math
<a href="http://nodejs.org/"><img src="http://c243025.r25.cf1.rackcdn.com/new_node_logo.jpg" border="0" align="right"/></a>Before some recent machine learning implementations in node I set out to find
a reasonable matrix math/linear algebra library for node.js. The pickings were
slim but I managed to dig up a general JavaScript matrix math library written
by James Coglan called <a href="http://sylvester.jcoglan.com/">sylvester</a>. Clearly, sylvester had to be node-ified.
<p>With the help of Rob Ellis (a collaborator on <a href="https://github.com/NaturalNode/natural">natural</a>) it's been wrapped up into
a <a href="https://github.com/NaturalNode/node-sylvester">node project titled node-sylvester</a> & <a href="http://search.npmjs.org/#/sylvester">NPM</a> and has had some features added such as element-wise multiplication,
QR, LU, SVD decompositions and basic solving of systems of linear equations. </p>
<p>In this post I'll cover some of the basic structures and operations supported by
sylvester, but it will by no means be complete. I'll focus solely on the <em>Matrix</em>
and <em>Vector</em> prototypes, but sylvester also supports <em>Line</em>, <em>Plane</em>,
and <em>Polygon</em>. Also within the covered prototypes I'll only
demonstrate a small, but useful subset of their functionality.</p>
<p>Currently, the only reasonable sources of documentation for functionality
existing only in the node port are in the <a href="https://github.com/NaturalNode/node-sylvester/blob/master/README.rdoc">README</a> while general sylvester
functionality is covered in its <a href="http://sylvester.jcoglan.com/">API docs</a>. In time I will (hopefully with the
help of the community) provide some more complete documentation.</p>
<p><strong>Installation</strong></p>
<p>Getting ready to use the node port of sylvester is what you'd expect, a standard
NPM install.</p>
<pre class="output_blog">npm install sylvester</pre>
<p>You can then require-up sylvester in your node code and use the prototypes
within.</p>
<pre class="JavaScript" name="code">
var sylvester = require('sylvester'),
Matrix = sylvester.Matrix,
Vector = sylvester.Vector;
</pre>
<p><strong><em>Vector</em> and <em>Matrix</em> Prototypes</strong></p>
<p> Matrices and vectors are abstracted by the <em>Matrix</em> and <em>Vector</em> prototypes.</p>
<p>Instances can be created using their <em>create</em> functions and passing in an array of values. <em>Vector.create</em> accepts a one dimensional array of numbers and <em>Matrix.create</em> accepts multiple dimensions.</p>
<pre class="JavaScript" name="code">
var x = Vector.create([1, 2, 3]);
var a = Matrix.create([[1, 2], [3, 4]]);
</pre>
<p>representing:</p>
<p>
<img src="http://c271180.r80.cf1.rackcdn.com/x.png" align="middle"/>
<img src="http://c271180.r80.cf1.rackcdn.com/A.png" align="middle"/>
</p>
<p>Global shortcuts exist to clone <em>Vector</em> and <em>Matrix</em> prototypes with <em>$V</em> and <em>$M</em>
respectively. The following is the semantic equivalent of the previous example.</p>
<pre class="JavaScript" name="code">
var x = $V([1, 2, 3]);
var a = $M([[1, 2], [3, 4]]);
</pre>
<p><em>Matrix</em>'s <em>Ones</em> function will create a matrix of specified dimensions with all
elements set to 1.</p>
<pre class="JavaScript" name="code">
var ones = Matrix.Ones(3, 2);
</pre>
<p>
<img src="http://c271180.r80.cf1.rackcdn.com/ones.png"/>
</p>
<p><em>Matrix</em>'s <em>Zeros</em> function does the same except with all elements set to 0.</p>
<pre class="JavaScript" name="code">
var zeros = Matrix.Zeros(3, 2);
</pre>
<p>
<img src="http://c271180.r80.cf1.rackcdn.com/zeros.png"/>
</p>
<p>An identity matrix of a given size can be created with <em>Matrix</em>'s <em>I</em> function.</p>
<pre class="JavaScript" name="code">
var eye = Matrix.I(4);
</pre>
<p>
<img src="http://c271180.r80.cf1.rackcdn.com/eye.png"/>
</p>
<p><strong>Data Access</strong></p>
<p>Values can be retrieved from within a <em>Matrix</em> or <em>Vector</em> using the <em>e</em> method.</p>
<pre class="JavaScript" name="code">
a.e(2, 1); // returns 3
x.e(3); // returns 3
</pre>
<p>The entire set of values can be retrieved/manipulated with the <em>elements</em> member of <em>Matrix</em> and <em>Vector</em>. <em>elements</em> is an array-of-arrays-of-numbers in the case of matrices and it is a simple array-of-numbers for vectors.</p>
<pre class="JavaScript" name="code">
var a_elements = a.elements; // [[1, 2], [3, 4]]
var x_elements = x.elements; // [1, 2, 3]
</pre>
<p><strong>Basic Math</strong></p>
<p>Many standard mathematic operations are supported in <em>Vector</em> and <em>Matrix</em> clones. In general the arguments can either be scalar values or properly-dimensioned <em>Matrix</em>/<em>Vector</em> clones. While not covered here element-wise versions of many multiplicative/specialized operations are also available.</p>
<p>Matrices and vectors can be added-to and subtracted-from by scalar values using
the <em>add</em> and <em>subtract</em> functions.</p>
<pre class="JavaScript" name="code">
var b = a.add(1);
var d = a.subtract(1);
</pre>
<p>symbolizing:</p>
<p>
<img src="http://c271180.r80.cf1.rackcdn.com/scalar_add2.png"/>
</p>
<p>
<img src="http://c271180.r80.cf1.rackcdn.com/scalar_sub2.png"/>
</p>
<p>Naturally, like-dimensioned matrices and vectors can be added to and subtracted from each other.</p>
<pre class="JavaScript" name="code">
var e = a.add($M([[2, 3], [4, 5]]));
var f = a.subtract($M([[0, 2], [3, 3]]));
</pre>
<p>meaning:</p>
<p>
<img src="http://c271180.r80.cf1.rackcdn.com/matrix_add2.png"/>
</p>
<p>
<img src="http://c271180.r80.cf1.rackcdn.com/matrix_sub2.png"/>
</p>
<p>Matrices/vectors can be multiplied with matrices/vectors of
appropriate dimensions.</p>
<pre class="JavaScript" name="code">
var g = a.x($V([3, 4]));
</pre>
<p>
<img src="http://c271180.r80.cf1.rackcdn.com/matrix_mul.png"/>
</p>
<p>The dot-product of two vectors can be computed with <em>Vector</em>'s <em>dot</em> method.</p>
<pre class="JavaScript" name="code">
var y = x.dot($V([4, 5, 6]));
</pre>
<p>
depicting:
<p>
<p>
<img src="http://c271180.r80.cf1.rackcdn.com/dot_product2.png"/>
</p>
<p><strong>Transposing</strong></p>
<p>A <em>Matrix</em> can be transposed with the <em>transpose</em> method.</p>
<pre class="JavaScript" name="code">
var at = a.transpose();
</pre>
<p>where <em>at</em> represents the matrix:</p>
<p>
<img src="http://c271180.r80.cf1.rackcdn.com/transpose.png"/>
</p>
<p><strong>Segmentation/Augmentation</strong></p>
<p>The first n elements can be removed from a <em>Vector</em> with the <em>chomp</em> method.</p>
<pre class="JavaScript" name="code">
var n = 2;
var xa = x.chomp(n);
</pre>
<p>In contrast the first n elements can be retrieved with the <em>top</em> method.</p>
<pre class="JavaScript" name="code">
var xb = x.top(n);
</pre>
<p>Vectors can have a list of values appended to the end with the <em>augment</em> method.</p>
<pre class="JavaScript" name="code">
var xc = x.augment([4, 5]);
</pre>
<p>Matrices can have a column appended to the right with <em>Matrix</em>'s <em>augment</em> method.</p>
<pre class="JavaScript" name="code">
var m = a.augment($V([3, 5]));
</pre>
<p>A sub-block of a <em>Matrix</em> clone can be retrieved with the <em>slice</em> method. <em>slice</em>
accepts a starting row, ending row, starting column and ending column as
parameters.</p>
<pre class="JavaScript" name="code">
var aa = $M([[1, 2, 3], [4, 5, 6], [7, 8, 9]]);
var ab = aa.slice(1, 2, 1, 2);
</pre>
<p>The code above produces a matrix <em>ab</em> shaped like:</p>
<p>
<img src="http://c271180.r80.cf1.rackcdn.com/slice.png"/>
</p>
<p><strong>Decompositions/Advanced functions</strong></p>
<p>A small number of decompositions are currently supported. These have all been
recently added and at the time of writing are, while functional, not
necessarily as computationally efficient as they could be. Also their stability cannot be guaranteed at this time.</p>
<p>Note that these were all implemented in pure JavaScript. My personal hope is to
keep node-sylvester 100% functional in pure JavaScript to maintain the best
compatibility across all platforms. In the long run, however, I hope to employ
time-tested, computationally-efficient native libraries to perform these
operations if the host machine allows.</p>
<p>A QR decomposition is made possible via <em>Matrix</em>'s <em>qr</em> method. An object is
returned containing both the <em>Q</em> and the <em>R</em> matrix. </p>
<pre class="JavaScript" name="code">
var qr = a.qr();
</pre>
<p>resulting in the object:</p>
<pre class="output_blog">
{ Q:
[-0.316227766016838, -0.9486832980505138]
[-0.9486832980505138, 0.3162277660168381],
R:
[-3.162277660168379, -4.427188724235731]
[5.551115123125783e-16, -0.6324555320336751] }
</pre>
<p>Singular Value Decompositions (SVD) can be produced via <em>Matrix</em>'s aptly named <em>svd</em>
method. An object is returned containing the <em>U</em> (left singular), <em>S</em> (singular
diagonal), and <em>V</em> (right singular) matrices.</p>
<pre class="JavaScript" name="code">
var svd = a.svd();
</pre>
<p>which returns:</p>
<pre class="output_blog">
{ U:
[0.40455358483359943, 0.9145142956773742]
[0.9145142956773744, -0.40455358483359943],
S:
[5.4649857042190435, 0]
[0, 0.36596619062625785],
V:
[0.5760484367663301, -0.8174155604703566]
[0.8174155604703568, 0.5760484367663302] }
</pre>
<p>Principal Component Analysis can be performed (dimensionality reduced) and
reversed (dimensionality restored with approximated values) using <em>Matrix</em>'s
<em>pcaProject</em> and <em>pcaRecover</em> methods respectively.</p>
<p><em>pcaProject</em> accepts the number of target dimensions as a parameter and returns
an object with the reduced data <em>Z</em> and the covariant eigenvectors <em>U</em>.</p>
<pre class="JavaScript" name="code">
var pca = a.pcaProject(1);
</pre>
<pre class="output_blog">
{ Z:
[2.2108795577070475]
[4.997807552180416],
U:
[0.5760484367663208, -0.8174155604703633]
[0.8174155604703633, 0.5760484367663208] }
</pre>
<p>and pcaRecover approximately recovers the data to the original dimensionality.</p>
<pre class="JavaScript" name="code">
pca.Z.pcaRecover(pca.U);
</pre>
<p>which produces:</p>
<p>
<img src="http://c271180.r80.cf1.rackcdn.com/pca_recover.png"/>
</p>
<p><strong>Solving Linear Equations</strong></p>
<p>Simple linear equations in the form of</p>
<p><img src="http://c271180.r80.cf1.rackcdn.com/linear_equation.png"/></p>
<p>can be solved using the suitability-named <em>solve</em> method.</p>
<pre class="JavaScript" name="code">
var A = $M([
[2, 4],
[2, 3],
]);
var b = $V([2, 2]);
var x= M.solve(b);
</pre>
<p>which is representative of</p>
<p><img src="http://c271180.r80.cf1.rackcdn.com/le_A.png"/></p>
<p><img src="http://c271180.r80.cf1.rackcdn.com/le_b.png"/></p>
<p><img src="http://c271180.r80.cf1.rackcdn.com/le_x.png"/></p>
<p><strong>Conclusion</strong></p>
<p>Thanks to James Coglan's hard work on sylvester plenty of matrix math
functionality has been exposed to JavaScript. With a little extra work we've
been able to expose quite a bit more to node. There's still more work to do
not only in optimization but also in capabilities. In time hopefully
<a href="https://github.com/NaturalNode/node-sylvester">node-sylvester</a> will become fast and complete.</p>
<p>As a reminder, the functionality outlined above is not representative of the
full list of current capabilities. Between the <a href="https://github.com/NaturalNode/node-sylvester/blob/master/README.rdoc">README</a> in the source and the
<a href="http://sylvester.jcoglan.com/">sylvester API documentation</a> you should be able to get a reasonably clear
picture. Still, if you're looking for ways to help with the project a concerted
documentation effort would be wonderful and appreciated.</p>
http://www.chrisumbel.com/article/sylvester_node_js_matrix_vector_mathSat, 17 Dec 2011 05:00:00 GMTIo, A Beautiful, Prototype-Based Languagehttp://www.chrisumbel.com/article/io_language_prototype
<a href="http://iolanguage.com/"><img src="http://c243025.r25.cf1.rackcdn.com/Io-logo.png" align="right" border="0"/></a>I recently started reading <a href="http://pragprog.com/book/btlang/seven-languages-in-seven-weeks"><em>Seven Languages in Seven Weeks</em></a>
by Bruce A. Tate and intended to ultimately post a review. Something went wrong
along the way, however. On the second language covered in the book I became so
intrigued that I had to dedicate a single post to it immediately. That language is <a href="http://iolanguage.com/">Io</a>,
a prototype-based language created by Steve Dekorte.
<p>Now, you have to keep in mind that I've only been hacking Io for about three
days. I'm not an expert. I can't promise any of the examples I'm going to provide
are idiomatic, robust or performant. I can't promise any of my advice will be prudent or that this article can even properly tutor you on Io.</p>
<p><a href="http://pragprog.com/book/btlang/seven-languages-in-seven-weeks"><img src="http://c243025.r25.cf1.rackcdn.com/seven-languages.jpg" align="left" border="0"/></a>I can, however, make the strong recommendation that you give Io a shot especially if you're
a novice JavaScript programmer. Being prototype-based and syntactically
simple Io can help an aspiring JavaScripter truly understand the patterns
without the baggage that comes along with JavaScript. I can only promise an honest attempt at whetting your appetite.</p>
<p>Here I'll just outline some examples rather than deliver thorough instruction. For that <a href="http://pragprog.com/book/btlang/seven-languages-in-seven-weeks"><em>Seven Languages in Seven Weeks</em></a>
does a fine job (most of these examples are adapted from its exercises) and the <a href="http://www.iolanguage.com/scm/io/docs/IoGuide.html"><em>Io Guide</em></a>
can be a wonderful help as well.</p>
<p><strong>Prototype-based</strong></p>
<p>As I mentioned several times already Io is prototype-based. There are no classes
yet there are objects. Objects can clone other objects providing an inheritance
mechanism. Consider the following example.</p>
<pre class="javascript" name="code">
Thing := Object clone
</pre>
<p>That code created a prototype named "Thing" from the super-prototype "Object".
Note the ":=" assignment operator. That essentially creates the destination if it does not exist
yet. Had "Thing" already been assigned to "=" would have sufficed as it can only assign
an existing slot (slots are the named locations in an object where data members and methods can be stored).</p>
<p>Now lets add a method to the "Thing" prototype. The following code will add a
method into the "printMessage" slot of the "Thing" prototype.</p>
<pre class="javascript" name="code">
Thing printMessage := method(
writeln("Hello, thing!")
)
</pre>
<p>We can clone the "Thing" prototype into an instance and call the "printMessage"
as such.</p>
<pre class="javascript" name="code">
thing := Thing clone
thing printMessage
</pre>
<p>which would output</p>
<pre class="output_blog">
Hello, thing!
</pre>
<p>Io thinks of things in terms of message-passing. Rather than saying, "we just
called thing's printMessage method" we should say, "we just sent a printMessage
message to thing."</p>
<p>Note that we cloned into a lower-case "thing". When you're defining a pure
prototype you start the identifier with a capital and instances with lower case.</p>
<p>Now lets add a data member to the instance and method to the prototype to
demonstrate parameters and encapsulation.</p>
<pre class="javascript" name="code">
Thing calculatePrice := method(markup,
self price + markup
)
thing price := 10
thing calculatePrice(2) println
</pre>
<p>which outputs</p>
<pre class="output_blog">
12
</pre>
<p>That example also demonstrates chaining messages together. calculatePrice
returned a Number and we in turn pass a println message to the number (as an
alternative to passing the number itself to writeln).</p>
<p><strong>Metaprogramming</strong></p>
<p>One of the exercises covered in <em>Seven Languages in Seven Weeks</em> that I found
interesting was changing the behavior of Io itself so that division by zero
would result in a 0 not infinity. Of course, you'd probably never want to do
that, but it's a wonderful example of how much control you have over the
runtime itself.</p>
<p>For instance if you execute the following in Io without any additional
intervention</p>
<pre class="javascript" name="code">
(6 / 0) println
</pre>
<p>you get</p>
<pre class="output_blog">
inf
</pre>
<p>Now let's reach into Io and change how division works.</p>
<pre class="javascript" name="code">
Number oldDiv := Number getSlot("/")
Number / := method(d,
if(d == 0, 0, self oldDiv(d))
)
(6 / 2) println
(6 / 0) println
</pre>
<p>outputting</p>
<pre class="output_blog">
3
0
</pre>
<p>The first output was done just to illustrate that we didn't break division
entirely. The second illustrates that we sure did change how division by 0
works, rather than "inf" we got "0".</p>
<p>Let's break it down line by line.</p>
<pre class="javascript" name="code">
Number oldDiv := Number getSlot("/")
</pre>
<p>That may look reasonable to a Rubyist who would "alias" in a situation like this. We're basically copying the division operator's (the method in slot "/") logic into another slot named "oldDiv". Since we're going to rewrite "/"
we'll want to keep the functionality around for later use and "oldDiv" is a fine
place.</p>
<pre class="javascript" name="code">
Number / := method(d,
if(d == 0, 0, self oldDiv(d))
)
</pre>
<p>Now we've changed the "/" method of Number. If the denominator (the lone
parameter) is zero we will return zero. Otherwise we rely on the "oldDiv" to
perform normal division.</p>
<p><strong>DSLs/Structured Data Parsing</strong></p>
<p>As a Rubyist by trade I always chuckle when I hear .Net or Java programmers
claim that Domain Specific Languages are a waste of time. From their perspective
the time investment is far greater than any value that might be yielded. Nine
times out of ten they're probably right. They work with tools with a fixed idea
of what instructions should look like.</p>
<p>Io gives you a great deal of control over the language's parser itself. Rather
than writing a parser or implementing a DSL within the confines of the host language you can teach Io to parse and evaluate your DSL within its own interpreter! In Io the line between internal and external DSLs is somewhat blurred and I think that's just fantastically nifty.</p>
<p>We're going to write a little JSON parser here. Sure, there are probably better ways of parsing
JSON with Io but it provides an effective example that's easy to relate to.</p>
<p>Take a file named "test.json" with the following content as given:</p>
<pre class="javascript" name="code">
{
"name": "Chris Umbel",
"lucky_numbers": [6, 13],
"job" : {
"title": "Software Developer"
}
}
</pre>
<p>The following code will parse the JSON data, albeit liberally. This is meant to
be demonstrative, not robust.</p>
<pre class="javascript" name="code">
OperatorTable addAssignOperator(":", "atPutNumber")
curlyBrackets := method(
data := Map clone
call message arguments foreach(arg,
data doMessage(arg))
data
)
squareBrackets := method(
arr := list()
call message arguments foreach(arg,
arr push(call sender doMessage(arg)))
arr
)
Map atPutNumber := method(
self atPut(
# strip off leading and trailing quotes
call evalArgAt(0) asMutable removePrefix("\"") removeSuffix("\""),
call evalArgAt(1)
)
)
s := File with("test.json") openForReading contents
json := doString(s)
json at("name") println
json at("lucky_numbers") println
json at("job") at("title") println
</pre>
<p>which will output</p>
<pre class="output_blog">
Chris Umbel
list(6, 13)
Software Developer
</pre>
<p>Now to break it down.</p>
<pre class="javascript" name="code">
OperatorTable addAssignOperator(":", "atPutNumber")
</pre>
<p>Here we told Io to accept a brand spanking new assignment operator with the text
":". It will then pass the argument along to the target object via the
"atPutNumber" message.</p>
<pre class="javascript" name="code">
curlyBrackets := method(
data := Map clone
call message arguments foreach(arg,
data doMessage(arg))
data
)
</pre>
<p>That instructs Io what to do when it comes across a curly bracket when parsing
code. In our case it it creates a Map and begins to fill it. "call" performs
reflection on the argument data passed to the method. "call message arguments"
accesses the list of all arguments recieved.</p>
<pre class="javascript" name="code">
squareBrackets := method(
arr := list()
call message arguments foreach(arg,
arr push(call sender doMessage(arg)))
arr
)
</pre>
<p>Here we instructed Io how to deal with JSON arrays. Per the slot name it builds
a list of all elements enclosed in square brackets.</p>
<pre class="javascript" name="code">
Map atPutNumber := method(
self atPut(
# strip off leading and trailing quotes
call evalArgAt(0) asMutable removePrefix("\"") removeSuffix("\""),
call evalArgAt(1)
)
)
</pre>
<p>The Map prototype has been given a atPutNumber method that will strip quote off of the
element names and slap the value specified by the JSON into the corresponding
value in the Map. "evalArgAt" grabs the argument data at a specified index.</p>
<pre class="javascript" name="code">
s := File with("test.json") openForReading contents
json := doString(s)
</pre>
<p>Now there's the money. We loaded up the JSON document and slapped its contents in a
string. We then essentially eval it with "doString" letting the Io interpreter
do the dirty work.</p>
<p><strong>Message Forwarding</strong></p>
<p>In Ruby something called "method_missing" is relied upon to handle methods of
arbitrary names at runtime. The Io equivalent is "forward". The following example
supplies a simple mechanism to build XML like:</p>
<pre class="xml" name="code">
<movies>
<movie>
<title>
The Thing
</title>
<genre>
Horror
</genre>
</movie>
</movies>
</pre>
<p>with code like:</p>
<pre class="javascript" name="code">
builder := Builder clone
builder movies(
movie(
title("The Thing"),
genre("Horror")
)
)
</pre>
<p>Here's the code to make it happen.</p>
<pre class="javascript" name="code">
Builder := Object clone do(
depth := 0
)
Builder indent := method(
depth repeat(
write(" ")
)
)
Builder emit := method(
indent
call message arguments foreach(arg,
write(call sender doMessage(arg))
)
writeln
)
Builder emitStart := method(
emit("<", call evalArgs join, ">")
)
Builder emitEnd := method(text,
emitStart("/", text)
)
# handles messages for non-existant methods.
Builder forward := method(
emitStart(call message name)
depth = depth + 1
call message arguments foreach(arg,
content := self doMessage(arg)
if(content type == "Sequence",
emit(content)
)
)
depth = depth - 1
emitEnd(call message name)
)
builder := Builder clone
builder movies(
movie(
title("The Thing"),
genre("Horror")
)
)
</pre>
<p>Now that's a big example, but let me extract the section of most relevance.</p>
<pre class="javascript" name="code">
# handles messages for non-existant methods.
Builder forward := method(
emitStart(call message name)
depth = depth + 1
call message arguments foreach(arg,
content := self doMessage(arg)
if(content type == "Sequence",
emit(content)
)
)
depth = depth - 1
emitEnd(call message name)
)
</pre>
<p>By implementing a method in the "forward" slot we allow the Builder prototype to
accept messages of any name. Our "forward" method then obtains the name of the message sent with
"call message name" and handles it accordingly (making a node of that name).</p>
<p><strong>Next Steps</strong></p>
<p>One thing not covered here is the concurrency story, and Io really shines
there. Easy coroutines, actors and futures are available to provide refreshing
simplicity to what's typically a tricky problem.</p>
<p>There are also reasonable libraries available for common tasks like networking
and XML parsing.</p>
<p>Check out the <a href="http://www.iolanguage.com/scm/io/docs/IoGuide.html"><em>Io Guide</em></a> for
information on these and other topics. Of course <a href="http://pragprog.com/book/btlang/seven-languages-in-seven-weeks"><em>Seven Languages in Seven Weeks</em></a>
is wonderful as well.</p>
http://www.chrisumbel.com/article/io_language_prototypeTue, 06 Sep 2011 03:00:00 GMTThe node.js Natural Language Storyhttp://www.chrisumbel.com/article/node_js_natural_language_nlp
In early May of 2011 I started work on <em>natural</em>, a general Natural Language
Processing module for <a href="http://nodejs.org/">node.js</a>. I was loosely basing the idea off of the
ever-popular <a href="http://www.nltk.org/">Natural Language ToolKit (NLTK)</a> for python. I wanted to create a
one-stop shop for NLP but for the node.js platform.
<p><a href="http://nodejs.org/"><img src="http://c243025.r25.cf1.rackcdn.com/new_node_logo.jpg" border="0" align="right"/></a>I'm excited to see that I'm not the only one with an interest in NLP under
noedejs. Considering there's no way I can be totally comprehensive with
<em>natural</em> it's imperative that the community is hacking away, building a great NLP
story for node.</p>
<p>Here I'm going to outline the interesting node NLP projects that I've
found so far.</p>
<p><strong>Projects</strong></p>
<p><a href="https://github.com/NaturalNode/natural">natural</a> - In some shameless self-promotion I'll list myself first:) Like I
mentioned above, <em>natural</em> is a general natural language facility for node.js
written by yours truly. Stemming, classification, phonetics, n-grams, tf-idf, WordNet, and some
inflection are currently supported.</p>
<p><a href="https://github.com/fortnightlabs/pos-js">pos-js</a> - Here's an excellent part of speech tagger by Percy Wegmann and Gerad Suyderhoud. It's a port of
<a href="http://www.markwatson.com/opensource/">Mark Watson's FastTag Part of Speech Tagger</a> for Java
which in turn uses <a href=" http://en.wikipedia.org/wiki/Eric_Brill">Eric Brill</a>'s POS ruleset.</p>
<p><a href="https://github.com/harthur/glossary">glossary</a> - Here's an auto tagger written by
<a href="http://harthur.wordpress.com/">Heather Arthur</a> which can extract keywords from text.</p>
<p><a href="https://github.com/visionmedia/reds.git">reds</a> - a Redis Full-text search
implementation by the prolific <a href="http://tjholowaychuk.com/">TJ Holowaychuk</a>.</p>
<p><a href="https://github.com/hanssonlarsson/tfidf">tfidf</a> - an easy to use text frequency-inverse document frequency library
for Node.js by <a href="http://hanssonlarsson.se/">Linus G Thiel</a> of <a href="http://hanssonlarsson.se/">Hansson & Larsson</a>.</p>
<p><a href="https://github.com/visionmedia/lingo">Lingo</a> - a general linguistics module by
<a href="<a href="http://tjholowaychuk.com/">TJ Holowaychuk</a> which does inflection, translation, and some casing.</p>
<p><a href="https://github.com/spencermountain/nlp-node">nlp-node</a> - rule-based NLP tools for
node including date extraction and inflection by Spencer (not sure he wants his
last name given).</p>
<p>Know of any others? <a href="mailto:chris@chrisumbel.com">Contact Me!</a></p>
<p><strong>Help Me!</strong></p>
<p>And finally I'd like to ask for help with <em>natural</em>. I'd love to make it as comprehensive as possible and there are a mountain of algorithms to implement for English alone. Also, I'm interested in supporting algorithms for other languages as well. If you have the capacity and interest <a href="mailto:chris@chrisumbel.com">let me know</a>.</p>
http://www.chrisumbel.com/article/node_js_natural_language_nlpSat, 20 Aug 2011 04:00:00 GMTNatural Language Processing in node.js with "natural"http://www.chrisumbel.com/article/node_js_natural_language_porter_stemmer_lancaster_bayes_naive_metaphone_soundex
Over the last few years I've developed a bit of an interest in natural-language
processing. It's never been the focus of my work, but when you're exposed to as
many enterprise-class data storage/search systems as I have you have no choice
but to absorb some details. Several hobby projects, sometimes involving
home-brewed full-text searching, have also popped up requiring at least a
cursory understanding of stemming and phonetic algorithms. Another recurring
theme in my hobby projects has been classification for custom spam filtering and
analyzing twitter sentiment.
<a href="http://nodejs.org/"><img style="width:200px;" src="http://c243025.r25.cf1.rackcdn.com/node.png" border="0" alt="node.js logo" align="right"/></a>
<p>In general, accomplishing these goals simply required the use of someone else's
hard work, wether it be having <a href="http://lucene.apache.org/solr/">Solr</a>/
<a href="http://lucene.apache.org/java/docs/index.html">Lucene</a> to stem my corpora at the office,
using the <a href="http://www.ruby-lang.org/">Ruby</a>
<a href="http://classifier.rubyforge.org/">classifier gem</a> to analyze tweets
about stocks or using the <a href="http://www.python.org/">Python</a>
<a href="http://www.nltk.org/">Natural Language Toolkit</a> for... Well, pretty
much anything.</p>
<p>Recent months have brought a new platform into my hobby work,
<a href="http://nodejs.org/">node.js</a>, which, while stable, still has
maturing to do. Like so many things I work with anymore the need for
natural-language facilities arose and I found the pickings pretty slim. I have
to be honest. That's *exactly* what I was hoping for; an opportunity to sink my
teeth into the algorithms themselves.</p>
<p>Thus I began work on <a href="https://github.com/NaturalNode/natural">"natural"</a>, a module of natural languages algorithms for
node.js. The idea is loosely based on the Python NLTK in that all algorithms
are in the same package, however it will likely never be anywhere near as
complete. I'd be lucky for "natural" to ever do 1/2 of what NLTK does without
plenty of help. As of version 0.0.17 it has two stemmers (Porter and Lancaster),
one classifier (Naive Bayes), two phonetic algorithms (Metaphone and SoundEx)
and an inflector.</p>
<p>The strategy was to cast a wide enough net to see how the algorithms
might fit together in terms of interface and dependancies first. Making them
performant and perfectly accurate is step two, which admittedly will still
require some work. At the time of writing "natural" is in version 0.0.17 and
everything seems to work (not in an official beta of any kind) but until the
version ticks 0.1.0 it's subject to significant internal change. Hopefully the
interfaces will stay the same.</p>
<p>With the exception of the Naive Bayes classifier (to which you can supply tokens
of your own stemming) all of these algorithms have no real applicability outside
of English. This is a problem I'd like to rectify after solidifying a 0.1.0
release and would love to get some more people involved to accomplish it.</p>
<p><strong>Installing</strong></p>
<p>In order to use "natural" you have to install it... naturally. Like most node
modules "natural" is packaged up in an NPM and can be install from the command
line as such:</p>
<pre class="output_blog">npm install natural</pre>
<p>If you want to install from source (which can be found <a href="https://github.com/NaturalNode/natural">here on github</a>), pull it and install the npm from the source directory.</p>
<pre class="output_blog">git clone git://github.com/NaturalNode/natural.git
cd natural
npm install .</pre>
<p><strong>Stemming</strong></p>
<p>The first class of algorithms I'd like to outline is stemming. As stated
above the Lancaster and Porter algorithms are supported as of 0.0.17. Here's
a basic example of stemming a word with a Porter Stemmer.</p>
<pre class="javascript" name="code">var natural = require('natural'),
stemmer = natural.PorterStemmer;
var stem = stemmer.stem('stems');
console.log(stem);
stem = stemmer.stem('stemming');
console.log(stem);
stem = stemmer.stem('stemmed');
console.log(stem);
stem = stemmer.stem('stem');
console.log(stem);</pre>
<p>Above I simply required-up the main "natural" module and grabbed the
PorterStemmer sub-module from within. Calling the "stem" function takes an
arbitrary string and returns the stem. The above code returns the following
output:</p>
<pre class="output_blog">stem
stem
stem
stem</pre>
<p>For convenience stemmers can patch String with methods to simplify the process
by calling the <i>attach</i> method. String objects will then have a <i>stem</i> method.</p>
<pre class="javascript" name="code">stemmer.attach();
stem = 'stemming'.stem();
console.log(stem);</pre>
<p>Generally you'd be interested in stemming an entire corpus. The <i>attach</i> method
provides a <i>tokenizeAndStem</i> method to accomplish this. It breaks the owning
string up into an array of strings, one for each word, and stems them all. For
example:</p>
<pre class="javascript" name="code">var stems = 'stems returned'.tokenizeAndStem();
console.log(stems);</pre>
<p>produces the output:</p>
<pre class="output_blog">[ 'stem', 'return' ]</pre>
<p>Note that the <i>tokenizeAndStem</i> method will omit certain words by default that are
considered irrelevant (stop words) from the return array. To instruct the
stemmer to not omit stop words pass a <i>true</i> in to <i>tokenizeAndStem</i> for the <i>keepStops</i> parameter.
Consider:</p>
<pre class="javascript" name="code">console.log('i stemmed words.'.tokenizeAndStem());
console.log('i stemmed words.'.tokenizeAndStem(true));</pre>
outputting:
<pre class="output_blog">[ 'stem', 'word' ]
[ 'i', 'stem', 'word' ]</pre>
<p>All of the code above would also work with a Lancaster stemmer by requiring the
LancasterStemmer module instead, like:</p>
<pre class="javascript" name="code">var natural = require('natural'),
stemmer = natural.LancasterStemmer;</pre>
<p>Of course the actual stems produced could be different depending on the
algorithm chosen.</p>
<p><strong>Phonetics</strong></p>
<p>Phonetic algorithms are also provided to determine what words sound like and
compare them accordingly. The old (and I mean old... like 1918 old) SoundEx and
the more modern Metaphone algorithm are supported as of 0.0.17.</p>
<p>The following example compares the string "phonetics" and the intentional
misspelling "fonetix" and determines they sound alike according to the Metaphone
algorithm.</p>
<pre class="javascript" name="code">var natural = require('natural'),
phonetic = natural.Metaphone;
var wordA = 'phonetics';
var wordB = 'fonetix';
if(phonetic.compare(wordA, wordB))
console.log('they sound alike!');</pre>
<p>The raw code the phonetic algorithm produces can be retrieved with the <i>process</i>
method:</p>
<pre class="javascript" name="code">var phoneticCode = phonetic.process('phonetics');
console.log(phoneticCode);</pre>
<p>resulting in:</p>
<pre class="output_blog">FNTKS</pre>
<p>Like the stemming implementations the phonetic modules have an <i>attach</i> method
that patches String with shortcut methods, most notably <i>soundsLike</i> for
comparison:</p>
<pre class="javascript" name="code">phonetic.attach();
if(wordA.soundsLike(wordB))
console.log('they sound alike!');</pre>
<p><i>attach</i> also patches in a <i>phonetics</i> and <i>tokenizeAndPhoneticize</i> methods to
retrieve the phonetic code for a single word and an entire corpus respectively.</p>
<pre class="javascript" name="code">console.log('phonetics'.phonetics());
console.log('phonetics rock'.tokenizeAndPhoneticize());</pre>
<p>which outputs:</p>
<pre class="output_blog">FNTKS
[ 'FNTKS', 'RK' ]</pre>
<p>The above could could also use SoundEx by substituting the following in for the
require.</p>
<pre class="javascript" name="code">var natural = require('natural'),
phonetic = natural.SoundEx;</pre>
<p><strong>Inflector</strong></p>
<p>Basic inflectors are in place to convert nouns between plural and singular forms
and to turn integers into string counters (i.e. '1st', '2nd', '3rd', '4th
'etc.).</p>
<p>The following example converts the word "radius" into its plural form "radii".</p>
<pre class="javascript" name="code">var natural = require('natural'),
nounInflector = new natural.NounInflector();
var plural = nounInflector.pluralize('radius');
console.log(plural);</pre>
<p>Singularization follows the same pattern as is illustrated in the following
example wich converts the word "beers" to its singular form, "beer".</p>
<pre class="javascript" name="code">var singular = nounInflector.singularize('beers');
console.log(singular);</pre>
<p>Just like the stemming and phonetic modules an <i>attach</i> method is provided to
patch String with shortcut methods.</p>
<pre class="javascript" name="code">nounInflector.attach();
console.log('radius'.pluralizeNoun());
console.log('beers'.singularizeNoun()); </pre>
<p>A NounInflector instance can do custom conversion if you provide expressions
via the <i>addPlural</i> and <i>addSingular</i> methods. Because these conversion aren't
always symmetric (sometimes more patterns may be required to singularize
forms than pluralize) there needn't be a one-to-one relationship between
<i>addPlural</i> and <i>addSingular</i> calls.</p>
<pre class="javascript" name="code">nounInflector.addPlural(/(code|ware)/i, '$1z');
nounInflector.addSingular(/(code|ware)z/i, '$1');
console.log('code'.pluralizeNoun());
console.log('ware'.pluralizeNoun());
console.log('codez'.singularizeNoun());
console.log('warez'.singularizeNoun());</pre>
<p>which would result in:</p>
<pre class="output_blog">codez
warez
code
ware</pre>
<p>Here's an example of using the CountInflector module to produce string counter
for integers.</p>
<pre class="javascript" name="code">var natural = require('natural'),
countInflector = natural.CountInflector;
console.log(countInflector.nth(1));
console.log(countInflector.nth(2));
console.log(countInflector.nth(3));
console.log(countInflector.nth(4));
console.log(countInflector.nth(10));
console.log(countInflector.nth(11));
console.log(countInflector.nth(12));
console.log(countInflector.nth(13));
console.log(countInflector.nth(100));
console.log(countInflector.nth(101));
console.log(countInflector.nth(102));
console.log(countInflector.nth(103));
console.log(countInflector.nth(110));
console.log(countInflector.nth(111));
console.log(countInflector.nth(112));
console.log(countInflector.nth(113));</pre>
<p>producing:</p>
<pre class="output_blog">1st
2nd
3rd
4th
10th
11th
12th
13th
100th
101st
102nd
103rd
110th
111th
112th
113th</pre>
<p><strong>Classification</strong></p>
<p>At the moment classification is supported only by the Naive Bayes algorithm.
There are two basic steps involved in using the classifier: training and
classification.</p>
<p>The following example requires-up the classifier and trains it with data. The
<i>train</i> method accepts an array of objects containing the name of the
classification and the sample corpus.</p>
<pre class="javascript" name="code">var natural = require('natural'),
classifier = new natural.BayesClassifier();
classifier.addDocument("my unit-tests failed.", 'software');
classifier.addDocument("tried the program, but it was buggy.", 'software');
classifier.addDocument("the drive has a 2TB capacity.", 'hardware');
classifier.addDocument("i need a new power supply.", 'hardware');
classifier.train();
</pre>
<p>By default the classifier will tokenize the corpus and stem it with a
LancasterStemmer. You can use a PorterStemmer by passing it in to the
BayesClassifier constructor as such:</p>
<pre class="javascript" name="code">var natural = require('natural'),
stemmer = natural.PorterStemmer,
classifier = new natural.BayesClassifier(stemmer);</pre>
<p>With the classifier trained it can now classify documents via the <i>classify</i>
method:</p>
<pre class="javascript" name="code">console.log(classifier.classify('did the tests pass?'));
console.log(classifier.classify('did you buy a new drive?'));</pre>
<p>resulting in the output:</p>
<pre class="output_blog">software
hardware</pre>
<p>Similarly the classifier can be trained on arrays rather than strings, bypassing
tokenization and stemming. This allows the consumer to perform custom
tokenization and stemming if any at all. This is especially useful in a
non-natural language scenario.</p>
<pre class="javascript" name="code">
classifier.addDocument( ['unit', 'test'], 'software');
classifier.addDocument( ['bug', 'program'], 'software');
classifier.addDocument(['drive', 'capacity'], 'hardware');
classifier.addDocument(['power', 'supply'], 'hardware');
classifier.train();
</pre>
<p>It's possible to persist and recall the results of a training via the <i>save</i>
method:</p>
<pre class="javascript" name="code">var natural = require('natural'),
classifier = new natural.BayesClassifier();
classifier.addDocument( ['unit', 'test'], 'software');
classifier.addDocument( ['bug', 'program'], 'software');
classifier.addDocument(['drive', 'capacity'], 'hardware');
classifier.addDocument(['power', 'supply'], 'hardware');
classifier.train();
classifier.save('classifier.json', function(err, classifier) {
// the classifier is saved to the classifier.json file!
});
</pre>
<p>The training could then be recalled later with the load method:</p>
<pre class="javascript" name="code">var natural = require('natural'),
classifier = new natural.BayesClassifier();
natural.BayesClassifier.load('classifier.json', null, function(err, classifier) {
console.log(classifier.classify('did the tests pass?'));
});</pre>
<p><strong>Conclusion</strong></p>
<p>This concludes the current state of "natural". Like I said in the introduction,
there are certainly potential improvements in both terms of accuracy and
performance. Now that 0.0.17 has been released features are frozen while I
focus on improving both for 0.1.0.</p>
<p>Post-0.1.0 I intend to make "natural" more complete; slowly staring to match the
NLTK with additional algorithms of all classifications and hopefully for
additional languages. For that I humbly ask assistance:)</p>
http://www.chrisumbel.com/article/node_js_natural_language_porter_stemmer_lancaster_bayes_naive_metaphone_soundexSun, 22 May 2011 22:29:06 GMTMirah, Ruby Syntax on the JVMhttp://www.chrisumbel.com/article/mirah_ruby_syntax_jvm_jruby
<a href="http://www.mirah.org/"><img src="http://c243025.r25.cf1.rackcdn.com/mirah.png" border="0" align="right"/></a>
Although languages like Java and C# have soured with me over the last few years
I still believe their runtimes (the JVM and CLR respectively) are sound. A sizable portion of the code I write
for my day job is in <a href="http://www.jruby.org/">JRuby</a>. We get a number of advantages from that. We can
use the industrial-strength infrastructure components Java brings to the table
and leverage mountains of existing, production-quality java libraries. We can
also benefit from the expressiveness and dynamism of <a href="http://www.ruby-lang.org/">Ruby</a>.
carbungles
<p>Still, JRuby isn't perfect. While its performance is fine for the average
application or script you'd have a hell of a time writing something like, say,
Lucene in JRuby as it's impractical from a performance point of view.</p>
<p>I suppose the lesson there is that Java is the systems language of the JVM
and JRuby/Jython/Groovy are fine for abstract implementations.</p>
<p>There is a specific alternative to the Java language, though, that's
piqued my interest lately. It was not only born from JRuby but employs JRuby in
its compilation: <a href="http://www.mirah.org/">Mirah</a>.</p>
<p>Mirah is developed primarily by <a href="http://blog.headius.com/">Charles Nutter</a>,
a principal developer of JRuby, and is an attempt applying Ruby syntax at the raw JVM.
It's statically-typed, equally performant to Java and doesn't require a runtime
library a la JRuby.</p>
<p>Basically the idea is that you could use it as a drop-in replacement for the
Java language itself.</p>
<p>Here I'll walk you through some examples I wrote while acquainting myself with Mirah. </p>
<p><strong>Example 1: Hello, World!</strong></p>
<p>Ah, the canonical "Hello, World!" example. This is very Ruby-like. No class, no
static main method, just good old "puts". The compiled bytecode will include a
class and static main method, but Mirah takes care of that for us.</p>
<pre class="ruby" name="code">puts "hello, World!"</pre>
<p><strong>Example 2: Java Objects</strong></p>
<p>Now this will look a little more Java-like and include familiar classes to a
Java developer. The actual iteration looks far more Ruby-like, however.</p>
<pre class="ruby" name="code">import java.util.ArrayList
list = ArrayList.new([3, 9, 5])
list.each do |x|
puts x
end</pre>
<p><strong>Example 3: Basic Class</strong></p>
<p>Of course this is the JVM we're talking about so classes are a core component as
is demonstrated here. This is also the first example that shows the
strong-typing in action. The "name" parameter of the constructor is followed by
":String" indicating that it's of type java.lang.String.</p>
<pre class="ruby" name="code">class Person
def name
@name
end
def initialize(name:String)
@name = name
end
end
person = Person.new('Gerald McGoo')
puts person.name</pre>
<p><strong>Example 4: Inheritance</strong></p>
<p>Inheritance follows the typical Ruby syntax.</p>
<pre class="ruby" name="code">class Person
def name
@name
end
def initialize(name:String)
@name = name
end
end
class Programmer < Person
def favorite_language
@favorite_language
end
def initialize(name:String, favorite_language:String)
super(name)
@favorite_language = favorite_language
end
end
programmer = Programmer.new('Gerald McGoo', 'Mirah')
puts "#{programmer.name} loves #{programmer.favorite_language}"</pre>
<p><strong>Example 5: Swing and Blocks</strong></p>
<p>This example, while demonstrating swing, illustrates a much more important
point. See the ruby-block-ish implementation of the action listener closure? Now
that's clean! Also note that clicked_count is not final.</p>
<p>This is also a good example to demonstrate the type inference of Mirah. Sure,
Mirah is strongly-typed, but I'm not explicitly declaring the type of "frame".
Mirah sees that I'm assigning it to a new instance of JFrame and goes with that.</p>
<pre class="ruby" name="code">import javax.swing.JFrame
import javax.swing.JButton
import javax.swing.JOptionPane
frame = JFrame.new 'Click Counter'
frame.setSize 200, 100
frame.setVisible true
button = JButton.new 'Click me'
frame.add button
clicked_count = 0
button.addActionListener do |e|
clicked_count += 1
JOptionPane.showMessageDialog nil, String.valueOf(clicked_count)
end
frame.show</pre>
<p><strong>Performance Comparison</strong></p>
<p>As I stated above one of the primary intentions of Mirah was to maintain an
identical performance profile to Java. I admit my attempt to benchmark it here
isn't overly scientific and is rather rough but certainly illustrates the point
that bytecode resultant from Mirah performs similarly to strait Java.</p>
<p>Also here you'll see some of the more strongly-typed characteristics of Mirah
such as casting [(type)variableName in Java becomes type(variable) in Mirah].</p>
<p>These examples essentially perform internet checksums quite crudely on 8 byte
chunks of a 10MB data file storing the results in an ArrayList. A little
computation, a little IO (probably too much) and a little usage of typical Java
collections.</p>
<p>Java</p>
<pre class="java" name="code">import java.io.FileInputStream;
import java.util.ArrayList;
public class Main {
public static int inetChecksum(byte buff[]) {
long sum = 0;
int datum = 0;
for(int i = 0; i < buff.length; i += 2) {
datum = (0xffff & buff[i] << 8) | (0xff & buff[i + 1]);
sum += datum;
}
while((sum >> 16) > 0)
sum = (sum >> 16) + (sum & 0xffff);
return ~(int)sum & 0xffff;
}
public static void main(String[] args) {
byte[] data = new byte[8];
ArrayList sums = new ArrayList();
try {
long start = System.currentTimeMillis();
FileInputStream fis = new FileInputStream("test.dat");
while(fis.read(data) > 0) {
sums.add(new Integer(inetChecksum(data)));
}
fis.close();
System.out.println(System.currentTimeMillis() - start);
} catch (Exception ex) {
}
}
}</pre>
<p>Mirah</p>
<pre class="ruby" name="code">import java.io.FileInputStream
import java.util.ArrayList
def inet_checksum(buff:byte[]):int
sum = long(0)
datum = 0
i = 0
while i < buff.length
datum = (0xffff & buff[i] << 8) | (0xff & buff[i += 1])
sum += datum
i += 1
end
sum = (int(sum) >> 16) + (sum & 0xffff) while (int(sum) >> 16) > 0
~int(sum) & 0xffff
end
data = byte[8]
begin
start = System.currentTimeMillis
sums = ArrayList.new
fis = FileInputStream.new 'test.dat'
sums.add Integer.new(inet_checksum(data)) while fis.read(data) > 0
fis.close
puts System.currentTimeMillis - start
rescue
end</pre>
<p>I ran three trials resulting in:</p>
<p>Java</p>
<pre class="output_blog">
2127 ms
2088 ms
2119 ms
AVG: 2111 ms
</pre>
<p>Mirah</p>
<pre class="output_blog">
2231 ms
2031 ms
2043 ms
AVG: 2101 ms
</pre>
<p>Let's just call that about even.</p>
<p><strong>Doesn't Haves</strong></p>
<p>Thus far I've yammered on about what Mirah does. Now here's some notes on what
Mirah doesn't do.</p>
<ul>
<li><strong>Generics</strong> - At this point Mirah doesn't do generics like Java (as of version 5
I believe). I spoke with Charles Nutter about this and he believes it'll
likely be included in the future, however.</li>
<li><strong>Ranges</strong> and other Ruby goodness - Mirah isn't Ruby so a few facilities a Ruby developer might be used
to are missing such as ranges. In other words the following Ruby code isn't
valid Mirah: (0..10).step(2) {|i| puts i}</li>
</ul>
<p><strong>Final Thoughts</strong></p>
<p>Like I mentioned in the introduction I see great value in the modern runtimes
but am not particularly thrilled with the typical languages used to program
them. I'll admit that a lot of that has to do with just my personal taste but I
truly believe languages like Mirah offer some additional conciseness and
features that enhance productivity.</p>
<p>I, for one, and excited about it.</p>
http://www.chrisumbel.com/article/mirah_ruby_syntax_jvm_jrubyTue, 29 Mar 2011 02:00:52 GMTGPU-Based Computing in C with NVIDIA's CUDAhttp://www.chrisumbel.com/article/cuda_nvidia_c_hpc_parallel
<a href="http://www.nvidia.com/object/cuda_home_new.html"><img src="http://c243025.r25.cf1.rackcdn.com/nvidia_cuda.jpg" alt="NVIDIA CUDA Logo" align="right" border="0"/></a>
At work we do plenty of video and image manipulation, particularly video encoding.
While it's certainly not a specialty of mine I've had plenty of exposure to
enterprise-level transcoding for projects like <a href="http://transcode.it/">transcode.it</a>, our free public
transcoding system; <a href="http://www.uencode.com">uEncode</a>, a RESTful encoding service; and our own in-house solutions
(we're <a href="http://www.dvdempire.com">DVD Empire</a>, BTW).
<p>Of course my exposure is rather high-level and large portions of the process
still elude me but I've certainly developed an appreciation for boxes with multiple
GPUs chugging away performing complex computation.</p>
<p>For a while now I've been hoping to dedicate some time to peer into the inner
workings of GPUs and explore the possibility of using them for general-purpose,
highly-parallel computing.</p>
<p>Well, I finally got around to it.</p>
<p>Most of the machines I use on a regular basis have some kind of <a href="http://www.nvidia.com/content/global/global.php">NVIDIA</a> card so
I decided to see what resources they
had to offer for general-purpose work. Turns out they set you up quite well!</p>
<p>They offer an architecture called <a href="http://developer.nvidia.com/object/gpucomputing.html">CUDA</a>
which does a great job of rather directly exposing the compute resources of
GPUs to developers. It supports Windows, Linux and Macs equally well as far as
I can tell and while it has bindings for many higher-levels languages
it's primarily accessible via a set of C extensions.</p>
<p>Like I said, I'm relatively new to this so I'm in no position to profess, but
figured I might as well share one of the first experiments I did while
familiarizing myself with CUDA.</p>
<p>Also, I'd like some feedback as I'm just getting my feet wet as well.</p>
<p><strong>Getting Started & Documentation</strong></p>
<p>Before going any further check out the
"Getting Started Guide"
for your platform on <a href="http://developer.nvidia.com/object/cuda_3_2_downloads.html" target="_new">the CUDA download page</a>. It will indicate
what you specifically have to download and how to install it. I've only
done so on Macs but the process was simple, I assure you.</p>
<p><strong>Example</strong></p>
<p>Ok, here's a little example C program that performs two parallel executions (the advantage
of using GPUs is parallelism after all) of the "Internet Checksum"
algorithm on some hard-coded sample data.</p>
<p>First I'll blast you with the full source then I'll walk through it piece by
piece.</p>
<pre class="Cpp" name="code">
#include <stdio.h>
#include <cuda_runtime.h>
/* "kernel" to compute an "internet checksum" */
__global__ void inet_checksum(unsigned char *buff, size_t pitch,
int len, unsigned short *checksums) {
int i;
long sum = 0;
/* advance to where this threads data starts. the pitch
ensured optimal alignment. */
buff += threadIdx.x * pitch;
unsigned short datum;
for(i = 0; i < len / 2; i++) {
datum = *buff++ << 8;
datum |= *buff++;
sum += datum;
}
while (sum >> 16)
sum = (sum & 0xffff) + (sum >> 16);
sum = ~sum;
/* write data back for host */
checksums[threadIdx.x] = (unsigned short)sum;
}
int main (int argc, char **argv) {
int device_count;
int size = 8;
int count = 2;
unsigned short checksums[count];
int i;
unsigned char data[16] = {
/* first chunk */
0xe3, 0x4f, 0x23, 0x96, 0x44, 0x27, 0x99, 0xf3,
/* second chunk */
0xe4, 0x50, 0x24, 0x97, 0x45, 0x28, 0x9A, 0xf4};
/* ask cuda how many devices it can find */
cudaGetDeviceCount(&device_count);
if(device_count < 1) {
/* if it couldn't find any fail out */
fprintf(stderr, "Unable to find CUDA device\n");
} else {
/* for the sake of this example just use the first one */
cudaSetDevice(0);
unsigned short *gpu_checksum;
/* create a place for the results be stored in the GPU's
memory space. */
cudaMalloc((void **)&gpu_checksum, count * sizeof(short));
unsigned char *gpu_buff;
size_t gpu_buff_pitch;
/* create a 2d pointer in the GPUs memory space */
cudaMallocPitch((void**)&gpu_buff, &gpu_buff_pitch,
size * sizeof(unsigned char), count);
/* copy our hard-coded data from above into the the GPU's
memory spacing correctly alligned for 2d access. */
cudaMemcpy2D(gpu_buff, gpu_buff_pitch, &data,
sizeof(unsigned char) * size,
size, count, cudaMemcpyHostToDevice);
/* execute the checksum operation. two threads
of execution will be executed due to the count param. */
inet_checksum<<<1, count>>>(gpu_buff, gpu_buff_pitch, size,
gpu_checksum);
/* copy the results from the GPU's memory to the host's */
cudaMemcpy(&checksums, gpu_checksum,
count * sizeof(short), cudaMemcpyDeviceToHost);
/* clean up the GPU's memory space */
cudaFree(gpu_buff);
cudaFree(gpu_checksum);
for(i = 0; i < count; i++)
printf("Checksum #%d 0x%x\n", i + 1, checksums[i]);
}
return 0;
}
</pre>
<p><strong>Dissection</strong></p>
<p>Phew, alright. There wasn't really all that much to it, but I'm sure many
of you will appreciate some explanation.</p>
<p>I'm sure you know the first directive. The second is obviously
the inclusion of CUDA.</p>
<pre class="Cpp" name="code">
#include <stdio.h>
#include <cuda_runtime.h>
</pre>
<p>The following is what's referred to as a "kernel" in CUDA. It's basically a
function that can execute on a GPU. Note the function is __global__ and
has no return type. The details of the function really aren't the subject of the
article. In this case it calculates the "internet checksum" of the incoming
buff but here's where you'd put your highly-parallelizable, computationally
intensive code.</p>
<p>The pitch will make more sense later as it helps to deal with memory alignment of multi-dimensional data which is what the buff turns out to be despite being a one dimensional vector. One dimension per thread, each with eight bytes.</p>
<p>Also have a look at the threadIdx.x. That's how you can determine which thread you are and can use it to read/write from the correct indexes in vectors, etc.</p>
<pre class="Cpp" name="code">
/* "kernel" to compute an "internet checksum" */
__global__ void inet_checksum(unsigned char *buff, size_t pitch,
int len, unsigned short *checksums) {
int i;
long sum = 0;
/* advance to where this threads data starts. the pitch
ensured optimal alignment. */
buff += threadIdx.x * pitch;
unsigned short datum;
for(i = 0; i < len / 2; i++) {
datum = *buff++ << 8;
datum |= *buff++;
sum += datum;
}
while (sum >> 16)
sum = (sum & 0xffff) + (sum >> 16);
sum = ~sum;
/* write data back for host */
checksums[threadIdx.x] = (unsigned short)sum;
}
</pre>
<p>Getting the party started. Note that this indicates that we have two elements
of eight bytes a piece with the size and count variables. They'll carve up
our hard-coded data.</p>
<pre class="Cpp" name="code">
int main (int argc, char **argv) {
int device_count;
int size = 8;
int count = 2;
unsigned short checksums[count];
int i;
</pre>
<p>Now here's the data we're going to checksum. We'll actually be treating this as
two distinct values later. The first eight bytes will be checksummed while
the second eight bytes are checksummed on another GPU thread.</p>
<pre class="Cpp" name="code">
unsigned char data[16] = {
/* first chunk */
0xe3, 0x4f, 0x23, 0x96, 0x44, 0x27, 0x99, 0xf3,
/* second chunk */
0xe4, 0x50, 0x24, 0x97, 0x45, 0x28, 0x9A, 0xf4};
</pre>
<p>The comment says it all. We're just asking CUDA how many devices it can find.
We could then use that information later to distribute load to GPUs.</p>
<pre class="Cpp" name="code">
/* ask cuda how many devices it can find */
cudaGetDeviceCount(&device_count);
</pre>
<p>For the most part we'll ignore it however. We will make sure at least one was
found as there's not point to all this if we can't slap our load on a GPU!
Assuming a GPU was found we'll call cudaSetDevice to direct CUDA to run our
GPU routines there.</p>
<pre class="Cpp" name="code">
if(device_count < 1) {
/* if it couldn't find any fail out */
fprintf(stderr, "Unable to find CUDA device\n");
} else {
/* for the sake of this example just use the first one */
cudaSetDevice(0);
</pre>
<p>Now I'll create a vector for the checksum's to be written in to by our
"kernel". Think of the cudaMalloc as a typical malloc call except the memory
is reserved in the GPU's space. We wont' directly access that memory. Instead
we'll copy in and out of it. The use of count indicats that it'll have room
for two unsigned short values.</p>
<pre class="Cpp" name="code">
unsigned short *gpu_checksum;
/* create a place for the results be stored in the GPU's
memory space. */
cudaMalloc((void **)&gpu_checksum, count * sizeof(short));
</pre>
<p>Here's some more allocation but in this case it's using a pitch. This is for
the memory we'll write our workload into. We're using cudaMallocPitch because
this data is essentially two dimensional and the pitch facilitates optimal
alignment in memory. It's basically allocating two rows of eight byte columns.</p>
<pre class="Cpp" name="code">
unsigned char *gpu_buff;
size_t gpu_buff_pitch;
/* create a 2d pointer in the GPUs memory space */
cudaMallocPitch((void**)&gpu_buff, &gpu_buff_pitch,
size * sizeof(unsigned char), count);
</pre>
<p>Now cudaMemcpy2D will shove the workload into the two-dimensial buffer we
allocated above. Think memcpy for the GPU. Care is take to specify the
dimensions of the data with the pitch, size and count.
The cudaMemcpyHostToDevice parameter directs the data to the GPUs memory space
rather than from it.</p>
<pre class="Cpp" name="code">
/* copy our hard-coded data from above into the the GPU's
memory spacing correctly alligned for 2d access. */
cudaMemcpy2D(gpu_buff, gpu_buff_pitch, &data,
sizeof(unsigned char) * size,
size, count, cudaMemcpyHostToDevice);
</pre>
<p>Here's the money. See the <<<..., ...>>> business? The first argument is
"blocks per grid" but I'll leave NVIDIA to explain that one to you in <a href="http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/NVIDIA_CUDA_Programming_Guide_2.3.pdf">the CUDA C Programming Guide</a>. The second
argument indicates how many threads will be spawned. Like I said, this is all
about parallelism. Consider our inet_checksum "kernel" hereby invoked twice
in parallel!</p>
<pre class="Cpp" name="code">
/* execute the checksum operation. two threads
of execution will be executed due to the count param. */
inet_checksum<<<1, count>>>(gpu_buff, gpu_buff_pitch, size,
gpu_checksum);
</pre>
<p>Now the "kernel" executions are done. We've successfully executed our logic on
a GPU! The results are still sitting in the GPU's memory space, however. We'll
simply copy it out with cudaMemcpy while specifying cudaMemcpyDeviceToHost
for the direction. The results are then in the checksums vector.</p>
<pre class="Cpp" name="code">
/* copy the results from the GPU's memory to the host's */
cudaMemcpy(&checksums, gpu_checksum,
count * sizeof(short), cudaMemcpyDeviceToHost);
</pre>
<p>CUDA has its own allocating, and copying and of course its own clean-up. We'll
be good citizens and use it here.</p>
<pre class="Cpp" name="code">
/* clean up the GPU's memory space */
cudaFree(gpu_buff);
cudaFree(gpu_checksum);
</pre>
<p>Might as well let the user know the results, no?</p>
<pre class="Cpp" name="code">
for(i = 0; i < count; i++)
printf("Checksum #%d 0x%x\n", i + 1, checksums[i]);
}
return 0;
}
</pre>
<p><strong>Compiling and Execution</strong></p>
<p>Assuming you've installed the CUDA SDK according to the <a href="http://developer.nvidia.com/object/cuda_3_2_downloads.html">documentation</a>
you can compile with:</p>
<pre class="output_blog">
> nvcc -o yourprogram yoursourcefile.cu
</pre>
<p>and execution produces:</p>
<pre class="output_blog">
> ./yourprogram
Checksum #1 0x1aff
Checksum #2 0x16fb
</pre>
<p>.cu being the preferred extension to be used with the CUDA pre-processor.</p>
<p><strong>Conclusion</strong></p>
<p>There you have it. Execution of your own logic on a GPU.</p>
<p>Where to go from here? Well, this barely scratched the surface but
<a href="http://www.nvidia.com/object/cuda_home_new.html">NVIDIA's CUDA Zone</a>
site is the starting point to much more.</p>
<p><a href="http://gpgpu.org/">GPGPU.org</a> is also a more platform independent source of
general-purpose GPU computing.</p>
http://www.chrisumbel.com/article/cuda_nvidia_c_hpc_parallelSun, 09 Jan 2011 03:01:45 GMTSolr Data Access in Ruby with SolrMapperhttp://www.chrisumbel.com/article/ruby_solrmapper_solr_mapper
<a href="http://www.theskunkworx.com/"><img src="http://c243025.r25.cf1.rackcdn.com/skunkworx_logo.png" border="0" alt="Skunkworx Logo" align="right"/></a>Recently my employer (The <a href="http://www.theskunkworx.com/">Skunkworx</a>) released our first open source project, <a href="http://github.com/skunkworx/solr_mapper/ ">SolrMapper</a>. Like the readme says, it's a Ruby Object Document Mapper for the <a href="http://www.apache.org">Apache Foundation</a>'s <a href="http://lucene.apache.org/solr/">Solr</a> search platform. It's loosely patterned after ActiveRecord and <a href="http://github.com/jnunemaker/mongomapper">MongoMapper</a> so it should feel somewhat familiar.
<p>What differentiates SolrMapper from many other Solr libraries is that it's not necessarily dependent on another form of persistence a la <a href="http://github.com/railsfreaks/acts_as_solr">acts_as_solr</a>. It could certainly allow you to use Solr as a stand-alone, general purpose data store if that floats your boat.</p>
<p><strong>Installing</strong></p>
<pre class="output_blog">
gem install solr_mapper
</pre>
<p><strong>Examples</strong></p>
<p>I might as well get started with a simple example of a model. This example assumes a solr index located at http://localhost:8080/solr/article with an abbreviated schema of:</p>
<pre class="xml" name="code">
<fields>
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="title" type="text" indexed="true" stored="true"/>
<field name="content" type="text" indexed="true" stored="false" multiValued="true"/>
</fields>
</pre>
<p>and the model</p>
<pre class="ruby" name="code">
require 'solr_mapper'
class Article
include SolrMapper::SolrDocument
bind_service_url 'http://localhost:8080/solr/article'
end
</pre>
<p>Not much to it. bind_service_url simply indicates the URL of the Solr instance housing our data. There's no real schema definition here. SolrDocument is just mixed-in and the model is pointed at an index.</p>
<p><strong>Creating</strong></p>
<p>I could then create an article as such:</p>
<pre class="ruby" name="code">
id = UUID.new().generate()
article = Article.new(
:_id => id,
:title => 'This is a sample article',
:content => 'Here is some sample content for our sample article.'
)
article.save()
</pre>
<p>Note the field identified by the symbol :_id. This field is somewhat special. SolrMapper prepends the underscore for this field only as it maps to "id" in solr. The remainder of the fields map to their respective fields in Solr strait-away.</p>
<p><strong>Querying</strong></p>
<p>I could then retrieve the article we just saved like:</p>
<pre class="ruby" name="code">
article = Article.find(id)
#print out the article's title
puts article.title
</pre>
<p>producing the output</p>
<pre class="output_blog">
This is a sample article
</pre>
<p>Of course the whole point of Solr is rich, full-text searches. To demonstrate I'll need some additional sample data.</p>
<pre class="ruby" name="code">
Article.new(
:_id => UUID.new().generate(),
:title => 'Yet another article',
:content => 'Have another one.'
).save
Article.new(
:_id => UUID.new().generate(),
:title => 'Number three',
:content => 'The third in a string of three.'
).save
</pre>
<p>The simplest way of accomplish this in SolrMapper is via a model's query method which can accept a typical Solr query string. </p>
<pre class="ruby" name="code">
articles = Article.query('title:article')
articles.each do |article|
puts article.title
end
</pre>
<p>producing</p>
<pre class="output_blog">
This is a sample article
Yet another article
</pre>
<p>It's also possible to pass in the query in a more structured fashion via a hash.</p>
<pre class="ruby" name="code">
articles = Article.query({:title => 'article'})
articles.each do |article|
puts article.title
end
</pre>
<p>producing the same results as the search example above.</p>
<p>Additional Solr parameters can be passed in as well such as this title sort.</p>
<pre class="ruby" name="code">
articles = Article.query({:title => 'article'}, {:sort => 'title desc'})
</pre>
<p><strong>Updating</strong></p>
<p>Updating a portion of a live record can be accomplished with the familiar update_attributes method:</p>
<pre class="ruby" name="code">
article.update_attributes(:title => 'An updated title')
</pre>
<p><strong>Pagination</strong></p>
<p>Generally speaking pretty much any traditional web app that lists or searches entities needs paging. SolrMapper
integrates will_paginate to accomplish this.</p>
<pre class="ruby" name="code">
articles_page = Article.paginate({:title => 'article'})
</pre>
<p>On to page 2 with:</p>
<pre class="ruby" name="code">
articles_page = Article.paginate({:title => 'article'}, {:page => 2})
</pre>
<p>The page size can be modified with:</p>
<pre class="ruby" name="code">
articles_page = Article.paginate({:title => 'article'}, {:rows => 5, :page => 2})
</pre>
<p><strong>Relationships</strong></p>
<p>Solr itself isn't relational and separate indexes are, well, separate. SolrMapper allows you to support
rudimentary ActiveRecord-esque relationships. Remember though, there is no low-level joining going on. This all happens at the ruby level so you must be judicious.</p>
<pre class="ruby" name="code">
class Biography
include SolrMapper::SolrDocument
has_many :articles
bind_service_url 'http://localhost:8080/solr/bio'
end
class Article
include SolrMapper::SolrDocument
belongs_to :biography
bind_service_url 'http://localhost:8080/solr/article'
end
Biography.find('SOME UUID').articles.each do |article|
puts article.title
end
</pre>
<p><strong>Auto-Generated IDs</strong></p>
<p>As of 0.1.9 solr_mapper can auto-generate the id field with a UUID when you define your model as follows.</p>
<pre class="ruby" name="code">
class Item
include SolrDocument
# tells solr_mapper to generate the id
auto_generate_id
bind_service_url 'http://localhost:8080/solr/item'
end
</pre>
<p><strong>Conclusion</strong></p>
<p>SolrMapper is intended to be simple and familiar and I hope we've accomplished that. We're looking for help either in patches or feedback, especially because we're just getting started here. If you have either please contact us <a href="http://github.com/skunkworx">at github</a>.</p>
http://www.chrisumbel.com/article/ruby_solrmapper_solr_mapperTue, 26 Oct 2010 02:08:14 GMTUsing MongoDB as a Backend for Django with django-mongodb-enginehttp://www.chrisumbel.com/article/django_python_mongodb_engine_mongo
<a href="http://mongodb.org"><img src="http://c243025.r25.cf1.rackcdn.com/mongodb.png" align="right" alt="MongoDB logo" border="0"/></a>I've been pretty taken with <a href="http://www.mongodb.org/">MongoDB</a> of late.
It's nearly disgusting how productive it is. However, like all database
systems it's only as productive as the higher-level systems that interface with it.
<p>Personally I've used it primarily from Java and Ruby on Rails (via
<a href="http://github.com/jnunemaker/mongomapper">MongoMapper</a>) and from Python via
<a href=""http://pypi.python.org/pypi/pymongo/1.9>PyMongo</a>.</p>
<p>PyMongo essentially exposes MongoDB via Python dictionaries. Sure, it's
plenty elegant and plenty pythonic but when it came to <a href="http://www.djangoproject.com/">django</a>
I wanted was something more MongoMapper-like, an honest-to-goodness Object-Document-Mapper.</p>
<p><a href="http://www.djangoproject.com/"><img src="http://c243025.r25.cf1.rackcdn.com/django.jpg" align="left" alt="django logo" border="0"/></a>Months and months ago when I looked into the existence of a MongoDB driver for django
all that turned up was dead, false-start projects, but after revisiting it
recently <a href="http://github.com/aparo/django-mongodb-engine">django-mongodb-engine</a>
came to my attention. django-mongodb-engine is
pretty much exactly what I was looking for. The authors describe it as,
"a database backend that adds mongodb support to django". In this post I intend to introduce it to you.</p>
<p>I'm going to assume you're comfortable getting a django application started.
If that's not the case please check out
<a href="http://docs.djangoproject.com/en/1.2/intro/overview/">the official getting-started docs</a>.</p>
<p><strong>Requirements</strong></p>
<p>In order to leverage MongoBD from django you'll need the following software
installed and operating:</p>
<ul>
<li>
<a href="http://www.python.org/">Python</a> - If you're reading this article odds are you already have it.
</li>
<li>
<a href="http://www.mongodb.org/">MongoDB</a> - I guess this is somewhat self
explanatory, but you'll need MongoDB itself.
</li>
<li>
<a href="http://www.allbuttonspressed.com/projects/django-nonrel">django-norel</a> -
is a special version of django designed for use with non-relational database engines in general.
</li>
<li>
<a href="http://github.com/aparo/djangotoolbox">django tooblox</a> - a general purpose utility library upon which
django-mongodb-engine depends
</li>
<li>
<a href="https://github.com/django-nonrel/mongodb-engine">mongodb-engine</a> - the MongoDB driver for django.
</li>
</ul>
<p><strong>Application</strong></p>
<p>Infrastructure in place I'll go ahead and create a django project named
"testproj" with an application named "testapp".</p>
<pre class="output_blog">
django-admin.py startproject testproj
cd testproj/
django-admin.py startapp testapp
</pre>
<p><strong>Setup</strong></p>
<p>Naturally the django project must be configured to talk to a specific database
in settings.py.</p>
<pre class="python" name="code">
DATABASES = {
'default': {
'ENGINE': 'django_mongodb_engine',
'NAME': 'mydatabase',
'USER': '',
'PASSWORD': '',
'HOST': 'localhost',
'PORT': '27017',
'SUPPORTS_TRANSACTIONS': False,
},
}
</pre>
<p>
<em>Edit 2012-11-20: Older versions may require ENGINE to be 'django_mongodb_engine.mongodb'.</em>
</p>
<p><strong>Models</strong></p>
<p>In said application you could create a model in models.py like</p>
<pre class="python" name="code">
from django.db import models
class Article(models.Model):
title = models.CharField(max_length = 64)
content = models.TextField()
</pre>
<p>It looks like a standard old django model, right? Nothing fancy here, just a
plain old model with plain old fields.</p>
<p>Note that it's important to not create and AutoField named "id" or things will
blow up when saving. That's because Mongo wants to put a proper ObjectId in
there.</p>
<p><strong>Saving</strong></p>
<p>We can then save some model data strait away from a django view.</p>
<pre class="python" name="code">
from django.http import HttpResponse
from models import *
def testview(request):
article = Article(title = 'test title',
content = 'test content')
article.save()
return HttpResponse("<h1>Saved!</h1>")
</pre>
<p>If you then peer into Mongo with a native javascript query query like</p>
<pre class="javascript" name="code">
db.testapp_article.find()
</pre>
<p>you'll find your document returned</p>
<pre class="output_blog">
{ "_id" : ObjectId("4cb4f9a01a8ff904fa000001"), "content" : "test content", "title" : "test title" }
</pre>
<p><strong>Querying</strong></p>
<p>of course it's a simple matter to query Mongo from django to retrieve a list
of Article objects just like you would with a relational store.</p>
<pre class="python" name="code">
articles = Article.objects.all()
</pre>
<p><strong>Embedding Documents</strong></p>
<p>Many document-database-esque features are covered as well, but I'll just touch
on one here. With a minor alteration to our model</p>
<pre class="python" name="code">
from django.db import models
from django_mongodb_engine.mongodb.fields import EmbeddedModel
from django_mongodb_engine.fields import ListField
class Comment(EmbeddedModel):
name = models.CharField(max_length = 160)
content = models.TextField()
class Article(models.Model):
title = models.CharField(max_length = 160)
content = models.TextField()
comments = ListField()
</pre>
<p>then we can embed Comment documents into Articles.</p>
<pre class="python" name="code">
article = Article(title = 'test title',
content = 'test content')
article.comments.append(Comment(name = 'alice', description = 'foo bar'))
article.comments.append(Comment(name = 'bob', description = 'fun baz'))
article.save()
</pre>
<p><strong>Conclusion</strong></p>
<p>Thanks to the hard work of others it's a simple matter for us to use MongoDB as
a backend for django. To see some specific examples of non-relational-style features check out the
<a href="http://github.com/aparo/django-mongodb-engine/tree/master/tests/testproj/">tests from the django-mongodb-engine project</a>.</p>
http://www.chrisumbel.com/article/django_python_mongodb_engine_mongoWed, 13 Oct 2010 01:14:48 GMTRuby in my Enterprise with JRuby Thanks to JRubyConfhttp://www.chrisumbel.com/article/jruby_ruby_enterprise
<a href="http://jruby.org/"><img src="http://c243025.r25.cf1.rackcdn.com/jruby.jpg" align="right" border="0" alt="JRuby logo"/></a>
Last weekend I went to <a href="http://jrubyconf.com/">JRubyConf</a> and had a blast. I left armed with some new
knowledge, some new contacts and a new academic appreciation for /whiske?y/. One of the
best parts is that I really only had to pay for the hotel because I won the ticket
at a <a href="http://pghrb.org/">Pittsburgh Ruby Brigade</a> meeting.
<p>Now, even though I was attending a <a href="http://jruby.org/">JRuby</a> conference I wasn't really all that
familiar with JRuby itself. Of course I understood that it was a port of Ruby
to the JVM thus facilitating use of existing Java code. I understood that JRuby has favorable concurrency characteristics
due to the lack of a global interpreter lock. But I always assumed it was
generally immature and difficult to work with.</p>
<p>Wow, was I wrong. It turns out that it's robust, solves many of the <a href="http://www.ruby-lang.org/en/">Ruby</a>
(and <a href="http://rubyonrails.org/">Rails</a>) problems
I've been having and simplifies tasks I figured I'd have to simplify myself.<p>
<p>It only took two key advantages for me to turn our rails department on a dime and start JRuby adoption: great x64
windows support and .war file deployment of rails apps.</p>
<p><strong>Excellent 64-bit Windows Support</strong></p>
<p>This took me off guard and I felt very stupid for not thinking of it earlier. JRuby runs
great on every meaningful, modern platform within my enterprise. This stems from the proliferation
of perfectly solid 100% pure Java libraries and drivers that exist for nearly any task including
rock solid JDBC drivers for database connectivity.</p>
<p>Personally, I only use 64-bit macs and 32-bit linux boxes at the office, but a number of people who
contribute to my projects are using 64-bit Windows machines, and... well... aren't going to change.
Even though the MRI's quite happy on the platforms I use personally many gems that involve native code have proven
to be unstable on Windows x64 thus screwing my coworkers (therefore annoying me). For instance it took quite a bit of
hacking around to get a working mysql gem on my boss's workstation even though other x64 windows machines seemed to
be ok with the same libmysql.dll.</p>
<p>With JRuby I never had to worry about it. I used a rock solid JDBC driver with an ActiveRecord
adapter and every machine everywhere was happy... Happy and FAST!</p>
<p>I guess I always operated under the assumption that the MRI was as portable as Ruby runtimes will get.
At its core it's pretty portable I suppose, but it's crippled by severe reliance on native
code extensions. JRuby, however, is largely free of such concerns as the Java extensions are
plenty fast enough and incredibly portable.</p>
<p><strong>Easy .war File Deployment of Rails Applications</strong></p>
<p>I was also caught off guard by JRuby on Rails' capacity to deploy applications as .war files.</p>
<p>My surprise is somewhat more understandable here, though. I'm not really a Java guy
despite having been immersed in it recently. Sure, I hacked out plenty of Java from
1995-ish to 2001-ish but my head was never in Java Servlet-based web apps. I've only
deployed other people's .war files, namely Solr, which felt more third-party-appliance-like. It wasn't something I was looking for and therefore didn't know that I wanted it.</p>
<p>As soon as I heard a speaker indicate I could wrap up the JRuby runtime itself along
with whatever gems my rails project needs with a single “warble” command I was sold.
That's some easy deployment!</p>
<p>All you have to do is install the <a href="http://kenai.com/projects/warbler/pages/Home">warbler</a> gem</p>
<pre class="output_blog">jruby -S gem install warbler</pre>
<p>and then type</p>
<pre class="output_blog">warble</pre>
<p>in your rails project's directory. A .war file will then be produced that you can deploy into the servlet container of your choice.</p>
<p><strong>Conclusion</strong></p>
<p>These two pieces of knowledge have already had a wonderful impact on productivity at the office and it's
barely been a week. Even if I had to pay for the ticket it would have been worth it.</p>
http://www.chrisumbel.com/article/jruby_ruby_enterpriseThu, 07 Oct 2010 04:36:20 GMTRich-Style Formatting of an Android TextViewhttp://www.chrisumbel.com/article/android_textview_rich_text_spannablestring
<a href="http://www.android.com/"><img src="http://c243025.r25.cf1.rackcdn.com/android.png" border="0" align="right" alt-"android" logo/></a>
Even a developer-friendly mobile platform like
<a href="http://www.android.com/">Android</a> can have a developer
feeling a little lost when trying to perform simple tasks when you're
unfamiliar with the platform.
<p>One of these simple, however poorly documented, tasks is rich-style text formatting
within a <a href="http://developer.android.com/reference/android/widget/TextView.html">TextView</a>.</p>
<p><strong>SpannableString</strong></p>
<p>While it's possible to set a TextView's text property to a simple String and
configure the TextView to have the formatting you desire you're then limited
in how granular you can control the formatting within the TextView itself. The
<a href="http://developer.android.com/reference/android/text/SpannableString.html">SpannableString</a>
class allows you to easily format certain pieces (spans) of a string
one way and other pieces another by applying extensions of
CharacterStyle (i.e. <a href="http://developer.android.com/reference/android/text/style/ForegroundColorSpan.html">ForegroundColorSpan</a>) via the
<a href="http://developer.android.com/reference/android/text/Spannable.html#setSpan(java.lang.Object, int, int, int)">setSpan</a> method.</p>
<p>In the end this isn't limited to formatting. It also allows the developer
to add behaviors to spans such as reacting to click events.</p>
<p><strong>Example</strong></p>
<p>Here's an example <a href="http://developer.android.com/reference/android/app/Activity.html#onCreate(android.os.Bundle)">onCreate</a>
method of an <a href="http://developer.android.com/reference/android/app/Activity.html">Activity</a>. This assumes there's a main.xml
layout with a TextView identified by "rich_text".</p>
<p>Essentially this code will set a TextView's text to the familiar,
"Lorem ipsum dolor sit amet" and perform the following formatting:</p>
<p>
<ul>
<li>Make "Lorem" red</li>
<li>Make "ipsum" a 1.5 times bigger than what the TextView's setting</li>
<li>Make "dolor" display a toast message when touched</li>
<li>Strike through "sit"</li>
<li>Make "amet" twice as big as the TextView's setting, green
and a link to this site</li>
</ul>
</p>
<pre class="java" name="code">
@Override
public void onCreate(Bundle icicle) {
super.onCreate(icicle);
setContentView(R.layout.main);
richTextView = (TextView)findViewById(R.id.rich_text);
// this is the text we'll be operating on
SpannableString text = new SpannableString("Lorem ipsum dolor sit amet");
// make "Lorem" (characters 0 to 5) red
text.setSpan(new ForegroundColorSpan(Color.RED), 0, 5, 0);
// make "ipsum" (characters 6 to 11) one and a half time bigger than the textbox
text.setSpan(new RelativeSizeSpan(1.5f), 6, 11, 0);
// make "dolor" (characters 12 to 17) display a toast message when touched
final Context context = this;
ClickableSpan clickableSpan = new ClickableSpan() {
@Override
public void onClick(View view) {
Toast.makeText(context, "dolor", Toast.LENGTH_LONG).show();
}
};
text.setSpan(clickableSpan, 12, 17, 0);
// make "sit" (characters 18 to 21) struck through
text.setSpan(new StrikethroughSpan(), 18, 21, 0);
// make "amet" (characters 22 to 26) twice as big, green and a link to this site.
// it's important to set the color after the URLSpan or the standard
// link color will override it.
text.setSpan(new RelativeSizeSpan(2f), 22, 26, 0);
text.setSpan(new URLSpan("http://www.chrisumbel.com"), 22, 26, 0);
text.setSpan(new ForegroundColorSpan(Color.GREEN), 22, 26, 0);
// make our ClickableSpans and URLSpans work
richTextView.setMovementMethod(LinkMovementMethod.getInstance());
// shove our styled text into the TextView
richTextView.setText(text, BufferType.SPANNABLE);
}
</pre>
<p>The results of which will look something like:</p>
<p><img src="http://c243025.r25.cf1.rackcdn.com/spannable.png"/></p>
<p>
Note that we set the TextView's movement method to a
<a href="http://developer.android.com/reference/android/text/method/LinkMovementMethod.html">LinkMovementMethod</a> instance.
Without that the
<a href="http://developer.android.com/reference/android/text/style/ClickableSpan.html">ClickableSpan</a>
and <a href="http://developer.android.com/reference/android/text/style/URLSpan.html">URLSpan</a>s won't perform their intended actions.</p>
<p><strong>Next Steps</strong></p>
<p>This covers the fundamental concepts, but there are many extensions of
<a href="http://developer.android.com/reference/android/text/style/CharacterStyle.html">CharacterStyle</a>
I haven't covered here. Check out
<a href="http://developer.android.com/reference/android/text/style/CharacterStyle.html">the CharacterStyle documentation</a>
for more details.</p>
<p>Also note that a <a href="http://developer.android.com/reference/android/text/SpannableStringBuilder.html">SpannableStringBuilder</a>
is provided for building large spannables from smaller pieces.</p>
http://www.chrisumbel.com/article/android_textview_rich_text_spannablestringSat, 28 Aug 2010 18:25:02 GMT