Dependencies

Gzip (Installed by default on most *NIX systems. Pull from distro repos otherwise.)

Node.js (For testing stuff on your localhost. Not required if you have/prefer some other server for delivering static content.)

Stomach for shell scripts

Prologue

I wanted to split my web application into two distinct components:

A client-side, JS-driven presentation layer.

A lightweight, REST-based backend.

I’ve had to sort out a lot of issues to get both to cooperate while running on different servers on different domains, use digest-based authentication instead of cookies (REST is stateless), and so on, but that’s another post. This one focuses on efficiently delivering the UI portion – HTML + CSS + JS + Media – which from a server POV is static content.

Preparing AngularJS Scripts for Minification

The AngularJS docs provide some information on how to prepare controllers for minification here. Quoting from the page:

Since angular infers the controller’s dependencies from the names of arguments to the controller’s constructor function, if you were to minify the JavaScript code for PhoneListCtrl controller, all of its function arguments would be minified as well, and the dependency injector would not be able to identify services correctly.

PhoneListCtrl is part of the angular-phonecat application, used for driving the on-site tutorial.

Basically, every controller defined by your application needs to be explicitly injected with whatever dependencies it has. For the example above, it looks something like:

PhoneListCtrl.$inject = ['$scope', '$http'];

There is one more way defined on the site, but I prefer the method above.

However, this is not enough to get minified scripts working right. YUI Compressor changes closure parameter names, and this doesn’t go down well with Angular. You need to use inline annotations in defining custom services. You can find a usage example here.

Additionally, you can collate all content from controllers.js, directives.js, services.js and filters.js into app.js to reduce the number of calls made to the server.Don’t forget to modify your index.html / index-async.html to reflect this change.

The Build Script

If you’re sticking to the folder structure provided by angular-seed, you’ll have an app folder in your project root. Adjacent to this, create a build folder to contain the minified and compressed output files generated by the build script. You can tell git to ignore this folder by adding the following line to .gitignore:

/build/*

You can put your build script anywhere you like, and run it from anywhere in the project folder. I have put it inside the conveniently provided scripts folder.

Once you run this script, every app/file.[css | js] would have a working copy at build/file.min.[css | js]. Every other file in the app folder will be either:

compressed and copied (name unchanged) into the build folder if it is a text file, or

simply copied into the build folder if it is a binary file (like an image).

Your CSS and JS references need to be updated to their corresponding min versions in index.html / index-async.html.

Now that you’ve got a compressed, minified version of your app in the build folder, you can deploy it to any static server. But you do need to set your HTTP response headers properly, or the browser WILL show garbage. Most importantly, any compressed content must be served with the HTTP response header:

Content-Encoding: gzip

Additionally, for every file that is static content, it makes sense to set a far future date using an Expires header similar to the following:

Expires: Thu, 31 Dec 2037 20:00:00 GMT

The NodeJS web-server.js Script

The contents of the build folder are technically ready to be uploaded to any web server, but you will probably want to run the app from your localhost to first check if everything works fine. The built-in web-server.js is very useful to quickly launch and test your app, but it needs a few mods in order to serve the compressed content from the build folder correctly. The Content-Encoding header is sufficient to render the page correctly, but if you’re a stickler for good YSlow grades even on your localhost, you will want to add the Expires headers as well. Search for the following response codes in your web-server.js and add the lines listed below:

Like this:

On some networks, I need to connect to a (firewalled) intranet over wired ethernet, while general unrestricted network access is available over WiFi. Typically I need to stay connected to both networks so as to access machines on the LAN as well as the WWW. Trouble is (at least on my F17 machines) the system is configured to use the ethernet interface (if live) by default for all outbound requests, regardless of whether the WiFi is enabled or not.

This is not a convenient situation as the LAN is often configured to block access to requests going outside the local subnet. This means every time I have to go online, I need to disable my ethernet Iface first! The source of this endless bother can be traced down to the way the system has setup its routing. Just fire up a terminal and issue the following command to get your current routes. In one such run I get the following output:

This tells me that the default route for all outbound requests (those that do not specifically match any other rule) is through Iface p1p1 (ethernet or wired LAN). I need this to be set to wlan0 (WiFi) instead.

That is done (as root) by first deleting the existing default route, followed by adding a new rule to route default requests through WiFi:

The gateway IP for the default route should be the default gateway for your WiFi.

Post these steps, the system will route requests within the LAN through p1p1 (note that this route was already configured for p1p1 in my case and is a stricter rule than all the others, hence is the first to match) and outbound traffic to non-local addresses through wlan0.

Like this:

I often develop sites/web applications that provide some common or core functionality at a top level domain and use sub-domains for hosting portals, micro-sites or related apps. While developing these apps on my local machine I might have a dozen or so portals running under sub-domains of a top level local domain. It’s possible to add an entry to the hosts file, one for the top domain and one for each sub-domain, but the number of sub-domains may quickly grow big enough to render this method way too cumbersome.

To get around this hurdle, I run a lightweight DNS server locally, that has support for wildcard sub-domains. It’s called dnsmasq and is available as a standard package on most Linux systems. It installs as a system service, and is configured through the file /etc/dnsmasq.conf (on most rpm-based systems).

Below is a quick round-up of the bare minimum settings you need to enable and configure in order to get up and running:

# Never forward plain names (without a dot or domain part)
domain-needed
# Never forward addresses in the non-routed address spaces.
bogus-priv
# This option only affects forwarding, SRV records originating for
# dnsmasq (via srv-host= lines) are not suppressed by it.
filterwin2k
# Add domains which you want to force to an IP address here.
# The example below send any host in double-click.net to a local
# web-server.
address=/double-click.net/127.0.0.1
address=/localhost/127.0.0.1
# The IP address to listen on
listen-address=127.0.0.1

Like this:

This is the second post of the BIRT SSO Series wherein I describe the implementation of a single sign-on module for the Eclipse BIRT Report Viewer. This post gets straight into the details of server configuration. It is recommended that you first read the introduction in Part 1 to get acquainted with the background and the premises on which this solution is built.

Part 2: Server & Environment Configuration

I had noted in Part 1 that I hosted my report server under a sub-path of the top level domain. For this there needs to be a form of inter-process communication enabled via mod_jk in order for Apache to pipe requests and responses to and from Tomcat. mod_jk is easy to compile from source, if your particular Linux distribution does not happen to supply it from its package repository.

You’ll need the apxs tool in order to compile the extension. On a Fedora system, this is available in the httpd-devel package. Once you’ve downloaded and extracted the tomcat-connectors source bundle, cd into the native folder and issue the command:

$ ./configure --with-apxs=/usr/sbin/apxs
$ make

Then copy the apache-2.0/mod_jk.so file into /usr/lib[64]/httpd/modules. Edit your httpd.conf file and add the following lines:

This configuration assumes that your Tomcat server is running on the same machine as Apache, but it is not a necessary condition. I’m running my Drupal application under a vhost and so the JkMount directive is placed inside the vhost directive. If your application is deployed directly, then it should go into the workers.properties file described above.

ServerName yourdomain.com
...
...
JkMount /birt/* worker1

Your Tomcat CATALINA_BASE/server.xml file should contain the following lines:

NOTE: Although the instructions tell you to install the jar files in JAVA_HOME/lib/security in case you’re running tomcat on a JDK, they must actually be put in JAVA_HOME/jre/lib/security. In case you’re running on a JRE directly, the instructions on the site should work.

This concludes the server and environment setup required for the module to work. Part 3 of this series delves into the details of the module implementation.

This continues from my previous post on the various online storage/sync solutions available today.

I’ve been a Dropbox (and Box, and Google Drive) user for a while now, and like it for its convenience. It is easy to use and setup, and lets you keep multiple devices in sync with next to no effort. However, I’ve always had some concerns over privacy and security issues. In light of the recent attack on the service provider, I started wondering how safe my files and accounts really are (not just with Dropbox, but actually with any online storage solution, including a home-brewed one).

I also have some concerns regarding the privacy of my documents. Say, I’ve got some sensitive data uploaded to an online storage service. Who’s to say these documents are safe from data mining, or (god forbid) human eyes? (I’m not pointing fingers at any individual storage provider here. Some may respect your privacy, others may not.) Many people would be extremely wary of the possibility of information harvesting (even if it is completely anonymized and automated) and/or leakage.

Then of course, there are some less critical, but nevertheless important limitations:

Only x GB of (free) storage space. One can always upgrade to a paid package, but I don’t want to pay for 50 GB of storage when I’m only going to use 10 GB in the foreseeable future. There are services who provide a large amount of storage space for free, but most of them still charge you for bandwidth usage above a fraction of the amount.

No support for multiple profiles. You have to put EVERYTHING you want to sync under one single top-level folder. This may not be a suitable or acceptable restriction in all situations.

Lack of flexibility – you don’t get to move your repository around if you need to. Once you subscribe to a service, you’re locked into using their storage infrastructure exclusively.

It is not necessary that the limitations I’ve described so far are all present in any single service, or even that they are a matter of concern for everybody. These are just a few issues that got me going on a personal quest to find a better alternative.

There are actually quite a few ways of setting up your own personal online storage and sync solution, whose security is limited only by your ability to configure it. But the most visible benefit over any existing service is the flexibility –

to use a storage infrastructure of your choice, and

to manage multiple profiles.

The rest of this post documents my experiments with one such solution, named bitpocket. It performs 2-way sync by using a wrapper script to run rsync twice (once on the master, once on the slave). It can also detect, and correctly propagate file deletions. It does have one limitation in that it doesn’t handle conflict resolution. You have been warned. (Unison is supposedly capable of this, but that is another post ;-).)

The basic setup instructions are right on the project landing page. Follow them and you’re all set. I’ll elaborate on two things here –

how to do a multi-profile setup, and

how to alleviate the problem of repeated remote lockouts when multiple slaves always try to sync at the same time.

Multiple profiles

I’ve got two folders on my laptop that I want to sync:

/home/aditya/scripts

/home/aditya/Documents

I want these two folder profiles to be self-contained, without requiring the tracking to be done at the common parent. Following the instructions on the project page, I did a bitpocket init inside each of the above folders. On the master side (I’m running an EC2 micro-instance on a 64-bit Amazon Linux AMI), I’ve got one folder: /home/ec2-user/syncroot where I want to track all synced profiles. So in the config file of the individual profile folders on the slave machine I set the REMOTE_PATH variable as follows:

For /home/aditya/scripts:

REMOTE_PATH="/home/ec2-user/syncroot/scripts"

For /home/aditya/Documents:

REMOTE_PATH="/home/ec2-user/syncroot/Documents"

That’s it! You can manage as many profiles as you want, with each slave deciding where to keep its local copy of each profile.

Preventing remote lockouts

Say, all your slaves are configured to sync their system clock over a network source. They are in sync with each other, often to the second (or finer). Now if all crons are configured to run at 5 minute intervals, then all the slaves attempt to connect to the master at exactly the same time. The first one to establish a connection starts syncing, and all the others get locked out. This happens on every cron run. The problem is further exacerbated by the fact that even blank syncing takes a few seconds at the very least, and the lockout is in force for that duration. We’re thus left with a very inefficient system which can sync ONLY one slave with every cron run. If one slave is on a network that enjoys consistently lower lag with the master than all the others, then the others basically never get a chance to connect! Even if that is not the case, the system overall always has a success rate of 1/N for N slaves, in each cron run. Not good.

One way to alleviate this (though not entirely) is to introduce a random delay (less than the cron interval) between when cron initiates and when the connection is actually attempted. Over several cron runs, this scheme spreads out the odds evenly (duh!), for each slave, of running into a remote lockout. Local lockouts are not a problem. Bitpocket uses a locking mechanism to prevent two local processes from syncing the same tracked directory at the same time. If a new process encounters a lock on a tracked directory, meaning the previously spawned process hasn’t finished syncing yet, it simply exits. The random delay is introduced as shown below (assuming a cron frequency of 5 min):

EDIT: I ran into trouble with stale server-side locks preventing further syncs with any slave. This happens when a slave disconnects mid-sync for whatever reason. Lock cleanup is currently the responsibility of the slave process that created it. There is no mechanism on the server to detect and expire stale locks (See https://github.com/sickill/bitpocket/issues/16). This issue needs to be fixed before this syncing tool can be left to run indefinitely, without supervision.

EDIT #2: One quick way to dispose of stale master locks is by periodically running a little script on the server that checks each sync directory for any open files (i.e. some machine is currently running a sync). If none are found, it simply deletes the leftover lock files. The script and the corresponding crontab entries are as below:

Share this:

Like this:

The apache configuration file (usually httpd.conf) that ships with most linux distros has some configuration settings that are sub-optimal for running PHP applications (many optimizations described below will benefit other applications as well). I’ve found, after a bit of trial and error, that the following settings will let you squeeze a little extra out of the server running on your development machine.

The settings below affect the overall memory usage (even when idle) of the server since they define the minimum number of httpd processes (or threads) that are spawned and maintained. On my laptop, with 4GB RAM, the prefork and mpm worker settings are as follows (apache only uses one of them, depending on how it is compiled):

Other than tweaking the process/thread variables, there are a few more settings to look at:

KeepAlive On #Reduces page load time by reusing tcp connections
MaxKeepAliveRequests 100
KeepAliveTimeout 15 #Set this value higher for longer-lived connections between browser and server, but if there are many simultaneous requests, this can cause some of them to unnecessarily queue up.
FileETag None

Additionally, you can instruct apache to compress content sent over the network using gzip by adding the following lines (I’ve put them in a separate mod_deflate.conf under /etc/httpd/conf.d, but that’s not necessary):

Assuming you’re doing PHP development, you might as well go ahead and enable APC. I’ve sometimes faced an issue with APC using stale opcode even when a newer version of the source file is available, but not very often. Some people prefer to keep APC disabled on development machines. In /etc/php.d/apc.ini (on Red Hat-like systems, including Fedora, CentOS, RHEL, Amazon Linux, etc.) enable the following settings:

apc.enabled=1
apc.shm_size=512M
; This is usually too much if you're running just one application.
; It's better to start with 128 MB and upscale only if this falls short.
; You can check APC memory usage using the apc.php file that ships with the default APC installation on most systems.
; (http://pecl.php.net/package/APC)

While we’re on the subject of optimization, we might as well tune our MySQL server to use its built-in query cache. In your /etc/my.cnf add the following line:

Share this:

Like this:

First of all, here’s an excellent document explaining all about Access Control Lists (ACLs) in Linux. What follows here is a simple example of how you can put them to use.

I do PHP development on one of my Linux laptops. I go with the standard apache + mod_php setup and all works pretty well. But here’s the thing – I don’t want to have to copy over a file I’ve worked on, say from somewhere in my home directory to the docroot of the vhost where it actually needs to be deployed. That’s a pretty easily avoidable extra step.

The problem is with managing permissions. Typically, all files under the docroot should be owned by the apache (or equivalent) user. No other user should have write access to this folder. But then I don’t want to have to sudo copy a file, then change ownership back to apache every time. It is prone to oversights and can lead to weird side effects if not done diligently. This is where ACLs come to the rescue. The rationale is as follows:

On my personal dev machine, I am ok with allowing one user (the one I log in and work with) to have write access to the apache-owned folders, so as to ease development by directly editing/creating files in place. This user need not share any groups with apache. ACLs allow for precisely this kind of fine-grained access rules to be defined. (For the hawk-eyes among you, there is a way to retain ownership of new files with apache’s primary group, even if they were created by a different user. See here.)

The standard Linux permissions need not be opened up to all and sundry.

The server runs without any problems, since it owns all the files in docroot.

So here’s what I do – assuming your apache webroot is at /var/www/html (there could be, and usually are multiple docroots in here), a user named aditya needs to be given write access (recursively). That is done by running:

$ sudo setfacl -R -m u:aditya:rwx /var/www/html/

And that’s pretty much it! If you’ve setup your setgid bits by following the link above (in point #1), and the standard rwx ownership is still with apache, you’re all set!

I don’t need to enable barriers for a fixed disk on a battery-backed laptop. (See the mount manpage for how a barrier works. There is a small risk associated.)

The data=writeback option results in only metadata being journaled, but not actual file data. This does engender the risk of corrupting recently modified files in the event of a sudden failure. But if you’re willing to run with it, it provides a great boost to the filesystem’s performance.

noatime disables the atime (access timestamp) feature, which would otherwise cause additional writes to record timestamps for every time an inode is accessed. You can safely enable this.

nobh prevents association of buffer heads with data pagefiles (Only works with data=writeback).

The big impact options are nos. 2 and 3, whereas you can play it a bit safer by ignoring nos. 1 and 4.

12.97159977.594563

Share this:

Like this:

If you’re running a linux box, you’re probably running an ssh server on it. Highly secure, if you’ve configured it right, but there are a few things you can do to increase security even further. There’s a kind of attack called a Denial of Service (DOS) that basically just hammers the machine on a specified port repeatedly with requests (well formed or otherwise) in the hope that a buffer overflow or a brute force password attack will allow for a break-in.

This is where you need to configure your firewall, so that it bans a given IP from reaching the ssh server at all, if there are more than 3 (failed) connection attempts within a minute. The commands below are for the iptables firewall… very commonly found on most linux distros, but you will have to look for other means if your firewall is different.

This does not necessarily secure you from a Distributed Denial of Service (DDOS) attack, and in no way does it ensure that your machine is completely hack-proof. (Is that even possible?) But it will (mostly) keep those pesky script kiddies at bay ;).

For more information on ssh and general system security, the following links are informative sources to start with:

Share this:

Like this:

So I am this long time Linux user – I started with my first linux installation some time in 2004. It was a Mandrake (now Mandriva) 7 or 8. I started out pretty clueless, but had the help of some geek friends when taking my first toddler steps into this fascinating world.

Now I’m at a stage where I’m more comfortable on Linux than on a Win/Mac. Does that mean I’m fairly comfortable with the shell? Yes. Does it mean I know each and every command under the sun for each and every task under the sun (Yes, you can do pretty much everything under the sun on a bash shell.)? Nope!

There is a HUGE number of commands and combinations, several meta-languages, regular expressions and built-in features, which together render the unassuming command shell not only incredibly powerful, but also (seemingly) incredibly difficult to learn and use (It isn’t.). There is a natural trend for most system configurations, software and scripts to rely heavily on the shell (many tools have feature-complete GUIs as well). Unfortunately, this also acts as a major deterrent for users coming from other platforms, wanting to migrate to a Linux environment. THE LINUX SHELL CAN BE INTIMIDATING!

Well, not anymore, thanks in no small part to these guys. Commandlinefu.com gives you ready-made, baked recipes for many things you can do with your shell, ranging from something as mundane as listing the open tcp ports on your machine, to the most eccentric (and dare I say, hare-brained) stuff, such as watching Star Wars episode IV in ASCII mode!! (I’m not kidding! Try

$ telnet towel.blinkelights.nl

)

This is a boon, not only for newbies, but also for mid-level Linux enthusiasts like me, who haven’t quite reached elite geekdomness yet! So if I want to see the disk usage under a certain folder, to identify space hogs or possible bloated temporary disk caches, I just search ‘disk usage’ and here’s what I get:

$ du -cs * .[^\.]* | sort -n

Shows size of dirs and files, hidden or not, sorted.
Very useful when you need disk space. It calculates the disk usage of all files and dirs (descending them) located at the current directory (including hidden ones). Then sort puts them in order.

$ du | sort -gr > file_sizes

Outputs a sorted list of disk usage to a text file
Recursively searches current directory and outputs sorted list of each directory’s disk usage to a text file.

Those were just a couple of examples. The site lists many more. You’re almost 100% sure to find a command that suits your needs. But I’m not writing this post to tell you how to use a website. I’m going to show you how to search this website from the shell!

The following instructions assume that you have a ready and working bash shell at your disposal.