Friday, June 28, 2013

Varnish is a program that can greatly speed up a Web site while reducing
the load on the Web server. According to Varnish's official site, Varnish
is a "Web application accelerator also known as a caching HTTP reverse
proxy".
When you think about what a Web server does at a high level, it receives
HTTP requests and returns HTTP responses. In a perfect world, the server
would return a response immediately without having to do any real work. In
the real world, however, the server may have to do quite a bit of work
before returning a response to the client. Let's first look at how a
typical Web server handles this, and then see what Varnish does
to improve the situation.
Although every server is different, a typical Web server will go through a
potentially long sequence of steps to service each request it receives. It
may start by spawning a new process to handle the request. Then, it
may have to load script files from disk, launch an interpreter process
to interpret and compile those files into bytecode and then execute
that bytecode. Executing the code may result in additional work, such
as performing expensive database queries and retrieving more files from
disk. Multiply this by hundreds or thousands of requests, and you can see
how the server quickly can become overloaded, draining system resources
trying to fulfill requests. To make matters worse, many of the requests
are repeats of recent requests, but the server may not have a way to
remember the responses, so it's sentenced to repeating the same painful
process from the beginning for each request it encounters.
Things are a little different with Varnish in place. For starters,
the request is received by Varnish instead of the Web server. Varnish
then will look at what's being requested and forward the request to the
Web server (known as a back end to Varnish). The back-end server does
its regular work and returns a response to Varnish, which in turn gives
the response to the client that sent the original request.
If that's all
Varnish did, it wouldn't be much help. What gives us the performance gains
is that Varnish can store responses from the back end in its cache for
future use. Varnish quickly can serve the next response directly from its
cache without placing any needless load on the back-end server. The result
is that the load on the back end is reduced significantly, response times
improve, and more requests can be served per second. One of the things
that makes Varnish so fast is that it keeps its cache completely in memory
instead of on disk. This and other optimizations allow Varnish to process
requests at blinding speeds. However, because memory typically is more
limited than disk, you have to size your Varnish cache properly and take
measures not to cache duplicate objects that would waste valuable space.
Let's install Varnish. I'm going to explain how to install it from source, but you can
install it using your distribution's package manager. The latest version
of Varnish is 3.0.3, and that's the version I work with here. Be aware
that the 2.x versions of Varnish have some subtle differences in the
configuration syntax that could trip you up. Take a look at the Varnish
upgrade page on the Web site for a full list of the changes between
versions 2.x and 3.x.
Missing dependencies is one of the most common installation
problems. Check the Varnish installation page for the full list of build
dependencies.
Run the following commands as root to download and install the latest
version of Varnish:

Varnish is now installed under the /usr/local directory. The full path to
the main binary is /usr/local/sbin/varnishd, and the default configuration
file is /usr/local/etc/varnish/default.vcl.
You can start Varnish by running the varnishd binary. Before you can do
that though, you have to tell Varnish which back-end server it's caching for. Let's
specify the back end in the default.vcl file. Edit the default.vcl file
as shown below, substituting the values for those of your Web server:

This will run varnishd as a dæmon and return you to the command
prompt. One thing worth pointing out is that varnishd will launch two
processes. The first is the manager process, and the second is the
child worker process. If the child process dies for whatever reason,
the manager process will spawn a new process.

Varnishd Startup Options

The -f option tells Varnish where your configuration file lives.
The -a option is the address:port that Varnish will listen on for incoming HTTP
requests from clients.
The -P option is the path to the PID file, which will make it easier to stop
Varnish in a few moments.
The -s option configures where the cache is kept. In this case, we're using a 256MB
memory-resident cache.
If you installed Varnish from your package manager, it may be
running already. In that case, you can stop it first, then use the command above to
start it manually. Otherwise, the options it was started with may differ
from those in this example. A quick way to see if Varnish is running
and what options it was given is with the pgrep command:

/usr/bin/pgrep -lf varnish

Varnish now will relay any requests it receives to the back end you
specified, possibly cache the response, and deliver the response back
to the client. Let's submit some simple GET requests and see what Varnish
does. First, run these two commands on separate terminals:

/usr/local/bin/varnishlog
/usr/local/bin/varnishstat

The following GET command is part of the Perl www library (libwww-perl). I
use it so you can see the response headers you get back from Varnish. If
you don't have libwww-perl, you could use Firefox with the Live HTTP
Headers extension or another tool of your choice:

GET -Used http://localhost:6081/

Figure 1. Varnish Response Headers

The options given to the GET command aren't important here. The important
thing is that the URL points to the port on which varnishd is listening.
There are three response headers that were added by Varnish. They are
X-Varnish, Via and Age. These headers are useful once you know what they
are. The X-Varnish header will be followed by either one or two numbers. The
single-number version means the response was not in Varnish's cache
(miss), and the number shown is the ID Varnish assigned to the request. If
two numbers are shown, it means Varnish found a response in its cache
(hit). The first is the ID of the request, and the second is the ID of the
request from which the cached response was populated. The Via header
just shows that the request went through a proxy. The Age header tells
you how long the response has been cached by Varnish, in seconds. The
first response will have an Age of 0, and subsequent hits will have
an incrementing Age value. If subsequent responses to the same page
don't increment the Age header, that means Varnish is not caching
the response.
Now let's look at the varnishstat command launched earlier. You should
see something similar to Figure 2.
Figure 2. varnishstat Command
The important lines are cache_hit and cache_miss. cache_hits won't be
shown if you haven't had any hits yet. As more requests come in, the
counters are updates to reflect hits and misses.
Next, let's look at the varnishlog command launched earlier (Figure 3).
Figure 3. varnishlog Command
This shows you fairly verbose details of the requests and responses that
have gone through Varnish. The documentation on the Varnish Web site explains
the log output as follows:

The first column is an arbitrary number,
it defines the request. Lines with the same number are part of the same
HTTP transaction. The second column is the tag of the log message. All
log entries are tagged with a tag indicating what sort of activity is
being logged. Tags starting with Rx indicate Varnish is receiving data
and Tx indicates sending data. The third column tell us whether this
is data coming or going to the client (c) or to/from the back end
(b). The forth column is the data being logged.

varnishlog has various filtering options to help you find what you're
looking for. I recommend playing around and getting comfortable with
varnishlog, because it will really help you debug Varnish. Read the
varnishlog(1) man page for all the details. Next are some simple examples
of how to filter with varnishlog.
To view communication between Varnish and the client (omitting the back end):

/usr/local/bin/varnishlog -c

To view communication between Varnish and the back end (omitting the client):

/usr/local/bin/varnishlog -b

To view the headers received by Varnish (both the client's request headers and the back end's response headers):

/usr/local/bin/varnishlog -i RxHeader

Same thing, but limited to just the client's request headers:

/usr/local/bin/varnishlog -c -i RxHeader

Same thing, but limited to just the back end's response headers:

/usr/local/bin/varnishlog -b -i RxHeader

To write all log messages to the /var/log/varnish.log file and dæmonize:

/usr/local/bin/varnishlog -Dw /var/log/varnish.log

To read and display all log messages from the /var/log/varnish.log file:

/usr/local/bin/varnishlog -r /var/log/varnish.log

The last two examples demonstrate storing your Varnish log to
disk. Varnish keeps a circular log in memory in order to stay fast,
but that means old log entries are lost unless saved to disk. The
last two examples above demonstrate how to save all log messages to a
file for later review.
If you wanted to stop Varnish, you could do so with this command:

kill `cat /var/run/varnish.pid`

This will send the TERM signal to the process whose PID is stored in the
/var/run/varnish.pid file. Because this is the varnishd manager process,
Varnish will shut down.
Now that you know how to start and stop Varnish, and examine cache hits
and misses, the natural question to ask is what does Varnish cache,
and for how long?
Varnish is conservative with what it will cache by default, but you can
change most of these defaults. It will consider only caching GET and HEAD
requests. It won't cache a request with either a Cookie or Authorization
header. It won't cache a response with either a Set-Cookie or Vary
header. One thing Varnish looks at is the Cache-Control header. This
header is optional, and it may be present in the Request or the Response. It
may contain a list of one or more semicolon-separated directives. This
header is meant to apply caching restrictions. However, Varnish won't
alter its caching behavior based on the Cache-Control header, with
the exception of the max-age directive. This directive looks like
Cache-Control: max-age=n, where n is a number. If Varnish receives
the max-age directive in the back end's response, it will use that value
to set the cached response's expiration (TTL), in seconds. Otherwise,
Varnish will set the cached response's TTL expiration to the value of
its default_ttl parameter, which defaults to 120 seconds.

Note:

Varnish has configuration parameters with sensible defaults. For
example, the default_ttl parameter defaults to 120 seconds. Configuration
parameters are fully explained in the varnishd(1) man page. You may want
to change some of the default parameter values. One way to do that is to
launch varnishd by using the -p option. This has the downside of having
to stop and restart Varnish, which will flush the cache. A better way
of changing parameters is by using what Varnish calls the management
interface. The management interface is available only if varnishd
was started with the -T option. It specifies on what port the management
interface should listen. You can connect to the management interface
with the varnishadm command. Once connected, you can query parameters
and change their values without having to restart Varnish.
To learn more, read the man pages for varnishd, varnishadm and varnish-cli.
You'll likely want to change what Varnish caches and how long it's cached
for—this is called your caching policy. You express your caching
policy in the default.vcl file by writing VCL. VCL stands for Varnish
Configuration Language, which is like a very simple scripting language
specific to Varnish. VCL is fully explained in the vcl(7) man page,
and I recommend reading it.
Before changing default.vcl, let's think about the process Varnish goes
through to fulfill an HTTP request. I call this the request/response cycle,
and it all starts when Varnish receives a request. Varnish will parse
the HTTP request and store the details in an object known to Varnish
simply as req. Now Varnish has a decision to make based entirely on the
req object—should it check its cache for a match or just forward the
request to the back end without caching the response? If it decides to
bypass its cache, the only thing left to do is forward the request
to the back end and then forward the response back to the client. However,
if it decides to check its cache, things get more interesting. This is
called a cache lookup, and the result will either be a hit or a miss. A
hit means that Varnish has a response in its cache for the client. A
miss means that Varnish doesn't have a cached response to send, so the
only logical thing to do is send the request to the back end and then
cache the response it gives before sending it back to the client.
Now that you have an idea of Varnish's request/response cycle, let's talk
about how to implement your caching policy by changing the decisions
Varnish makes in the process. Varnish has a set of subroutines that carry
out the process described above. Each of these subroutines performs a
different part of the process, and the return value from the subroutine is
how you tell Varnish what to do next. In addition to setting the return
values, you can inspect and make changes to various objects within the
subroutines. These objects represent things like the request and the
response. Each subroutine has a default behavior that can be seen in
default.vcl. You can redefine these subroutines to get Varnish to behave
how you want.

Varnish Subroutines

The Varnish subroutines have default definitions, which are shown
in default.vcl. Just because you redefine one of these subroutines
doesn't mean the default definition will not execute. In particular,
if you redefine one of the subroutines but don't return a value, Varnish
will proceed to execute the default subroutine. All the default Varnish
subroutines return a value, so it makes sens that Varnish uses them as
a fallback.
The first subroutine to look at is called vcl_recv. This gets executed
after receiving the full client request, which is available in the req
object. Here you can inspect and make changes to the original request
via the req object. You can use the value of req to decide how to
proceed. The return value is how you tell Varnish what to do. I'll put
the return values in parentheses as they are explained. Here you can tell
Varnish to bypass the cache and send the back end's response back to the
client (pass). You also can tell Varnish to check its cache for a match
(lookup).
Next is the vcl_pass subroutine. If you returned pass in
vcl_recv, this is where you'll be just before sending the request to
the back end. You can tell Varnish to continue as planned (pass) or to
restart the cycle at the vcl_recv subroutine (restart).
The vcl_miss and vcl_hit subroutines are executed depending on whether
Varnish found a suitable response in the cache. From vcl_miss, your
main options are to get a response from the back-end server and cache
it (fetch) or to get a response from the back end and not cache it
(pass). vcl_hit is where you'll be if Varnish successfully finds
a matching response in its cache. From vcl_hit, you have the cached
response available to you in the obj object. You can tell Varnish to send
the cached response to the client (deliver) or have Varnish ignore the
cached response and return a fresh response from the back end (pass).
The vcl_fetch subroutine is where you'll be after getting a fresh response
from the back end. The response will be available to you in the beresp
object. You either can tell Varnish to continue as planned (deliver)
or to start over (restart).
From vcl_deliver, you can finish the request/response cycle by delivering
the response to the client and possibly caching it as well (deliver),
or you can start over (restart).
As previously stated, you express your caching policy within the
subroutines in default.vcl. The return values tell Varnish what to do
next. You can base your return values on many things, including the
values held in the request (req) and response (resp) objects mentioned
earlier. In addition to req and resp, there also is a client object
representing the client, a server object and a beresp object representing
the back end's response. It's important to realize that not all objects
are available in all subroutines. It's also important to return one of
the allowed return values from subroutines. One of the hardest things to
remember when starting out with Varnish is which objects are available
in which subroutines, and what the legal return values are. To make it
easier, I've created a couple reference tables. They will help you
get up to speed quickly by not having to memorize everything up front
or dig through the documentation every time you make a change.

Table 1. This table shows which objects are available in each of the
subroutines.

client

server

req

bereq

beresp

resp

obj

vcl_recv

X

X

X

vcl_pass

X

X

X

X

vcl_miss

X

X

X

X

vcl_hit

X

X

X

X

vcl_fetch

X

X

X

X

X

vcl_deliver

X

X

X

X

Table 2. This table shows valid return values for each of the
subroutines.

pass

lookup

error

restart

deliver

fetch

pipe

hit_for_pass

vcl_recv

X

X

X

X

vcl_pass

X

X

X

vcl_lookup

vcl_miss

X

X

X

vcl_hit

X

X

X

X

vcl_fetch

X

X

X

X

vcl_deliver

X

X

X

Tip:

Be sure to read the full explanation of VCL, available subroutines, return
values and objects in the vcl(7) man page.
Let's put it all together by looking at some examples.
Normalizing the request's Host header:

Notice you access the request's host header by using req.http.host. You
have full access to all of the request's headers by putting the header
name after req.http. The ~ operator is the match operator. That is
followed by a regular expression. If you match, you then use the set
keyword and the assignment operator (=) to normalize the hostname to
simply "example.com". A really good reason to normalize the hostname is
to keep Varnish from caching duplicate responses. Varnish looks at the
hostname and the URL to determine if there's a match, so the hostnames
should be normalized if possible.
Here's a snippet from the default vcl_recv subroutine:

That's a snippet of the default vcl_recv subroutine. You can see that
if it's not a GET or HEAD request, varnish returns pass and won't cache
the response. If it is a GET or HEAD request, it looks it up in the cache.
Removing request's Cookies if the URL matches:

sub vcl_recv {
if (req.url ~ "^/images") {
unset req.http.cookie;
}
}

That's an example from the Varnish Web site. It removes cookies from the
request if the URL starts with "/images". This makes sense
when you recall
that Varnish won't cache a request with a cookie. By removing the cookie,
you allow Varnish to cache the response.
Removing response cookies for image files:

That's another example from Varnish's Web site. Here you're in the vcl_fetch
subroutine, which happens after fetching a fresh response from the
back end. Recall that the response is held in the beresp object. Notice
that here you're accessing both the request (req) and the response
(beresp). If the request is for an image, you remove the Set-Cookie header
set by the server and override the cached response's TTL to one
hour. Again, you do this because Varnish won't cache responses with the
Set-Cookie header.

Now, let's say you want to add a header to the response called X-Hit. The
value should be 1 for a cache hit and 0 for a miss. The easiest way
to detect a hit is from within the vcl_hit subroutine. Recall that
vcl_hit will be executed only when a cache hit occurs. Ideally, you'd
set the response header from within vcl_hit, but looking at Table 1
in this article, you see that neither of the response objects (beresp and resp)
are available within vcl_hit. One way around this is to set a temporary
header in the request, then later set the response header. Let's take
a look at how to solve this.
Adding an X-Hit response header:

The code in vcl_hit and vcl_miss is straightforward—set a value in a
temporary request header to indicate a cache hit or miss. The interesting
bit is in vcl_deliver. First, I set a default value for X-Hit to 0,
indicating a miss. Next, I detect whether the request's tempheader was set,
and if so, set the response's X-Hit header to match the temporary header
set earlier. I then delete the tempheader to keep things tidy, and
I'm all done. The reason I chose the vcl_deliver subroutine is because
the response object that will be sent back to the client (resp) is
available only within vcl_deliver.
Let's explore a similar solution that doesn't work as expected.
Adding an X-Hit response header—the wrong way:

Notice that within vcl_fetch, I'm now altering the back end's response
(beresp), not the final response sent to the client. This code appears
to work as expected, but it has a major bug. What happens is that the
first request is a miss and fetched from the back end, and that response
has X-Hit set to "0" then it's cached. Subsequent requests result in a
cache hit and never enter the vcl_fetch subroutine. The result is that
all cache hits continue having X-Hit set to "0". These are the types of
mistakes to look out for when working with Varnish.
The easiest way to avoid these mistakes is to keep those reference tables
handy; remember when each subroutine is executed in Varnish's workflow,
and always test the results.
Let's look at a simple way to tell Varnish to cache everything for one
hour. This is shown only as an example and isn't recommended for a
real server.
Cache all responses for one hour:

Here, I'm overriding two default subroutines with my own. If I hadn't
returned "deliver" from vcl_fetch, Varnish still would have executed
its default vcl_fetch subroutine looking for a return value, and this
would not have worked as expected.
Once you get Varnish to implement your caching policy, you should run
some benchmarks to see if there is any improvement. The benchmarking
tool
I use here is the Apache benchmark tool, known as ab. You can install
this tool as part of the Apache Web server or as a separate
package—depending on your system's package manager. You can read about
the various
options available to ab in either the man page or at the Apache Web site.
In the benchmark examples below, I have a stock Apache 2.2 installation
listening on port 80, and Varnish listening on port 6081. The page
I'm testing is a very basic Perl CGI script I wrote that just outputs
a one-liner HTML page. It's important to benchmark the same URL against
both the Web server and Varnish so you can make a direct comparison. I run
the benchmark from the same machine that Apache and Varnish are running
on in order to eliminate the network as a factor. The ab options I use
are fairly straightforward. Feel free to experiment with different ab
options and see what happens.
Let's start with 1000 total requests (-n 1000) and a concurrency of 1 (-c 1).
Benchmarking Apache with ab:

Figure 5. Output from ab Command (Varnish)
As you can see, the ab command provides a lot of useful output. The
metrics I'm looking at here are "Time per request" and "Requests per
second" (rps). You can see that Apache came in at just over 1ms per request
(780 rps), while Varnish came in at 0.1ms (7336 rps)—nearly ten times
faster than Apache. This shows that Varnish is faster, at least based on
the current setup and isolated testing. It's a good idea to run ab with
various options to get a feel for performance—particularly by
changing
the concurrency values and seeing what impact that has on your system.

System Load and %iowait

System load is a measure of how much load is being placed on your CPU(s). As a
general rule, you want the number to stay below 1.0 per CPU or core on your
system. That means if you have a four-core system as in the machine I'm
benchmarking here, you want your system's load to stay below 4.0.
%iowait is a measure of the percentage of CPU time spent waiting on
input/output. A high %iowait indicates your system is disk-bound,
performing many disk i/o
operations causing the system to slow down. For example, if your server
had to
retrieve 100 files or more for each request, it likely would cause the
%iowait time to go up very high indicating that the disk is a
bottleneck.
The goal is to not only improve response times, but also to do so with
as little impact on system resources as possible. Let's compare how a
prolonged traffic surge affects system resources. Two good measures
of system performance are the load average and the %iowait. The load
average can be seen with the top utility, and the %iowait can be seen
with the iostat command. You're going to want to keep an eye on both top
and iostat during the prolonged load test to see how the numbers change.
Let's fire up top and iostat, each on separate terminals.
Starting iostat with a two-second update interval:

iostat -c 2

Starting top:

/usr/bin/top

Now you're ready to run the benchmark. You want ab to run long enough
to see the impact on system performance. This typically means anywhere
from one minute to ten minutes. Let's re-run ab with a lot more total
requests and a higher concurrency.
Load testing Apache with ab:

Figure 7. System Load Impact of Traffic Surge on Varnish
First let's compare response times. Although you can't see it in the
screenshots, which were taken just before ab finished, Apache came in at
23ms per request (2097 rps), and Varnish clocked in at 4ms per request
(12099 rps). The most drastic difference can be seen in the load
averages in top. While Apache brought the system load all the way up
to 12, Varnish kept the system load near 0 at 0.4. I did have to wait
several minutes for the machine's load averages to go back down after
the Apache load test before load testing Varnish. It's also best to run
these tests on a non-production system that is mostly idle.
Although everyone's servers and Web sites have different requirements and
configurations, Varnish may be able to improve your site's
performance drastically while simultaneously reducing the load on the server.

Thursday, June 27, 2013

Raw tcp sockets in C

Raw sockets can be used to construct a packet manually inside an
application. In normal sockets when any data is send over the network,
the kernel of the operating system adds some headers to it like IP
header and TCP header. So an application only needs to take care of what
data it is sending and what reply it is expecting.

But there are other cases when an application needs to set its own
headers. Raw sockets are used in security related applications like nmap
, packets sniffer etc. In this article we are going to program raw
sockets on linux using native sockets. Windows for example does not
support raw socket programming directly. To program raw sockets on windows a packet crafting library like winpcap has to be used.
In this article we are going to do some raw socket programming by
constructing a raw TCP packet and sending it over the network. Before
programming raw sockets, it is recommended that you learn about the
basics of socket programming in c.

Raw TCP packets

A TCP packet is constructed like this
Packet = IP Header + TCP Header + Data
The plus means to attach the binary data side by side. So when making
a raw tcp packet we need to know how to construct the headers properly.
The structures of all headers are established standards which are
described in RFCs.

Ip header

The structure of IP Header as given by RFC 791

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

0 1 2 3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

|Version| IHL |Type of Service| Total Length |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| Identification |Flags| Fragment Offset |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| Time to Live | Protocol | Header Checksum |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| Source Address |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| Destination Address |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| Options | Padding |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The "Source Address" field stores the ip address of the system
sending the packet and the "Destination Address" contains the ip address
of the destination system. Ip addresses are stored in long number
format. The "Protocol" field stores a number that indicates the
protocol, which is TCP in this case.

Structure of tcp header

The structure of a TCP header as given by RFC 793

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

0 1 2 3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| Source Port | Destination Port |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| Sequence Number |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| Acknowledgment Number |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| Data | |U|A|P|R|S|F| |

| Offset| Reserved |R|C|S|S|Y|I| Window |

| | |G|K|H|T|N|N| |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| Checksum | Urgent Pointer |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| Options | Padding |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| data |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

So we need to construct the headers according to the formats specified above.

Raw tcp sockets

Create a raw socket like this

1

ints = socket (AF_INET, SOCK_RAW, IPPROTO_TCP);

The above function call creates a raw socket of protocol TCP. This
means that we have to provide the TCP header along with the data. The
kernel or the network stack of Linux shall provide the IP header.
If we want to provide the IP header as well then there are 2 ways of doing this
1. Use protocol IPPROTO_RAW - This will allow to specify the IP header and everything that is contained in the packet.

1

ints = socket (AF_INET, SOCK_RAW, IPPROTO_RAW);

2. Set the IP_HDRINCL socket option to 1 - This is same as the above. Just another way of doing.

When using the IP_HDRINCL the protocol used in the socket function is effectively of no use.

In this example we are creating raw sockets
where we specify the Ip header and TCP header. The packet that moves out
of the machine actually has 1 more header attached to it called the
Ethernet header. So the actual packet structure is somewhat like this.
Packet = Ethernet header + Ip header + TCP header + Data
Take a look at the packets sniffed by wireshark to understand this
better. It is important to note here that the Ethernet header is
provided by the OS kernel and we do not have to construct it. However it
is possible to make such raw packets where we can even specify the
ethernet header, but we shall look into those in a separate article.

Below is an example code which constructs a raw TCP packet with some data

Compile and Run

Compile by program by doing a gcc raw_socket.c at the terminal.
Remember to run the program with root privileges. Raw sockets require
root privileges. Note the while loop in the above program. It has been
put for testing purpose and should be removed if you dont intend to
flood the target.
Use a packet sniffer like wireshark to check the output and verify
that the packets have actually been generated and send over the network.
Also note that if some kind of firewall like firestarter is running
then it might block raw packets.

Wise men say that you should never choose the
easy path but, instead, live life fully. But when it comes to moving
around the Unix file system, easy is good. And bash's builtin shopt command can make maneuvering even the most complicated file system paths easier.

Some of us have spent decades moving around our Unix file systems with the cd
command and maybe even using file name completion so that we don't have
to type very letter in every directory name. Even so, there may be
some bash options that you might not be aware of and the shopt command has some options that might surprise you.

The shopt built-in provides numerous options for changing optional shell behavior. To view these options, just type shopt
on the command line. Oh, and don't mentally parse that as "shop t",
but "sh opt" as in "shell options". That should make it easier to
remember.
Depending on the version of bash that you are using, you will see a variable list of options that you can set and unset with shopt -s (set) and shopt -u (unset) commands. Some of these can be used to change bash's behavior when you issue cd commands.

$ shopt
cdable_vars off
cdspell off
checkhash off
checkwinsize on
cmdhist on
dotglob off
execfail off
expand_aliases on
extdebug off
extglob off
extquote on
failglob off
force_fignore on
gnu_errfmt off
histappend off
histreedit off
histverify off
hostcomplete on
huponexit off
interactive_comments on
lithist off
login_shell on
mailwarn off
no_empty_cmd_completion off
nocaseglob off
nocasematch off
nullglob off
progcomp on
promptvars on
restricted_shell off
shift_verbose off
sourcepath on
xpg_echo off

Notice that this list also tells you if the option in question is enabled (on) or disabled (off).
If autocd is in your list (bash 4.1 and newer), you can use this option to make it possible to cd to a directory without typing the cd
command. Instead, you just type a directory name. The shell then sees
that the "command" you entered is not a command, but the name of a
directory and moves you there.

$ Lab12
$ pwd
/home/unixdweeb/Lab12

A similar option, available in older (pre 4.1) versions of bash is cdable_vars.
This is easier to parse than "shopt". Think of it as "cd-able
variables". In other words, if a variable contains the name of a
directory, you can move into the directory by using the variable with
the cd command.

$ shopt -s cdable_vars
$ lab=Lab12
$ cd $lab
$ pwd
Lab12

By the way, this doesn't work if your variable is set to something like
~/Lab12 or ~jdoe/Lab12. The tilde isn't interpreted as referring to
your home directory.

But $HOME/Lab12 works as expected.
The cdable_vars option essentially provides a way for you to set up shortcuts to directories without having to create symbolic links.
Another useful option that helps with directory changes is cdspell. This one allows the shell to compensate for minor typos in the paths you type. Meant to go to /tmp, but you typed cd /tpm, no problem!

$ shopt -s cdspell
$ cd /tpm
/tmp

The force_fignore, set by default, uses whatever value you have
assigned to $FIGNORE to prevent path completion using that name. If,
for example, you have two directories -- ignore.yes and ignore.no
-- and you set FIGNORE to "yes", you will be able to use path
completion for ignore.no, but the shell will act as if ignore.yes
doesn't exist -- for path completion anyway. You have to type each
letter to cd into it.

The force_fignore variable can be used when there are particular
directories that you don't want your uses to accidentally enter when
using path name completion. The FIGNORE variable can be set to a
complete directory name or, as in the example, to a file extension or
right substring of the file name.
Choose your own path, but let shopt make it easier.

Wednesday, June 26, 2013

Messaging servers facilitate the exchange of
binary or text information between remote systems, and are useful in any
distributed system where one node needs to share information with
another. They have a lot to do: They must deliver messages, monitor
client connections, and store and resend undelivered messages when a
destination system comes back online.
You don't need a message server if guaranteed message delivery isn't
important, but for some types of applications, such as financial
transactions, it is essential that messages arrive at their destination
in the right order and be stored in the case of a network failure.
Apache ActiveMQ, based on the Java Message Service
1.1 specification, is a mature open source message server with a range
of advanced features. It supports a wide range of client libraries,
which allows programs written in languages such as C/C++, .NET, PHP, and
Ruby to access and use the message server. It is also frequently tested
with popular JEE servers such as JBoss, GlassFish,
and WebLogic. If you're writing an application that encompasses
distributed systems, you can use ActiveMQ as your application's
communication channel.
To see how such a messaging server works, you can start by
implementing a simple remote system logger that logs events, warnings,
and errors that occur during a program's execution and sends these log
messages over the network to a client that reads and optionally stores
or processes the messages. For our purposes, we'll say that multiple log
readers must be able to receive the log messages, and if a log reader
disconnects, it must be able to catch up with log messages that were
issued while it was offline.
You can read more about the basics of ActiveMQ in How to Get Started with ActiveMQ. Install the program, change directory to apache-activemq-5.8.0/bin and run ./activemq start. You should then be able to access the ActiveMQ's admin page from a browser at http://localhost:8161.

Queues vs. topics

Before we start coding, let's talk about some key components. At the
highest level a message producer, which can be any type of software
including a server, a database, or an application, wants to send a
message to a remote system. The remote system is a message consumer, and
there can be more than one of them. The producer submits the message to
the message server, which processes it and sends it to the message
consumer. The message producer doesn't communicate directly with the
consumer, but rather uses the message server as a broker.
ActiveMQ (and JMS) has two models for processing messages: a
publish-and-subscribe model and a load-balancer model. The former is
implemented as topics in ActiveMQ, while the latter is implemented as
queues.
With topics, the producer submits a message to the message server,
which sends it to all the consumers that have registered to receive the
messages.
With queues, the message server sends any one message to exactly one
consumer. If there are two or more consumers, they will receive messages
alternately as determined by a load balancer built into the message
server.
For our purposes, then, queues are not desirable, as each log reader
would get only a few of the messages. If we use topics, every log reader
that subscribes to receive the messages will get its own copy.
Topics can be either durable or non-durable. Durable topics are
stored by the message server until delivered, while non-durable topics
have no persistency and are discarded if the consumer is not online to
receive the message. For our remote system logger, a log reader needs to
register itself with the message server as durable to ensure that the
messages are stored and later delivered if the log reader disconnects
and then later reconnects.

The logger and its reader

On our demo logging system, the server part – the message producer –
connects to ActiveMQ and starts sending the log messages it is
generating. One or more log readers – the message consumers – connect to
ActiveMQ to receive those messages. Because we will use topics with
durability, once a log reader has registered an interest it can recover
from network disconnects without losing any messages.
We have to take into consideration that until the first log reader
has registered its interest with ActiveMQ, there will be no consumers
for the messages logged by the server component, and they will be lost.
Also, ActiveMQ identifies the set of log readers that want to receive
log messages by using a combination of the client ID and subscriber
name, which means that the client ID needs to be unique for each log
reader that connects. After a connection drop, a log reader must use the
same client ID to ensure that it can pick up where it left off.
Depending on the network topography, you might use information such as
IP address, hostname, or MAC address to create a unique client ID.

First contact

In a real-world example the logger would be part of a bigger program
which would call it to send the log message, but we've coded it as a
program of its own for illustration purposes. The log reader would
either be a standalone program or part of a larger monitoring solution.
The initial ActiveMQ API calls for the logger start by making a
connection to ActiveMQ. Here is that code in Java, but you can use the
ActiveMQ API in other languages:

For the logger, the final initialization step is to create a message
producer object, which is the top-level object used for sending
messages, and ensure that ActiveMQ writes the messages to disk before
delivering them, so that messages won't be lost if the system reboots.

Sending and receiving

At this point in the code, both the logger and the log reader have
established a link with ActiveMQ. To send messages, the logger calls producer.send(), and to receive messages, the log reader calls consumer.receive().
We can add a logIt() function to the logger code that
prepends the date to the log message and wraps it up in a TextMessage
object for sending to the log reader. A TextMessage object is the
easiest way to send a message that contains a string, which means it can
also send XML. The TextMessage object has a getText() method that the log reader can use to extract the string from the received message.

The downloadable zip archive of the project contains three files. One, build.xml, lets the Apache Ant
tool build and execute the logger and reader programs. The other two
are the source files Logger.java and LogReader.java. Unzip the archive
in the ActiveMQ installation directory (apache-activemq-5.8.0/) and
change directory to logger. You can run the logger with the command ant logger, and the log reader with ant logreader.
If you want to see whether the code works as it should, try stopping
the log reader and starting it again after a few seconds. ActiveMQ
should send all the log messages posted while the log reader was offline
to the log reader once it comes back in contact.

Conclusion

As you can see, programming for ActiveMQ is straightforward. Now that
you know how to create a logger, try modifying the programs to uses
queues rather than topics – call session.createQueue() instead of session.createTopic() – and observe the difference in behavior. You can also try creating a non-durable subscriber – use session.createSubscriber() rather than session.createDurableSubscriber() in the log reader – and see what happens to messages posted when the log reader is offline.
To learn more about ActiveMQ it is also worth looking at the various article resources on the ActiveMQ website.