Wednesday, February 28, 2007

Image spammers doing the twist

It's been quite a while since I last blogged about ever changing image spam. Anna Vlasova wakens me from my unblogging slumber with some great samples of recent image spams were the spammer has decided to rotate the entire image to try to avoid detect. Take a look at this first one:

The spammer has really gone to town here:

There's random speckling all over the images to upset hashing and OCR techniques

There's no URL in the message itself (it's in the image)

The entire image has been rotated to the left to obscure the text

And, of course, they are not going to be content with just one rotation and can randomize the angle per message:

And they've gone even further by slicing the image up, randomizing the angle and overlaying the elements using animation.

Monday, February 19, 2007

Jack Bauer's Management Secrets #1: I need it!

This is part one of a series of posts unlocking the valuable management secrets and strategies of 24's best agent: Jack Bauer. What is it that makes Jack successful? Sure, he's a great shot, he's been trained in all sorts of combat, sometimes he's lucky, clearly he's very driven.

But what really makes Jack a winner are his managament skills. Jack successfully motivates and manages, he handles superiors and subordinates, he gains people's trust, he has high integrity, he's a team player and ultimately he helps his team win time and again.

These posts look into Jack's management secrets. In part one I look out how Jack creates a sense of urgency while at the same time binding his team together towards a common goal. And he does all of that with a simple phrase: 'I need it!'.

I need it!

Jack doesn't say "I want this done" or "You must do this", he tells his team members (especially, Chloe) "I need it!". Why's that so important?

Firstly, by saying "I need it!" Jack makes his request personal. It's not a simple matter of something having to be done, it's that Jack (in his position of authority) needs the task completed for his own success. That's an important technique because it binds the manager's success along with the subordinate's success. If the person Jack is asking successfully completes the task they can see clearly that they've been successful personally, made the manager successful and by extension made the entire team more successful.

It also gives the employee a sense of empowerment and reason for their work. Since they know the manager needs the task completed, instead of merely wanting it done, the employee feels that the task is worth accomplishing.

Contrast that with a classic pointy-haired manager who spits out orders without being personally involved. An employee just sees the manager as a conduit for orders that probably came from above them. There's no sense that the manager is in any way involved in the decision making process, or even cares about the outcome.

Of course, it's not enough for Jack to say that he needs it, he needs to provide justification and clarity. He does this in two ways: he briefly explains why, and he sets a clear time frame. Because he's built trust with the team (a subject of a separate post) he can briefly explain his reasons without fuss, and tell the employee when the task must be completed.

Here's Jack speaking to Chloe in Season 5:

Chloe, listen to me, I have a thumb drive that's going to help us find the sentox. I need you to data mine the files.

He makes a clear request 'data mine the files', he gives a clear reason 'going to help us find the sentox' and states that he needs it. In this case he doesn't state when, because it's clear from the context that 'when' is 'asap'.

So, how can you apply Jack's "I need it!" technique in the work place? You just need to remember the three step C T U plan whenever setting a task:

C is for Context: explain the context of the request so it's clear why you are asking for a particular task to be completed.T is for Timeframe: make sure that the expected deadline or timeframe is clearly stated.U is for Urgency: communicate your sense of urgency by specifying that this is something you need.

Trusted Email Connection Signing (rev 0.2)

The motivation behind TECS (Trusted Email Connection Signing) is that what managers of MX servers on the public Internet really care about is the ability to distinguish a good connection (coming from a legitimate sender and which will be used to send wanted email) from a bad connection (coming from a spammer). If you can identify a bad connection (today, you do that using an RBL or other reputation service based on the IP address of the sender) you can tarpit or drop it, or subject the mails sent on the connection to extra scrutiny. If you can identify a good connection it can bypass spam checks and help reduce the overall false positive rate.

If you are a legitimate bulk mailer (an email marketer, for example) then you care deeply that you reputation being recognizable and that mail sent from you be delivered. Currently, you have to carefully tend your IP addresses to make sure that they don't appear on blacklists, and you have to ensure that new IP addresses are clean.

If you are running a large email service (e.g. Yahoo! Mail) then you are currently trying to build white and blacklists of IP addresses, when what you really want are white and black lists of entities.

Currently, the options used to identify a bad connection are rather limited (RBLs, paid reputation services and grey listing), and good connections are hard to manage (whitelists on a per-recipient basis, or pay-per-mail services). What's needed is a different approach.

The idea is to identify and determine the reputation of the entity connecting to a mail server in real-time without resorting to a blacklist or whitelist. This is done by signing the connection itself. With the signature on a per-connection basis a mail server is able to determine who is responsible for the connection, and then look up that entity's reputation in a database.

Current reputation databases are based on IP addresses. This is a very inflexible system: IP addresses must be added to blacklists very fast as spammers churn through zombie machines, and any legitimate emailer needs to make sure their mail servers are whitelisting with multiple email providers (e.g. Yahoo!, Gmail, Brightmail, ...) to ensure delivery. And if a legitimate mailer wants to bring on line new servers, with new IP addresses they have to run through the entire whitelisting process again.

This is inefficient. The mapping between IP address and entities (e.g. knowing that Google's Gmail services uses a specific set of IP addresses) is unwieldy to manage and the wrong level of granularity. Google should be free to add and remove email servers at will, while carrying their good reputation with them.

That's what TECS gives you.

Connection Signing

TECS is an extension to the existing SMTP AUTH mechanism (see RFC 2554) and implements an authentication mechanism that I'll refer to as TECS-1 (the 1 here acts as a version number on the protocol). TECS-1 would need to be registered as a SASL (see RFC 2222) authentication mechanism.

When a mail sender connects to an SMTP server wishing to sign its connection it issues the EHLO command and if that SMTP server is capable of handling AUTH the mail sender then signs the connection using the AUTH command, with the TECS-1 mechanism followed by an initial response (which contains the TECS signature) as defined in RFC 2554.

The 'initial response' section of the of the AUTH command is a base-64 encoded string containing the following structure (this is deliberately similar to the DKIM fields):

a=rsa-sha256; q=dns; d=jgc.org; b=oU0Nnbmh1YWVlMDljNDBhZjJiO==

a= is the cryptographic method used (default would be RSA/SHA-256 with suitable padding as described in PKCS#1 version 1.5 RFC 3447).

d= is the name of the domain signing the connection. In the example above I am showing a connection that is being signed (and hence claimed by) jgc.org.

q= is a query type with the default being the use of a DNS TXT record. This query method is used to obtain the public key associated with the signing domain. The public key would be obtained by looking up _tecs.jgc.org and getting the associated TXT record.

b= is the binary signature for the connection generated using the method in a= by the d= domain.

The connecting server signs the tuple consisting of ( destination IP/port, source IP/port and epoch ); that way they sign the current connection and verify that they are responsible for the mail sent across it.

Each entity has an RSA key public/private key pair. When signing a connection the entity generates a SHA-256 hash of the tuple. The destination IP/port pair is the IP address and port on the mail server that the mail sender is currently connected to; similarly the source IP/port pair is the IP address and port of the connection being used by themail sender. The epoch is the standard Unix epoch rounded to the nearest 30 seconds.

The entity making the connection then encrypts the hash with their private key.

Update (February 8, 2007): A number of people have suggested getting the public key from the _domainkey (i.e. DKIM) label of the d= domain. This seems like a good idea since there's no need to reinvent the wheel.

Update (February 9, 2007). A few people pointed me in the direction of the MARID CSV proposal (see CSV). I've addressed this below.

I'm big ISP and I don't want to sign as myself, can I sign as a customer?Sure, for example, jgc.org's mail is actually handled by lists.herald.co.uk. When that server was sending mail for jgc.org it could sign as jgc.org as long as it has access to jgc.org's private key. It would simply specify d=jgc.org in the TECS data.

Why don't you just use STARTTLS with certificates?Because that's a very heavyweight system, designed for something else. A SASL extension using SMTP AUTH is simple and clean.

Why do you think connection signing is useful?Because SMTP server resources are precious. Being able to make a decision about a connection before any mail is delivered is very useful. An SMTP server owner could use reputation data from a public or private source to decide whether to accept or reject a connection, slow down a connection, apply little or extra scrunity to a connection, etc. Being able to do this before receiving a ton of mail and tying up a server is very valuable.

Won't spammers just sign their connections?No doubt, but that's hardly a worry, being able to identify the good senders fast is the most important goal.

Why don't you just signed the destination IP/port pair? That's known before the connection is made and avoids problems with NATTECS could just sign the tuple ( destination IP, port, epoch ) but I think it's a bit weaker than my proposal. Since the destination IP and port are fixed for a given MTA the signature is really a signature on the time. An eavesdropper could reply the signature within 30 seconds (or other timeone on the epoch value) and get an authenticated connection from any source IP address.

What about the MARID CSV proposal?The CSV proposal is a lightweight (DNS-based), non-cryptographic method of estabilishing whether a host claiming a certain domain name in HELO/EHLO is authorized to be an SMTP client. Clearly, CSV aims to provide a simple method of determining whether a connecting SMTP client is authorized to be an SMTP client with the claimed name. This seems like a useful extension, but is very different from TECS. TECS operates at the level of a specific connection, and with an entity that is distinct from the domain of the SMTP client. This is valuable for two reasons: it allows the identity to be moved from SMTP service provider to provider, and it means that shared SMTP servers can operate claiming different 'responsible parties' for each connection. This latter point is important for ISPs that provide SMTP services to email marketers where the same SMTP server may be shared across many clients. This can result in a clean emailer being blacklisted because the IP of the shared server was blacklisted because of some other unrelated misbehaviour.

Whilst CSV is a useful extension which would help with the zombie problem, it does not address the needs at the connection level where I believe the problem needs to be addressed.

CSV also provides specific services for checking domain names against accreditation services. That is outside the scope of TECS, although the assumption is that such services would exist for TECS signed connections against the domain name claiming responsbility. The bottom line is that TECS deals with the party responsible for a connection, CSV the party responsible for the server.

What about mailing lists that forward mail?By signing their connections they take responsibility for the mails they are sending. So mailing lists would need to have appropriate email policies in place for unsubscriptions, and deal themselves with spam to the list. Since the connection is signed any concern about munging of From: addresses for VERP handling, or adding headers/footers to email are irrelevant.

Is this compatible with SPF, Sender-ID, DomainKeys?They are orthogonal. There's no direct interaction. Although, it might be sensible to use the _domainkey record from DKIM to obtain a public key thus sharing the same key between DKIM and TECS.

Will this reduce spam?I'm not going to make any predictions. The goal would be to build a database that makes it easier to recognize someone who is legitimate, and scrutinize those who abuse the system or who choose not to sign.

What about anonymity?Anoymous remailers are unaffected. They could sign their outbound connections with the system but that would not affect any changes they make to anonymize messages since its the conneciton, not the message content that's signed.

What if I change the mail servers or IP addresses I am using?There's no effect. Keep signing the connections and you can take responsibility for any IP address you want to.

I think you are wrong, right, stupid, a genius.Please comment here, or write to me directly.Many thanks to all members of the REDACTED discussion forum, and to Toby DiPasquale.

SocksFox rocks

It's not often that I post a personal experience on this blog (it's usually a bunch of dry stuff about anti-spam or GNU Make), but today's an exception.

I had a great on line shopping experience with a UK-based company called SocksFox.

I wanted to buy some of these Falke socks for padding around at home in. Since I work at home I spend almost all my time shoeless, and regular socks were getting a little chilly during the winter months.

Unable to find a retailer where I live, I went on line and found SocksFox. What a great experience! Here's the chronology in French time.

Day 1

1300 I place order on the SocksFox web site using PayPal. Not only do they accept credit cards, but you can pay by PayPal. That's great for small purchases since PayPal does not charge lots of money for different currency orders.

1303 Receive PayPal receipt for payment to SocksFox.

1304 Receive email from SocksFox acknowledging order

1458 Receive email from SocksFox indicating that the socks have been dispatched by regular mail from the UK.

Day 2

1100 Receive socks

Clearly, some thanks needs to go to the French and British postal services, but just look at the speed of that. In 22 hours I'd gone from order to delivery including the transportation of the socks from the UK to the South of France.

Monday, February 05, 2007

A strange PHPism from the O'Reilly book

I've been working on a project where the front end is written in PHP and decided that I really needed to sit down and learn something about the language instead of just hacking my way through the existing scripts. I got a copy of Programming PHP and have been working my way page-by-page through the language.

On Page 54 I came across an example that I just can't figure out. I quote:

The do/while statement is sometimes used to break out of a block of code when an error condition occurs. For example:

do { // do some stuff if ($error_condition) break; // do some other stuff} while (false);

Because the condition for the loop is false, the loop is executed only once, regardless of what happens inside the loop. However, if an error occurs, the code after the break is not evaluated.

Now, can someone in PHP land explain to my why you would do that when the following is much clearer and simpler:

// do some stuffif (!$error_condition) { // do some other stuff}

And while I'm ranting WTF is the deal with variables essentially having no scope (a variable defined in inside a block is global: either global to the script or 'global' to the function definition the block is within).

Thursday, February 01, 2007

Proposal for connection signing reputation system for email: TECS

IMPORTANT: This blog post is deprecated. Please read Trusted Email Connection Signing (rev 0.2) insteadThe motivation behind TECS (Trusted Email Connection Signing) is that what managers of MX servers on the public Internet really care about is the ability to distinguish a good connection (coming from a legitimate sender and which will be used to send wanted email) from a bad connection (coming from a spammer). If you can identify a bad connection (today, you do that using an RBL or other reputation service based on the IP address of the sender) you can tarpit or drop it, or subject the mails sent on the connection to extra scrutiny. If you can identify a good connection it can bypass spam checks and help reduce the overall false positive rate.

Currently, the options used to identify a bad connection are rather limited (RBLs, paid reputation services and grey listing), and good connections are hard to manage (whitelists on a per-recipient basis, or pay-per-mail services). What's needed is a different approach.

There are also ideas like SPF, Sender-ID and DomainKeys which all attack the problem of protecting the integrity of the From: portion of a message.

TECS is different. The idea is to identify and determine the reputation of the entity connecting to a mail server in real-time without resorting to a blacklist or whitelist. This is done by signing the connection itself. With the signature on a per-connection basis a mail server is able to determine who is responsible for the connection, and then look up that entity's reputation in a database.

Current reputation databases are based on IP addresses. This is a very inflexible system: IP addresses must be added to blacklists very fast as spammers churn through zombie machines, and any legitimate emailer needs to make sure their mail servers are whitelisting with multiple email providers (e.g. Yahoo!, Gmail, Brightmail, ...) to ensure delivery. And if a legitimate mailer wants to bring on line new servers, with new IP addresses they have to run through the entire whitelisting process again.

This is inefficient. The mapping between IP address and entities (e.g. knowing that Google's Gmail services uses a specific set of IP addresses) is unwieldy to manage and the wrong level of granularity. Google should be free to add and remove email servers at will, while carrying their good reputation with them.

That's what TECS gives you.

Now for the how. To work TECS requires two things: a reputation authority and an algorithm. Let's start with the second.

Connection Signing

When a mail sender connects to an SMTP server wishing to sign its connection it issues the EHLO command and if that SMTP server is capable a new extension command TECS will be available. After the EHLO the mail sender then signs the connection using the TECS command.

The TECS command has two parts: an identifier (this is the unique identifier of the entity signing the connection, and thus taking responsibility for the messages send across the connection) and a signature.

Each entity has an RSA key public/private key pair. When signing a connection the entity generates a SHA-256 hash of the tuple . The destination IP/port pair is the IP address and port on the mail server that the mail sender is currently connected to; similarly the source IP/port pair is the IP address and port of the connection being used by themail sender. The epoch is the standard Unix epoch rounded to the nearest 30 seconds.

The entity making the connection then encrypts the hash with their private key, turns that into a hex string and uses that string as the second parameter to the new SMTP TECS command.

For example, an entity with the unique identifier 1b46ef4 might sign a particular connection like this:

TECS 1b46ef3d 5dde82a341863c87be1258c02ce7f80bf214192b

to which the receiving server could reply 200 OK if the signature is good (which they verify by generating the same hash and decrypting using the entity's public key), or with an error if the signature is bad (and they should probably drop the connection).

To get the entity's public key the receiving server needs to query the reputation authority.

Reputation Authority

The TECS reputation authority would be a non-profit organization that sells public/private key pairs and allocates entity IDs to verified entities. Money gathered from selling keys would be used to maintain the database of reputation information for each entity, and in ensuring the only reputable entities can obtain keys.

In the example above the receiving server would query the DNS TXT record of the domain name produced by concatenating identifier given in the TECS command with the name of the authority. Suppose that the authority was tecs.jgc.org then a DNS TXT query would go to 1b46ef3d.tecs.jgc.org.

The reply would consist of the ascii-armored public key for that entity and a reputation measure indicating the reliability of that user. The reputation measure would take one of 4 states: unknown (a recently issued key would not have any reputation), good (only a small number of complaints against this ID), medium (some complaints), bad (large number of complaints, probable spam source). The receiving server can verify the signature and use the reputation information to decide on the handling of the connection.

The authority would accept ARF formatted complaints consisting of abusive messages giving connection information, and the full text of the TECS command. They would then investigate to ensure that the reputation database contained up to date and useful information.

How much is a key pair going to cost?I think it should be cheap for individuals ($25?), fairly cheap for non-profits and charities ($100?), and then a sliding scale for for-profit companies based on size (say $100 for a small company, $1000 for a big one?). The goal would be to make enough money to run the list.

What about mailing lists that forward mail?By signing their connections they take responsibility for the mails they are sending. So mailing lists would need to have appropriate email policies in place for unsubscriptions, and deal themselves with spam to the list. Since the connection is signed any concern about munging of From: addresses for VERP handling, or adding headers/footers to email are irrelevant.

Is this compatible with SPF, Sender-ID, DomainKeys?They are orthogonal. There's no direct interaction.

Will this reduce spam?I'm not going to make any predictions. The goal would be to build a database that makes it easier to recognize someone who is legitimate, and scrutinize those who abuse the system or who choose not to sign.

What about anonymity?Anoymous remailers are unaffected. They could sign their outbound connections with the system but that would not affect any changes they make to anonymize messages since its the conneciton, not the message content that's signed.

What if I change the mail servers or IP addresses I am using?There's no effect. Keep signing the connections and you can take responsibility for any IP address you want to.

I think you are wrong, right, stupid, a genius.Please comment here, or write to me directly.

Available Now

With this unique traveler's guide, you'll learn about 128 destinations around the world where discoveries in science, mathematics, or technology occurred or is happening now. Travel to Munich to see the world's largest science museum, watch Foucault's pendulum swinging in Paris, ponder a descendant of Newton's apple tree at Trinity College, Cambridge, and more. Each site in The Geek Atlas focuses on discoveries or inventions, and includes information about the people and the science behind them.