The book is AMAZING! Taking a trip down memory lane to ’80s pop culture, video games, music & movies. A sci-fi futuristic book that online gamers are trying to solve puzzles on a easter egg hunt for the control of oasis, a virtual reality game.

2FA SSH aka OpenSSH OATH, Two-Factor Authentication

prologue

Good security is based on layers of protection. At some point the usability gets in the way. My thread model on accessing systems is to create a different ssh pair of keys (private/public) and only use them instead of a login password. I try to keep my digital assets separated and not all of them under the same basket. My laptop is encrypted and I dont run any services on it, but even then a bad actor can always find a way.

Back in the days, I was looking on Barada/Gort. Barada is an implementation of HOTP: An HMAC-Based One-Time Password Algorithm and Gort is the android app you can install to your mobile and connect to barada. Both of these application have not been updated since 2013/2014 and Gort is even removed from f-droid!

Talking with friends on our upcoming trip to 34C4, discussing some security subjects, I thought it was time to review my previous inquiry on ssh-2FA. Most of my friends are using yubikeys. I would love to try some, but at this time I dont have the time to order/test/apply to my machines. To reduce any risk, the idea of combining a bastion/jump-host server with 2FA seemed to be an easy and affordable solution.

OpenSSH with OATH

As ssh login is based on PAM (Pluggable Authentication Modules), we can use Gnu OATH Toolkit. OATH stands for Open Authentication and it is an open standard. In a nutshell, we add a new authorization step that we can verify our login via our mobile device.

Below are my personal notes on how to setup oath-toolkit, oath-pam and how to synchronize it against your android device. These are based on centos 6.9

OpenSSH

Verify PAM

SSHD-PAM

# cat /etc/pam.d/sshd

In modern systems, the sshd pam configuration file seems:

#%PAM-1.0
auth required pam_sepermit.so
auth include password-auth
account required pam_nologin.so
account include password-auth
password include password-auth
# pam_selinux.so close should be the first session rule
session required pam_selinux.so close
session required pam_loginuid.so
# pam_selinux.so open should only be followed by sessions to be executed in the user context
session required pam_selinux.so open env_params
session required pam_namespace.so
session optional pam_keyinit.so force revoke
session include password-auth

Depending on your policy and thread model, you can switch sufficient to requisite , you can remove debug option. In the third field, you can try typing just the pam_path.so without the full path and you can change the window to something else:

Restart sshd

In every change/test remember to restart your ssh daemon:

# service sshd restart

Stopping sshd: [ OK ]
Starting sshd: [ OK ]

SELINUX

If you are getting some weird messages, try to change the status of selinux to permissive and try again. If the selinux is the issue, you have to review selinux audit logs and add/fix any selinux policies/modules so that your system can work properly.

Post Scriptum

The idea of using OATH & FreeOTP can also be applied to login into your laptop as PAM is the basic authentication framework on a linux machine. You can use OATH in every service that can authenticate it self through PAM.

Signal

The year is 2144. A group of anti-patent scientists are working to reverse engineer drugs in free labs, for (poor) people to have access to them. Agents of International Property Coalition are trying to find the lead pirate-scientist and stop any patent violation by any means necessary. In this era, without a franchise (citizenship) autonomous robots and people are slaves. But only a few of the bots have are autonomous. Even then, can they be free ? Can androids choose their own gender identity ? Transhumanism and extension life drugs are helping people to live a longer and better life.

Postfix

In a nutshell, postfix will pass through (filter) an email using the milter protocol to another application before queuing it to one of postfix’s mail queues. Think milter as a bridge that connects two different applications.

Training

If you have already a spam or junk folder is really easy training the Bayesian classifier with rspamc.

I use Maildir, so for my setup the initial training is something like this:

# cd /storage/vmails/balaskas.gr/evaggelos/.Spam/cur/

# find . -type f -exec rspamc learn_spam {} \;

Auto-Training

I’ve read a lot of tutorials that suggest real-time training via dovecot plugins or something similar. I personally think that approach adds complexity and for small companies or personal setup, I prefer using Cron daemon:

Are you willing to walk-away without anything in the world to build a better world ?

In many companies, nagios is the de-facto monitoring tool. Even with new modern alternatives solutions, this opensource project, still, has a large amount of implementations in place. This guide is based on a “clean/fresh” CentOS 6.9 virtual machine.

Selinux

Every online manual, suggest to disable selinux with nagios. There is a reason for that ! but I will try my best to provide info on how to keep selinux enforced. To write our own nagios selinux policies the easy way, we need one more package:

# yum -y install policycoreutils-python

Starting nagios:

# /etc/init.d/nagios restart

will show us some initial errors in /var/log/audit/audit.log selinux log file

Selinux - Nagios package

So … there is an rpm package with the name: nagios-selinux on Version: 4.3.2
you can install it, but does not resolve all the selinux errors in audit file ….. so ….
I think my way is better, cause you can understand some things better and have more flexibility on defining your selinux policy

Nagios Plugins

Nagios is the core process, daemon. We need the nagios plugins - the checks !
You can do something like this:

and if everything is going as planned, you will see something like this:

PNP4Nagios

It is time, to add pnp4nagios a simple graphing tool and get read the nagios performance data and represent them to graphs.

# yum info pnp4nagios | grep Version
Version : 0.6.22

# yum -y install pnp4nagios

We must not forget to restart our web server:

# /etc/init.d/httpd restart

Bulk Mode with NPCD

I’ve spent toooooo much time to understand why the default Synchronous does not work properly with nagios v4x and pnp4nagios v0.6x
In the end … this is what it works - so try not to re-invent the wheel , as I tried to do and lost so many hours.

Performance Data

We need to tell nagios to gather performance data from their check:

# vim +/process_performance_data /etc/nagios/nagios.cfg

process_performance_data=1

We also need to tell nagios, what to do with this data:

nagios.cfg

# *** the template definition differs from the one in the original nagios.cfg
#
service_perfdata_file=/var/log/pnp4nagios/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATAtTIMET::$TIMET$tHOSTNAME::$HOSTNAME$tSERVICEDESC::$SERVICEDESC$tSERVICEPERFDATA::$SERVICEPERFDATA$tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$tHOSTSTATE::$HOSTSTATE$tHOSTSTATETYPE::$HOSTSTATETYPE$tSERVICESTATE::$SERVICESTATE$tSERVICESTATETYPE::$SERVICESTATETYPE$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file
# *** the template definition differs from the one in the original nagios.cfg
#
host_perfdata_file=/var/log/pnp4nagios/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATAtTIMET::$TIMET$tHOSTNAME::$HOSTNAME$tHOSTPERFDATA::$HOSTPERFDATA$tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$tHOSTSTATE::$HOSTSTATE$tHOSTSTATETYPE::$HOSTSTATETYPE$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file

Slack

Iterator

a few months ago, I wrote an article on RecursiveDirectoryIterator, you can find the article here: PHP Recursive Directory File Listing . If you run the code example, you ‘ll see that the output is not sorted.

Object

Recursive Iterator is actually an object, a special object that we can perform iterations on sequence (collection) of data. So it is a little difficult to sort them using known php functions. Let me give you an example:

Internet Answers

Unfortunately stackoverflow and other related online results provide the most complicated answers on this matter. Of course this is not stackoverflow’s error, and it is really a not easy subject to discuss or understand, but personally I dont get the extra fuzz (complexity) on some of the responses.

Back to basics

So let us go back a few steps and understand what an iterator really is. An iterator is an object that we can iterate! That means we can use a loop to walk through the data of an iterator. Reading the above output you can get (hopefully) a better idea.

Prologue

Part of my day job is to protect a large mail infrastructure. That means that on a daily basis we are fighting SPAM and try to protect our customers for any suspicious/malicious mail traffic. This is not an easy job. Actually globally is not a easy job. But we are trying and trying hard.

ReplyTo

The last couple months, I have started a project on gitlab gathering the malicious ReplyTo from already identified spam emails. I was looking for a pattern or something that I can feed our antispam engines with so that we can identify spam more accurately. It’s doesnt seem to work as i thought. Spammers can alter their ReplyTo in a matter of minutes!

TheList

Here is the list for the last couple months: ReplyTo
I will -from time to time- try to update it and hopefully someone can find it useful

Free domains

It’s not much yet, but even with this small sample you can see that ~ 50% of phishing goes back to gmail !

105 gmail.com
49 yahoo.com
18 hotmail.com
17 outlook.com

More Info

You can contact me with various ways if you are interested in more details.

Prologue

I should have written this post like a decade ago, but laziness got the better of me.

I use TLS with IMAP and SMTP mail server. That means I encrypt the connection by protocol against the mail server and not by port (ssl Vs tls). Although I do not accept any authentication before STARTTLS command is being provided (that means no cleartext passwords in authentication), I was leaving the PLAIN TEXT authentication mechanism in the configuration. That’s not an actual problem unless you are already on the server and you are trying to connect on localhost but I can do better.

LDAP

I use OpenLDAP as my backend authentication database. Before all, the ldap attribute password must be changed from cleartext to CRAM-MD5

Dovecot

Dovecot is not only the imap server but also the “Simple Authentication and Security Layer” aka SASL service. That means that imap & smtp are speaking with dovecot for authentication and dovecot uses ldap as the backend. To change AUTH=PLAIN to cram-md5 we need to do the below change:

file: 10-auth.conf

From:

auth_mechanisms = plain

To:

auth_mechanisms = cram-md5

Before restarting dovecot, we need to make one more change. This step took me a couple hours to figure it out! On our dovecot-ldap.conf.ext configuration file, we need to tell dovecot NOT to bind to ldap for authentication but let dovecot to handle the authentication process itself:

From:

# Enable Authentication Binds
# auth_bind = yes

To:

# Enable Authentication Binds
auth_bind = no

To guarantee that the entire connection is protected by TLS encryption, change in 10-ssl.conf the below setting:

From:

ssl = yes

To:

ssl = required

SSL/TLS is always required, even if non-plaintext authentication mechanisms are used. Any attempt to authenticate before SSL/TLS is enabled will cause an authentication failure.

When traveling, I make an effort to visit the local hackerspace. I understand that this is not normal behavior for many people, but for us (free / opensource advocates) is always a must.

This was my 4th week on Bratislava and for the first time, I had a couple free hours to visit ProgressBar HackerSpace.

For now, they are allocated in the middle of the historical city on the 2nd floor. The entrance is on a covered walkway (gallery) between two buildings. There is a bell to ring and automated (when members are already inside) the door is wide open for any visitor. No need to wait or explain why you are there!

Entering ProgressBar there is no doubt that you are entering a hackerspace.

Failures

Every SysAdmin, DevOp, SRE, Computer Engineer or even Developer knows that failures WILL occur. So you need to plan with that constant in mind. Failure causes can be present in hardware, power, operating system, networking, memory or even bugs in software. We often call them system failures but it is possible that a Human can be also the cause of such failure!

Listening to the stories on the latest episode of stack overflow podcast felt compelled to share my own oh-shit moment in recent history.

I am writing this article so others can learn from this story, as I did in the process.

Rolling upgrades

I am a really big fun of rolling upgrades.

I am used to work with server farms. In a nutshell that means a lot of servers connected to their own switch behind routers/load balancers. This architecture gives me a great opportunity when it is time to perform operations, like scheduling service updates in working hours.

eg. Update software version 1.2.3 to 1.2.4 on serverfarm001

The procedure is really easy:

From the load balancers, stop any new incoming traffic to one of the servers.

Monitor all processes on the server and wait for them to terminate.

When all connections hit zero, stop the service you want to upgrade.

Perform the service update

Testing

Monitor logs and possible alarms

Be ready to rollback if necessary

Send some low traffic and try to test service with internal users

When everything is OK, tell the load balancers to send more traffic

Wait, monitor everything, test, be sure

Revert changes on the load balancers so that the specific server can take equal traffic/connection as the others.

This procedure is well established in such environments, and gives us the benefit of working with the whole team in working hours without the need of scheduling a maintenance window in the middle of the night, when low customer traffic is reaching us. During the day, if something is not going as planned, we can also reach other departments and work with them, figuring out what is happening.

Configuration Management

We are using ansible as the main configuration management tool. Every file, playbook, role, task of ansible is under a version control system, so that we can review changes before applying them to production. Viewing diffs from a modern web tool can be a lifesaver in these days.

Virtualization

We also use docker images or virtual images as development machines, so that we can perform any new configuration, update/upgrade on those machines and test it there.

Ansible Inventory

To perform service updates with ansible on servers, we are using the ansible inventory to host some metadata (aka variables) for every machine in a serverfarm. Let me give you an example:

Rollback

When something is not going as planned, we revert the changes on ansible (version control) and re-push the previous changes on a system. Remember the system is not getting any traffic from the front-end routers.

The Update

I was ready to do the update. Nagios was opened, logs were tailed -f

and then:

~> ansible-playbook serverfarm001.yml -t update

The Mistake

I run the ansible-playbook without limiting the server I wanted to run the update !!!

So all new changes passed through all servers, at once!

On top of that, new configuration broke running software with previous version. When the restart notify of service occurred every server simple stopped!!!

Funny thing, the updated machine server04 worked perfectly, but no traffic was reaching through the load balancers to this server.

Activate Rollback

It was time to run the rollback procedure.

Reverting changes from version control is easy. Took me like a moment or something.
Running again:

~> ansible-playbook serverfarm001.yml

and …

Waiting for Nagios

In 2,5 minutes I had fixed the error and I was waiting for nagios to be green again.

Then … Nothing! Red alerts everywhere!

Oh-Shit Moment

It was time for me to inform everyone what I have done.
Explaining to my colleagues and manager the mistake and trying to figuring out what went wrong with the rollback procedure.

Collaboration

On this crucial moment everything else worked like clockwise.

My colleagues took every action to:

informing helpdesk

looking for errors

tailing logs

monitor graphs

viewing nagios

talking to other people

do the leg-work in general

and leaving me in piece with calm to figure out what went wrong.

I felt so proud to be part of the team at that specific moment.

If any of you reading this article: Truly thank all guys and gals .

Work-Around

I bypass ansible and copied the correct configuration to all servers via ssh.
My colleagues were telling me the good news and I was going through one by one of ~xx servers.
In 20minutes everything was back in normal.
And finally nagios was green again.

Blameless Post-Mortem

It was time for post-mortem and of course drafting the company’s incident report.

We already knew what happened and how, but nevertheless we need to write everything down and try to keep a good timeline of all steps.
This is not only for reporting but also for us. We need to figure out what happened exactly, do we need more monitoring tools?
Can we place any failsafes in our procedures? Also why the rollback procedure didnt work.

Fixing Rollback

I am writing this paragraph first, but to be honest with you, it took me some time getting to the bottom of this!

Rollback procedure actually is working as planned. I did a mistake with the version control system.

What we have done is to wrap ansible under another script so that we can select the version control revision number at runtime.
This is actually pretty neat, cause it gives us the ability to run ansible with previous versions of our configuration, without reverting in master branch.

The ansible wrapper asks for revision and by default we run it with [tip].

So the correct way to do rollbacks is:

eg.

~> ansible-playbook serverfarm001.yml -rev 238

At the time of problem, I didnt do that. I thought it was better to revert all changes and re-run ansible.
But ansible was running into default mode with tip revision !!

Although I manage pretty well on panic mode, that day my brain was frozen!

Re-Design Ansible

I wrap my head around and tried to find a better solution on performing service updates. I needed to change something that can run without the need of limit in ansible.

The answer has obvious in less than five minutes later:

files/serverfarm001/1.2.3
files/serverfarm001/1.2.4

I need to keep a separated configuration folder and run my ansible playbooks with variable instead of absolute paths.

Conclusion

Everyone makes mistakes. I know, I have some oh-shit moments in my career for sure. Try to learn from these failures and educate others. Keep notes and write everything down in a wiki or in whatever documentation tool you are using internally. Always keep your calm. Do not hide any errors from your team or your manager. Be the first person that informs everyone. If the working environment doesnt make you feel safe, making mistakes, perhaps you should think changing scenery. You will make a mistake, failures will occur. It is a well known fact and you have to be ready when the time is up. Do a blameless post-mortem. The only way a team can be better is via responsibility, not blame. You need to perform disaster-recovery scenarios from time to time and test your backup. And always -ALWAYS- use a proper configuration management tool for all changes on your infrastructure.

post scriptum

After writing this draft, I had a talk with some friends regarding the cloud industry and how this experience can be applied into such environment. The quick answer is you SHOULD NOT.

Working with cloud, means you are mostly using virtualization. Docker images or even Virtual Machines should be ephemeral. When it’s time to perform upgrades (system patching or software upgrades) you should be creating new virtual machines that will replace the old ones. There is no need to do it in any other way. You can rolling replacing the virtual machines (or docker images) without the need of stopping the service in a machine, do the upgrade, testing, put it back. Those ephemeral machines should not have any data or logs in the first place. Cloud means that you can (auto) scale as needed it without thinking where the data are.

Postfix

smtp Vs smtpd

postfix/smtp

The SMTP daemon is for sending emails to the Internet (outgoing mail server).

postfix/smtpd

The SMTP daemon is for receiving emails from the Internet (incoming mail server).

TLS

Encryption on mail transport is what we call: opportunistic. If both parties (sender’s outgoing mail server & recipient’s incoming mail server) agree to exchange encryption keys, then a secure connection may be used. Otherwise a plain connection will be established. Plain as in non-encrypted aka cleartext over the wire.

Postfix

Your mail server is asking for a certificate so that a trusted TLS connection can be established between outgoing and incoming mail server.
The servers must exchange certificates and of course, verify them!

Now, it’s time to present your own domain certificate to the world. Offering only your public certificate cert.pem isnt enough. You have to offer both your certificate and the intermediate’s certificate, so that the sender’s mail server can verify you, by checking the digital signatures on those certificates.