Monday, February 29, 2016

I was toying with vFRC in my lab and when I was done, I deleted the volume from the vSphere web client, but the local flash disk had retained its GPT partition format and was still claimed as a VMFS volume. I was unable to use that disk for other applications.

Try deleting using the web client:

Select the host then go to Manage tab then select Storage option and from there choose the Storage Devices entry. Select the disk, then click on the gear icon and choose Erase Partitions. Make sure you selected the right disk because this will wipe everything.

Via CLI: To delete the disk partition, first enable SSH on the host, then login and list all disks: ls -l /vmfs/devices/disks/

Saturday, February 6, 2016

I want to move away from the bloated Apache web server and NGINX meets my requirements, but this time I want to use SSL/TLS with signed certificates with the highest security ciphers that support Perfect Forward Secrecy, because why not?

Sadly, the information was scattered and not everything is there in the manuals, so this is a documentation of what I've found and done in my setup.

The Let's Encrypt project provides authenticated and validated domain certificates for free! The catch? They expire every 90 days and their official client requires root access & dependencies, but you can (auto)renew and avoid these. Read on to know more.

Article Updates

Mar 3rd

Corrected root's crontab entry.

Corrected headers' content and location

Added more info about security and privacy headers

Environment

My setup consists of the stuff below. This post will presume Debian & NGINX are already installed. In the steps below, a line starting with "#" means it's a command you should type. Type the command without the "#" character (not necessarily as root).

If you have an older version of openssl or nginx, you're likely to face problems and failures since new ciphers have been introduced in recent versions of OpenSSL only (1.0.1h) and the same for NGINX's settings. Make sure your distro supports the latest versions, otherwise you'll be leaving yourself and your visitors vulnerable.

Why acme-tiny?

The official letsencrypt client requires installing some dependencies such as gcc (GNU C Compiler) and some other things, in addition to requiring it being run as root, not only once, but as a daemon or in a cronjob as it requires to renew the certificate every 90 days!

As much as I appreciate the Let's Encrypt initiative, I'm not granting their software root access to my machines, nor installing gcc on a production machine. That's where acme-tiny comes in: a small (200 lines) client that is using Let's Encrypt API calls and you can (and should) audit the client's code before using it, since it's only 200 lines of human-readable Python code.

Configuring NGINX for TLS/PFS

SSL is dead. You should be using TLS only, and if you don't have to service old devices (Android 4.x, old IE browsers, Windows XP), then you should be using TLS v1.2 only with a strict set of ciphers.

Perfect Forward Secrecy (PFS) is an old standard but hasn't been widely adopted until after Snowden revealed the amount of encrypted data being stored for later decryption. PFS cycles the encryption key during the session, so even when a session is captured, decryption will be possible only for a small portion as the key changes.

TLS Config

If you're going to configure a wildcard certificate, place the config in /etc/nginx/nginx.conf. Otherwise if the certificate is unique to a specific domain/subdomain, you'll need to place the config in a virtual host config file.

In my case, I started with a wildcard but it self a self-signed certificate and was rejected by browsers, which is normal. Later when I made a Let's Encrypt certificate, I moved it to the specific subdomain.

Note: Let's Encrypt doesn't support wildcard certs as of this writing, however, they allow you up to 100 domains/subdomains.

Make sure you correct the line breaks if you paste. Due to styling on my blog, you may have one line spilling to multiple lines in the config, and this will break your config.

Don't worry about non-existing directories. We'll come to those later as we finish the setup.

Default Virtual Host Config

I don't use a default domain (www, for example) as mine are hidden from public. If you're like me, then this config fits you, otherwise move to the step below.

Edit /etc/nginx/sites-enabled/default

# Default server configurationserver{# change IP to match yourslisten127.0.0.1:80default_server;# uncomment to enable IPv6#listen [::1]:80 default_server;# uncomment to enable ssl on IPv4listen127.0.0.1:443ssldefault_server;# uncomment to enable ssl on IPv6#listen [::1]:443 ssl default_server;server_name_;#default serverssl_certificate/etc/nginx/ssl/default_wild.crt;ssl_certificate_key/etc/nginx/ssl/default_wild.key;root/var/www/html;# Add index.php to the list if you are using PHP#index index.html index.htm index.nginx-debian.html;indexindex.html;location/{# First attempt to serve request as file, then# as directory, then fall back to displaying a 404.try_files$uri$uri/=404;autoindexoff;}}

This config will load when someone visits the IP(s) NGINX is configured at.

Virtual Host Config

This is where your subdomain config goes. In my case, the certificate belongs to this specific subdomain, so the certificate lines are added here. If you were using a wildcard cert, you should move them to nginx.conf above.

Create a file for your subdomain /etc/nginx/sites-available/mysubdom

server{listen127.0.0.1:80;# uncomment if you want IPv6#listen [::1]:80;#listen 127.0.0.1:443 ssl;#listen [::1]:443 ssl;server_namesubdomain.domain.com;keepalive_timeout70;# The certificate is for subdomain.domain.com only#ssl_certificate /var/www/challenge/subdomain_chained.crt;#ssl_certificate_key /etc/nginx/ssl/subdomain.key;root/var/www/subdomain;# Add index.php to the list if you are using PHP#index index.html index.htm index.nginx-debian.html;indexindex.html;# letsencrypt challenge directory to verify domainlocation/.well-known/acme-challenge/{alias/var/www/challenge/;try_files$uri=404;}location/{# First attempt to serve request as file, then# as directory, then fall back to displaying a 404.try_files$uri$uri/=404;autoindexoff;#enable if you want file listing}}

Notice that listening for SSL/TLS is not yet enabled and the ssl_certificate line and the one below have a hash to comment it. This is required for the initial setup since we'll need to reload nginx and it'll fail since the files are not there yet. We'll enable these lines once everything is done.

To make this config file active by NGINX, you need to link it to sites-enabled:

www-data is the user that NGINX runs as, as shown in the first NGINX config above. Don't worry about the challenge directory owner for now. It'll be taken care of later.

Private Keys and Certificates

The overall config is done. What's left is generating private keys, deriving a certificate for the subdomain, then finally working with Let's Encrypt client.

Create the directory /etc/nginx/ssl to place the subdomain private keys and other things in there:

mkdir/etc/nginx/ssl

Modify its permissions to be restricted to root and only those who know exactly which file to use:

chmod751/etc/nginx/ssl

Now inside the ssl directory, generate a 4096 bit Diffie-Hellman parameters file (prime numbers) to act as seeds for the PFS/TLS sessions (this will take a VERY LONG time):

openssldhparam-outdhparam.pem4096

Generate a self-signed certificate to be used for the default virtual host (i.e., not the one you care about). This will be served to anyone accessing the IP or any subdomain other than the one you specifically define in the virtual host:

If there are no errors here, it's all good, otherwise look into /var/log/nginx/error.log for hints.

Script Execution

Now that NGINX is functioning on port 80, it will be used to verify the subdomain ownership. acme-tiny writes to LetsEncrypt.org via APIs and they reply with a random hash that is written to the challenge directory, which is accessible via NGINX on port 80, and then LetsEncrypt.org checks that this hash actually exists at the subdomain you supplied and then verifies you.

All should go OK without errors. If any, verify directory paths and file and directory permissions. Make sure the username "letsencrypt" has access to the files private.key, subdomain.csr and the challenge directory.

NGINX requires concatenating the intermediate certificate to the freshly signed certificate from Let's Encrypt:

That's it! It should now work after enabling the SSL/TLS settings in NGINX.

Enable TLS in NGINX

Modify the file /etc/nginx/sites-enabled/mysubdom to make it look like this:

server{listen127.0.0.1:80;# uncomment if you want IPv6#listen [::1]:80;server_namesubdomain.domain.com;# force all traffic to go to HTTPS instead of HTTPreturn301https://subdomain.domain.com$request_uri;}server{listen127.0.0.1:443ssl;#listen [::1]:443 ssl;server_namesubdomain.domain.com;keepalive_timeout70;# The certificate is for subdomain.domain.com onlyssl_certificate/var/www/challenge/subdomain_chained.crt;ssl_certificate_key/etc/nginx/ssl/subdomain.key;add_headerStrict-Transport-Security "max-age=63072000;includeSubdomains;preload";add_headerX-Frame-Options DENY;#or "SAMEORIGIN" always;add_headerX-Content-Type-Options nosniff;add_headerContent-Security-Policy 'default-src https://subdomain.domain.com:443';add_headerX-Xss-Protection '1;mode=block';root/var/www/subdomain;# Add index.php to the list if you are using PHP#index index.html index.htm index.nginx-debian.html;indexindex.html;# letsencrypt challenge directory to verify domainlocation/.well-known/acme-challenge/{alias/var/www/challenge/;try_files$uri=404;}location/{# First attempt to serve request as file, then# as directory, then fall back to displaying a 404.try_files$uri$uri/=404;autoindexoff;#enable if you want file listing}}

Notice how listening on port 80 (HTTP) has been shifted to its own segment while the rest uses HTTPS exclusively. Future certificate renewals can also go over HTTPS as long as the certificate is still valid. If not, revert the config to be as it was at the beginning.

Reload NGINX to read the certificates and make the settings active:

service nginx reload

Note: Reload reads the settings again without dropping connections. It's advised for live websites.

About Headers

Previously, I had the security headers in the main nginx.conf file, but that will apply the same headers to all websites, and that's not scalable nor correct. According to Igor Sysoev (NGINX's creator), he created the config in NGINX to not inherit so that troubleshooting becomes simpler. Duplicating code is good because it makes life easy in finding the problem when things go wrong. See the link for his talk below in the references.

This means that headers (and other configs) should be repeated for every virtual host you configure. If you configure a header in the main block in nginx.conf then define another (or modified) header in the subdomain block, the latter will take over and the first one will be ignored.

About Security Headers

The SecurityHeaders service recommends using HTTP Public-Key Pinning (Stapling) or HPKP for short, but there are privacy and performance concerns with that: Pinning means the public key of your own certificate is sent in the header and is sent to your certificate issuer to validate it. This prevents a Man-in-the-Middle attack, but exposes your visit(s) to the certificate issuer! Additionally, it puts a huge burden on the certificate issuer to scale their own performance to reply to every single site visit. If they don't (and why should they?), your site visiting experience will suffer great delays.

The headers also tell the browser to cache your public keys for a very long period (3+ months) to protect you against forged certificates that could come during that period, but since we're using Let's Encrypt certificates which expire every 3 months, it'll become hectic to manage the headers, aging and other aspects.

With all these concerns, I decided against adding Public-Key Pinning headers in my config. It is up to you to evaluate your case. See the references below for more details about the available options for HPKP in addition to the Content-Protection policies and the XSS protection policies, as they may affect your site when you want to load media/material external to your website.

Auto-Renewing The Certificate

LetsEncrypt issues certificates valid for 90 days only to combat spam and fraudulent uses of domains that have been neglected. That means the certificate needs to be renewed before 90 days expire.

As the user "letsencrypt" put the following in a shell script letsencrypt_renew.sh:

Saturday, January 23, 2016

Introduction

A customer with an existing setup from HP with HP-branded Brocade switches wanted to connect those switches to the newly acquired IBM setup (also using Brocade switches). The HP switches are the 24-port 8 Gb switches, and the IBM ones are 48-port 16 Gb switches. The final goal is to virtualize the HP storage behind the V7000 storage, but this will not be discussed in this post.

The HP SAN switches had existing configurations & were in production. The IBM switches also had configurations for an ongoing implementation.

To merge the SAN fabrics, there are 2 ways:

Wipe one of them (clear the config), disable it, then enable it. The config of the other switch will be written to this empty one.

Merge 2 different fabrics without wiping any data.

This post will address point (2), because I didn't want to re-do all the zoning from scratch. That's a waste of time. The steps will be done in command line (CLI), because I hate java.

Why Write This Post?

I was reading Brocade's forums and many were talking about using fabric merge tools and that the two fabrics must have different names, and there was a lot of wrong or outdated information that no longer applies to the new Fabric OS 7.x (new switch firmware).

Requirements

Fabric OS has to be 6.x or 7.x on all switches connecting to each other. The minor version ".x" does not have to match, but it's recommended to keep the switches on the same level, if possible.

Full Fabric license must be available on 24-port switches. It's available by default on 48-port switches.

Change Domain ID from default value to a unique value. The 2 switches connecting to each other must have different Domain IDs.

Switch configuration names must be the same for the fabric to merge. If they are different, "Zone Conflict" error will show on the secondary switch.

If you have a lot of traffic going from one switch to another switch, it's advised to purchase the "Trunking License" to allow aggregating multiple FC ports/links together.

Aliases and zone names must be unique before merging the fabric. If you have similar alias names on the 2 different switches, you have to rename the aliases/zones on the secondary switch (the one that you can disable to merge the fabric).

Aliases that have the same WWN on both secondary and primary switches, must have the same name on both fabrics. This is a very unique case, but possible if you're virtualizing the WWNs of your servers.

Make sure switch date, timezone & time are all correct before you merge the switches. Changing the timezone requires a switch restart, so plan for the downtime.

Default user is 'admin' and default password is 'password'.

Do not connect any FC cables between the HP/IBM (different switches) until you're told to do so. Follow the steps exactly as shown below.

Steps

In the steps below, a line starting with "#" means it's a command you should type. Type the command without the "#" character.

Some steps will require rebooting the switch. Some will require disabling the switch more than one time, which makes it offline, and stops all storage access traffic. It's better to change the paths from the servers to the 2nd switch manually, or if you're sure the multipath drivers are working properly, you can disable server ports.

The primary switch is the one that will remain operational. The secondary switch is the one where we are making all these changes & can afford downtime.

Disable Ports

It's better to disable server ports, to prevent multipath driver from using the paths again when they're online, but before you finish your activity. Do this on ONE switch only! After you successfully merge fabrics on this switch, enable ports, then move to the 2nd switch. Do NOT disable ports on both switches at the same time, if you have active servers connected to the SAN switches.

Wait 10-30 seconds before proceeding to give enough time for the link to establish and the 2 switches to talk.

Disable the secondary switch to make it the slave and to add the config from the primary:
# switchdisable

Enable the secondary switch: # switchenable

Wait 10-50 seconds, then check the switch: # switchshow
You should see in the line of the port connecting the switches something like this:35 35 1f2300 id 8G Online FC E-Port 10:00:00:xx:xx:xx:xx:xx "" (upstream)

Wait some time and the name of the primary switch will appear between the double quotes.

You should also see both switches in the same fabric now: # fabricshowThis should show the names of the primary & secondary switches.

If you type # cfgshow it will show all zones and aliases from both switches, but only those from the primary are in the active config.

Enabling Zones of Secondary Switch

The fabrics are now merged, but the zones of the secondary switch are not in the active config yet. We need to add them to the config and enable the config.

Open the text file of the zone names (cfgshow output) from the previous step.

To add the zones, type the command: # cfgadd "<zone name>", "zone1; zone2; zone3"
Notice it's a semicolon between the zone names. You can add multiple zones at the same time to the active config.
If you're lazy and java works for you, you can use the graphical interface to select the zones and add them to the config.

Enable Ports

You can now check your servers and storage and all links should be operational.

Congratulations! You're now done with the first switch connectivity. Make sure your links are stable, then move on to the remaining switches.

Errors

Zone Conflicts and Segmentation

For some reason, the switch showed "segmented" and "zone conflict" messages and upon a reboot, all ports were disabled. Trying to enable a specific port gave the error: "Port 35: Port enable failed due to unknown system error"

I rebooted the SAN switch again and the ports (and switch) became online again. Looks like it froze at some point and needed another reboot. If this happens often, upgrade the FOS to latest stable version. For me, it only happened once.

If you still get "zone conflict" after finishing all the steps, then you have an alias with the same WWN but different names. To fix it, rename the alias using the "zoneobjectrename" command as shown above.

Unstable Ports

I was unlucky to have the ports being unstable. The link kept going online & offline, flapping many times and sometimes it connects at 16 Gbps and sometimes at 8 Gbps (before I fixed the speed to 8 Gbps). Also, it prevented the switches from creating a fabric connection.

First clear the stats to not carry any old data: # portstatsclear <port number>, then you can check your port statistics by issuing the command: # portshow <port number>

In the output, if you have very large numbers in any of these parameters:

Unknown

Parity_err

2_parity_err

Link failure

Loss_of_sync

Loss_of_sig

Invalid_word

Invalid_crc

In my case, I had to change 2 SFPs, one on the old HP SAN switch and one on the new IBM SAN switch. I also had to change the port slot on the old HP switch because the port slot itself had problems. I'm glad the FC cable was good.

I only got 2 results online and both pointed at Changelogs that mention the issue has been fixed, but not how! I contacted a great person within Lenovo who checked internal documents and it turned out that this issue affects G8272 and EN4093R switches manufactured on December 2015 (specifically, 12th week of 2015). (Thank you Zeeshan!)

Cause

"The switch software uses it hardware serial number and the public keys on its kernel file system to generate a private key to decrypt the OS or Boot image being uploaded to it and then proceeds to install it. If the serial number of the switch is changed for some reason, the combination of the hardware serial number and the public keys will fail to generate the appropriate private key to decrypt the uploaded image and reports that the image has an invalid signature."

In my case, the switches were fresh & no one changed any serial code, but were still affected.

Fix

"In order to remedy this situation, the way out is to remove the public keys installed on the kernel file system and reboot the switch. During reboot, the switch will generate new set of public keys using the current serial number. With these newly generated public keys, the switch will be able to compute the proper private key to decrypt the uploaded images."

On a Flex chassis, you should enable Serial Over LAN (SOL) from the Chassis Management Module (CMM) to be able to access the serial port of the switches. Use UTP cable on the CMM port not the switch.

I highly recommend configuring the management port (RJ45) to use for firmware upload since it'll be very fast, as it'll take 45 minutes to upload one OS image! While it takes 1 minute on the management port via Ethernet.

Note: The initial firmware (8.2.1.0 does not support SSH). However, SSH is enabled by default once you upgrade to 8.2.4.0. Make sure you disable HTTP & Telnet after the upgrade.

Procedure

Any line that starts with # it means this is a command to be typed (without the # sign).

I actually used the Passive DACs that I used for the Lenovo servers, and the cables worked just fine. The AIX team configured 2 Virtual Input/Output Servers (VIOS) on each POWER8 system, and each POWER8 system had 4 of these adapters. We also configured LACP for each VIOS, so the total bandwidth available to each VIOS was 40 Gb.

So even though the redbook says that Active DACs are required, the passive ones work just fine. Also the redbook only lists 1 meter, 3 meter & 5 meter cables (since they're active) and no mention of passive cables.