Production Installation

Installing production-ready DC/OS

This page outlines how to install DC/OS for production. Using this method, you can package the DC/OS distribution and connect to every node manually to run the DC/OS installation commands. This installation method is recommended if you want to integrate with an existing system or if you do not have SSH access to your cluster.

The DC/OS installation process requires a bootstrap node, master node, public agent node, and a private agent node. You can view the nodes documentation for more information.

Production Installation Process

The following steps are required to install DC/OS clusters:

Configure bootstrap node

Install DC/OS on master node

Install DC/OS on agent node

Figure 1. The production installation process

This installation method requires the following:

The bootstrap node must be network accessible from the cluster nodes.

The bootstrap node must have the HTTP(S) ports open from the cluster nodes.

The DC/OS installation creates the following folders:

Folder

Description

/opt/mesosphere

Contains the DC/OS binaries, libraries, and cluster configuration. Do not modify.

/etc/systemd/system/dcos.target.wants

Contains the systemd services that start the systemd components. They must be located outside of /opt/mesosphere because of systemd constraints.

/etc/systemd/system/dcos.<units>

Contains copies of the units in /etc/systemd/system/dcos.target.wants. They must be at the top folder as well as inside dcos.target.wants.

WARNING: Changes to /opt/mesosphere are unsupported. They can lead to unpredictable behavior in DC/OS and prevent upgrades.

Prerequisites

Before installing DC/OS, your cluster must meet the software and hardware requirements.

Configure your cluster

Create a directory named genconf on your bootstrap node and navigate to it.

mkdir -p genconf

Store license file Enterprise

Create a license file containing the license text received in email sent by your Authorized Support Contact and save as genconf/license.txt.

Create an IP detection script

In this step, an IP detection script is created. This script reports the IP address of each node across the cluster. Each node in a DC/OS cluster has a unique IP address that is used to communicate between nodes in the cluster. The IP detection script prints the unique IPv4 address of a node to STDOUT each time DC/OS is started on the node.

NOTE: The IP address of a node must not change after DC/OS is installed on the node. For example, the IP address should not change when a node is rebooted or if the DHCP lease is renewed. If the IP address of a node does change, the node must be uninstalled.

NOTE: The script must return the same IP address as specified in the config.yaml. For example, if the private master IP is specified as 10.2.30.4 in the config.yaml, your script should return this same value when run on the master.

Create an IP detection script for your environment and save as genconf/ip-detect. This script needs to be UTF-8 encoded and have a valid shebang line. You can use the examples below.

Use the IP address of an existing interface

This method discovers the IP address of a particular interface of the node.

If you have multiple generations of hardware with different internal IP address, the interface names can change between hosts. The IP detect script must account for the interface name changes. The example script could also be confused if you attach multiple IP addresses to a single interface, or do complex Linux networking, etc.

Use the network route to the Mesos master

This method uses the route to a Mesos master to find the source IP address to then communicate with that node.

In this example, we assume that the Mesos master has an IP address of 172.28.128.3. You can use any language for this script. Your Shebang line must be pointed at the correct environment for the language used and the output must be the correct IP address.

Create a fault domain detection script Enterprise

By default, DC/OS clusters have fault domain awareness enabled, so no changes to your config.yaml are required to use this feature. However, you must include a fault domain detection script named fault-domain-detect in your ./genconf directory. To opt out of fault domain awareness, set the fault_domain_enabled parameter of your config.yaml file to false.

Create a fault domain detect script named fault-domain-detect to run on each node to detect the node’s fault domain. During installation, the output of this script is passed to Mesos.

Extracting an image from this script and loading it into a docker daemon, can take a few minutes.
dcos-genconf.9eda4ae45de5488c0c-c40556fa73a00235f1.tar
Running mesosphere/dcos-genconf docker with BUILD_DIR set to /home/centos/genconf
00:42:10 dcos_installer.action_lib.prettyprint:: ====> HASHING PASSWORD TO SHA512
00:42:11 root:: Hashed password for 'password' key:
$6$rounds=656000$v55tdnlMGNoSEgYH$1JAznj58MR.Bft2wd05KviSUUfZe45nsYsjlEl84w34pp48A9U2GoKzlycm3g6MBmg4cQW9k7iY4tpZdkWy9t1

Create the configuration

Create a configuration file and save as genconf/config.yaml. You can use this template to get started.

If your servers are installed with a domain name in your /etc/resolv.conf, add the dns_search parameter. For parameter descriptions and configuration examples, see the documentation.

NOTE: If AWS DNS IP is not available in your country, you can replace the AWS DNS IP servers 8.8.8.8 and 8.8.4.4 with your local DNS servers.

NOTE: If you specify master_discovery: static, you must also create a script to map internal IPs to public IPs on your bootstrap node (for example, genconf/ip-detect-public). This script is then referenced in ip_detect_public_filename: "relative-path-from-dcos-generate-config.sh".

NOTE: In AWS, or any other environment where you can not control a node's IP address, master_discovery needs to be set to use master_http_load_balancer, and a load balancer needs to be set up.

Install DC/OS

In this step, you will create a custom DC/OS build file on your bootstrap node and then install DC/OS onto your cluster. With this method you

Package the DC/OS distribution yourself

Connect to every server manually

Run the commands

NOTE: Due to a cluster configuration issue with overlay networks, we recommend setting enable_ipv6 to false in config.yaml when upgrading or configuring a new cluster. If you have already upgraded to DC/OS 1.12.x without configuring enable_ipv6 or if config.yaml file is set to true, then do not add new nodes.

You can find additional information and a more detailed remediation procedure in our latest critical product advisory. Enterprise

IMPORTANT: Do not install DC/OS until you have these items working: ip-detect script, DNS, and NTP on all DC/OS nodes with time synchronized. See troubleshooting for more information.

NOTE: If something goes wrong and you want to rerun your setup, use the cluster uninstall instructions.

Prerequisites

A genconf/config.yaml file that is optimized for manual distribution of DC/OS across your nodes.

The term dcos_generate_config file refers to either a dcos_generate_config.ee.sh file or dcos_generate_config.sh file, based on whether you are using the Enterprise or Open Source version of DC/OS.

Download and save the dcos_generate_config file to your bootstrap node. This file is used to create your customized DC/OS build file. Contact your sales representative or sales@mesosphere.com for access to this file. Enterprise

OR

Download and save the dcos_generate_config file to your bootstrap node. This file is used to create your customized DC/OS build file. Open Source

curl -O https://downloads.dcos.io/dcos/stable/dcos_generate_config.sh

From the bootstrap node, run the DC/OS installer shell script to generate a customized DC/OS build file. The setup script extracts a Docker container that uses the generic DC/OS install files to create customized DC/OS build files for your cluster. The build files are output to ./genconf/serve/.

You can view all of the automated command line installer options with:

Note: If you encounter errors such as Time is marked as bad, adjtimex, or Time not in sync in journald, verify that Network Time Protocol (NTP) is enabled on all nodes. For more information, see the system requirements documentation.

Monitor Exhibitor and wait for it to converge at http://<master-ip>:8181/exhibitor/v1/ui/index.html.

NOTE: This process can take about 10 minutes. During this time, you will see the Master nodes become visible on the Exhibitor consoles and come online, eventually showing a green light.

Figure 2. Exhibitor for ZooKeeper

When the status icons are green, you can access the DC/OS web interface.

Launch the DC/OS web interface at: http://<master-node-public-ip>/. If this doesn’t work, take a look at the troubleshooting documentation.

NOTE: After clicking Log In To DC/OS, your browser may show a warning that your connection is not secure. This is because DC/OS uses self-signed certificates. You can ignore this error and click to proceed.

Enter your administrator username and password.

Figure 3. Sign in dialogue

You are done! The UI dashboard will now be displayed.

Figure 4. DC/OS UI dashboard

NOTE: You can also use Universal Installer to deploy DC/OS on AWS, Azure, or GCP in production.