Installing CARTO on AWS

Author: Michael Minn (www.michaelminn.com)

21 August 2018

This tutorial describes how to install CARTO on an Amazon Web Services (AWS) Elastic Compute Cloud (EC2) instance.

These instructions are for installing the v4.20.0-46-gf4b7df4 version of CARTO on an EC2 instance built with an Ubuntu Server 16.04 LTS AWS machine image. These instructions largely follow the official installation instructions, with augmentations and customizations specifically for AWS installation.

This tutorial assumes some familiarity with issuing commands using the Linux Bash console and text editors like vi or emacs.

This is an insecure installation (HTTP) using a fixed IP address and no domain or subdomain names. This was deemed adequate for an installation used for student exercises where confidentiality was not an issue. Obviously, if you are dealing with mission critical or sensitive information, you will want to adapt these instructions to use a domain name and HTTPS.

CARTO Architecture

CARTO consists of two conceptual components:

While CARTO Builder is available as a subscription service and is also available free to students as part of the GitHub Student Developer Pack, installation of CARTO on your own servers can be a more-flexible, cost-efficient, or educational experience than use of the subscription service.

These instructions cover installation of three components of CARTO Engine that work together to make CARTO Builder available from an AWS EC2 instance:

CARTO Architecture

Amazon EC2

The Amazon Elastic Compute Cloud (Amazon EC2) provides the ability to deploy virtual cloud servers (instances) on demand, and adjust the memory, storage, or processing power of those instances as needed. You pay for instances at an hourly rate based on those parameters, with additional charges for bandwidth and other (optional) features.

Get an AWS Account

To use EC2 instances, you set up an account at https://aws.amazon.com/.

AWS Account Signup

You launch an new EC2 instance from the AWS EC2 Management Console

EC2 Management Console

Step 1: Choose an Amazon Machine Image (AMI)

An EC2 instance is based on a machine image which includes the operating system and associated software. For this CARTO deployment we use a Ubuntu Server 16.04 LTS (HVM), SSD Volume Type - ami-759bc50a.

Machine Image Selection

Step 2: Choose an Instance Type

For this fairly low-demand installation (20 maximum simultaneous users with data sets of limited size) a general purpose t2.medium (2-core + 4 GB memory + 8 GB storage) was adequate. You can dynamically resize instances if your needs dictate. As of this writing, the hourly charge for a t2.medium instance is $0.0464/hour, which works out to around $117 for a 15-week semester.

Step 3: Configure Instance Details

You can probably accept the default instance details, although you might want to turn on Protect against accidental termination. Terminating an instance will cause you to lose all data and configuration in an instance, so requiring multiple steps to confirm termination is a helpful safeguard.

Configure Instance Details

Step 4: Add Storage

The standard t2.medium instance comes with 8 GB of general purpose elastic block storage (EBS) by default. Depending on what kind of data you are storing, that may not be enough, but you can increase this later if needed.

Storage

Step 5: Add Tags

Tags are helpful for organizing large numbers of instances. For this particular situation, there is only one instance so tags are unnecessary.

Add Tags

Step 6: Configure Security Group

A security group is a named set of firewall settings that help protect your instance from intrusion.

You should give your security group a meaningful name associating it with CARTO.

While we will need to add a number of rules to permit access to the various ports used by CARTO, to start with we will only allow in port 22, which is used to access the instance using the secure shell (SSH).

As a protection in this case, the Source is limited to a set of IP addresses associated with the ISP that services my home (Spectrum / Charter / Road Runner). Your particular network will vary, and 0.0.0.0/0 (everyone) is an option if security is not a major concern for you.

Security Group

Step 7: Review Instance Launch

Review the instance settings and launch:

Review Instance

Select an existing key pair or create a new key pair

To access your instance through SSH, you will need a public/private key pair that is used to encrypt login information (2048-bit SSH-2 RSA keys). Upon launching your instnace you will be prompted to create a new key pair and download a .pem file you can use with your SSH program to log in to your instance.

You should keep this .pem file in a safe, memorable place. While it is possible to generate new key pairs through the AWS console, it will probably be easiest to just reuse the same .pem file.

Key Pair

Launch Status

A new instance will take a minute or two to initialize. You can go to the EC2 Instance panel (Services, EC2 Dashboard, Instances) to see and change the status of your new and existing instances.

EC2 Instance Panel

Allocate Elastic IP Address

By default, your instance will get a different IP address from a pool of IP addresses every time you restart your instance. This IP address is listed under IPv4 Public IP and the public domain name based on that IP address is listed under Public DNS (IPv4).

You will likely need to stop and restart your instance at some point, such as when you need to add storage or if you just want to save hourly charges when the server is not being used. Since these instructions require a domain name or fixed IP address to correctly configure the CARTO components, you will need a fixed IP address.

AWS provides Elastic IP addresses that are free when associated with a running instance and that have only a minimal charge when the instance is stopped (to discourage neglect).

In the left-side panel menu, select Elastic IPs and Allocate new address:

Elastic IPs

Although you are provided with an AWS command line interface, you will probably not need this except in exceptional situations, so just select Allocate:

Allocate

You will be given the new IP address:

New Address

You then need to Associate that address with your instance:

Associate Address
Select the Image

You will then see that IPv4 Public IP listed in the panel below the instance.

Configuring the Security Group

The CARTO Builder web app will need to be able to access:

Edit Inbound Rules for your security group to allow access to these ports. The simplest and least-secure configuration is to allow access to everyone on the internet (0.0.0.0/0). The one restriction is limiting SSH access into the instance from my home ISP (98.0.0.0/12).

Unrestricted Access

However, if you know your server will be accessed from a limited number of networks (such as your home + your campus computer labs), you can add more-restrictive rules to block everyone else out. For example, this configuration limits access to my home ISP (98.0.0.0/12) and to the server's network itself (18.214.0.0/15) for testing.

Restrictive access

Logging in to your Instance Using SSH

On a Linux box (or Mac) you can use the ssh command to access your image using the key-pair .pem file and the public IPv4 address of the instance.

You should use the -X option to forward X11 messages (so you can use graphical programs like browsers on the server) and the -C option to compress those messages (to improve speed).

The user name for a server based on an Ubuntu machine image is ubuntu.

ssh -X -C -i CartoKeyPair.pem ubuntu@18.215.23.17

If you get a warning about the .pem file being publicly readable, you can change its permission to be only accessible to you:

chmod 0400 CartoKeyPair.pem

Once you get into your instance, you may want to install all the current Ubuntu packages to make sure you do not have out-of-date/insecure software running on your server.

The sudo command runs commands as superuser, which is necessary when modifying system software.

The apt-get command installs, upgrades, or removes software packages from an Ubuntu software repository. update checks the repository for the latest software versions and upgrade upgrades any installed software with newer versions from the repository.

sudo apt-get update
sudo apt-get upgrade

Installing Carto

These instructions are based on the official CARTO install instructions.

System Locales

Installations assume you use UTF-8 character encoding, which is the default with the Ubuntu machine image. You can verify this with the locale command:

locale

The output should look something like this:

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Development Tools

The following installs development tools needed at various points in the installation:

sudo apt-get install make pkg-config

Git

Git is a popular, free and open source distributed version control system used by developers to track and distribute changes to large projects. For these installation instruction it is used to download the current versions of the CARTO software. Git is included with the Ubuntu machine image, but you should run an install to make sure you have the current version:

sudo apt-get install git

PostgreSQL

PostgreSQL is a popular open-source relational database that is used by CARTO to store data.

CARTO requires PostgreSQL version 10+.

The PPA repository provides additional patches to PostgreSQL, which are not needed but help improve performance in production environments.

sudo add-apt-repository ppa:cartodb/postgresql-10
sudo apt-get update
sudo apt-get install postgresql-10 postgresql-plpython-10 postgresql-server-dev-10

PostgreSQL access authorization is managed through pg_hba.conf configuration file, which is normally in /etc/postgresql/10/main/pg_hba.conf.

To simplfy installation and operation, connections to postgreSQL should be configured to allow access from the local machine (localhost) without authentication.

Edit /etc/postgresql/10/main/pg_hba.conf, modifying the existing lines to use trust authentication (no password access from localhost):

local   all             postgres                                trust
local   all             all                                     trust
host    all             all             127.0.0.1/32            trust

Restart PostgreSQL with the systemctl command to apply these changes:

sudo systemctl restart postgresql

Create the CARTO PostgreSQL users with the createuser command:

sudo createuser publicuser --no-createrole --no-createdb --no-superuser -U postgres
sudo createuser tileuser --no-createrole --no-createdb --no-superuser -U postgres

Download the CartoDB postgresql extensions, which contain functions that are used by different parts of the CartoDB platform:

git clone https://github.com/CartoDB/cartodb-postgresql.git
cd cartodb-postgresql

Find the tag for the downloaded version.

git describe --tags

In this case, the displayed version tag was 0.23.2-9-g0c86469. This tag will vary depending on the current version when you perform the download. Check out that version and make it:

git checkout 0.23.2-9-g0c86469
make all

If the make completes with no errors, install:

sudo make install

GDAL

GDAL (Geospatial Data Abstraction Library) is a library for reading a wide variety of geospatial data formats.

Add the CARTO GIS repository, update to read the current version from the repository, and install GDAL:

sudo add-apt-repository ppa:cartodb/gis
sudo apt-get update
sudo apt-get install gdal-bin libgdal-dev

PostGIS

PostGIS is a set of extensions for PostgreSQL that adds support for SQL queries of geographic objects.

CARTO requires PostGIS 2.4, which is the package in the cartodb repository.

sudo apt-get install postgis

Create the PostGIS template database that CARTO uses when creating new spatial databases:

sudo createdb -T template0 -O postgres -U postgres -E UTF8 template_postgis
psql -U postgres template_postgis -c 'CREATE EXTENSION postgis;CREATE EXTENSION postgis_topology;'
sudo ldconfig

Run the install check. This spits out alot of information and may issue warnings, but should not fail with a fatal error.

cd
cd cartodb-postgresql
sudo PGUSER=postgres make installcheck

Redis

Redis is an open source, in-memory data structure store, used as a database, cache and message broker by CARTO.

CARTO requires Redis 4+, which is available in the cartodb repository:

sudo add-apt-repository ppa:cartodb/redis-next
sudo apt-get update
sudo apt-get install redis

Node.js

Node.js is a runtime environment used to build network applications written in JavaScript. Node.js is used to build the SQL API and Map API used in CARTO.

Add the repository and install:

sudo add-apt-repository ppa:cartodb/nodejs
sudo apt-get update
sudo apt-get install nodejs

npm is a JavaScript package manager used by CARTO that should be installed automatically along with Node.js.

CARTO uses Node.js v6.9.2 and npm 3.10.9. You can verify that the correct versions have been installed with:

nodejs -v
npm -v

You also need to install some development libraries needed to build some of the CARTO Node.js modules:

sudo apt-get install libpixman-1-0 libpixman-1-dev
sudo apt-get install libcairo2-dev libjpeg-dev libgif-dev libpango1.0-dev

SQL API

The CARTO SQL API is the CARTO process written with Node.js that is used to access data from the PostGIS database.

Download the code:

cd
git clone git://github.com/CartoDB/CartoDB-SQL-API.git

Use npm to install with dependencies:

cd CartoDB-SQL-API
npm install

Create the configuration script from the example template:

cp config/environments/development.js.example config/environments/development.js

MAPS API

The Maps API is the CARTO process written with Node.js that is used to create map tiles from geospatial data for display in a browser. This is an extension of CARTO's Windshaft library.

Download the code:

cd
git clone git://github.com/CartoDB/Windshaft-cartodb.git

Use the Yarn package manager to install the additional libraries needed by the maps API:

cd Windshaft-cartodb
sudo npm install -g yarn@0.27.5
yarn install

Create the configuration script from the example template:

cp config/environments/development.js.example config/environments/development.js
mkdir logs

Ruby

CARTO Builder is written in the Ruby programming language.

CARTO requires exactly Ruby 2.2.x. Older or newer versions won't work.

Brightbox provides Ruby packages optimized for Ubuntu. Add their repository and install Ruby 2.2:

sudo apt-add-repository ppa:brightbox/ruby-ng
sudo apt-get update
sudo apt-get install ruby2.2 ruby2.2-dev

Bundler is an app used by CARTO Builder to manage Ruby components (gems).

sudo apt-get install ruby-bundler

Compass is a stylesheet authoring framework used by CARTO Builder.

sudo gem install compass

CARTO Builder

Finally, we install the CARTO Builder software.

Note that CARTO requires Python 2.7+ and will not work with Python 3. Python 2.7.12 is installed by default in the AWS Ubuntu 16.04 machine image, but you should check the python version to make sure.

python -V

Pip is a Python package manager.

sudo apt-get install python-pip

Install other standard libraries used by CARTO Builder.

sudo apt-get install imagemagick unp zip libicu-dev

Download the CARTO Builder code.

cd
git clone --recursive https://github.com/CartoDB/cartodb.git

Install the Ruby components used by CARTO Builder.

cd cartodb
RAILS_ENV=development bundle install

Install the Python components used by CARTO builder.

sudo CPLUS_INCLUDE_PATH=/usr/include/gdal \
C_INCLUDE_PATH=/usr/include/gdal \
PATH=$PATH:/usr/include/gdal \
pip install --no-use-wheel -r python_requirements.txt

Install the Node.js components used by CARTO builder.

npm install

Compile the CARTO Builder static assets.

npm run carto-node

Create configuration files from the sample files.

cp config/app_config.yml.sample config/app_config.yml
cp config/database.yml.sample config/database.yml

Start the redis-server that allows access to the SQL and Maps APIs.

sudo systemctl start redis-server

Initialize the metadata database.

RAILS_ENV=development bundle exec rake db:create
RAILS_ENV=development bundle exec rake db:migrate

Create the First User

sh script/create_dev_user

The subdomain is the login user name. CARTO used to use separate subdomains for individual users.

CARTO Configuration

You will need to modify configuration files for the three components: Builder, the SQL API, and the Map API. These files are poorly documented and configuration problems result in cryptic error messages from the API that are difficult to diagnose.

These examples use 172.31.49.47 as the private IP address (internal to the AWS network) and 18.215.23.17 as the public IP address. Your values will be different: see Allocate Elastic IP Address above.

SQL API Configuration

The SQL API file is CartoDB-SQL-API/config/environments/development.js.

Change the module.exports.node_host from...

module.exports.node_host    = '127.0.0.1';

... to the private IP address for your EC2 instance.

module.exports.node_host    = '172.31.49.47';

Map API Configuration

The Map API file is Windshaft-cartodb/config/environments/development.js.

Change the config.host (line 4) from...

,host: '127.0.0.1'

... to the private IP address for your EC2 instance.

,host: '172.31.49.47'

Change the analysis.batch.endpoint (line 263) from localhost...

endpoint: 'http://127.0.0.1:8080/api/v2/sql/job',

...to the PUBLIC IP address for your EC2 instance.

endpoint: 'http://18.215.23.17:8080/api/v2/sql/job',

Note that failure to configure the batch endpoint correctly will cause the app to freeze because it can't access the server. The server error message will be:

Template <name> of user <username> not found

CARTO Builder Configuration

The CARTO Builder configuration file is cartodb/config/app_config.yml. This file requires extensive modification.

Change the session_domain (line xxx) from...

session_domain:     '.localhost.lan'

...to:

session_domain:     ''

Enable subdomainless URLs by changing subdomainless_urls (line xxx) from...

subdomainless_urls: false

...to:

subdomainless_urls: true

Change the account_host (line xxx) from...

account_host:       'localhost.lan:3000'

...to:

account_host:       'localhost'

Change the vizjson_cache_domains (line xxx) from...

vizjson_cache_domains: ['.localhost.lan']

...to:

vizjson_cache_domains: ['localhost']

Change the map API hosts/domains (lines xxx - xxx) from...

  tiler:
    filter: 'mapnik'
    internal:
      protocol:      'http'
      domain:        'localhost.lan'
      port:          '8181'
      host:          '18.215.23.17'
      verifycert:     false
    private:
      protocol:      'http'
      domain:        'localhost.lan'
      port:          '8181'
      verifycert:     false
    public:
      protocol:      'http'
      domain:        'localhost.lan'
      port:          '8181'
      verifycert:     false

...to:

  tiler:
    filter: 'mapnik'
    internal:
      protocol:      'http'
      domain:        '18.215.23.17'
      port:          '8181'
      host:          '18.215.23.17'
      verifycert:     false
    private:
      protocol:      'http'
      domain:        '18.215.23.17'
      port:          '8181'
      verifycert:     false
    public:
      protocol:      'http'
      domain:        '18.215.23.17'
      port:          '8181'
      verifycert:     false

Change the SQL API hosts/domains (lines xxx - xxx) from...

  sql_api:
    private:
      protocol:   'http'
      domain:     'localhost.lan'
      endpoint:   '/api/v1/sql'
      port:       8080
    public:
      protocol:   'http'
      domain:     'localhost.lan'
      endpoint:   '/api/v2/sql'
      port:       8080

...to:

  sql_api:
    private:
      protocol:   'http'
      domain:        '18.215.23.17'
      endpoint:   '/api/v1/sql'
      port:       8080
    public:
      protocol:   'http'
      domain:        '18.215.23.17'
      endpoint:   '/api/v2/sql'
      port:       8080

Change the assets storage from S3 to local (lines xxx - xxx). from...

  aws:
    s3:
      access_key_id: "test"
      secret_access_key: "test"
      region: ''
  assets:
    s3_bucket_name: "tests"
    max_file_size: 5242880 # 5.megabytes
    region: ''

...to:

  #aws:
  #  s3:
  #    access_key_id: "test"
  #    secret_access_key: "test"
  #    region: ''
  assets:
    # s3_bucket_name: "tests"
    max_file_size: 5242880 # 5.megabytes
    # region: ''
    location: 'organization_assets'

This will configure CARTO to place assets like images in:

./cartodb/public/uploads/development/[username]/assets/[filename]

Starting the Processes

Execute the following code. You may want to put it in a script named carto-start.sh:

#!/bin/bash
set -x
sudo systemctl start redis-server
cd cartodb && bundle exec script/resque > /dev/null &
cd cartodb && bundle exec thin start --threaded -p 3000 --threadpool-size 5 &
cd CartoDB-SQL-API && node app.js development &
cd Windshaft-cartodb && node app.js development &

You should now be able to see the carto login screen at http://<ip_address>:3000 and log in using the username / password you created above.

CARTO Login Screen

If you need to stop the servers, execute the following code. You may want to put it in a script named carto-stop.sh:

#!/bin/bash
set -x
killall node
killall ruby2.2

CARTO logs a significant amount of information and if you run CARTO for any significant amount of time, these logs will get quite large. The following script copies the logs for each of the three components to dated directories that you can delete when they are no longer needed.

#!/bin/bash
set -x
./carto-stop.sh
sleep 5
mv cartodb/log cartodb/log-`date +'%Y-%m-%d_%H-%M-%S'`
mkdir cartodb/log
mv CartoDB-SQL-API/logs CartoDB-SQL-API/logs-`date +'%Y-%m-%d_%H-%M-%S'`
mkdir CartoDB-SQL-API/logs
mv Windshaft-cartodb/logs Windshaft-cartodb/logs-`date +'%Y-%m-%d_%H-%M-%S'`
mkdir Windshaft-cartodb/logs
./carto-start.sh