Installing CARTO on AWS
Author: Michael Minn (www.michaelminn.com)
21 August 2018
This tutorial describes how to install CARTO on an Amazon Web Services (AWS) Elastic Compute Cloud (EC2) instance.
These instructions are for installing the v4.20.0-46-gf4b7df4 version of CARTO on an EC2 instance built with an Ubuntu Server 16.04 LTS AWS machine image. These instructions largely follow the official installation instructions, with augmentations and customizations specifically for AWS installation.
This tutorial assumes some familiarity with issuing commands using the Linux Bash console and text editors like vi or emacs.
This is an insecure installation (HTTP) using a fixed IP address and no domain or subdomain names. This was deemed adequate for an installation used for student exercises where confidentiality was not an issue. Obviously, if you are dealing with mission critical or sensitive information, you will want to adapt these instructions to use a domain name and HTTPS.
CARTO Architecture
CARTO consists of two conceptual components:
- CARTO Builder is a web app for mapping and analyzing geospatial data
- CARTO Engine is the web platform that supports CARTO Builder as well as application programmer interfaces (APIs) for building custom applications
While CARTO Builder is available as a subscription service and is also available free to students as part of the GitHub Student Developer Pack, installation of CARTO on your own servers can be a more-flexible, cost-efficient, or educational experience than use of the subscription service.
These instructions cover installation of three components of CARTO Engine that work together to make CARTO Builder available from an AWS EC2 instance:
- Carto Builder (cartodb) is a process written with Ruby that serves the HTML / CSS / JavaScript for the Carto Builder web app
- The SQL API (CartoDB-SQL-API) is a process written with Node.js that provides access to geospatial data stored in a PostgreSQL / PostGIS database
- The Maps API (Windshaft-cartodb) is a process written with Node.js that renders geospatial data as map tiles that can be displayed in a browser
Amazon EC2
The Amazon Elastic Compute Cloud (Amazon EC2) provides the ability to deploy virtual cloud servers (instances) on demand, and adjust the memory, storage, or processing power of those instances as needed. You pay for instances at an hourly rate based on those parameters, with additional charges for bandwidth and other (optional) features.
Get an AWS Account
To use EC2 instances, you set up an account at https://aws.amazon.com/.
You launch an new EC2 instance from the AWS EC2 Management Console
Step 1: Choose an Amazon Machine Image (AMI)
An EC2 instance is based on a machine image which includes the operating system and associated software. For this CARTO deployment we use a Ubuntu Server 16.04 LTS (HVM), SSD Volume Type - ami-759bc50a.
Step 2: Choose an Instance Type
For this fairly low-demand installation (20 maximum simultaneous users with data sets of limited size) a general purpose t2.medium (2-core + 4 GB memory + 8 GB storage) was adequate. You can dynamically resize instances if your needs dictate. As of this writing, the hourly charge for a t2.medium instance is $0.0464/hour, which works out to around $117 for a 15-week semester.
Step 3: Configure Instance Details
You can probably accept the default instance details, although you might want to turn on Protect against accidental termination. Terminating an instance will cause you to lose all data and configuration in an instance, so requiring multiple steps to confirm termination is a helpful safeguard.
Step 4: Add Storage
The standard t2.medium instance comes with 8 GB of general purpose elastic block storage (EBS) by default. Depending on what kind of data you are storing, that may not be enough, but you can increase this later if needed.
Step 5: Add Tags
Tags are helpful for organizing large numbers of instances. For this particular situation, there is only one instance so tags are unnecessary.
Step 6: Configure Security Group
A security group is a named set of firewall settings that help protect your instance from intrusion.
You should give your security group a meaningful name associating it with CARTO.
While we will need to add a number of rules to permit access to the various ports used by CARTO, to start with we will only allow in port 22, which is used to access the instance using the secure shell (SSH).
As a protection in this case, the Source is limited to a set of IP addresses associated with the ISP that services my home (Spectrum / Charter / Road Runner). Your particular network will vary, and 0.0.0.0/0 (everyone) is an option if security is not a major concern for you.
Step 7: Review Instance Launch
Review the instance settings and launch:
Select an existing key pair or create a new key pair
To access your instance through SSH, you will need a public/private key pair that is used to encrypt login information (2048-bit SSH-2 RSA keys). Upon launching your instnace you will be prompted to create a new key pair and download a .pem file you can use with your SSH program to log in to your instance.
You should keep this .pem file in a safe, memorable place. While it is possible to generate new key pairs through the AWS console, it will probably be easiest to just reuse the same .pem file.
Launch Status
A new instance will take a minute or two to initialize. You can go to the EC2 Instance panel (Services, EC2 Dashboard, Instances) to see and change the status of your new and existing instances.
Allocate Elastic IP Address
By default, your instance will get a different IP address from a pool of IP addresses every time you restart your instance. This IP address is listed under IPv4 Public IP and the public domain name based on that IP address is listed under Public DNS (IPv4).
You will likely need to stop and restart your instance at some point, such as when you need to add storage or if you just want to save hourly charges when the server is not being used. Since these instructions require a domain name or fixed IP address to correctly configure the CARTO components, you will need a fixed IP address.
AWS provides Elastic IP addresses that are free when associated with a running instance and that have only a minimal charge when the instance is stopped (to discourage neglect).
In the left-side panel menu, select Elastic IPs and Allocate new address:
Although you are provided with an AWS command line interface, you will probably not need this except in exceptional situations, so just select Allocate:
You will be given the new IP address:
You then need to Associate that address with your instance:
You will then see that IPv4 Public IP listed in the panel below the instance.
Configuring the Security Group
The CARTO Builder web app will need to be able to access:
- TCP port 3000 (Builder),
- TCP port 8080 (the SQL API), and
- TCP port 8181 (the Maps API)
Edit Inbound Rules for your security group to allow access to these ports. The simplest and least-secure configuration is to allow access to everyone on the internet (0.0.0.0/0). The one restriction is limiting SSH access into the instance from my home ISP (98.0.0.0/12).
However, if you know your server will be accessed from a limited number of networks (such as your home + your campus computer labs), you can add more-restrictive rules to block everyone else out. For example, this configuration limits access to my home ISP (98.0.0.0/12) and to the server's network itself (18.214.0.0/15) for testing.
Logging in to your Instance Using SSH
On a Linux box (or Mac) you can use the ssh command to access your image using the key-pair .pem file and the public IPv4 address of the instance.
You should use the -X option to forward X11 messages (so you can use graphical programs like browsers on the server) and the -C option to compress those messages (to improve speed).
The user name for a server based on an Ubuntu machine image is ubuntu.
ssh -X -C -i CartoKeyPair.pem ubuntu@18.215.23.17
If you get a warning about the .pem file being publicly readable, you can change its permission to be only accessible to you:
chmod 0400 CartoKeyPair.pem
Once you get into your instance, you may want to install all the current Ubuntu packages to make sure you do not have out-of-date/insecure software running on your server.
The sudo command runs commands as superuser, which is necessary when modifying system software.
The apt-get command installs, upgrades, or removes software packages from an Ubuntu software repository. update checks the repository for the latest software versions and upgrade upgrades any installed software with newer versions from the repository.
sudo apt-get update sudo apt-get upgrade
Installing Carto
These instructions are based on the official CARTO install instructions.
System Locales
Installations assume you use UTF-8 character encoding, which is the default with the Ubuntu machine image. You can verify this with the locale command:
locale
The output should look something like this:
LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
Development Tools
The following installs development tools needed at various points in the installation:
sudo apt-get install make pkg-config
Git
Git is a popular, free and open source distributed version control system used by developers to track and distribute changes to large projects. For these installation instruction it is used to download the current versions of the CARTO software. Git is included with the Ubuntu machine image, but you should run an install to make sure you have the current version:
sudo apt-get install git
PostgreSQL
PostgreSQL is a popular open-source relational database that is used by CARTO to store data.
CARTO requires PostgreSQL version 10+.
The PPA repository provides additional patches to PostgreSQL, which are not needed but help improve performance in production environments.
sudo add-apt-repository ppa:cartodb/postgresql-10 sudo apt-get update sudo apt-get install postgresql-10 postgresql-plpython-10 postgresql-server-dev-10
PostgreSQL access authorization is managed through pg_hba.conf configuration file, which is normally in /etc/postgresql/10/main/pg_hba.conf.
To simplfy installation and operation, connections to postgreSQL should be configured to allow access from the local machine (localhost) without authentication.
Edit /etc/postgresql/10/main/pg_hba.conf, modifying the existing lines to use trust authentication (no password access from localhost):
local all postgres trust local all all trust host all all 127.0.0.1/32 trust
Restart PostgreSQL with the systemctl command to apply these changes:
sudo systemctl restart postgresql
Create the CARTO PostgreSQL users with the createuser command:
sudo createuser publicuser --no-createrole --no-createdb --no-superuser -U postgres sudo createuser tileuser --no-createrole --no-createdb --no-superuser -U postgres
Download the CartoDB postgresql extensions, which contain functions that are used by different parts of the CartoDB platform:
git clone https://github.com/CartoDB/cartodb-postgresql.git cd cartodb-postgresql
Find the tag for the downloaded version.
git describe --tags
In this case, the displayed version tag was 0.23.2-9-g0c86469. This tag will vary depending on the current version when you perform the download. Check out that version and make it:
git checkout 0.23.2-9-g0c86469 make all
If the make completes with no errors, install:
sudo make install
GDAL
GDAL (Geospatial Data Abstraction Library) is a library for reading a wide variety of geospatial data formats.
Add the CARTO GIS repository, update to read the current version from the repository, and install GDAL:
sudo add-apt-repository ppa:cartodb/gis sudo apt-get update sudo apt-get install gdal-bin libgdal-dev
PostGIS
PostGIS is a set of extensions for PostgreSQL that adds support for SQL queries of geographic objects.
CARTO requires PostGIS 2.4, which is the package in the cartodb repository.
sudo apt-get install postgis
Create the PostGIS template database that CARTO uses when creating new spatial databases:
sudo createdb -T template0 -O postgres -U postgres -E UTF8 template_postgis psql -U postgres template_postgis -c 'CREATE EXTENSION postgis;CREATE EXTENSION postgis_topology;' sudo ldconfig
Run the install check. This spits out alot of information and may issue warnings, but should not fail with a fatal error.
cd cd cartodb-postgresql sudo PGUSER=postgres make installcheck
Redis
Redis is an open source, in-memory data structure store, used as a database, cache and message broker by CARTO.
CARTO requires Redis 4+, which is available in the cartodb repository:
sudo add-apt-repository ppa:cartodb/redis-next sudo apt-get update sudo apt-get install redis
Node.js
Node.js is a runtime environment used to build network applications written in JavaScript. Node.js is used to build the SQL API and Map API used in CARTO.
Add the repository and install:
sudo add-apt-repository ppa:cartodb/nodejs sudo apt-get update sudo apt-get install nodejs
npm is a JavaScript package manager used by CARTO that should be installed automatically along with Node.js.
CARTO uses Node.js v6.9.2 and npm 3.10.9. You can verify that the correct versions have been installed with:
nodejs -v npm -v
You also need to install some development libraries needed to build some of the CARTO Node.js modules:
sudo apt-get install libpixman-1-0 libpixman-1-dev sudo apt-get install libcairo2-dev libjpeg-dev libgif-dev libpango1.0-dev
SQL API
The CARTO SQL API is the CARTO process written with Node.js that is used to access data from the PostGIS database.
Download the code:
cd git clone git://github.com/CartoDB/CartoDB-SQL-API.git
Use npm to install with dependencies:
cd CartoDB-SQL-API npm install
Create the configuration script from the example template:
cp config/environments/development.js.example config/environments/development.js
MAPS API
The Maps API is the CARTO process written with Node.js that is used to create map tiles from geospatial data for display in a browser. This is an extension of CARTO's Windshaft library.
Download the code:
cd git clone git://github.com/CartoDB/Windshaft-cartodb.git
Use the Yarn package manager to install the additional libraries needed by the maps API:
cd Windshaft-cartodb sudo npm install -g yarn@0.27.5 yarn install
Create the configuration script from the example template:
cp config/environments/development.js.example config/environments/development.js mkdir logs
Ruby
CARTO Builder is written in the Ruby programming language.
CARTO requires exactly Ruby 2.2.x. Older or newer versions won't work.
Brightbox provides Ruby packages optimized for Ubuntu. Add their repository and install Ruby 2.2:
sudo apt-add-repository ppa:brightbox/ruby-ng sudo apt-get update sudo apt-get install ruby2.2 ruby2.2-dev
Bundler is an app used by CARTO Builder to manage Ruby components (gems).
sudo apt-get install ruby-bundler
Compass is a stylesheet authoring framework used by CARTO Builder.
sudo gem install compass
CARTO Builder
Finally, we install the CARTO Builder software.
Note that CARTO requires Python 2.7+ and will not work with Python 3. Python 2.7.12 is installed by default in the AWS Ubuntu 16.04 machine image, but you should check the python version to make sure.
python -V
Pip is a Python package manager.
sudo apt-get install python-pip
Install other standard libraries used by CARTO Builder.
sudo apt-get install imagemagick unp zip libicu-dev
Download the CARTO Builder code.
cd git clone --recursive https://github.com/CartoDB/cartodb.git
Install the Ruby components used by CARTO Builder.
cd cartodb RAILS_ENV=development bundle install
Install the Python components used by CARTO builder.
sudo CPLUS_INCLUDE_PATH=/usr/include/gdal \ C_INCLUDE_PATH=/usr/include/gdal \ PATH=$PATH:/usr/include/gdal \ pip install --no-use-wheel -r python_requirements.txt
Install the Node.js components used by CARTO builder.
npm install
Compile the CARTO Builder static assets.
npm run carto-node
Create configuration files from the sample files.
cp config/app_config.yml.sample config/app_config.yml cp config/database.yml.sample config/database.yml
Start the redis-server that allows access to the SQL and Maps APIs.
sudo systemctl start redis-server
Initialize the metadata database.
RAILS_ENV=development bundle exec rake db:create RAILS_ENV=development bundle exec rake db:migrate
Create the First User
sh script/create_dev_user
The subdomain is the login user name. CARTO used to use separate subdomains for individual users.
CARTO Configuration
You will need to modify configuration files for the three components: Builder, the SQL API, and the Map API. These files are poorly documented and configuration problems result in cryptic error messages from the API that are difficult to diagnose.
These examples use 172.31.49.47 as the private IP address (internal to the AWS network) and 18.215.23.17 as the public IP address. Your values will be different: see Allocate Elastic IP Address above.
SQL API Configuration
The SQL API file is CartoDB-SQL-API/config/environments/development.js.
Change the module.exports.node_host from...
module.exports.node_host = '127.0.0.1';
... to the private IP address for your EC2 instance.
module.exports.node_host = '172.31.49.47';
Map API Configuration
The Map API file is Windshaft-cartodb/config/environments/development.js.
Change the config.host (line 4) from...
,host: '127.0.0.1'
... to the private IP address for your EC2 instance.
,host: '172.31.49.47'
Change the analysis.batch.endpoint (line 263) from localhost...
endpoint: 'http://127.0.0.1:8080/api/v2/sql/job',
...to the PUBLIC IP address for your EC2 instance.
endpoint: 'http://18.215.23.17:8080/api/v2/sql/job',
Note that failure to configure the batch endpoint correctly will cause the app to freeze because it can't access the server. The server error message will be:
Template <name> of user <username> not found
CARTO Builder Configuration
The CARTO Builder configuration file is cartodb/config/app_config.yml. This file requires extensive modification.
Change the session_domain (line xxx) from...
session_domain: '.localhost.lan'
...to:
session_domain: ''
Enable subdomainless URLs by changing subdomainless_urls (line xxx) from...
subdomainless_urls: false
...to:
subdomainless_urls: true
Change the account_host (line xxx) from...
account_host: 'localhost.lan:3000'
...to:
account_host: 'localhost'
Change the vizjson_cache_domains (line xxx) from...
vizjson_cache_domains: ['.localhost.lan']
...to:
vizjson_cache_domains: ['localhost']
Change the map API hosts/domains (lines xxx - xxx) from...
tiler: filter: 'mapnik' internal: protocol: 'http' domain: 'localhost.lan' port: '8181' host: '18.215.23.17' verifycert: false private: protocol: 'http' domain: 'localhost.lan' port: '8181' verifycert: false public: protocol: 'http' domain: 'localhost.lan' port: '8181' verifycert: false
...to:
tiler: filter: 'mapnik' internal: protocol: 'http' domain: '18.215.23.17' port: '8181' host: '18.215.23.17' verifycert: false private: protocol: 'http' domain: '18.215.23.17' port: '8181' verifycert: false public: protocol: 'http' domain: '18.215.23.17' port: '8181' verifycert: false
Change the SQL API hosts/domains (lines xxx - xxx) from...
sql_api: private: protocol: 'http' domain: 'localhost.lan' endpoint: '/api/v1/sql' port: 8080 public: protocol: 'http' domain: 'localhost.lan' endpoint: '/api/v2/sql' port: 8080
...to:
sql_api: private: protocol: 'http' domain: '18.215.23.17' endpoint: '/api/v1/sql' port: 8080 public: protocol: 'http' domain: '18.215.23.17' endpoint: '/api/v2/sql' port: 8080
Change the assets storage from S3 to local (lines xxx - xxx). from...
aws: s3: access_key_id: "test" secret_access_key: "test" region: '' assets: s3_bucket_name: "tests" max_file_size: 5242880 # 5.megabytes region: ''
...to:
#aws: # s3: # access_key_id: "test" # secret_access_key: "test" # region: '' assets: # s3_bucket_name: "tests" max_file_size: 5242880 # 5.megabytes # region: '' location: 'organization_assets'
This will configure CARTO to place assets like images in:
./cartodb/public/uploads/development/[username]/assets/[filename]
Starting the Processes
Execute the following code. You may want to put it in a script named carto-start.sh:
#!/bin/bash set -x sudo systemctl start redis-server cd cartodb && bundle exec script/resque > /dev/null & cd cartodb && bundle exec thin start --threaded -p 3000 --threadpool-size 5 & cd CartoDB-SQL-API && node app.js development & cd Windshaft-cartodb && node app.js development &
You should now be able to see the carto login screen at http://<ip_address>:3000 and log in using the username / password you created above.
If you need to stop the servers, execute the following code. You may want to put it in a script named carto-stop.sh:
#!/bin/bash set -x killall node killall ruby2.2
CARTO logs a significant amount of information and if you run CARTO for any significant amount of time, these logs will get quite large. The following script copies the logs for each of the three components to dated directories that you can delete when they are no longer needed.
#!/bin/bash set -x ./carto-stop.sh sleep 5 mv cartodb/log cartodb/log-`date +'%Y-%m-%d_%H-%M-%S'` mkdir cartodb/log mv CartoDB-SQL-API/logs CartoDB-SQL-API/logs-`date +'%Y-%m-%d_%H-%M-%S'` mkdir CartoDB-SQL-API/logs mv Windshaft-cartodb/logs Windshaft-cartodb/logs-`date +'%Y-%m-%d_%H-%M-%S'` mkdir Windshaft-cartodb/logs ./carto-start.sh