Node Install - Docker Image

Sandfly nodes allow the system to connect to Linux hosts to do agentless investigations and forensic analysis. You need at least one node running at all times, but it is recommended you start multiple nodes for redundancy and performance.

Referring to the diagram below, the nodes are the workhorse of Sandfly. The nodes connect to the protected hosts over SSH to do investigations and report back the results to the server. Each node has 500 threads and can easily scan many times this number of hosts during operation.

Sandfly High-Level Overview

Sandfly High-Level Overview

We recommend you start more than one node container for normal operation. Each new node container provides 500 more scanning threads so it is very easy to build massive capability with Sandfly to protect many hosts even in large network deployments. The nodes will connect to the server and handle scanning requests on demand with automatic load balancing. You do not need to do anything to the nodes except ensure they have SSH access to the hosts they are required to protect.

The containers can all run on the same Virtual Machine (VM), but we recommend that this VM not be the same one used to host the server for security reasons. The only limit to how many node containers you can run is the CPU and RAM of the VM hosting them.

The rest of these instructions will get the host VM ready to run the node containers.

Standard Security vs. Maximum Security Installation

The section on Standard Security vs. Maximum Security installation goes over the differences in how to deploy Sandfly for your environment. If you are running a very small deployment, or testing the product, you may want to use the Standard Security mode. For customers with resources to do so, we highly recommend the Maximum Security installation of running the server and nodes on separate VMs.

I Want to Use the Standard Security Install

If you are happy running the server and scanning node containers on the same VM, you can skip most of the instructions here. Simply go to the Start the Node section below to start a scanning node on the same system as the server and proceed to log in to begin using Sandfly.

I Want to Use the Maximum Security Install

If you want to use the recommended separate VMs for running the server and scanning nodes, you will need to do all of the steps outlined below.

Download Setup Scripts

The Sandfly setup scripts are located on Github. Please visit the URL below to obtain the latest version:

https://github.com/sandflysecurity/sandfly-setup/releases

The version format is X.X.X (e.g. 5.0.2)

wget https://github.com/sandflysecurity/sandfly-setup/releases/download/vX.X.X/sandfly-setup-X.X.X.tgz

tar -xzvf sandfly-setup-X.X.X.tgz

There should be a directory named sandfly-setup after you decompress the image. This is where all the operations below will take place.

Install Container Tool

Sandfly uses Docker or Podman as everything runs inside a container for security and performance reasons. Ubuntu and CentOS repositories often contain very old versions of Docker and are not compatible with Sandfly, as is the snap version of Docker on Ubuntu. Please use the appropriate install method below to install the latest, supported version of Docker or Podman.

❗️

IMPORTANT: Do Not Install Alternate / Additional Versions of Docker

Some Linux distributions (such a Ubuntu via the use of snap and apt) or manual docker installations allow for two versions of docker packages to exist without a warning. This can potentially cause problems when starting Sandfly containers. Please ensure that there is only one supported version of docker installed on the host OS.

❗️

IMPORTANT: CentOS Repositories Are Too Old for Docker

Some Linux distributions (such a CentOS) contain old versions of Docker that are not compatible with Sandfly. Please perform the installation via one of the provided scripts to ensure that a supported version of Docker is used.

For Podman installations, first complete the steps found in the Run Sandfly with Podman documentation, then return to this page and proceed to the Copy Over Config JSON from the Server to the Node section to continue the node installation.

For Docker installations, use one of the scripts below to install the latest version:

CentOS 7 Docker Install

~/sandfly-setup/setup/install_docker_centos7.sh

Ubuntu 18 Docker Install

~/sandfly-setup/setup/install_docker_ubuntu18.sh

Ubuntu 20 and Newer Docker Install

~/sandfly-setup/setup/install_docker_ubuntu20.sh

Debian 9 and Newer Docker Install

~/sandfly-setup/setup/install_docker_debian.sh

Start Docker

Make sure the Docker daemon starts automatically or you can start it manually on Linux with the following command:

service docker start

Copy Over Config JSON from the Server to the Node

We now need to copy over the generated node config JSON file from the server. This file is populated with all cryptographic keys and related setup information for the node to automatically connect to the server and operate.

You will want to open two terminal windows. One will need to be connected to the server, and the other to the node. You could also use scp to copy the file or any other method you want as long as it is secure.

Go to the setup_data directory on the server and copy the configuration text:

# ON SERVER

cd ~/sandfly-setup/setup/setup_data

cat config.node.json

<copy contents>

Go to the setup_data directory on the node and paste the configuration text into the file:

# ON NODE

cd ~/sandfly-setup/setup/setup_data

cat > config.node.json

<paste contents>


<CTRL-D>

🚧

CAUTION: It is possible to create an invalid configuration file

Copy and pasting the text between screens can cause minor changes in the created config.node.json file that will cause an error later in the install process. This situation occurs most often when using the "cat" command. Pasting into the config.node.json file that was opened in your favorite text editor is less likely to cause this issue.

If you do use the paste method we also recommend that you validate the JSON structure of the file before proceeding on to the next step. The JSON can be validated with a method of your preference or if python3 is installed it can be quickly checked from the command line:

python3 -mjson.tool "./config.node.json" > /dev/null

The entire config.node.json file must be copied with all keys intact. Most of these values should not be altered unless advised to do so by Sandfly Security.

Delete the Node Config File

Sandfly uses high performance elliptic curve cryptography to secure SSH keys in the server database. To ensure these SSH keys are safe in the event of server compromise, the secret keys used to decrypt them are only stored on the scanning nodes.

Because of the above, we do not want the server to have both public and private keys for the nodes. After you copy the node config JSON to your nodes, we want to remove it from the server.

Go into the server setup_data directory and delete the config.node.json file. The server only needs the config.server.json file present.

You can use a secure delete on the node config file if available as shown below:

# ON SERVER:

shred -u ~/sandfly-setup/setup/setup_data/config.node.json

Or standard delete:

# ON SERVER:

rm ~/sandfly-setup/setup/setup_data/config.node.json

❗️

IMPORTANT: DELETE THE SECRET KEY

You must delete the node config (config.node.json) from the server to ensure full security of your SSH credentials with Sandfly.

Once the secret key has been deleted from the server, then you can start the node.

Start the Node

At this point you need to decide on how many node containers to start, but at least one needs to be running in order for Sandfly to function.

Start One Node Container

A single instance can be started with the following command:

~/sandfly-setup/start_scripts/start_node.sh

The Docker image will be pulled over if it does not already exist and the node will start if the keys from above were copied over correctly.

While we generally recommend running multiple node instances, there are a few reasons why you may only want to initially run just one. First, if you are installing Sandfly for the first time, having only one instance will help with debugging or monitoring. Secondly, the host/VM that is running the node does not have sufficient CPU and RAM resources. One node is sufficient for initial testing or light use, but eventually it is advisable to run multiple containers for regular use.

Start Additional Containers

You can start multiple node containers on the same system to get more performance and redundancy by simply running the start_node.sh script repeatedly. Make sure your host instance has sufficient RAM to run multiple node containers before doing this.

root@example:~/sandfly-setup/start_scripts# ./start_node.sh
0106c87dbfd304b3f6fef847702a41f603eb5e625c7b6194ba5fd30019533421

root@example:~/sandfly-setup/start_scripts# ./start_node.sh
9ecc25cdaae72589d4792a01989ab73001bcf400da05cfd436a54e9defc38be9

root@example:~/sandfly-setup/start_scripts# ./start_node.sh
a8c3b80228c47a7feabf0dcbee89cbd6a2d5abbe80ec7b2a61fc86ed246bfbd7

👍

TIP: We Recommend Running Multiple Containers

We recommend you run multiple node containers. You can run multiple containers on a single host instance or on individual hosts. Running multiple containers provides much higher performance and redundancy if a container exits unexpectedly.

Each node container runs 500 scanning threads. So for each node container you add onto the system you expand scanning capacity by 500 threads.

Running 5 nodes for instance gives you 2500 scanning threads. This means that you can scan 2500 hosts concurrently. It also means that if one container should die unexpectedly, you will still have capacity for scanning to continue uninterrupted.

You can run the following command to see all of the running node containers on a host:

docker ps

CONTAINER ID   IMAGE                           COMMAND                  CREATED          STATUS          PORTS                                                                            NAMES
865c0520124e   quay.io/sandfly/sandfly:5.0.2   "/opt/sandfly/start_…"   5 seconds ago    Up 3 seconds                                                                                     boring_jang
3b9a82546aae   quay.io/sandfly/sandfly:5.0.2   "/opt/sandfly/start_…"   7 seconds ago    Up 5 seconds                                                                                     clever_burnell
92b33fe63f33   quay.io/sandfly/sandfly:5.0.2   "/opt/sandfly/start_…"   8 seconds ago    Up 6 seconds                                                                                     goofy_blackwell

🚧

CAUTION: Node Container RAM and CPU

Make sure your host instance for the node containers has enough RAM before running many containers and a couple CPUs to make sure there are no performance issues.

A 2GB instance can run 4 containers comfortably. A 4GB instance can run around 10 node containers or perhaps more.

If you want to run many node containers on a single instance you will need to scale up RAM and CPU accordingly.

If you want, you can view the log of the node to make sure it is connected and functioning properly. One way to do this is by finding out what the Docker log is called for output after you run the start script above.

Use the docker name or container id of the targeted container to find what unique log name is used for that container instance:

docker inspect boring_jang | grep LogPath
        "LogPath": "/var/lib/docker/containers/865c0500124e4b119f36447a3556264a3996c5fd78eeee009e7fe10fbbe2e847/865c0500124e4b119f36447a3256264a3996c5fd78eeee009e7fe10fbbe2e847-json.log",

With the LogPath file information from the above command, the log can then be viewed. In the example below, the log is displayed via the tail command and its output will be appended as new log entries come in due to the -f option:

tail -f /var/lib/docker/containers/865c0500124e4b119f36447a3556264a3996c5fd78eeee009e7fe10fbbe2e847/865c0500124e4b119f36447a3256264a3996c5fd78eeee009e7fe10fbbe2e847-json.log
{"log":"Setting fallback_directory to /dev/shm\n","stream":"stdout","time":"2024-02-02T15:31:03.680165719Z"}
{"log":"Concurrency set to 500\n","stream":"stdout","time":"2024-02-02T15:31:03.744270095Z"}
{"log":"Simulator multiplier set to 0\n","stream":"stdout","time":"2024-02-02T15:31:03.797034353Z"}
{"log":"Starting Node\n","stream":"stdout","time":"2024-02-02T15:31:03.959266465Z"}
{"log":"{\"time\":\"2024-02-02T15:31:03.964868301Z\",\"level\":\"INFO\",\"msg\":\"starting Sandfly node\",\"version\":\"5.0.2\",\"build_date\":\"2024-01-04T02:44:13Z\"}\n","stream":"stderr","time":"2024-02-02T15:31:03.972896661Z"}
{"log":"{\"time\":\"2024-02-02T15:31:03.964989048Z\",\"level\":\"INFO\",\"msg\":\"loading config file\",\"path\":\"conf/config.json\"}\n","stream":"stderr","time":"2024-02-02T15:31:03.972982467Z"}
{"log":"{\"time\":\"2024-02-02T15:31:04.00731022Z\",\"level\":\"INFO\",\"msg\":\"successfully loaded additional CA certificates from config\"}\n","stream":"stderr","time":"2024-02-02T15:31:04.007616874Z"}
{"log":"{\"time\":\"2024-02-02T15:31:04.007362626Z\",\"level\":\"INFO\",\"msg\":\"node thread limit\",\"threads\":500}\n","stream":"stderr","time":"2024-02-02T15:31:04.008122668Z"}
...

Leaving this example command running will allow you to continue to see node messages scroll by, oftentimes very quickly when scans are occurring. That example is not necessary for normal Sandfly operations, but it can be useful for debugging or monitoring should there be potential errors or performance issues.

Alternatively, you can also use this formatted log viewing method:

How to get a complete log from a Sandfly docker container?

The installation section is almost complete, please continue on with the next page.