Performance testing is a method of testing various aspects of Repose in order to collect performance data. Performance data can be used to determine the overhead of the aspects in question, both at a system resource level as well as at a response time level.

To performance test Repose, and collect the resulting data for analysis, the Framework is used.

Methodology

In an attempt to provide the most meaningful data possible, the following test cases will be covered by independent tests:

  • The simplest (i.e., passthrough) Repose deployment.

  • Every Repose filter.

  • A relatively complex Repose deployment.

  • Common Repose deployments (e.g., RBAC).

Framework

Design

The overall design of the automatic performance testing framework can be outlined succinctly by the following diagram:

perf test design diagram

A code change in the Repose GitHub repository on the master branch will trigger a Jenkins job to run. The Jenkins job will run the Ansible playbooks for all defined test cases. The Ansible playbooks, in turn, will provision the necessary cloud resources and begin executing the Gatling simulations for each test case. As Gatling runs, it will pipe the performance data it captures to InfluxDB. Once Gatling execution completes, the cloud resources will be cleaned up, and the Ansible playbook will finish executing. Grafana will query InfluxDB to retrieve the performance data which is available in the Performance Tests dashboard.

Gatling

Data will be transported from Gatling to InfluxDB using the Graphite line protocol. The data being transferred will include numerous statistics calculated for a distinct time. To get a better idea of the data in question, a sample has been provided which captures the number of request/response pairs sent at time 1492629621 as well as various statistics calculated from the amount of time taken to service those requests. Note that the sample data only captures a single second of the Gatling simulation. Much more data of the same form would be expected for every second that the Gatling simulation ran.

gatling.computerworld.getComputers.ok.count 2 1492629621
gatling.computerworld.getComputers.ok.min 192 1492629621
gatling.computerworld.getComputers.ok.max 219 1492629621
gatling.computerworld.getComputers.ok.mean 205 1492629621
gatling.computerworld.getComputers.ok.stdDev 13 1492629621
gatling.computerworld.getComputers.ok.percentiles50 192 1492629621
gatling.computerworld.getComputers.ok.percentiles75 219 1492629621
gatling.computerworld.getComputers.ok.percentiles95 219 1492629621
gatling.computerworld.getComputers.ok.percentiles99 219 1492629621
gatling.computerworld.getComputers.ko.count 0 1492629621
gatling.computerworld.getComputers.all.count 2 1492629621
gatling.computerworld.getComputers.all.min 192 1492629621
gatling.computerworld.getComputers.all.max 219 1492629621
gatling.computerworld.getComputers.all.mean 205 1492629621
gatling.computerworld.getComputers.all.stdDev 13 1492629621
gatling.computerworld.getComputers.all.percentiles50 192 1492629621
gatling.computerworld.getComputers.all.percentiles75 219 1492629621
gatling.computerworld.getComputers.all.percentiles95 219 1492629621
gatling.computerworld.getComputers.all.percentiles99 219 1492629621

Repose

System data will be transported from Repose hosts to InfluxDB using Telegraf. More specifically, Telegraf will collect system metrics like CPU usage, memory usage, and disk usage, and report those metrics to a UDP listener in InfluxDB. These metrics will provide better insight into the system constraints of hosts running Repose.

InfluxDB

InfluxDB will process the data it receives through its Graphite listener port by applying configured templates. These templates will translate the metric names from the Graphite line protocol into measurements, fields, tags, and values in the database.

For example, given the sample data provided in the Gatling section, the following template could be applied:

gatling.*.*.*.* measurement.simulation.request.status.field

Resulting in a database full of entries like the following:

Table 1. Measurement Name: gatling
Field/Tag Key Field/Tag Value

time

1492629621000000000

count

2

max

219

mean

205

min

192

percentiles50

192

percentiles75

219

percentiles95

219

percentiles99

219

request

getComputers

simulation

computerworld

status

all

stdDev

12

To learn more about how InfluxDB handles Graphite input, see Graphite Input.

Deployment

The following diagram illustrates the interactions between each component referenced in the Design.

Performance Test Framework

Creating a New Test

The directory structure of the performance tests is organized by category and test. That is, each file is located in a directory named after the specific test which is itself located in a directory named after the category of test. For example, the repose configuration files directory:

  • repose-aggregator/tests/performance-tests/src/performanceTest/resources/roles/repose/files/config/

contains the category sub-directories filters/, services/, and use-cases/.

Armed with this information, these are the basic steps to create a new test:

  1. Create a vars file for the new test.

    1. This needs to be located at:

      • repose-aggregator/tests/performance-tests/src/performanceTest/resources/test_vars/{category}/{test}.yaml

    2. At a minimum, the vars file should contain:

      • dir_prefix

        The dir_prefix is the {category}/{test} used to create the new vars file (e.g., filters/saml).

      • required filter chain

      • Gatling simulation class package

        The package is similarly created using {category}.{test} (e.g., filters.saml).

      • Gatling simulation class name

        The name should follow the Camel Cased pattern of {Test}{Category}Simulation (e.g., SamlFilterSimulation, ComplexUseCaseSimulation).

  2. If necessary, add Repose configuration files.

    1. Template files should be placed in the directory:

      • repose-aggregator/tests/performance-tests/src/performanceTest/resources/roles/repose/templates/config/{dir_prefix}/

        The Framework uses Ansible and it supports Jinja2 templates. These templates allow users to add dynamic content to files and Ansible variables. Jinja2 templated files must have a .j2 file extension appended to the file name and it will be removed as part of the templating process.

    2. Configuration files should be placed in the directory:

      • repose-aggregator/tests/performance-tests/src/performanceTest/resources/roles/repose/files/config/{dir_prefix}/

  3. Create a Gatling simulation.

    1. The Gatling simulation which will be responsible for generating test traffic and performance test data, should be created in the file:

      • repose-aggregator/tests/performance-tests/src/performanceTest/scala/{package}/{name}.scala

    2. The Gatling simulation should, in most cases, extend the AbstractReposeSimulation class provided by the Repose performance-test-library.

  4. If necessary, add mock services.

    1. Create any necessary mocks in the directory:

      • repose-aggregator/tests/performance-tests/src/performanceTest/resources/roles/repose/files/mocks/{dir_prefix}/.

Running a Test Automatically

A Jenkins job is triggered any time that code on the master branch of the Repose Git Repository changes. The Jenkins job will run the performance tests and publish the results automatically. Design describes the full automated process in more detail.

Running a Test Manually

If you have the appropriate permissions on the Repose Jenkins instance, you can run the performance-test Jenkins job to manually start a single performance test.

  1. In the performance-test Jenkins job click on the Build with Parameters link.

  2. Populate the build parameters for the test you want to run.

  3. Click Build.

Table 2. performance-test build parameters
Name Required Description Example

perf_test

Yes

name of the performance test you wish to perform

uses-cases/simple

extra_options

No

extra options to pass to the ansible-playbook command

-e '{gatling: {test: {params: {duration_in_min: 1}}}}'

branch

Yes

the branch to build from (Default: master)

REP-0000_NewPerfTest

Running a Test Locally

  1. Set up your environment.

    We use Puppet to manage our performance test environment. The details of our environment can be found in our Puppet manifest. If you have Puppet installed, you can download our manifest and re-create our environment by running puppet apply directly on the downloaded file.

  2. Clone the Repose GitHub repository from Github.
    git clone https://github.com/rackerlabs/repose.git

  3. Navigate to the performance test directory in the project.
    cd repose/repose-aggregator/tests/performance-tests

  4. If the test is not on the master branch, then you simply indicate the correct branch in the test’s vars file:

    {
      dir_prefix: "filter/saml",
      repose: {
        install: {
          git: {
            branch: "REP-0000_NewPerfTest" (1)
          }
        },
        ...
      },
      ...
    }
    1 This indicates the desired branch.
  5. Run the start_test.yml Ansible playbook.
    This playbook will create the necessary Cloud resources, provision the servers, and execute the Gatling simulation. When running the playbook, you may specify which test to run and what configuration to override for that test.

    ansible-playbook start_test.yml \ (1)
      -e '{perf_test: filters/saml, (2)
           gatling: {test: {params: {duration_in_min: 15, saml: {num_of_issuers: 2}}}}}' \ (3)
      -vvvv (4)
    1 Run the start_test.yml playbook.
    2 Set the config value for perf_test to filters/saml to run the SAML test.
    3 Override the values for the two config parameters using YAML syntax.
    4 Provide very verbose output.
  6. View the Results.

  7. Clean up cloud resources.
    ansible-playbook stop_test.yml -e '{perf_test: filters/saml}' -vvvv

Results

Grafana Dashboard

TODO
  • Describe the dashboard.

Grafana serves as the user interface to performance test data which can be seen in the Performance Tests dashboard.

HTML Report

After test execution has completed, Gatling will generate an HTML report. The report can be found inside an archive residing in the home directory with a file name like SamlFilterSimulationRepose-1496319945650.tar.gz. The exact file name will be included in the output of the Fetch the Gatling results task which will resemble the following snippet:

changed: [perf-filters-saml-testsuite01] => {
    "changed": true,
    "checksum": "9bf6327e21d9029870e81451287aef61428329bf",
    "dest": "/home/perftester/SamlFilterSimulationRepose-1496319945650.tar.gz", (1)
    "md5sum": "aeace1aecfb4949b0e6bbd104e87b7a0",
    "remote_checksum": "9bf6327e21d9029870e81451287aef61428329bf",
    "remote_md5sum": null
}
1 This is the file name.

After the report archive is located, the contents must be extracted before the report can be viewed. That can be done with a command like the following:
tar xvf /home/perftester/SamlFilterSimulationRepose-1496319945650.tar.gz

Once the report has been extracted, it can be viewed by opening the index.html file with a web browser.

Debugging

Running Gatling Manually

If you need to manually run Gatling, you can do so from the Gatling host.

export JAVA_OPTS="-Dtest.base_url=162.200.100.100:7070" (1)
export GATLING_HOME="/root/gatling/current" (2)
cd $GATLING_HOME

gatling -s filters.saml.SamlFilterSimulation (3)
1 The target host and port need to be specified for all of the simulations. These JAVA_OPTS are only for Gatling and do not apply to Repose.
2 GATLING_HOME needs to be set for Gatling to run.
3 filters.saml.SamlFilterSimulation is the package-qualified class name of the simulation to run. It should be replaced with the FQCN of the simulation that should be run.

Configuring Gatling Manually

Files and directories of interest:

  • conf/logback.xml - Lets you turn on debug logging; useful to see the request and response data.

  • conf/application.conf - The configuration file read by the simulations.

  • results/ - Contains all of the results.

  • user-files/simulations/ - Contains the test simulations written in Scala and organized by package.

Additional Information

Test Preparation

The Ansible directory is prepared by running the Gradle task buildAnsibleDirectory. The base ansible files are processed using an ant based templating, this allows any build time values to be templated in (such as the gatling version to use). The simulations are copied in from the source directory. Any dependencies of the simulations will be copied into place for placement in the Gatling lib directory.

Performance Test Library

To reduce redundancy, the Repose team created the performance-test-framework for Gatling. This library includes the AbstractReposeSimulation, which is meant to serve as the basis for most, if not all, simulations.

To utilize the performance-test-framework, its JAR file must be placed in the Gatling lib directory. This is handled along with any other dependencies by the Gradle task buildAnsibleDirectory. The lib directory is a sub-directory of the Gatling home directory.

In the context of the Framework, Jenkins will build the performance-test-framework and copy the resulting JAR into the Ansible directory structure where it will then be copied to the hosts running Gatling.

Warmup Duration

Run the simple use-case test for 15 minutes with a 5 minute warmup (20 minutes total):

ansible-playbook start_test.yml \
  -e '{perf_test: use-cases/simple,
       gatling: {test: {params: {duration_in_min: 15, warmup_duration_in_min: 5}}}}' \
  -vvvv

Skipping Repose or Origin Service Tests

By default, both the Repose test and the origin service test (i.e., Repose is bypassed) are run. You can use Ansible tags to specify that only one of those tests should be run.

ansible-playbook start_test.yml \
  -e '{perf_test: filters/saml}' \
  --tags "repose" \
  -vvvv
ansible-playbook start_test.yml \
  -e '{perf_test: filters/saml}' \
  --tags "origin" \
  -vvvv

Cloud Server Flavor

You can specify which cloud server flavor to use in the configuration overrides.

ansible-playbook start_test.yml \
  -e '{perf_test: filters/saml,
       cloud: {server: {repose: {flavor: performance1-4}}}}' \
  -vvvv

Increasing Timeout for Repose Startup

By default, the Repose start task will wait 5 minutes for Repose to start up. If you expect Repose to take longer to start (e.g., due to a large WADL), you can increase this timeout using a command like the following:

ansible-playbook start_test.yml \
  -e '{perf_test: filters/saml,
       repose: {service: {start_timeout_in_sec: 900}}}' \
  -vvvv

Saxon

To get Repose to use Saxon, add a saxon-license.lic file to the repose-aggregator/tests/performance-tests/roles/repose/files/ directory and pass in the following configuration override:

ansible-playbook start_test.yml \
  -e '{perf_test: filters/saml,
       repose: {systemd_opts: {use_saxon: true}}}' \
  -vvvv

JVM Options (Heap size, Garbage Collection, etc.)

You can set the JVM Options (JAVA_OPTS) used by Repose by setting the following configuration override:

ansible-playbook start_test.yml \
  -e '{perf_test: filters/saml,
       repose: {
         systemd_opts: {
           java_opts: ["-Xmx2g", "-Xms2g", "-XX:+UseG1GC", "-XX:+UseStringDeduplication"]}}}' \
  -vvvv

Automatic Cloud Resource Cleanup

Use the master playbook to run a performance test and immediately clean up afterwards without having to running the individual playbooks.

ansible-playbook site.yml \
  -e '{perf_test: filters/saml}' \
  -vvvv

Running a Test Again

You can re-run a test using the same cloud resources by simply running the start_test.yml playbook again. You can even specify different configuration overrides in subsequent runs, although there are some limitations. For example, you can enable Saxon for a subsequent run, but you can’t disable it afterwards. Also, if you don’t want the Repose JVM already warmed up from the previous run, you should have Ansible restart the Repose service. This feature is considered experimental.

ansible-playbook start_test.yml \
  -e '{perf_test: filters/saml,
       repose: {service: {state: restarted}, systemd_opts: {use_saxon: true}}}' \
  -vvvv

Gatling Output Directory Name

You can specify the base name of the directory containing the Gatling results in the configuration overrides. For example, if you wanted the base name custom-base-name (resulting in a directory name resembling custom-base-name-repose-1487812914968), you would run:

ansible-playbook start_test.yml \
  -e '{perf_test: filters/saml,
       gatling: {results: {output_dir: custom-base-name}}}' \
  -vvvv

Running Parallel Tests

Running a performance test with a unique naming prefix enables you to run a particular test multiple times simultaneously (each run requires a unique prefix):

ansible-playbook start_test.yml \
  -e '{perf_test: filters/saml,
       cloud: {naming_prefix: perf-saml-many-issuers},
       gatling: {test: {params: {saml: {num_of_issuers: 100}}}}}' \
  -vvvv

If you’re using the stop_test.yml playbook to clean up your cloud resources, you’ll need to include the unique prefix to ensure the correct resources are deleted. If the prefix is not specified, the wrong cloud resources or no cloud resources could end up being deleted, and in both cases, no error will be returned (due to idempotency).

ansible-playbook stop_test.yml \
  -e '{perf_test: filters/saml,
       cloud: {naming_prefix: perf-saml-many-issuers}}' \
  -vvvv

Scripting Filter Test

When running a Scripting filter test, the perf_test variable will look slightly different than it does for other tests. The reason is that the scripting language to be used in the test is nested one level down in the perf_test variable value.

ansible-playbook start_test.yml \
  -e '{perf_test: filters/scripting/python}' \
  -vvvv