8.8.4.0 | Architecture

Performance testing is a method of testing various aspects of Repose in order to collect performance data. Performance data can be used to determine the overhead of the aspects in question, both at a system resource level as well as at a response time level.

To performance test Repose, and collect the resulting data for analysis, the Framework is used.

Methodology

In an attempt to provide the most meaningful data possible, the following test cases will be covered by independent tests:

The simplest (i.e., passthrough) Repose deployment.
Every Repose filter.
A relatively complex Repose deployment.
Common Repose deployments (e.g., RBAC).

Framework

Design

The overall design of the automatic performance testing framework can be outlined succinctly by the following diagram:

A code change in the Repose GitHub repository on the master branch will trigger a Jenkins job to run. The Jenkins job will run the Ansible playbooks for all defined test cases. The Ansible playbooks, in turn, will provision the necessary cloud resources and begin executing the Gatling simulations for each test case. As Gatling runs, it will pipe the performance data it captures to InfluxDB. Once Gatling execution completes, the cloud resources will be cleaned up, and the Ansible playbook will finish executing. Grafana will query InfluxDB to retrieve the performance data which is available in the Performance Tests dashboard.

Gatling

Data will be transported from Gatling to InfluxDB using the Graphite line protocol. The data being transferred will include numerous statistics calculated for a distinct time. To get a better idea of the data in question, a sample has been provided which captures the number of request/response pairs sent at time 1492629621 as well as various statistics calculated from the amount of time taken to service those requests. Note that the sample data only captures a single second of the Gatling simulation. Much more data of the same form would be expected for every second that the Gatling simulation ran.

gatling.computerworld.getComputers.ok.count 2 1492629621
gatling.computerworld.getComputers.ok.min 192 1492629621
gatling.computerworld.getComputers.ok.max 219 1492629621
gatling.computerworld.getComputers.ok.mean 205 1492629621
gatling.computerworld.getComputers.ok.stdDev 13 1492629621
gatling.computerworld.getComputers.ok.percentiles50 192 1492629621
gatling.computerworld.getComputers.ok.percentiles75 219 1492629621
gatling.computerworld.getComputers.ok.percentiles95 219 1492629621
gatling.computerworld.getComputers.ok.percentiles99 219 1492629621
gatling.computerworld.getComputers.ko.count 0 1492629621
gatling.computerworld.getComputers.all.count 2 1492629621
gatling.computerworld.getComputers.all.min 192 1492629621
gatling.computerworld.getComputers.all.max 219 1492629621
gatling.computerworld.getComputers.all.mean 205 1492629621
gatling.computerworld.getComputers.all.stdDev 13 1492629621
gatling.computerworld.getComputers.all.percentiles50 192 1492629621
gatling.computerworld.getComputers.all.percentiles75 219 1492629621
gatling.computerworld.getComputers.all.percentiles95 219 1492629621
gatling.computerworld.getComputers.all.percentiles99 219 1492629621

Repose

System data will be transported from Repose hosts to InfluxDB using Telegraf. More specifically, Telegraf will collect system metrics like CPU usage, memory usage, and disk usage, and report those metrics to a UDP listener in InfluxDB. These metrics will provide better insight into the system constraints of hosts running Repose.

InfluxDB

InfluxDB will process the data it receives through its Graphite listener port by applying configured templates. These templates will translate the metric names from the Graphite line protocol into measurements, fields, tags, and values in the database.

For example, given the sample data provided in the Gatling section, the following template could be applied:

gatling.*.*.*.* measurement.simulation.request.status.field

Resulting in a database full of entries like the following:

Table 1. Measurement Name: gatling
Field/Tag Key	Field/Tag Value
time	1492629621000000000
count	2
max	219
mean	205
min	192
percentiles50	192
percentiles75	219
percentiles95	219
percentiles99	219
request	getComputers
simulation	computerworld
status	all
stdDev	12

To learn more about how InfluxDB handles Graphite input, see Graphite Input.

Deployment

The following diagram illustrates the interactions between each component referenced in the Design.

Creating a New Test

The directory structure of the performance tests is organized by category and test. That is, each file is located in a directory named after the specific test which is itself located in a directory named after the category of test. For example, the repose configuration files directory:

repose-aggregator/tests/performance-tests/roles/repose/files/config/

contains the category sub-directories filters/, services/, and use-cases/.

Armed with this information, these are the basic steps to create a new test:

Create a vars file for the new test.
1. This needs to be located at:
  - repose-aggregator/tests/performance-tests/test_vars/{category}/{test}.yaml
2. At a minimum, the vars file should contain:
  - dir_prefix
    
    The dir_prefix is the {category}/{test} used to create the new vars file (e.g., filters/saml).
  - required filter chain
  - Gatling simulation class package
    
    The package is similarly created using {category}.{test} (e.g., filters.saml).
  - Gatling simulation class name
    
    The name should follow the Camel Cased pattern of {Test}{Category}Simulation (e.g., SamlFilterSimulation, ComplexUseCaseSimulation).

If necessary, add Repose configuration files.

Template files should be placed in the directory:

repose-aggregator/tests/performance-tests/roles/repose/templates/config/{dir_prefix}/

The Framework uses Ansible and it supports Jinja2 templates. These templates allow users to add dynamic content to files and Ansible variables. Jinja2 templated files must have a .j2 file extension appended to the file name and it will be removed as part of the templating process.

Configuration files should be placed in the directory:
- repose-aggregator/tests/performance-tests/roles/repose/files/config/{dir_prefix}/

Create a Gatling simulation.
1. The Gatling simulation which will be responsible for generating test traffic and performance test data, should be created in the file:
  - repose-aggregator/tests/performance-tests/roles/gatling/files/simulations/{package}/{name}.scala
If necessary, add mock services.
1. Create any necessary mocks in the directory:
  - repose-aggregator/tests/performance-tests/roles/repose/files/mocks/{dir_prefix}/.

Running a Test Automatically

A Jenkins job is triggered any time that code on the master branch of the Repose Git Repository changes. The Jenkins job will run the performance tests and publish the results automatically. Design describes the full automated process in more detail.

Running a Test Manually

If you have the appropriate permissions on the Repose Jenkins instance, you can run the performance-test Jenkins job to manually start a single performance test.

In the performance-test Jenkins job click on the Build with Parameters link.
Populate the build parameters for the test you want to run.
Click Build.

Table 2. performance-test build parameters
Name	Required	Description	Example
perf_test	Yes	name of the performance test you wish to perform	`uses-cases/simple`
extra_options	No	extra options to pass to the ansible-playbook command	`-e '{gatling: {test: {params: {duration_in_min: 1}}}}'`
branch	Yes	the branch to build from (Default: `master`)	`REP-0000_NewPerfTest`

Running a Test Locally

Set up your environment.

We use Puppet to manage our performance test environment. The details of our environment can be found in our Puppet manifest. If you have Puppet installed, you can download our manifest and re-create our environment by running puppet apply directly on the downloaded file.

Clone the Repose GitHub repository from Github.
git clone https://github.com/rackerlabs/repose.git
Navigate to the performance test directory in the project.
cd repose/repose-aggregator/tests/performance-tests

If the test is not on the master branch, then you simply indicate the correct branch in the test’s vars file:

{
  dir_prefix: "filter/saml",
  repose: {
    install: {
      git: {
        branch: "REP-0000_NewPerfTest" (1)
      }
    },
    ...
  },
  ...
}

1	This indicates the desired branch.

Run the start_test.yml Ansible playbook.
This playbook will create the necessary Cloud resources, provision the servers, and execute the Gatling simulation. When running the playbook, you may specify which test to run and what configuration to override for that test.

ansible-playbook start_test.yml \ (1)
  -e '{perf_test: filters/saml, (2)
       gatling: {test: {params: {duration_in_min: 15, saml: {num_of_issuers: 2}}}}}' \ (3)
  -vvvv (4)

1	Run the `start_test.yml` playbook.
2	Set the config value for `perf_test` to `filters/saml` to run the SAML test.
3	Override the values for the two config parameters using YAML syntax.
4	Provide very verbose output.

View the Results.
Clean up cloud resources.
ansible-playbook stop_test.yml -e '{perf_test: filters/saml}' -vvvv

Results

Grafana Dashboard

TODO

Describe the dashboard.

Grafana serves as the user interface to performance test data which can be seen in the Performance Tests dashboard.

HTML Report

After test execution has completed, Gatling will generate an HTML report. The report can be found inside an archive residing in the home directory with a file name like SamlFilterSimulationRepose-1496319945650.tar.gz. The exact file name will be included in the output of the Fetch the Gatling results task which will resemble the following snippet:

changed: [perf-filters-saml-testsuite01] => {
    "changed": true,
    "checksum": "9bf6327e21d9029870e81451287aef61428329bf",
    "dest": "/home/perftester/SamlFilterSimulationRepose-1496319945650.tar.gz", (1)
    "md5sum": "aeace1aecfb4949b0e6bbd104e87b7a0",
    "remote_checksum": "9bf6327e21d9029870e81451287aef61428329bf",
    "remote_md5sum": null
}

1	This is the file name.

After the report archive is located, the contents must be extracted before the report can be viewed. That can be done with a command like the following:
tar xvf /home/perftester/SamlFilterSimulationRepose-1496319945650.tar.gz

Once the report has been extracted, it can be viewed by opening the index.html file with a web browser.

Debugging

Running Gatling Manually

If you need to manually run Gatling, you can do so from the Gatling host.

export JAVA_OPTS="-Dtest.base_url=162.200.100.100:7070" (1)
export GATLING_HOME="/root/gatling/current" (2)
cd $GATLING_HOME

gatling -s filters.saml.SamlFilterSimulation (3)

1	The target host and port need to be specified for all of the simulations. These JAVA_OPTS are only for Gatling and do not apply to Repose.
2	`GATLING_HOME` needs to be set for Gatling to run.
3	`filters.saml.SamlFilterSimulation` is the package-qualified class name of the simulation to run. It should be replaced with the FQCN of the simulation that should be run.

Configuring Gatling Manually

Files and directories of interest:

conf/logback.xml - Lets you turn on debug logging; useful to see the request and response data.
conf/application.conf - The configuration file read by the simulations.
results/ - Contains all of the results.
user-files/simulations/ - Contains the test simulations written in Scala and organized by package.

Additional Information

Warmup Duration

Run the simple use-case test for 15 minutes with a 5 minute warmup (20 minutes total):

ansible-playbook start_test.yml \
  -e '{perf_test: use-cases/simple,
       gatling: {test: {params: {duration_in_min: 15, warmup_duration_in_min: 5}}}}' \
  -vvvv

Skipping Repose or Origin Service Tests

By default, both the Repose test and the origin service test (i.e., Repose is bypassed) are run. You can use Ansible tags to specify that only one of those tests should be run.

ansible-playbook start_test.yml \
  -e '{perf_test: filters/saml}' \
  --tags "repose" \
  -vvvv

ansible-playbook start_test.yml \
  -e '{perf_test: filters/saml}' \
  --tags "origin" \
  -vvvv

Cloud Server Flavor

You can specify which cloud server flavor to use in the configuration overrides.

ansible-playbook start_test.yml \
  -e '{perf_test: filters/saml,
       cloud: {server: {repose: {flavor: performance1-4}}}}' \
  -vvvv

Increasing Timeout for Repose Startup

By default, the Repose start task will wait 5 minutes for Repose to start up. If you expect Repose to take longer to start (e.g., due to a large WADL), you can increase this timeout using a command like the following:

ansible-playbook start_test.yml \
  -e '{perf_test: filters/saml,
       repose: {service: {start_timeout_in_sec: 900}}}' \
  -vvvv

Saxon

To get Repose to use Saxon, add a saxon-license.lic file to the repose-aggregator/tests/performance-tests/roles/repose/files/ directory and pass in the following configuration override:

ansible-playbook start_test.yml \
  -e '{perf_test: filters/saml,
       repose: {systemd_opts: {use_saxon: true}}}' \
  -vvvv

JVM Options (Heap size, Garbage Collection, etc.)

You can set the JVM Options (JAVA_OPTS) used by Repose by setting the following configuration override:

ansible-playbook start_test.yml \
  -e '{perf_test: filters/saml,
       repose: {
         systemd_opts: {
           java_opts: "-Xmx2g -Xms2g -XX:+UseG1GC -XX:+UseStringDeduplication"}}}' \
  -vvvv

Automatic Cloud Resource Cleanup

Use the master playbook to run a performance test and immediately clean up afterwards without having to running the individual playbooks.

ansible-playbook site.yml \
  -e '{perf_test: filters/saml}' \
  -vvvv

Running a Test Again

You can re-run a test using the same cloud resources by simply running the start_test.yml playbook again. You can even specify different configuration overrides in subsequent runs, although there are some limitations. For example, you can enable Saxon for a subsequent run, but you can’t disable it afterwards. Also, if you don’t want the Repose JVM already warmed up from the previous run, you should have Ansible restart the Repose service. This feature is considered experimental.

ansible-playbook start_test.yml \
  -e '{perf_test: filters/saml,
       repose: {service: {state: restarted}, systemd_opts: {use_saxon: true}}}' \
  -vvvv

Gatling Output Directory Name

You can specify the base name of the directory containing the Gatling results in the configuration overrides. For example, if you wanted the base name custom-base-name (resulting in a directory name resembling custom-base-name-repose-1487812914968), you would run:

ansible-playbook start_test.yml \
  -e '{perf_test: filters/saml,
       gatling: {results: {output_dir: custom-base-name}}}' \
  -vvvv

Running Parallel Tests

Running a performance test with a unique naming prefix enables you to run a particular test multiple times simultaneously (each run requires a unique prefix):

ansible-playbook start_test.yml \
  -e '{perf_test: filters/saml,
       cloud: {naming_prefix: perf-saml-many-issuers},
       gatling: {test: {params: {saml: {num_of_issuers: 100}}}}}' \
  -vvvv

If you’re using the stop_test.yml playbook to clean up your cloud resources, you’ll need to include the unique prefix to ensure the correct resources are deleted. If the prefix is not specified, the wrong cloud resources or no cloud resources could end up being deleted, and in both cases, no error will be returned (due to idempotency).

ansible-playbook stop_test.yml \
  -e '{perf_test: filters/saml,
       cloud: {naming_prefix: perf-saml-many-issuers}}' \
  -vvvv

Scripting Filter Test

When running a Scripting filter test, the perf_test variable will look slightly different than it does for other tests. The reason is that the scripting language to be used in the test is nested one level down in the perf_test variable value.

ansible-playbook start_test.yml \
  -e '{perf_test: filters/scripting/python}' \
  -vvvv

Performance Testing