Performance testing is a method of testing various aspects of Repose in order to collect performance data. Performance data can be used to determine the overhead of the aspects in question, both at a system resource level as well as at a response time level.
To performance test Repose, and collect the resulting data for analysis, the Framework is used.
In an attempt to provide the most meaningful data possible, the following test cases will be covered by independent tests:
The simplest (i.e., passthrough) Repose deployment.
Every Repose filter.
A relatively complex Repose deployment.
Common Repose deployments (e.g., RBAC).
The overall design of the automatic performance testing framework can be outlined succinctly by the following diagram:
A code change in the Repose GitHub repository on the master branch will trigger a Jenkins job to run. The Jenkins job will run the Ansible playbooks for all defined test cases. The Ansible playbooks, in turn, will provision the necessary cloud resources and begin executing the Gatling simulations for each test case. As Gatling runs, it will pipe the performance data it captures to InfluxDB. Once Gatling execution completes, the cloud resources will be cleaned up, and the Ansible playbook will finish executing. Grafana will query InfluxDB to retrieve the performance data which is available in the Performance Tests dashboard.
Data will be transported from Gatling to InfluxDB using the Graphite line protocol.
The data being transferred will include numerous statistics calculated for a distinct time.
To get a better idea of the data in question, a sample has been provided which captures the number of request/response pairs sent at time 1492629621
as well as various statistics calculated from the amount of time taken to service those requests.
Note that the sample data only captures a single second of the Gatling simulation.
Much more data of the same form would be expected for every second that the Gatling simulation ran.
gatling.computerworld.getComputers.ok.count 2 1492629621 gatling.computerworld.getComputers.ok.min 192 1492629621 gatling.computerworld.getComputers.ok.max 219 1492629621 gatling.computerworld.getComputers.ok.mean 205 1492629621 gatling.computerworld.getComputers.ok.stdDev 13 1492629621 gatling.computerworld.getComputers.ok.percentiles50 192 1492629621 gatling.computerworld.getComputers.ok.percentiles75 219 1492629621 gatling.computerworld.getComputers.ok.percentiles95 219 1492629621 gatling.computerworld.getComputers.ok.percentiles99 219 1492629621 gatling.computerworld.getComputers.ko.count 0 1492629621 gatling.computerworld.getComputers.all.count 2 1492629621 gatling.computerworld.getComputers.all.min 192 1492629621 gatling.computerworld.getComputers.all.max 219 1492629621 gatling.computerworld.getComputers.all.mean 205 1492629621 gatling.computerworld.getComputers.all.stdDev 13 1492629621 gatling.computerworld.getComputers.all.percentiles50 192 1492629621 gatling.computerworld.getComputers.all.percentiles75 219 1492629621 gatling.computerworld.getComputers.all.percentiles95 219 1492629621 gatling.computerworld.getComputers.all.percentiles99 219 1492629621
System data will be transported from Repose hosts to InfluxDB using Telegraf. More specifically, Telegraf will collect system metrics like CPU usage, memory usage, and disk usage, and report those metrics to a UDP listener in InfluxDB. These metrics will provide better insight into the system constraints of hosts running Repose.
InfluxDB will process the data it receives through its Graphite listener port by applying configured templates. These templates will translate the metric names from the Graphite line protocol into measurements, fields, tags, and values in the database.
For example, given the sample data provided in the Gatling section, the following template could be applied:
gatling.*.*.*.* measurement.simulation.request.status.field
Resulting in a database full of entries like the following:
Field/Tag Key | Field/Tag Value |
---|---|
time |
1492629621000000000 |
count |
2 |
max |
219 |
mean |
205 |
min |
192 |
percentiles50 |
192 |
percentiles75 |
219 |
percentiles95 |
219 |
percentiles99 |
219 |
request |
getComputers |
simulation |
computerworld |
status |
all |
stdDev |
12 |
To learn more about how InfluxDB handles Graphite input, see Graphite Input.
The following diagram illustrates the interactions between each component referenced in the Design.
The directory structure of the performance tests is organized by category
and test
.
That is, each file is located in a directory named after the specific test
which is itself located in a directory named after the category
of test.
For example, the repose configuration files directory:
repose-aggregator/tests/performance-tests/src/performanceTest/resources/roles/repose/files/config/
contains the category
sub-directories filters/
, services/
, and use-cases/
.
Armed with this information, these are the basic steps to create a new test:
Create a vars file for the new test.
This needs to be located at:
repose-aggregator/tests/performance-tests/src/performanceTest/resources/test_vars/{category}/{test}.yaml
At a minimum, the vars file should contain:
dir_prefix
The |
required filter chain
Gatling simulation class package
The |
Gatling simulation class name
The |
If necessary, add Repose configuration files.
Template files should be placed in the directory:
repose-aggregator/tests/performance-tests/src/performanceTest/resources/roles/repose/templates/config/{dir_prefix}/
The Framework uses Ansible and it supports Jinja2 templates.
These templates allow users to add dynamic content to files and Ansible variables.
Jinja2 templated files must have a |
Configuration files should be placed in the directory:
repose-aggregator/tests/performance-tests/src/performanceTest/resources/roles/repose/files/config/{dir_prefix}/
Create a Gatling simulation.
The Gatling simulation which will be responsible for generating test traffic and performance test data, should be created in the file:
repose-aggregator/tests/performance-tests/src/performanceTest/scala/{package}/{name}.scala
The Gatling simulation should, in most cases, extend the AbstractReposeSimulation
class provided by the Repose performance-test-library
.
If necessary, add mock services.
Create any necessary mocks in the directory:
repose-aggregator/tests/performance-tests/src/performanceTest/resources/roles/repose/files/mocks/{dir_prefix}/
.
A Jenkins job is triggered any time that code on the master branch of the Repose Git Repository changes. The Jenkins job will run the performance tests and publish the results automatically. Design describes the full automated process in more detail.
If you have the appropriate permissions on the Repose Jenkins instance, you can run the performance-test Jenkins job to manually start a single performance test.
In the performance-test Jenkins job click on the Build with Parameters link.
Populate the build parameters for the test you want to run.
Click Build.
Name | Required | Description | Example |
---|---|---|---|
perf_test |
Yes |
name of the performance test you wish to perform |
|
extra_options |
No |
extra options to pass to the ansible-playbook command |
|
branch |
Yes |
the branch to build from (Default: |
|
Set up your environment.
We use Puppet to manage our performance test environment.
The details of our environment can be found in our Puppet manifest.
If you have Puppet installed, you can download our manifest and re-create our environment by running |
Clone the Repose GitHub repository from Github.
git clone https://github.com/rackerlabs/repose.git
Navigate to the performance test directory in the project.
cd repose/repose-aggregator/tests/performance-tests
If the test is not on the master
branch, then you simply indicate the correct branch in the test’s vars
file:
{
dir_prefix: "filter/saml",
repose: {
install: {
git: {
branch: "REP-0000_NewPerfTest" (1)
}
},
...
},
...
}
1 | This indicates the desired branch. |
Run the start_test.yml
Ansible playbook.
This playbook will create the necessary Cloud resources, provision the servers, and execute the Gatling simulation.
When running the playbook, you may specify which test to run and what configuration to override for that test.
ansible-playbook start_test.yml \ (1)
-e '{perf_test: filters/saml, (2)
gatling: {test: {params: {duration_in_min: 15, saml: {num_of_issuers: 2}}}}}' \ (3)
-vvvv (4)
1 | Run the start_test.yml playbook. |
2 | Set the config value for perf_test to filters/saml to run the SAML test. |
3 | Override the values for the two config parameters using YAML syntax. |
4 | Provide very verbose output. |
View the Results.
Clean up cloud resources.
ansible-playbook stop_test.yml -e '{perf_test: filters/saml}' -vvvv
TODO
|
Grafana serves as the user interface to performance test data which can be seen in the Performance Tests dashboard.
After test execution has completed, Gatling will generate an HTML report.
The report can be found inside an archive residing in the home directory with a file name like SamlFilterSimulationRepose-1496319945650.tar.gz
.
The exact file name will be included in the output of the Fetch the Gatling results
task which will resemble the following snippet:
changed: [perf-filters-saml-testsuite01] => { "changed": true, "checksum": "9bf6327e21d9029870e81451287aef61428329bf", "dest": "/home/perftester/SamlFilterSimulationRepose-1496319945650.tar.gz", (1) "md5sum": "aeace1aecfb4949b0e6bbd104e87b7a0", "remote_checksum": "9bf6327e21d9029870e81451287aef61428329bf", "remote_md5sum": null }
1 | This is the file name. |
After the report archive is located, the contents must be extracted before the report can be viewed.
That can be done with a command like the following:
tar xvf /home/perftester/SamlFilterSimulationRepose-1496319945650.tar.gz
Once the report has been extracted, it can be viewed by opening the index.html
file with a web browser.
If you need to manually run Gatling, you can do so from the Gatling host.
export JAVA_OPTS="-Dtest.base_url=162.200.100.100:7070" (1)
export GATLING_HOME="/root/gatling/current" (2)
cd $GATLING_HOME
gatling -s filters.saml.SamlFilterSimulation (3)
1 | The target host and port need to be specified for all of the simulations. These JAVA_OPTS are only for Gatling and do not apply to Repose. |
2 | GATLING_HOME needs to be set for Gatling to run. |
3 | filters.saml.SamlFilterSimulation is the package-qualified class name of the simulation to run.
It should be replaced with the FQCN of the simulation that should be run. |
Files and directories of interest:
conf/logback.xml
- Lets you turn on debug logging; useful to see the request and response data.
conf/application.conf
- The configuration file read by the simulations.
results/
- Contains all of the results.
user-files/simulations/
- Contains the test simulations written in Scala and organized by package.
The Ansible directory is prepared by running the Gradle task buildAnsibleDirectory
.
The base ansible files are processed using an ant based templating, this allows any build time values to be templated in (such as the gatling version to use).
The simulations are copied in from the source directory.
Any dependencies of the simulations will be copied into place for placement in the Gatling lib
directory.
To reduce redundancy, the Repose team created the performance-test-framework
for Gatling.
This library includes the AbstractReposeSimulation
, which is meant to serve as the basis for most, if not all, simulations.
To utilize the performance-test-framework
, its JAR file must be placed in the Gatling lib
directory.
This is handled along with any other dependencies by the Gradle task buildAnsibleDirectory
.
The lib
directory is a sub-directory of the Gatling home directory.
In the context of the Framework, Jenkins will build the performance-test-framework
and copy the resulting JAR into the Ansible directory structure where it will then be copied to the hosts running Gatling.
Run the simple use-case test for 15 minutes with a 5 minute warmup (20 minutes total):
ansible-playbook start_test.yml \
-e '{perf_test: use-cases/simple,
gatling: {test: {params: {duration_in_min: 15, warmup_duration_in_min: 5}}}}' \
-vvvv
By default, both the Repose test and the origin service test (i.e., Repose is bypassed) are run. You can use Ansible tags to specify that only one of those tests should be run.
ansible-playbook start_test.yml \
-e '{perf_test: filters/saml}' \
--tags "repose" \
-vvvv
ansible-playbook start_test.yml \
-e '{perf_test: filters/saml}' \
--tags "origin" \
-vvvv
You can specify which cloud server flavor to use in the configuration overrides.
ansible-playbook start_test.yml \
-e '{perf_test: filters/saml,
cloud: {server: {repose: {flavor: performance1-4}}}}' \
-vvvv
By default, the Repose start task will wait 5 minutes for Repose to start up. If you expect Repose to take longer to start (e.g., due to a large WADL), you can increase this timeout using a command like the following:
ansible-playbook start_test.yml \
-e '{perf_test: filters/saml,
repose: {service: {start_timeout_in_sec: 900}}}' \
-vvvv
To get Repose to use Saxon, add a saxon-license.lic
file to the repose-aggregator/tests/performance-tests/roles/repose/files/
directory and pass in the following configuration override:
ansible-playbook start_test.yml \
-e '{perf_test: filters/saml,
repose: {systemd_opts: {use_saxon: true}}}' \
-vvvv
You can set the JVM Options (JAVA_OPTS
) used by Repose by setting the following configuration override:
ansible-playbook start_test.yml \
-e '{perf_test: filters/saml,
repose: {
systemd_opts: {
java_opts: ["-Xmx2g", "-Xms2g", "-XX:+UseG1GC", "-XX:+UseStringDeduplication"]}}}' \
-vvvv
Use the master playbook to run a performance test and immediately clean up afterwards without having to running the individual playbooks.
ansible-playbook site.yml \
-e '{perf_test: filters/saml}' \
-vvvv
You can re-run a test using the same cloud resources by simply running the start_test.yml
playbook again.
You can even specify different configuration overrides in subsequent runs, although there are some limitations.
For example, you can enable Saxon for a subsequent run, but you can’t disable it afterwards.
Also, if you don’t want the Repose JVM already warmed up from the previous run, you should have Ansible restart the Repose service.
This feature is considered experimental.
ansible-playbook start_test.yml \
-e '{perf_test: filters/saml,
repose: {service: {state: restarted}, systemd_opts: {use_saxon: true}}}' \
-vvvv
You can specify the base name of the directory containing the Gatling results in the configuration overrides.
For example, if you wanted the base name custom-base-name
(resulting in a directory name resembling custom-base-name-repose-1487812914968
), you would run:
ansible-playbook start_test.yml \
-e '{perf_test: filters/saml,
gatling: {results: {output_dir: custom-base-name}}}' \
-vvvv
Running a performance test with a unique naming prefix enables you to run a particular test multiple times simultaneously (each run requires a unique prefix):
ansible-playbook start_test.yml \
-e '{perf_test: filters/saml,
cloud: {naming_prefix: perf-saml-many-issuers},
gatling: {test: {params: {saml: {num_of_issuers: 100}}}}}' \
-vvvv
If you’re using the stop_test.yml
playbook to clean up your cloud resources, you’ll need to include the unique prefix to ensure the correct resources are deleted.
If the prefix is not specified, the wrong cloud resources or no cloud resources could end up being deleted, and in both cases, no error will be returned (due to idempotency).
ansible-playbook stop_test.yml \
-e '{perf_test: filters/saml,
cloud: {naming_prefix: perf-saml-many-issuers}}' \
-vvvv
When running a Scripting filter test, the perf_test
variable will look slightly different than it does for other tests.
The reason is that the scripting language to be used in the test is nested one level down in the perf_test
variable value.
ansible-playbook start_test.yml \
-e '{perf_test: filters/scripting/python}' \
-vvvv