AYNTKA Performance testing

Performance and capacity is linked together and are two fundamental quality characteristics for systems. Quality assurance of performance easily get quite technical. This article intend to summarize some of the concepts and terminology used.

Introduction

Performance testing starts with understanding the system under test. Degrading performance, or poor performance, could originate from a thousande different parameters. For example these include:

Poorly written code
Bad SQL
Un-optimized database indexes
Bad architecture with the system
Infrastructure limitations (firewalls, IPS'es, switches, reverse proxies)
Data structure and data volumes
Location to access from
Load pattern
Concurrent batch schedules
Caching policy setups
Hardware
OS and middleware settings
Component configuration (version of dependencies, drivers, runtimes)
...and so on...

Types of performance tests

Load, stress, robustness, long term

APM (Application Performance Monitoring

If you are lucky the organization has invested in APM. These include for example:

DynaTrace
NewRelic
AppDynamics

The basic idea of these tools is to tag incoming transactions and follow them through the system to pinpoint where the most time is spent and what code get used the most times. These tools are really good, but require agents to be installed on a lot of system components for full use, and soon can become quite expensive. Generally the use is often leveraged with a capability for A/B testing for risk mitigation.

Load generation procedure

When APM is out of scope, load generation based performance testing most often is the best way to feel safe about system reliabilty and performance.

Different types of communication methods come with different challenges to performance and different ways of assessing the performance/capacity of a software system. Generally the following way of working is applied:

Understand if the test system differs from the production system in any way.
Get a grasp of the expected load distribution upon the system.
Find a way to generate applicable load upon the system under test.
Set up relevant monitoring of the system resource utilization during the test.
Perform the test while monitoring.
Analyze the results.
Summarize and document the test so a comparitive test may be ran at a later time.

Understand differences between test environment and production

A performance risk assessment workshop is often a good way to start performance tests. This is an excellent opportunity to efficiently get a lot of relevant information prior to a performance test.

You can find a checklist for performance risk assessment (in Swedish) here: http://damberg.one/TestAutomationCourses/performanceriskassessment.html.

Understand the expected load upon the system

If you are really lucky you can get hold of performance requirements. If they exist (they are rare), they are often documented as non-functional requirements. Even if they exist you almost always have to interprete the requirements anyway.
For example you may have to treat the absolute response times given as 95 percentile, or assume the data volumes in the system.

Often you can base your load on historic transaction/request volumes. The marketing team might use Google Analytics, or the backend logs give a clue of how much load to expect. You may have to adjust these numbers for marketing campains, expected increasing usage due to the changes introduced in the system, limitations of the test environment and so forth.

Generate the load

Sometimes you don't even need a special tool for this. If your architecture is transaction based or event based you may for example halt an MQ broker to queue up a lot of messages, and when you start the broker again the messages will be a load of its own.

There are litteraly a thousend tools to generate a load upon a system. Some of the most widely used tools include:

Apache JMeter
Neoload
Microfocus LoadRunner
Locust
LoadImpact
LoadImpress
Grinder
ParallelJUnit
ParallelMsTest

In these tools you create the load as it would be made up from one single user/transaction and then make this run in parallel threads at load time. Some of the tools also include monotoring capabilities and tools to guide in the analysis of performance metrics.

Most types of loads consist of a series of transactions. A transaction is based upon the Request->Response pattern. These series are then ran consecutively, and in parallel threads based on load size, until the test is considered done.

In a sense the performance tests share a lot of similarities with API testing, since they too work on the protocol level. It could be good to refresh the following chapters to fully understand the implications on performance testing:

Parameterization - variations in the scripts

To avoid false sense of security due to the caches taking all the load, or the same single user account being used in testing, the scripts produced should include some variations. This is achieved by parameterizing the load scripts. For example you make the scripts use different search words, different user accounts, or different types of records.

Parameterization is achieved by substituting request values at runtime. Most tools support this from Excel like tables of data rows and rules for how to treat the rows (random, sequential, looping conditions).

Correlation - capturing and re-using data from server responses

If you log in to a system you probably get some kind of login token from the server. In order for your subsequent server communication to work you might need to send this token with each request. To make this happen you need to capture the session token from the server response after login, and apply this to any following request to the server. This is called correlation.

Correlation is applied in a lot of use cases. If your script create a new issue/record/object in the database it probably receives a referrer to the server side object (issue id or similar) to be able to continue working with this entity. You will need to correlate the returned referrer/id.

The major tools all have mechanism to help with correlation.

Monitoring

Applying the load

Before applying the load remember to:

Notify any stakeholders in the environment
Make sure relevant staff is present
The environment is backed-up if it breaks under load

Analyze results

Document the test

Documenting the test is most often done in a document that is created at the begining of test preparations and remains open and continuously edited throuhout the duration of test.

Do not forget that the report shouldn't only tell about the findings of the test. It should also include information enough for future re-tests (even with other tools) could be performed and compared to the results from the current test.

The report should also contain a section of notes to ease any re-test. Among other things this could include information about data usage, code snippets for performed correlations, where to find parameterization data, and session management clues.

Kapacitet och prestanda är två viktiga kvalitetegenskaper. Det kan vara svårt att få kontroll så att man känner sig trygg med prestanda. Denna checklista är framtagen för att, i workshop-form ledd av en erfaren prestandaspecialist, identifiera hur man bäst ska förhålla sig till prestandarisker.

Happy learning, or happy confirmation that you are already skilled.

Introduktion till domänområdet och till workshopen

Upplägg för riskworkshop

Ett systems prestanda kan stjälpas av tusentals olika orsaker. Att få kontroll på alla dessa så att man känner sig trygg kan vara en utmaning. Denna utmaning blir ännu större när man går mot kontinuerliga produktionssättningar.

Som bäst brukar en sådan typ av workshop fungera om den leds av en erfaren prestandaspecialist och består av:

Systemägare eller någon som förstår hur systemanvändningsförändringar över tid sker.
Systemarkitekt eller senior utvecklare som förstår hur systemet fungerar tekniskt.
Eventuell testledare eller likande som förstår hur systemet egentligen fungerar utifrån sett och som vet vad som redan testats.
Eventuellt kan det vara bra med någon med budgetansvar (projektledare/systemansvarig) så att beslut kan fattas i slutet av mötet utan ytterligare förankring.

Effektiv prestandasäkring

Många gånger försöker man säkra prestandarisker så tidigt som det är lämpligt. Många typer av kodrelaterade problem går att lösa direkt i IDE:n, genom dess profilers.
Ytterligare andra, som samtidighetsbuggar, går att hitta i enhetstester som säkrar prestanda. Några kräver integrerade system med access till riktiga databaser snarare än de minnesdatabaser eller mockade databaser som är vanliga vid enhetstester.
Vissa typer av prestanaproblem, som onödiga full-table-scans i databaser, blir ett problem först vid produktionsvolymer av data - och ibland kanske först efter några års användning av systemet så att det har accumulerat tillräckligt mycket data för att det ska bli ett problem.

En del typer av prestandaproblem, som t.ex. suboptimalt konfigurerad middleware eller infrastruktur, eller dåligt indexerade databaser, eller IPS:er som är onödigt grundliga, kan behöva genomföra prestandatester på en högre nivå.
Vid denna typ av prestandatest skapar man skript som på protokollnivå simulerar en i väsentliga aspekter produktionslik last mot ett system. Under tiden lasten sakta ökas monitorerar man nyckelparametrar i resursanvändningen av systemet. Det kan röra sig om CPU, RAM, I/O, connection pool, kölängder, cache-nyttjande, svarstider, felfrekvens och mycket mer.

Begrepp

Begrepp	Betydelse
Korrelering	I en kommunikation mellan en klient och en server fångas ett datafält ur ett svar från en server och återanvänds i senare request.
Parameterisering	Om man använder samma data i många parallella trådar testar man bara cache-funktioner. Man brukar därför parameterisera, byta ut, t.ex. användarkonto och liknande i en konversation mellan en klient och en server.
Monitorering	Kontinuerlig övervakning av relevanta mätpunkter på server, applikation, middleware och infrastruktur under last.
Användarscenario	De transaktioner som en simulerad (virtuell) användare av en viss kategori genomför i systemt.
Lastscenario	Viktad last över en testomgång. Hur skalas användarscenarion upp och ner.
Synkroniseringspunkt	Punkt i ett lastscenario där de virtuella (simulerade) användarna väntar in varandra. Används mest i samtidighetstester.
Probing client	En manuellt, eller automatiserad, hanterad applikationsklient som utför aktiviteter i den faktiska applikationsklienten undertiden en last ligger på systemet. Den används för att dels känna av upplevd svarstid och dels rimlighetsbedöma effekten av den pålagda lasten.

Saker att tänka på

Prestandatester har en förmåga att konsumera enorma mängder testdata.
Det händer att man kör sönder systemen ordentligt i prestandatester - se till att backa upp miljön.
Tänk på att ett system ofta kan ge spill-laster till andra system under last. Förvarna dessa inför att man lägger på en last.
Var noga med vad du monitorerar under lastkörningen.
Det kan vara svårt att förstå vad som är rimlig och verklighetstypisk last att lägga på. Lägg möda på att analysera detta.

All you need to know about:

Performance testing

Table of contents