A lot of projects and companies nowadays no longer have dedicated testers. That doesn’t mean they no longer do testing; they simply share the responsibility of testing inside a development team. Testing becomes an activity that everyone in the team does, but there’s also a strong focus on automation and trying to create large regression suites that cover as much as possible from the overall functionality of the application.
I’ve also seen automated scripts created in several contexts where the people creating them were focused on solving the programming challenges, but they seemed to overlook one key element: how to make their tests powerful. There were lots of hours involved, lots of tools and frameworks, lots of lines of code, but there was little understanding of the application and superficial interest in what the tests will find and cover. So the teams put a lot of effort in creating extensive automated test suites but the question that remained was “Do they bring enough value?”
In contexts like these, it’s easy to forget that, even though you no longer have team members that call themselves testers and even though you no longer do exploratory testing but you focus only on automation, you still need to develop your testing skills.
One way to do that, for example, is through our BBST courses, which give you a chance develop your skills by practicing them. Many of the concepts that are taught in the courses address automation related challenges and help you come up with solutions for problems that are specific to automation and to designing good automated suites.
In this article, I want to give you an insight into some of these concepts and give you a glimpse of what you might learn and what skills you would be working on if you decide to take these courses.
1. Partial Oracles
In testing, an oracle is a mechanism that helps us recognize potential problems. It helps us determine if the product we are testing has passed or failed a test.
A survey into the oracle problem in software testing by Earl T. Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz and Shin Yoo, nicely categorizes oracles in 4 different types:
- Specified Oracles. One of the most common ways of deciding if a behaviour is correct or not is by referring to a formal specification. As the survey authors explain, the specified oracles are not referring to textual documentation, but “model-based specification languages, state transition systems, assertions and contracts, and algebraic specifications.”
- Derived Oracles. When specified test oracles are not available, distinguishing between correct or incorrect behavior can be done using derived oracles: how do other similar systems work? How did the previous version of this product work? Is there a model that can be generated from the textual documentation?
- Implicit Oracles. These rely on general implicit knowledge, like: an application shouldn’t crash, a web page cannot be too slow, etc.
- The Human Oracle. The tester or observer uses their own knowledge and understanding as well as the knowledge of others around them to decide if the behaviour they are observing is correct or not.
While doing exploratory testing, our evaluation of the product comes from our general expectations and product-specific knowledge that we have. We use the human oracle a lot and we also use specified oracles as much as possible.
In automation, however, our ability to write good automated tests depends on our ability to choose and to use good test oracles, since we need to detect programmatically if the product passed or failed the test. Here are some examples of oracles that are addressed in the BBST courses that can be extremely useful when writing automated tests.
A “regression” is a problem in the product that used to work as expected in a previous version, and thus regression TESTING often involves re-running the same set of tests from one version to the other. This type of testing is often automated since the suite of tests tends to grow larger with each new version of the product. However, regression testing doesn’t necessarily use the regression ORACLE. In most cases, testers use the human oracle alongside specified (e.g. assertions) and other derived oracles (like textual documentation) when they are designing the tests that will be included in the regression suite.
When doing regression testing and using the regression ORACLE, the previous version of the product is the reference. Even though we might not know how the product should behave in a certain case and even though we don’t have a specified oracle to help us, we know it should work exactly the same way as it did before, in a previous version. If we have a set of tests that we know have passed on version x of the product, we know they should also pass on version x + 1, so we can use the results from version x as an oracle version x + 1.
In typical regression testing suites, we have a set of test scenarios where we do something in the program and assert that this or that has happened afterwards, so we typically define in detail what the expected result is. This is often cumbersome (there are many aspects to verify and assert on to make sure that a page is correctly displayed, for example). Instead, using regression oracle can be very simple as it relies on a “straightforward comparison”, as Cem Kaner describes in the BBST Foundations course.
Regression testing, in general, comes with a disadvantage: if the application changes, then the oracle will give false positives and it will identify as problems aspects that have been changed on purpose. To avoid this, the tests need to be continuously maintained. There are, however, situations when this oracle can prove to be a good option, especially if the comparison between results from different versions is trivial. If that’s not the case, you may be spending a lot of time counteracting the disadvantages, so use it wisely.
Self-verifying data is test data that embeds the expected output of a test. For example, when testing the checkout feature of a webshop, we could use a small database of testing credit cards, where:
- The names of the card owner tell us if the payment with that card will be successful or not and what the error will be, if any (e.g.: First names: “Insufficient Funds”, Last name: “Declined”).
- The first digits of the card number tell us the expected error code: (e.g.: Card number: 431223456906789, expected error code for Declined is “431”)
An automated test would pick a credit card from the list of available ones, try to use it for payment, then check the result against the information contained in the credit card number and first and last name to see if it’s correct or not.
Self-verifying data is great in automation because it helps create better test data and stronger asserts.
State Model Oracle
This is one of the oracles that are more and more popular these days as people start to see its value in testing if your system can be treated as a finite state machine. If your system is more complex than that, you may consider using statecharts instead.
Once you create a model of the SUT’s behaviour, you can use it to test it
How does this work?
You create a diagram or matrix with the states the system can take and all the transitions to those states. Let’s assume we want to test an access control system based on access cards. I will oversimplify the example to keep it short and easy to follow. We have a card reader and a control panel that either keeps the door locked or unlocks it if it can find the card code in the access control list.
The states for the access control systems are:
- LOCKED – default state, it will not allow the door to be opened
- UNLOCKED – for this, we need a card that has a code registered in the access control list. This state will transition back into LOCKED after a number of seconds
- FORCE LOCKED – during the weekend, no card should be able to open the door, even if the card is registered in the access control list
- FORCE UNLOCKED – in case of fire or other major incidents, the door should be unlocked automatically and not return into LOCKED state unless the emergency has ended
The other component of this model is the transition. It reflects the change from one state to another. One example is regular_unlock, that changes the state of the system from LOCKED to UNLOCKED.
Here’s an example of how you can model this system:
Now that you have the model, you know what you expect from the system in terms of transitions from one state to another. You can either test the system in an exploratory way, or you can create automated tests. If you use an algorithm that implements Markov chains, you can go through the graph and let it decide based on the assigned probability which transition it will take from one state to another. By adding this randomness, you can test a lot of scenarios/paths that would be hard to follow/script and run otherwise.
You can have two objectives:
- focus on the graph coverage by going through all the states and transitions.
- test the reliability of the system by running the tests for long time intervals.
The BBST Foundations course also discusses the idea that all of these oracles are incomplete. This is, perhaps, one of the most important concepts to keep in mind when using oracles in testing in general and in automation in particular:
All of the oracles that we use in testing only pay attention to a part of the result.
The regression oracle tells us how the program used to behave in version x, but knows nothing about all the new features that we have introduced in version x + 1 or about the bugs that we missed in version x.
Self Verifying Data might help us evaluate the output of a function and decide if it’s correct or not, but it tells us nothing about how fast it should return or what parts of the disk it should write the result to.
State Model oracle gives us the reference about how the states of the system and how it should transition from one state to another, but doesn’t give us any information about the performance of the system.
In automation, the trick is to use not just one, but a combination of these programmable partial oracles. One oracle can look at how the product used to function in a previous version, while another will check that it does it in a reasonable amount of time and with a reasonable amount of resources. Another oracle might look at all the calculations the product does and check that they are mathematically correct, while another one might check the behavior of our product in comparison to that of a competing product. Besides these 3 oracles, there’s a list of about 10 more that are presented in the BBST Foundations class and that are all useful in automation.
2. Risk-based testing
My experience with projects and teams that use agile methodologies is that some teams design tests based exclusively on the acceptance criteria and using a narrow understanding. These are abstract, high-level requirements and one phrase can include a multitude of expected and unexpected behaviors. How do you get from an acceptance criteria to a set of powerful tests? One technique to do this is to start with the risks. Think about these questions:
- In what ways can the program fail?
- What could go wrong?
Create a list with these risks and then design tests based on it. This will give you a lot of testing ideas that you probably wouldn’t have come up with if you only considered the acceptance criteria. You could take this idea further than what’s presented in the BBST Test Design course and prioritise the tests and add them to automation testing suites that are organized based on risks. You can run them selectively, based on the risk(s) you want to cover in a particular version of the application.
3. “The most powerful test/value”
I attended a talk once where a developer was discussing ways to improve your unit tests. He showed a unit test for a function that multiplied a number by itself. The unit test gave the function the input 5 and checked that the output was 25. To improve this test, he showed how to use a library that would allow him to test all the numbers from 5 to 1000000. With very little effort, he explained, he already had 1 million tests instead of one.
One of the testers in the room asked: “But why would you test with those 1 million values? They don’t give us any new information, they are just as likely to fail or pass as 5 was.” The developer replied: “Because we can!” and, in that case, he was right: running the test with 1 million values didn’t seem to take any longer than running it with one value. But then, in real life, how often do we have to test a function that takes a number and multiplies it by itself?
In my experience, the automated tests that we have to write on today’s projects are far more complex than the one in the example and take a lot longer to execute. I’ve recently worked on a project where I had to write a BDD scenario describing the first time use for an online shop. The scenario was to visit the shop, choose a product, choose a size, add it to the cart, go to checkout and create an account to pay for the purchase. While designing these scenarios, I had to decide which product to add to cart. The 5th? Is that enough? I couldn’t test all of them, because there were thousands, and the tests would have taken way too long to complete.
The concept of the “power of a test” from domain testing helps with choosing these values. A test is more powerful than another one if it is more likely to discover a bug than the other one. Out of all the possible values for a variable, some of them are just as likely to find problems (they are equivalent) while others are more interesting.
The BBST Test Design and Domain Testing courses teach us that equivalence classes go beyond just the usual partitioning of a domain. We can have risk-based equivalence classes, and these are excellent when trying to prioritize what test data to cover in automated scripts.
4. Code coverage vs test coverage vs other types of coverage
Many programmers and testers work with the concept of code coverage, mostly measured from running unit and integration automated tests. I’ve worked with teams that have hard to attain code coverage goals that they had to meet, and I’ve heard many testers assume that 100% code coverage means that “everything is tested”.
However, the BBST classes work on understanding that 100% structural code coverage does not mean complete testing. Even though every line and branch in the code is covered by one test, it doesn’t mean that we have tested, for example:
- Every possible input value for each variable
- Every possible combination of inputs
- Every possible sequence through the program
- Every possible interrupt, timeout or race condition
- Every possible configuration
- Every possible interference
Also, one common misconception about coverage in general that the BBST courses address is the idea that it’s focused on the code only. Coverage, however, means the extent of testing of a given type that has been completed, compared to the total amount of possible tests of that specific type. We can talk about configuration testing and think of coverage from this point of view: how many configurations have we covered? Or we can think about software requirements, design tests for each of them and measure how much of them we have covered after running some of the tests.
Achieving complete coverage would be equivalent to achieving complete testing but, in practice, that is never achievable. It would mean running 100% of all the possible tests of all the possible types, and that is impossible.
5. Measurement dysfunction
“We have 10000 automated tests”, “Let’s measure the testers’ productivity by the number of bugs they find per sprint” “We can write 10 automated tests per day”. Are any of these statements familiar to you?
People like numbers and they will try to find correlations between them that could ease the effort to draw conclusions and make decisions. But to differentiate between valid measurements, where the correlation between the variables is real and the ones that are a set of coincidences, you may need to understand some key concepts. You need to understand what type of measurements are out there, what are the valid models that make the measurements valid.
BBST Foundations class will help you get a solid understanding of the measurement dysfunctions that are more common than you’d expect in the software world. Once you go through the material and you discuss with your peers and instructors, you will probably have a good understanding of what measurements you can rely on for good decisions and which ones you should take with a grain of salt. You wouldn’ want to have all your team get a false sense of safety and be surprised when users start to report unexpected behaviour.
It’s been almost 10 years since Alberto Savoia’s keynote at GTAC 2011, where he declared that “Test is dead!” – and what he meant by that was that traditional testing was transforming. He was worried that, along with their testers, companies were also getting rid of testing altogether. He saw that as a threat then, and it’s still a threat today: even if we don’t have testers, we still need testing. Savoia said “Test is dead”, and continued with “Don’t take that literally, take it seriously”. Especially if you are in automation, developing your testing skills is the key to avoiding that threat.
While this article lists some ideas and concepts that can help you towards developing these skills, the only way to really improve is by practicing them. The BBST courses offer this chance to practice them in-depth in a safe environment and to get valuable feedback from experienced instructors. Join us for the next one!