One of the most critical aspects of performing automated end-to-end testing is test data generation. Various combinations of test data are needed to ensure the system works as expected.
This includes “happy path” scenarios and edge cases. For example, if we test a login page, the test data could contain a mixture of valid and invalid username and password combinations. It could include blank characters, null values, unicode characters and even unique combinations of special characters.
Let’s take a look at some different strategies to efficiently handle test data to ensure it does not become a bottleneck for testing.
Select your test data generation strategy
There are two options for test data generation: manual and automated. There are pros and cons to both approaches.
Manual test data generation gives more control over the inputs used to test the application. Testers can use their critical thinking skills and experiences to carefully come up with test data that helps do both positive and negative testing. These are predefined, static data fed into the automated tests during runtime.
The only downside of having static test data is the values can become stale over some time if not maintained consistently. Also, if the information is changed to test one system, another system using the same set of test data can get affected as well. So you may have scenarios where some tests pass and others fail due to a change in the static test data.
Automated test data generation gives more flexibility and prevents problems due to changes in test data each time you run the test. Test data is generated automatically at runtime without any manual intervention. This approach removes the need for continually maintaining test data.
One of the most significant disadvantages of this approach is that you have less control over inputs used to validate the system, as they change dynamically every time you run the test. If test failures happen, it is hard to make a comparison with previous test runs to figure out which test data combination broke the test. Also, there could be situations where the test data does not have the right number of combinations and may not closely resemble production data.
Which strategy is right for you? It depends on the context of the project and what you are trying to do.
Simulate production test data
Test data needs to mimic the production environment to a certain extent. There is no benefit in having test data that real users won’t use in production.
An excellent approach to ensure the test data is diversified is to take a copy of the production data, obscure sensitive parts, and use that in the QA environment to get the real behavior of the application. This way, the tests are using inputs similar to what real users are doing when they interact with the app.
The whole copying and scrubbing process of test data from production can happen during runtime, or it can happen periodically with the help of batch jobs running in the background.
Use fake test data generators
If you decide to use an automated test data generation strategy, there is no need to write separate utilities to perform each action unless it is essential. Many open-source libraries generate fake data, such as Faker, PHP Faker, Perl Faker, Ruby Faker, Java Faker, jfairy and fakeit. There are also many paid tools and libraries that have more robust test data generation capabilities.
Have strategies to manage test data
Maintenance is the biggest issue with manual test data. This is especially true when the amount of test data grows exponentially with the complexity of the application.
To handle growing test data needs, manage them in Excel files, config files or databases. Pass these files during runtime when running automated tests. Also, have backups of all the test data in case some of them become stale.
One useful approach to prevent problems with test data is to have separate data for different environments, like development, QA and production. Using diverse data sets helps test the system and gives more test coverage.