As the need for coverage grows and calls for more overall tests, it makes sense to try to find ways to make tests that can address a variety of scenarios. It’s also a good idea to make sure that the need to update the tests or make radical changes is minimal. To that end, data-driven testing is a great option that fits into a variety of testing frameworks.
However, while data-driven testing is an excellent tool to have in our arsenal, it’s possible to have too much of a good thing. Over the past few months, I’ve been considering the way many organizations — mine included — work with data-driven testing, and I’ve come to the conclusion that perhaps we’re going about this the wrong way.
Defining data-driven testing
First off, let’s define what data-driven testing is. In a nutshell, it’s taking a test or series of tests and feeding it varying data values from a data source, such as a spreadsheet, text file or database query. From there, we can run the tests using those data values and perform comparisons to see if the data values help us see the values we’re expecting to see. From there, we could also use parameterization, where we create additional tests (or, more accurately, different examples of the same tests) by adding new lines of data.
The upside to this is that it is easy to add new tests by creating additional rows of data to process: one run for each row. By entering a new row of data that matches the variables we have chosen to work with, we can extend the tests as far as we want.
The downside is that this can make for lengthy test runs. Additionally, if the data entered is all the same format (i.e., correct, or what the application expects to see in those areas), then it’s not likely the tests are going to tell me anything interesting besides “OK, the system recognizes this type of data.”
With that in mind, what are some issues we may face, and how might we resolve them?
Creating wasteful tests
While data-driven testing can be automated to run a test multiple times with a variety of data, I often find that I create tests that are needlessly long and repetitive. As an example, I’ll share one of the most common interactions that might benefit from a data-driven test: user login. I can set up a test to log into a system, providing a variety of usernames and passwords. I could, potentially, log every user into the system to verify that everyone can log in, but that’s not really useful — or, should I say, not really useful beyond a few examples.
There are four examples that make sense to test a login process: the correct login, and the three examples of an incorrect login — wrong username, wrong password or a combination of both. Running this test for every user is wasteful and time-consuming, and it doesn’t provide additional information. If, however, the purpose of the test is to see if the system will crash when handling hundreds or thousands of logins in steady succession, then this could be a valuable test. It’s not one I’d want to run every single time I push a merge request or try to deploy a branch to a device, but it would be helpful at times.
Alternative method: testing the happy path and the errors
By setting up deliberate errors in my data (meaning the values I provide are a mix of valid and invalid input types), I can use a variety of data elements to examine the path and also check that my error-handling code is working. A quick win when testing software is to see what error-handling is in place and what triggers it. This lets me set up my tests to verify if those errors can actually be raised based on the data that is being provided. With one test containing multiple permutations with good and bad data, the net result is a lot of potential coverage and less need to write multiple test cases.
Relying on static data files
As I create data-driven tests, I start with a simple format such as a .csv file to keep my test data in a single place. The positives are that there is a known safe place for data and a specific format that is expected to be used and maintained. Adding new data is as simple as adding another row with data values. However, over time, if these data files get exceptionally large or there’s a new variable that we want to add to our tests, then it can be time-consuming to add a new column and populate those new values.
Alternative method: using a dedicated database
By contrast, with a database that can be queried, I can get the specific values in a similar format and feed those values to my script. Over time, as I add new variables, those values can likewise be queried from the database and used.
Not having a known or reliable data source
The example I just gave of using a database query to get the values to populate our tests is cleaner than maintaining data files individually, but it has its own downside. The database itself can be updated while tests are running, and such tests could change values in the database and cause issues of unreliable data.
Alternative method: creating a second database
A method to resolve this is having a cloned database that is never directly accessed by the application. Additionally, I could create a .csv file by setting up a query to retrieve the values important for my tests, and subsequently destroy the file after the tests complete. This way, my data source starts with a known state, and even if it’s modified during a test run, it can be easily regenerated with our known values. It’s also possible to have two databases, one being the specific test data values to be used and another to be the active database in use by the application. By querying the first database and getting the values needed, I can use them to populate our target database and examine those values to see if they are correct — or to raise errors if the values are not.
Data-driven testing can be useful to examine an application with a variety of data points, but there are challenges and pitfalls to the approach. Using the alternative methods outlined above can help make sure we have as good a chance to leverage the benefits of data-driven testing as possible.