Organizations sometimes (too often!) decide, “our HTML isn’t really testable.” What they mean by this is generally a combination of, “an embarrassing error slipped through our test process once”, “a senior manager approved all the screenshots, but then complained about how the site looks on his or her Blackberry/iPad/Internet Explorer 9/…”, “HTML isn’t a real programming language, so it must not be testable, right?”, and “The pixels keep moving.”
These are all real business situations. The best response to them, though, is not resignation to a conclusion that HTML (HyperText Markup Language) is beyond testing; instead, this article describes six specific actions a development team can take to bring its HTML under better control.
The “softest” part of HTML testability has more to do with attitude than code-and-tools: it is a commitment to the possibility of HTML testing, perseverance to make the on-going effort HTML requires, and clarity about goals. Consider the simplest possible example: a document such as:
<html> <body> <h1>An Example</h1> <p>This is only a test.</p> </body> </html>
It’s crucial that the organization agree on what “test” means for such an example. At one extreme, a successful test requires that all pixels on a certain screen rendered by a specific browser match a specific design document from a defined organization process. At the other, “test” might be as relaxed as, “a human looked at the rendered screen and agreed that it’s readable.” A significant portion of the confusion around HTML testing results from a misapplication of tools or techniques appropriate for a different target or situation or criterion. Successful organizations must be clear about where they want their projects to arrive.
Clarity about goals isn’t enough by itself, though. Like any skill worth learning—bicycling, budgeting, employee retention, and so on—testing HTML is difficult enough that the first few trials are likely to have problems. Make a clear goal, then keep that goal in sight; observe ways testing fails, and correct them. Don’t expect perfection immediately, but do expect to approach perfection steadily.
Sometimes it’s best to approach goals incrementally. A goal of a totally-automated “continuous testing” system might be appropriate for a particular organization. At the same time, prototyping early versions of the system with a mix of automation and “manual” testing is likely to be more productive than deferring all testing until the ideal automation is in place.
Range of techniques
With a business-like attitude properly in place, what’s next? What specific tests should the organization develop first?
Look at history, that is, the business’s experience with its own customers. What HTML defects did your customers have to find for you? Did content in non-European languages display incorrectly? That might be an error elsewhere than in HTML; check with your developers about how best to correct it. Is the variation between renderings of different browsers too great? Static analysis might quickly and cheaply fix such a problem.
What is “static analysis” for HTML, and how does it address browser skew? Briefly, static analysis focuses on the syntactic correctness of an HTML instance. Browsers are notoriously tolerant of minor mistakes in HTML. For example, browsers generally interpret the first block text below as the second, which is what was likely intended:
HTML as written
<p>First paragraph <p>Second paragraph</li>
HTML as interpreted by a browser
<p>First paragraph</p> <p>Second paragraph</p>
These blemishes sometimes accumulate to the point of confusing browsers, though, and in ways that vary between browsers. One inexpensive way to flush out an abundance of small inconsistencies is to scan HTML for syntactic correctness.
Several no- or low-cost tools are available for such scans, including the W3C Markup Validation Service from the organization responsible for the standards behind the World Wide Web. Experiment with a few of these to see which best your organization’s workflows, sanitize all your HTML so it conforms to standards, and see how few problems remain in your HTML afterward.
HTML syntactic validation is relatively “lightweight”: low in dollar cost, and readily adaptable to many workflows. On the other end of HTML testing are techniques which compare rendered Web pages to reference “snapshots”. These techniques tend to combine fragility, expense, and difficulty in various ratios. For example, assume that a test standard looks like the first box below:
This is only a test.
This is only a test.
This is only a test.
Is a variation in font or text color a match? Useful answers take the organization back to the first step above: clarity of goal. It’s possible to construct tests for every combination of the images above; it’s a business decision, though, not a technical one, whether details of font and color deserve to be part of an HTML testing program. Do customers notice or care when pixels are slightly different? Does the organization’s brand and impact depend more on the details of visual design or content? Both?
One of the surest ways to improve quality in software is to reduce line count and complexity. Among the best ways to simplify HTML is to reduce its scope:
- As much as possible, style with CSS (Cascading Style Sheets) rather than HTML. The resulting HTML is simpler, and therefore easier to test
- Take advantage of any templating or pre-processing facilities appropriate to the organization’s overall software technology to reduce the size and complexity of HTML even more.
The previous paragraph promoted simple HTML as advantageous and generally more testable. One prominent exception to this generalization exists, though: it’s often valuable that the individual tags of an HTML source have IDs. Compare the HTML which appeared at the beginning of this article with this variant:
<html> <body> <h1>An Example</h1> <p id = ‘p1’>This is only a test.</p> </body> </html>
The two samples:
- look exactly the same when rendered in a conventional browser;
- the second one is undeniably more complex than the first, as the only difference between them is the addition of the id attribute; and
- the second one gives many testing tools the ability to reference the content easily with the ID p1.
These IDs are so valuable with certain testing tools that it can pay off to ID tags automatically. A scripting tool, for instance, generally has a way to express, “push the button with the id submit1” or “select the radio button IDed pay-one-thousand” as part of a sequence of actions which simulate end-user behavior.
While the payoff from IDs depends on the particular testing tools in use, it’s such an important exception to the usual desirability of simplicity that development teams need to talk about it ahead of time. It’s far better to build in IDs strategically, from the beginning of a project, than to have to do “surgery” only when a project reaches the Testing Department near the end of its span.
Finally, recognize that HTML testing might well require a combination of tools and workflows. A development team might rely on a static analyzer for automated verifications, along with fine-grained human inspection supported by, for instance, IE Tab. These two kinds of tests complement each other and generally isolate different kinds of HTML errors. Meanwhile, the QA department can use a tool like Ranorex Studio to validate the entire user interface.
Individual HTML tests also can require that multiple tools work together. Many organizations don’t write standard HTML; instead, their sources—even when named something like front_page.html—are actually HTML templates. Popular Web-building technologies including Flask, Rails, Velocity, and so on, take this approach. A basic static analysis tool expects to analyze HTML and invariably produces noisy results with HTML templates. In a case like this, the best results come from construction of a short pipeline: one automation or script renders the HTML template as pure HTML, which is then delivered to the static analyzer for a more useful correctness report.
HTML has the reputation of being a simple language, and organizations too often conclude it’s not worth testing or even cannot be tested. HTML goes wrong in enough ways, though, that every organization ought to analyze the business case for its specific use of HTML and build an explicit plan to test at least one aspect of its HTML sources. Make the HTML simple by restricting it to the roles at which HTML is best, but give tags IDs where appropriate. Well-designed testing not only helps deliver trustworthy results to current readers or customers but also makes sources more consistent and thus easier to maintain over the life of the application or content.