How to perform testing of AI/ML-based systems in DesignWise

Learn which phases of the AI development lifecycle benefit the most from DesignWise optimization.

Note: the article assumes a fairly advanced understanding of DesignWise terms & concepts.

Artificial Intelligence is transforming the technology landscape of the digital age. The world is moving towards the adoption of AI-powered smart systems which will increase exponentially over the next few years. While we see the advancements, the key challenge would be the testing of Artificial Intelligence/Machine Learning (AI/ML)-based systems.

There are 3 major challenges in testing AI systems:

DesignWise cannot do much about the first one, so we will talk primarily about the benefits related to availability and quality of data. After all, 80% of a scientist’s time is spent preparing the training dataset.

We will use the phase classification from Forbes:

DesignWise applicability to QA in different phases of AI development

  AI algorithm itself

Low

  Hyperparameter configuration

Low

  Training, validation, test data

Medium

  Integration of the AI system with other workflow elements

High

 

The rest of the article covers phases 2-4 in more detail. Regarding phase 1, significant customization of the algorithm code is not as prominent and, to borrow the quote from Ron Schmelzer, “There’s just one way to do the math!”, so the core value proposition of DesignWise to explore possible combinations is not as relevant (i.e., low applicability due to the “linear” nature of operations).

Phase 2

The general idea is to include each hyperparameter in the DesignWise model, breaking down the value lists based on the thresholds derived from theory or practical experience.

Reference

The specific ranges and value expansions on the screenshot are for example purposes but should sufficiently communicate the “identity” of the approach. Further, constraints and risk-based algorithm settings can be used to control the desired interactions:

Or you could use the 4-way setting to get the full scope of possible combinations.

Strength: Systematic approach to identifying relevant hyperparameter configuration profiles.

Weakness: May explore the profiles with too many changes at a time or require numerous constraints to limit the scope.

Phase 3

Robo-advisors are a popular application of AI/ML systems in finance. They use online questionnaires that obtain information about the clients’ degree of risk-aversion, financial status, and desired return on investment. For this example, we will use Fidelity GO.

To build the corresponding model in DesignWise, you will need to forget (temporarily) some of the lessons about parameter & value definitions given different objectives. Instead of optimizing the scenario count, the goal of this data set is to become a representative sample of the real world and eliminate as much human bias as possible. This means not just data quality, but also completeness.

Such a model would not only include all parameters regardless of the impact on the business outcome but also utilize lengthy, highly detailed value lists (often more than 10 per parameter). To distinguish between the review and the “consumption” formats, value names or value expansions can be adjusted accordingly (i.e., value name can be “sell some” for communication to stakeholders while the expansion can be “3” given the data encoding).

When it comes to the DesignWise algorithm strength selection, the highest available option is typically the most desired one (see the caveat in the “Weakness” below):

When this approach is used for generating the validation + test data sets, the DesignWise Analysis capabilities (in addition to standard statistical methods) can be used to evaluate the diversity of the split:
Strength: data sets are intelligently built to test all relevant permutations and combinations to deduce the efficiency of trained models while minimizing bias. Further, the regeneration of such data sets is much faster and easier.

Weakness:

The current scope limitation is 4000 scenarios per DesignWise model which may not be sufficient for training or even validation purposes of some AI systems.

As a side note, while “all possible permutations” is a nice goal, it is often not the optimal one – even for representative purposes, having 289,700,167,680,000 scenarios (which is the possible total for the model above) will not be realistic to perform training on. So, the “right” answer still requires balance and prioritization.

Despite certain workarounds, programmatic handling of complex expected results would likely require complementary manual effort.

The approach depends on the overall ability to leverage synthetic data instead of production copies which may or may not be feasible in your environment.

Phase 4

This phase is the closest to DesignWise’s “bread and butter”. The model would serve a dual purpose – 1) smoke testing of the AI; 2) integration testing of how it is operationalized.
Given the execution setup, you would likely have to keep all the factors consumed by the AI system, but, for this phase, reduce the number of values based on the importance (both business- and algorithm-wise).

Scenario volume would still be largely driven by the “standard” integration priorities (i.e., key parameters affecting multiple systems) but the number of values and/or the average mixed-strength dropdown selection would be higher than typical.

Focusing on the “just right” level of detail for the high-significance factors will guarantee the optimal dataset for sustainable AI testing.

Strength:

DesignWise at its best with the thoroughness, speed, and efficiency benefits.

Ability to quickly reuse model elements from Phase 3 and plans related to other systems (e.g., the old version of the non-AI advisor for systems B and C [link to E2E Finance]).

Higher control over the variety of data at the integration points and over the workflow as a whole.

Weakness: Similar to Phase 3 but usually more manageable given the difference in goals (volume in P3 vs integration in P4).

Conclusion

To summarize, the applicability level by phase is repeated below:

  AI algorithm itself

Low

  Hyperparameter configuration

Low

  Training, validation, test data

Medium

  Integration of the AI system with other workflow elements

High

 

For another perspective, using this stage classification from Infosys, DesignWise can deliver the most significant benefits in the highlighted testing areas:

Given the typical scale of AI projects, the combinations of possible inputs and outputs will be almost indefinitely high. Moreover, the techniques used to implement self-learning elements are very complex.

Therefore, fully testing these kinds of applications would not be feasible. To overcome this challenge, we need to think more critically about a systematic, risk-based test design approach, such as the one that DesignWise facilitates.