Ranorex Logo

Evaluation Metrics to Determine AI Model Performance

Test Automation Blog by Ranorex
|
Robot AI being evaluated by a software tester

If you ask anyone working in a technology company what one thing would help them grow faster and change the world, the answer would be data. It is the new currency. To analyze trillions of data sets and find common patterns that would otherwise be hard for humans to recognize, companies are turning to AI.

AI-based systems can make decisions based on these data sets far faster than humans. But how do we know the systems are working correctly and won’t have harmful effects when released to end-users? AI-based systems, similar to other systems, have acceptance criteria in the form of evaluation metrics. These metrics determine whether the performance of an AI model is at an acceptable level.

There are three commonly used evaluation metrics:

  • Accuracy
  • Precision
  • Recall

Before training the AI model, the team collectively decides on acceptable values for these metrics to determine an AI model’s performance.

How to calculate evaluation metrics for an AI model

Here is how to calculate these metrics, where:

  • True positives (TP): The cases where we predicted YES and the actual output was also YES
  • True negatives (TN): The cases where we predicted NO and the actual output was NO
  • False positives (FP): The cases in which we predicted YES and the actual output was NO
  • False negatives (FN): The cases in which we predicted NO and the actual output was YES

Performance example

For example, say we build an AI model to determine if a coffee mug has a crack. Let’s take three coffee mugs and figure out how we evaluate the performance of this AI model.

Coffee mug 1: No crack (Correct prediction: NO)
Coffee mug 2: Has a crack (Correct prediction: YES)
Coffee mug 3: Has a design that looks like a crack but has no actual cracks (Correct prediction: NO)

Our AI model analyzes the above coffee mugs and gives the following predictions:

Coffee mug 1: No crack (Actual output: NO)
Coffee mug 2: Has a crack (Actual output: YES)
Coffee mug 3: Has a crack (Actual output: YES)

In the last case, the coffee mug does not really have a crack, but its design looked like a crack and confused the AI model into giving the incorrect output.

Let’s apply the evaluation metrics in this example:

Before starting the AI model training, the team would have already decided on the acceptable value for each of these metrics.

Say the team decided that accuracy should be >90%, precision should be >90% and recall should be >85%. Then the AI model has not met two of the three acceptance criteria.

Conclusion

There are other evaluation metrics to determine AI model performance, such as a receiver operating characteristic curve, the area under that curve and the F-score. It all depends on the type of AI model used, such as regression, classification, clustering or something else.

Using AI doesn’t mean you simply feed data into a model and then have to accept whatever results come out. Testers can indeed determine whether the AI model is working as expected, ensuring they do not have surprise consequences when end-users use the system.

In This Article

Sign up for our newsletter

Share this article

Related Articles

Automated-UI-testing-guide-blog-image

Automated UI Testing Guide: Tools, Scripts & Cross-Browser Tips

Oct 23, 2025

Take the leap and make the switch to automated UI testing. A product like Ranorex Studio is just what you need to streamline your testing process.

Integration-testing-complete-guide-blog-image

Integration Testing: A Complete Guide for QA Teams

Oct 16, 2025
Even with 100% unit test coverage, critical defects can slip through when components interact in unexpected ways. This is where integration testing becomes essential—validating that APIs, databases, and services work seamlessly together. By catching data mismatches, broken endpoi...
How-to-improve-software-productivity-blog-image

How to Improve Software Delivery Productivity

Sep 18, 2025
Software delivery productivity is all about maximizing the output of high-quality, reliable products within a given timeframe.  Easier said than done. This blog post explores software delivery productivity, what slows it down, and proven ways to improve it across teams and t...