We're Pioneering the Path to AAA Webpage Testing

A/B Testing is great, but what if you had a better way to determine the true likeliness of a webpage? Well, we're trail blazing that technology.

Posted by Myles Golden on Dec 8, 2016

One of the benefits of A/B Testing is to always be gathering data and experimenting. Most modern technology companies have figured out that data is really their actual product. Whether it's a service being sold, a product, or content, what companies really are doing is creating value for their customer base. Every interaction the customers make with the brand has a measurable amount of value.

Even the best data-driven businesses don’t just passively collect and analyze data, they're actively generating actionable data by running experiments. The secret to getting value from data is testing, and if you’re looking to grow your online business, implementing well-executed, consistent UI testing is a necessity.

What is AAA Testing?

Let's begin by clearing the air. AAA (Triple A) Testing is a variation of testing that performs multiple iterations controlled by machine learning and user interaction. In short, Triple A testing can be thought of as "Individual Iterative A/B Testing". Triple A is different then A/B or Multi-armed bandit testing because it shifts traffic based on demographics first and not necessarily performance. However, AAA can perform both normal A/B and Multi-armed bandit tests. In many ways, Triple A can be viewed as a superset of both of those testing mechanisms.

What Does A-A-A (AAA) in Triple A Testing stand for?

Triple A Testing in short stands for: Automated Analytical Assessment Testing (a concept developed by our own CTO Scott Wyatt).

Automated - The Series of Tests are automatically ran and a TRUE winner is determined.
Analytical - These Series of Tests have individual analytics and decisions are made based off of them.
Assessment - Machine Learning helps determine what is the best variation(s) given at least two variables and automatically implements these changes for a better end user experience.

The name AAA itself is a play on A/B but is still appropriate in this context. However, it can be a little deceiving because it’s not 3 tests that are being ran all at the same time, it can be hundreds of tests being ran all at the same time or just one.

How does AAA Testing Differentiate between A/B Testing?

Generally speaking when you A/B test, you're testing for a single variable on a page, and finding out what version performs better in general. With AAA testing you can create multiple tests at once and hide variations from multiple demographics all at the same time. The outcome is the best experience for the individual audience vs the best experience for the masses.

What Makes AAA Testing better than A/B Testing?

With Triple A Testing, you can do multiple variations across an audience. What makes AAA testing better than A/B Testing, is that you are solving to find out what is most positive vs what is the least negative. Add in some Machine Learning behind that and Triple A can resolve this automatically and implement the next series of Automated Tests.

Why Would Someone Want to Implement AAA Testing?

Some huge problems with A/B testing are that only one test can be ran at a time and that the human element is tedious. A/B Testing requires that Marketers and UX specialist have easy access to creating variations of views and running tests. Then they need to review the results, and make a distinction of which one is better by scanning through tons of data. Then they go back to the drawing board and begin re-testing with a new iteration.

Machines should be able to make the educated decisions for us when it comes to implementing the testing, which is where Multi-armed Bandit testing is an alternative. However, neither of these testing mechanisms can understand patterns of individuals vs the masses. Triple A Testing solves these problems by focusing on what makes users unique vs what makes users the same and delivers appropriate results automatically. This saves time as well as boosts conversions.

How Does AAA Testing Handle all the Variables?

Where AAA Testing becomes powerful is being able to automate between different variable tests and appeal to a specific demographic. This makes UI testing purely iterative and personable.

The reality of UI testing is that while a marketer may believe they are testing only one variable, they are actually testing many. A simple test, for example a button color, becomes an issue when it can't account for the audience in a scope.

For example, a Red "Buy Now" button may be less appealing in the morning, but may beat a Green "Buy Now" button overall. In A/B testing or Multi-armed bandit testing, a result will always win, but there is no true reason why. A winner is implied in the test, making the test useful overall, but limited by the factor that there has to be a winner and it can never be the "best" winner.

On the other hand Triple A testing embraces the unknown and allows for multiple winners in different areas. As not all variables can ever be accounted for, Triple A testing ignores what it can't understand and tests only what it can. From the above example of button color and time of day, AAA testing now has two clear variables. It can test the color at different times and then continue with the best result for that variable in respect to the time of day.

How does it actually work?

Triple A Testing utilizes a Series Document Model giving different states within an architecture. AAA testing can handle hundreds of different series test at once for each view in a web app and it can do it automatically.

Want to find the best UI/UX for a time of day and show it at the right time to the right demographic?
Want to find the best layouts for males or females or trans and show it to the corresponding audience automatically?

AAA Testing Terms in Action

Proxy Engine

Proxy Engine is a technology created by Cali Style Technologies that implements AAA Testing. The following is a definition of it's terms as an example.

Demographics

Demographics is a generic term because they can literally be anything. For example Let's say we have a site that's homepage needs some simple testing. We have no information about our user, so we can classify them as "Unknown" which is the default demographic. Now, we can set up two series: a0 and b0 and split visits equally with a weight of 50. This gives us the minimum two required variables to perform a AAA test: "Unknown" and "Variation". When our user visits the home page, Proxy Router will send them a0 or b0 and track the view that was run. Now when the user does something on the home page which should issue a Positive or Negative control to reinforce the test.

Scores and Positive/Negative Controls

When a user does something we don't like on the page, we want to send a negative control back to Proxy Router. For example, if they leave the website, then we might send a negative control. That said, if the user was visiting the a0 series of the homepage, then a0 would get a reduction in it's overall score. If the user does something that we like on the page, for example clicks "Buy Now", then we might want to send a positive control that increases the score of a0. We continue this process of adding and deducting until the Baseline and Threshold is met.

Baseline and Threshold

Every Page has a baseline. The baseline is the minimal times the page can be viewed before the threshold comes into effect. Imagine it as a survey, where you want 1000 people to take the survey before you review the results. From the previous example between a0 and b0, let's say a 1000 people visit the home page. After that 1000 people have visited, we should have some decent scores from a0 and b0 for example a0 scored 0.89 and b0 score 0.70. Proxy Route now examines the threshold and will predict that a0 is more productive then b0. If we set the threshold to .90, then we will stop testing between a0 and b0 when a0 reaches 0.90 and begin serving only a0 for the "unknown" demographic.

Scoring

A max series score is 1.0 and a min series score is 0.0. This score is the result of positive/negative scores from some machine learning (nothing too fancy) and user interactions. When a user clicks on let's say a button, we can issue a click event with a score between 1 and 100 based on how important that is too us. For example, a page link will maybe issue a score of 1, while a "Buy Now" may issue a score of 100. Proxy Router will take that event score and compare it to the previous score, runs of the series, the threshold of the page, and the weight of series distribution.

This isn't much different than a Multi-armed bandit test, so let's do a more complex test.

Let's say we have a homepage that has a hero banner depicting some models interacting with a product. For this example, we'll set up three series: a0's banner image are female models, b0's are male models, and c0's are mixed gender. Now, imagine that we have a following on Instagram that is mostly female, on Facebook we have mostly male followers, and on Twitter we have a mix of genders. We place an advertisement on each of these networks that links to the homepage. It's reasonable to believe that each audience will respond differently to each of our variations and we have the minimum required 2 variables for a AAA test: "Audience", "Banner Image".

It is now possible that one, two, or even all three of these banner images will be relevant to our audience. For this example, let's assume that they are and after a 1000 views, Instagram loves a0, Facebook loves b0, and Twitter loves c0. If this were A/B or Multi-armed bandit testing, we would have just ignored 2/3 of our users and missed out on 2/3 of the possible conversions. Obviously we are making some serious assumptions in this test, but it is probable. You'll have to run a similar test to find out for yourself!

Let's Triple A Test Your Next Project

Drop us a line and let's talk about your next project and how we can incorporate the latest Triple A web design trends into your project. It's as easy as clicking the icon on the bottom right of your screen and starting a conversation.

Tell us about your project

Do you have a project you think we will love? Then please complete our short project contact form.

Start your project