Usability Hub tests review
- 5s tests
- 1 – The answer is in the question or introduction context (20%) :
- 2 – Ask the user to reflect on their behavior or exploration process (7%).
- 3 – The test asks end users to answer questions that should really be directed at experts (4%).
- 4 – Task is unclear, inapropriate, confusing or too long and distracts the user from looking at the design during the 5s (3%)
- 5 – The question is biased towards a positive answer (2.5%)
- Click tests review
- 1- Confusing choice (10%)
- 2- Minimal difference between designs (10%)
- Nav flow
- Over all test quality
- How can we increase the quality of these tests ?
- Platform dependent issues : un-complicated should not be over-simplified
Many platforms allow cheap and easy access to remote usability testing. Solutions like optimal workshop, usability hub, usertesting.com allow anyone to quickly set up a test and propose services from proposing user panels and recruiting participants to fully designing tests and providing consolidated reports.
Aside from the biaises introduced by remote usability testing, one concern lies in the possibility for anyone to access the tools for conducting research regardless of their prior training and experience. To verify the quality of tests created on those platforms, regardless the authors, we have passed and analysed biases of 160 random tests on usabilityHUB.
Out of 120+ 5s tests we reviewed, 44% will give inapropriate, not exploitable or inacturate results. The top 5 biaises due to the test’s designs were mostly beginner errors.
1 – The answer is in the question or introduction context (20%) :
Often, the introduction of the test presents a web search context which contains exactly the topic of the site. When asking later what the site was about, even if they don’t want to cheat, the users won’t be able to un-know what the researcher told him before he even started.
The test :
“You are searching the web for baby toys to give as present to your nephew. You find this site…”
“What was the site about ?””
The user :
“A site that sells Baby toys ?”
This can also happen after viewing the test, when the answer to one question was in the previous one.
The test :
“What did you think of the colour of the contact form ?””
The user :
“(There was a contact form ?) No idea, orange?”
The test :
“How can you contact this company ?”
The user :
” Augh, through a form ?”
2 – Ask the user to reflect on their behavior or exploration process (7%).
Many of the questions asked to users after they viewed the design require them to focus on how they visually explored the design. While the user may be able to say what he remembers or looked at for the longest time, it is unlikely to reflect necessarily the first focus point or visual exploration pattern.
3 – The test asks end users to answer questions that should really be directed at experts (4%).
Generally, this means the test asks users to review a specific aspect and give advice on it or imagine what they think will work best to achieve a certain goal. While getting user attitudes may be helpful to design the right solutions, that is better done during focus groups or interviews. End users are not experts. They are not designers, and as such, knowing that they believe a color is too saturated for the design to convey safe baby toys has little value compared to an actual designer’s constructive feedback, taking in account contrast and color impacts on visiblity, accessibility and values that are induced by the design choices.
4 – Task is unclear, inapropriate, confusing or too long and distracts the user from looking at the design during the 5s (3%)
Once the user validates the display of the design, he may have forgotten the question and reads it again instead of looking at the image. As a result, many answers will be guess work or invalid.
The test, before showing the design :
“What do you think of the colour scheme of the page?”
“What was the site about ?”
The user :
5 – The question is biased towards a positive answer (2.5%)
The question contains one or several answer proposals that anyone who has studied psychology know will introduce biases towards positive answers.
The test :
“Blabla is a national chain of businesses offering in-home care services. We tried to make a design that inspires trust and seriousness. Please look at the image and tell us whether you think the page conveys these qualities”
“What did you think when looking at this, did you feel more like it was trustworthy, or though it looked serious?”
The user :
“Whatever you say !”
Click tests review
43% of Click tests will result in inapropriate, not exploitable or inacurate results
Click tests are primarily used for two types of studies :
- 60% were pick your favorite studies. The test introduces two or more designs and asks the user to click on their favorite layout, color scheme or version.
- 40% were task based studies. The test presents the user with a task and asks him to click on the screen as if he were trying to perform it.
The test is definitely meant for the second type, and will give more insights into how to improve a design. Out of this subset of tasks, 30% propose a task that is confusing to users and may lead to a sub-optimal quality of responses.
Pick your favorite tests are a functional way of bending the platform to gather slightly different data. Preference or feelings towards a design is a tricky thing to quantify, and subject to many biases. Two top issues with the test design are
1- Confusing choice (10%)
The test proposes 2 choices, that are composed of 5 images and asks the users to pick their favorite. It isn’t clear to the user whether he has 2 or 5 choices however, and many clicks may not be possible to understand in the way the user meant it.
2- Minimal difference between designs (10%)
The click tests emulates a typical simple A/B test, where the only difference between both designs is the absence of a small link in a menu or the color of a button. The user will play “7 differences” and make a decision based on their findings, which may be incomplete. Their decision is them based on an opinion on what they would do, or what most users would do. As a result, there is no garantee that implementing the winning deisgn will actually have the desired effect in terms of behavior.
Our tests reviews included only 8 Nav Flow tests, so we don’t have enough material to judge the overall quality of tests. 5 of them proposed an appropriate complex tasks, but 4 were solved in only one step and the last one asked the user to perform six steps but ended after the second step. We did not boserve any huge biases in Nav Flow tests though, in any of the 8 observed.
Over all test quality
Globally, about half the tests we reviewed on UsabilityHub suffer from obvious biases that could be avoided even by novice researchers. This seems to confirm that the tests being very easy to create, anyone starts off easily, possibly unaware of the issues their results may suffer, either in terms of quality, or in terms of relevance and analysis complexity.
Nav Flow tests, which are slightly more difficult to create seem to suffer less from obvious beginner biases.
How can we increase the quality of these tests ?
It is great that designers and stakeholders take an interest in testing their concepts. If they are lead with poor care for getting accurate and valuable results, there is a risk however that the errors of an intern or improvised researchers affects the perception of UX research in general, even if only temporarily.
As researchers, we have a role to play in raising these questions and evangelizing to raise the quality of results by providing trainings to non researchers, and sharing the mandatory knowledge. Testing platforms however are highly responsible of the quality of questions proposed on their platforms, by providing appropriate training and explanations or services.
On Usability Hub, the examples provided by the survey creation interface are actually one of the main sources of issues with the tests. The field to introduce the design suggests to give context, which actually ruins the whole point of viewing a design and understanding what it is about, effectively putting the answer in the question, and encouraging users to make mistakes.
In the sample post 5s questions, 3 of the 7 “good questions” examples fall under biased in our analysis :
“Rate the quality of this page between 1 and 5.””
Which is the positive end? Which is the negativ end? Since there is only one scale question, the scale should be larger than 5 to get more reliable answers.
“Which element on the page did you focus on most?””
Users may not be aware where their eyes really focused the most, and their perception of time might be biased, even over 5 seconds. You might ask them simple “What did you see?”
“Did you notice the free shipping offer?”
If the users did not see it, now they can just say yes, or even be convinced they saw it while they didn’t and wouldn’t have mentioned it otherwise.
Platform dependent issues : un-complicated should not be over-simplified
Though this can not be confirmed without performing the same type of review on other platforms, it is possible that the quality of tests depends on the platform on which they are created. Providing adequate training, examples and including in depth explanations of biases and limitations of the tests that can be performed is mandatory to increase the quality of tests made by non-researchers.
Of course, this doesn’t mean the platform itself is bad or remote testing shouldn’t be accessible to larger audiences. Actually I am a firm believer in remote testing benefits, and UsabilityHub does offer good quality services that make user research and user centered design easier to carry out even for tiny projects, small budgets and large scale quantitative evaluations.
I am convinced also that over-simplifying something complex by trying to make it un-complicated is counter productive. Remote testing platforms are still a young thing however, and hopefully, they will get better and better as they mature, also for untrained researchers.