I'm finding that regardless of how solid my tests scripts are, I still have tests that fail (falsely), even though they worked just fine before. I'm not talking about if markup changes (although that is another valid problem too), I'm talking about every time I run through my list of tests using NUnit, 10% of the tests will fail falsely.

I simply wanted to know if others have this issue or wanted to share details.

4 Answers
4

I think it's difficult to make through-the-UI tests reliable. The challenge comes down to the difficulty of reliably controlling and observing the variables that matter to your tests. Whether this is worthwhile depends on your ability to make your test code more resilient, and on the value of being able to run the tests automatically.

Asynchrony. For web applications, a common source of trouble is asynchrony. You take an action, and at some later time the app produces a result. And the delay can vary. How do you account for the varying delay?

Fixed waits. Some people do this by naively adding fixed waits into their test code. Click the button, wait a fixed amount of time, and check whether the result is displayed. This is troublesome because one day you'll run the test and the fixed wait won't be long enough. So you increase the duration of the fixed wait. Over time, this makes your tests more reliable, but very slow. Don't use fixed waits unless you absolutely can't use either of the following two techniques.

Latch. A better technique is to have the application use a latch, a variable that indicates whether some asynchronous operation has completed. When the operation starts, the application sets the latch variable to false. When the operation completes, the application sets the latch variable to true. Sometimes your tests can inject code into the application to implement the latches. Your injected code registers to be notified when the operation starts and stops, and sets the value of the latch accordingly. Then write your test code to watch for the latch variable to indicate that the operation is complete.

Polling. Another technique is for your test code to poll for the desired result. You click the button, then repeatedly check whether the result is displayed, pausing briefly in between checks. You put a maximum polling duration on your polling code, so that if the item isn't displayed within, say, 30 seconds, you fail the test. I usually end up creating a DSL to hide the details of the polling code and make it expressive, like this:

assertThat(submitButton, eventually(), is(present()));

Identifying elements. A second common challenge is how to identify the HTML/XML elements you want to observe and manipulate.

Meaningful ids and classes. It's best if the application is designed to make it easy to identify elements, by adding well-named id attributes to unique elements, and well-named class attributes for repeated structures (like lists or tables). These attributes make it easier to refer directly to the elements you're interested in. If your application doesn't have these, befriend a developer and beg for them. Otherwise, you'll have to identify elements by their relationships to other elements, which makes your tests depend on the current structure of the markup code. When the structure changes, your locators become invalid.

Simplify locators. A related challenge is to write locators that are amenable to changes in the markup. I often see people using inspection tools (Selenium IDE, Firebug, etc) to generate locators. Often the monstrous locators created by these tools include the complete path to the element, including indexes. If you use these locators, your tests become dependent on that full path. If the structure changes, your locators become invalid. It's better to examine each tool-generated locator to see if you can pick out the essential parts, then create a locator that relies on only those essential parts. (Then get a developer to add meaningful id and class attributes.)

I +1'd Dale's answer, however I wanted to add a few additional things.

Dale is absolutely correct that a large part of the fragility of automated UI tests comes from timing issues and can often be fixed by using polling (referred to as implicit waits in Selenium) or latches.

I also agree with his assertions on identifying elements and depending on the project and how much code churn there is this can make a huge difference. One additional point I would like to make is that if elements and pages are changing, then it can sometimes be considered a "flaky" test, but it also often points to automated tests that need to be refactored or removed anyways due to the changes. Another question to ask here is why did the element or element's identifier change? Was this necessary for the functional changes being made to the web page? Often, there is some developer education necessary to help them understand how these seemingly minor changes can have a large impact on the automation and can many times be avoided.

Other possibilities for flakiness can include intermittent issues in your test environment and product under test. It is VERY important to treat these differently than test flakiness as this can point to serious issues that should be addressed in the product or environment. Many times I have seen automation failures get lumped into "flaky automation" that were actually discovering bugs in the product that could have been avoided.

One technique that I like for identifying "flaky" test cases is what I have referred to as "soak testing" where you take your test suite and run it over and over again (100 times is a common goal) against the same environment and same product build (to rule out product or environment changes as causes of failure). There will be 3 possible results per test case: 100% pass, this is a stable test case; 100% fail, this is also a stable test case, and can point to either a product bug or automation defect; any other % of pass/fail, this points to an unreliable test OR unreliable product feature or environment and should be investigated.

Of the unstable tests the best scenario would be a test that passes/fails inconsistently but always fails in the same place when it does fail. This is easy to track down. You will sometimes find that tests will actually fail in multiple places and these will require a bit more work to track down and fix the root cause, but this is WELL worth the effort in my opinion.

NONONONONO. All above posters have it wrong. Polling and latching and what not do not account for INHERENT RANDOMNESS or SELENIUM BUGINESS . Why does a test fail and turn red in Jenkins or some other automated build? Here are the past 5 selenium failures on our build

An element doesn't have a type="file". Selenium chokes sending input keys to it for file upload but manually, there is no problem when using the application manually.

Network latency/throughput. Backbone congested? Internet is just slow? These aren't bugs. This is just the application running on the Internet.

Mysterious manually unreproducible application redirect errors that only happen when the driver version of the browser is used.

Someone added text-transform:uppercase to css to change capitalization on a label and now the build is blocked. hurray

Element polling doesn't work over the Internet. Want to wait for something to go away before your test proceeds? The element already went away by the time the driver received your next selenium command, which coincidentally is a command to wait for something to happen. Asynchrony issues will be there no matter what approach you use. This is probably the worst Selenium bug of all. By the time you start waiting for a condition, your conditions are invalid and your wait is invalid. This is absolutely a bug in selenium. It's as if they built a car that by nature, crashes randomly. RemoteWebDriver can't handle these asynchrony issues

Having worked with selenium for a year now with thoughtworks consulting guidance and sauce labs support, I can tell you, it's a clusterYouKnowWhat. these problems aren't solved. So OP, the answer to your question at this current point in time is most probably "no".

The file upload issue is not an example of flakiness or randomness. You cannot control a native file browser via JS.
–
user246Jun 18 '14 at 1:13

2

Your answer feels a bit like a rant, because it is out of your control. In case 4. your developers should update the tests after they change something, waiting for a fail is just bad practice and has nothing to-do with Selenium itself. Also finding a label by its text sounds like a bad practice. Give it an ID or someway to identify it better.
–
Niels van ReijmersdalJun 18 '14 at 6:57

Case 3, who put a non working test in your suite? You do test upgrades, not? With SauceLabs you can even keep testing against a working version of the browsers. Yes, sometimes Selenium and browser upgrades break the testsuite, but you can workaround this with a bit of process and not using the latest versions unless necessary.
–
Niels van ReijmersdalJun 18 '14 at 7:06