Testing Rocks, Debugging Sucks

— Chapter 6 —

Jan 24, 2024

Text within this block will maintain its original spacing when published

«prev          next»

Author’s Note: This chapter is now freely available to the public. All paywalls have been removed.

There seem to be some foolish souls out there who think that “if I don’t ever test my code, then I will find no bugs”. Their way of doing software development with poor test coverage might help them to meet their deadlines in the short term, but they are most definitely heading for a complete disaster in the long term.

Software constantly changes and evolves with each engineer’s code submission to the code repository. But each of these changes brings a risk. Changes in one class, function, or component can easily break something somewhere else in the code.

The best and pretty much the only way to protect against this risk is software testing.

Without proper testing, these bugs can go unnoticed and accumulate. Then they will start causing problems. If you are lucky, your build is going to break when you’re trying to compile and build your binaries, and then you’re going to spend time debugging the root cause of this build breakage. If you’re particularly unlucky, your server is going to fail at 3 am on a Saturday once you’re on-call, and by then you’re going to have to do your debugging.

Without proper testing, software is inevitably going to decay.

Here is another way of looking at it: Either you do your testing properly at the very beginning as you’re developing your software, or your customers are going to be doing the testing of your software as they use your services or products. Believe me, you want to go with option 1 rather than option 2. Your company’s reputation depends on it.

Never think that it would be a good idea to save time and deliver your product early to impress your customer by skipping or mitigating the testing efforts. Customers prefer using a product that works properly rather than receiving a faulty product delivered a few weeks/months faster.

Manual vs Automated Testing

Most companies have QA teams dedicated to the testing of their products. Some software testing teams do manual testing. If the product is a web service, they go to the newly developed web page and go through the use case scenarios as they enter the necessary fields on the page, click the necessary buttons, and make sure the expected correct results show up on the page. This process is pretty much the same for mobile development, except it is the mobile app that is being tested instead of a web page. If the product is a network system product (e.g. a network router), the testers may have to manually connect its cables, turn on and turn off switches, enter some commands in a separate client terminal, and observe the results on the same terminal.

Manual testing is better than no testing at all. However, the best kind of testing is automated testing. And most - if not all - of these tests can be automated.

There are software tools like Selenium-Webdriver that enable test programs to enter values into fields on a web page, press buttons, and analyze the resulting changes on the page. One can replace the manual testing of a webpage with a fully automated Selenium-Webdriver test suite that runs the same tests on that webpage. Likewise, there are software tools for running automated tests on mobile apps. There are even hardware/firmware tools that can automate the testing of a network router. The network router could be connected to a switch board with a controller that can toggle the connections to the router with software commands, thus capable of automating most testing scenarios like disconnecting cables and turning switches on or off.

Manual testing is a slow and tedious process. It is usually done only before the software product or service is released to production, as the QA test engineers manually run through each of the test scenarios. In some cases, going through the entire test suite can take days. This has the adverse side effect of slowing down the production release frequency. It is hard to release your changes to production once every week or two when the test process can take multiple days. There are teams that could only release their products once every couple of months due to the lengthy and mostly manual release processes.

Automated test suites on the other hand can be run quite fast once initiated, at the speed of a running software. They can be set to run after each code change submission to the code repository, not just right before releasing the entire service to production. These are also known as Automated Regression Tests. Regression tests ensure that the previously written code is still working with the most recent changes. When they are completely automated, they enable the issues to be caught and fixed much earlier. They can also enable the team to release their service to production more frequently, in the order of weeks instead of months. I have even heard of teams that claimed to do production releases once every day or two. This is only possible with automated testing.

Engineers Are Responsible For Writing Tests

Who is ultimately responsible for developing the automated test suite that tests the software? Is it the QA team?

No. It is the software engineers themselves who have this ultimate responsibility.

Software engineers are responsible for the proper test coverage of all the features they implement. For each production code change request they implement, they also need to implement the proper unit and integration test code that goes along with it at the same time. If they do it right, the change request is most likely going to contain more lines of testing code than actual production code. The test code should be reviewed by the peer reviewers of the change request just as carefully as the production code itself. Before the change request is merged with the main development branch (however your code repository is set up), the test code in the change request should be run by the CI/CD pipeline tools, along with all the preexisting tests in the other parts of the codebase that could be affected by this particular change request. This automated regression testing process should be a standard part of every development effort.

Does this mean there is no place for QA teams in an organization? No, I believe that organizations could always use QA teams with particular skill sets and roles, which I will cover below. But first, I need to explain a few important concepts regarding software testing.

Unit Tests vs Integration Tests vs End-to-End System Tests

You can test the functionality of your code using these different types of tests: Unit, Integration, and End-to-End System tests.

Unit Tests

Let’s start with unit tests. These are used to test a single function on a single class. They are the smallest type of test that involves very few moving pieces. Therefore, they are also the fastest to run.

Developing unit tests is usually pretty straightforward. There are established framework libraries for unit testing in each programming language, for example JUnit for Java. In the unit test, you just instantiate the class you want to test, call its methods with the appropriate test data, and see if you are getting the output you expected.

If the class is interacting with another class during the unit test, then things get a little more interesting. You usually use what is called a mock object which mocks that other class in the unit test. Let’s say you want to test a method on class A. But let’s say this class A method implementation ends up making a method call to another class B instance. You want to run your unit test on a single instance of class A without involving the instances of any other classes. What if you use the actual instance of class B in the test, and then class B ends up making yet another method call to an instance of a totally different class C? You would have to instantiate the entire chain of classes involved in this test scenario then. Therefore in this case, when you’re unit testing class A, you use a mock of class B. You can do this by defining a BMock class that inherits from class B, for instance. Then you implement whatever method on BMock that class A is supposed to call, and make sure it returns what class A would expect in that particular unit test. You can hardcode the values returned by the mocked methods in the various test scenarios. So, BMock is not an accurate implementation of the class B, but just its mock. When running the unit test, you pass the instance of BMock to the instance of class A, and make sure class A ends up using BMock instead of the actual class B.

There are many mocking frameworks in many different languages that make it very easy to implement mocks. In the example above, you don’t actually have to define a BMock class that inherits from class B. You just use a mocking framework library that mocks class B, and use that instance in your unit test. The mocking library also enables you to define the behavior of the method on class B that class A calls in the unit test.

Mocks can also help you test for the adverse conditions in your software. Let’s say when class A instance is calling the class B instance, the method on the class B instance might fail and return some kind of error to class A. How could you test for this and make sure class A handles this error correctly? You could implement the same method on your BMock class to fail with the expected error, and unit test the class A with it. Then you could ensure that class A handles this error the way you want it to.

Unit tests (and integration tests to some extent) have this advantage where you could use mocks and test fakes to simulate adverse conditions like errors, bad user input, network failures, etc. This enables you to test and make sure your software is well prepared for all those worst-case scenarios. It is more difficult to do this with the end-to-end system tests which don’t contain many mocks or test fakes by their very nature.

As a side note, test fakes are just like mocks, but they involve a bit more custom coded implementation in their definition. Mocks on the other hand just have simple predefined behaviors for each separate specific test scenario. The lines do get a bit blurry between them.

There is a small but important point I need to mention here: Do not get too carried away with mocking everything. Like all things in life, there is a balance to this. For example, if the unit test is using a data object, it’s better to use the real deal (the actual data object) rather than its mock. Data object is an instance of a very simple class that solely exists to transport data between the different classes or components of the system. For instance, it could be an Address class that contains the string fields of street name, unit number, city, state, zip code, etc. This Address class instance could be used to store the address data in a database. Let’s say you’re developing an AddressValidator class that validates the correctness of an address input by the user. The AddressValidator obviously operates on Address data objects. When you’re writing the unit tests for the AddressValidator, it is not recommended to mock out Address data objects. There is no need to mock out the getters and setters on the Address class. Just use the actual Address class instances in the unit tests of the AddressValidator. Populate the Address fields with appropriate test data, and run the unit tests with those.

Using the standard unit test frameworks and mocking libraries make it pretty straightforward for the software engineers to implement the unit tests. Engineers can implement the various test cases and test scenarios without much help or tooling support from the QA team. When it comes to the integration or end-to-end (e2e) system tests on the other hand, this picture is a bit different, as you’ll now see.

Integration and End-to-End System Tests

Integration tests are where multiple classes, modules, or software components are tested together. Their interactions with the rest of the system can be mocked out with custom-written test fakes or similar mocking techniques that are also used for the unit tests. End-to-end (e2e) system tests are where the entire software system developed by the team is tested from end to end, hence the name.

Integration and e2e system tests give you a bit more assurance about how your system would behave compared to the unit tests, since all the various pieces of your actual software are tested together. By "actual software", I mean the software that you're going to deploy to production once it passes all the tests. If there are any issues with the interactions of your software components, they would be apparent in these system integration tests. However, compared to the smaller unit tests, these tests are slower to bring up and run. Integration and e2e system tests contain a lot more moving parts, which contributes to their relative slowness.

These tests are also more tricky to implement. They may require implementation of a test framework that sets up a test environment that brings up the relevant servers and subsystems under test. This becomes even more difficult when your system is composed of multiple microservices working together, instead of a single monolithic server. Then you’re going to have to bring up all these multiple servers together in your test environment, in the correct order (which sometimes does matter), and make sure they connect to each other correctly. In a system test, you’re also usually going to have to bring up an appropriate database that your servers could use while being tested. It is highly advisable not to use the actual production database when running some of these tests, as they might make unwanted changes to the database tables, or just simply fill your production database with unwanted test data. This means you need to set up a test database or use an in-memory version of your database using some database-faking technologies which you may have to develop yourself in some cases.

This is where a QA team or a test tools team might come in handy.

The Roles of a QA or a Test Tools Team

A test tools team can implement the framework for system integration tests that sets up the test environment and brings up the servers and components to be tested. They can build stubs, mocks, and fakes for the external services that your servers would normally connect to, by mocking the API of those external services. They can implement a test database which can store the necessary data during the running of these tests. They can write scripts & programs that would run at the beginning of the test to populate this test database with appropriate initial data. Such a test tools team can even implement a few test cases that could set up an example to the rest of the software engineering team about how this new test framework can be utilized. But, as I said before, the software engineers themselves should be ultimately responsible for developing their actual test cases that cover all the features and use case scenarios that they develop. And to reiterate a previous point, the software engineers can completely take care of the unit testing efforts themselves since a variety of standard unit testing and mocking frameworks already exist in most programming languages.

A test tools team needs to consist of software engineers itself. Implementing a test framework requires software development skills. In some companies today, there might exist separate test tool teams and QA teams. Some of the QA teams could be responsible for doing a lot of the manual testing work and may not possess many development skills. As you may have noticed, I am in the opinion of automating as many test cases as possible. I also believe that QA teams can consist of individuals with software development skills who can contribute to the automation of test cases. The companies could also train their QA engineers in software development skills that could be used for this purpose. There is nothing like a company that truly invests in its people.

Moreover, the QA team can act like consultants to the software engineering teams that are responsible for feature development. They can come up with an initial test plan to set an example to such an engineering team. During the course of their iterative development cycles, the engineers are most likely going to come up with even more test cases, and the overall test plan is going to evolve furthermore. During these iterative cycles, the QA team can periodically audit the test code of the engineers and make sure a proper test coverage has been implemented. At the very least, they can meet with the engineers every couple of weeks and discuss the state of the test coverage. The QA team can be there to give advice to the engineers about what kind of testing and test scenarios are necessary, how those can be properly implemented, and how the necessary testing infrastructure can be developed.

The Test Pyramid - Not So Wondrous

As I’ve implied, there is a tradeoff between the smaller unit tests and larger system integration tests. Unit tests are much easier to implement as they don’t require a huge infrastructure investment to build. Plus, they are much faster to run. A typical unit test suite usually takes a couple of seconds to run. If it’s taking more than a minute or so to run to completion, that means there is something very wrong with this particular unit test suite. Running a system integration suite on the other hand can take multiple minutes. I have even heard of end-to-end system test suites running for multiple hours, although that shouldn’t be very typical. Tests that run for multiple hours should be avoided if possible.

While unit tests are easier to implement and faster to run, system integration tests provide much better assurance for the proper functioning of the system, as they test multiple components working together. The end-to-end system tests test all of the system components together by their very definition. This is an excellent measure of the functionality of your system before it’s deployed to production and used by your actual customers.

Among many software engineers, there is this concept of a Test Pyramid. According to this conventional wisdom, it is better to implement a lot of small-sized unit tests (which forms the base of the pyramid), lesser number of medium-sized integration tests (the mid-section of the pyramid), and a very few number of large-sized system tests (the top of the pyramid). The size of the tests are determined by how long it would take to bring up the test environment and run each test case. The system integration tests that bring up multiple servers and run the tests using remote procedure calls (RPCs) between these servers are naturally considered to be large.

Here is what I have to say about all this: Yes, the system integration tests are harder to implement and slower to run. However, they shouldn’t be ignored and simply delegated to the “top of a test pyramid” in small numbers. What might end up happening is that some teams might implement tons of unit tests, but ignore the integration test coverage, implementing a very few number of system integration tests.

When a software system fails, it is usually due to the interaction of its multiple subcomponents. Some remote procedure calls or RESTful API calls fail between the servers due to networking errors, which results in cascading failures throughout the system. Or multiple servers are required to make writes to their respective databases, but one of their transactions fails for some reason when the other server’s transaction goes through. Then the resulting system ends up with a data inconsistency, which subsequently keeps haunting the system, causing mysterious issues and server failures.

The only proper way to test all this is with system integration tests. While unit tests are also necessary and shouldn’t be ignored either, proper investment needs to be made into developing the system integration tests as well. A system integration test suite that tests multiple scenarios, including the failure scenarios between the various system components, is invaluable in assuring the quality of the system to be delivered to the production.

This does require a lot of investment. Not only you’re going to have to implement a proper system integration test infrastructure that brings up all the necessary components, services, and test databases, you are also going to have to implement additional components that enable the testing of the various adversarial testing scenarios. To illustrate, you may have to implement proxy components between the servers that can simulate a network failure when a server is making a remote procedure call or a RESTful API call to another, and use these proxies in various adversarial test scenarios.

While you can anticipate some of these failure scenarios in advance and test for them, some of these scenarios you may only find out about after your server has crashed in production. In these cases, you will know which parts of your system you should focus your testing efforts on and develop the proper testing coverage to prevent any further such issues in the future.

Software engineering is a lot of things: Fun, fulfilling, creative, lucrative… But it is never ever easy.

Non-Functional Tests

All of the types of tests I’ve mentioned so far (unit tests, system integration tests, etc.) have been examples of functional tests. They are meant to test whether your system, subsystem, class, or function is functioning correctly: Sending certain inputs to it, testing whether these inputs are processed the correct way and whether the expected outputs are produced.

However, there are also non-functional tests, which test beyond the simple functionality of a system.

There are Performance Tests that test the speed and reliability of a system: For example, how fast your program runs, and whether your latest code changes have caused any slowdown in your overall runtime.

For APIs and typically for the web services, there are Load Tests that test how the service behaves under a regular expected load of service requests. There are also Stress Tests with which you can test your system under extreme loads, using very high QPS (queries per second) of simulated user or client inputs.

There are Security Tests which test the security defense mechanisms of your system. They test to see whether your system can withstand various hacking attacks, such as cross site scripting attacks, database injection attacks, and so forth.

The non-functional tests are as important as the functional tests. Whether you are releasing a new software product/service to your clients, or whether you are making updates to an existing product/service, the non-functional tests should be a part of your automated testing suite as much as the functional tests.

There is one suggestion I want to make here: You should run these non-functional tests in a test environment that resembles your actual production environment as closely as possible. For instance, if you’re running a load test, please make sure your servers under test are using a copy of the actual database that you would be using in your actual production (e.g. MySQL, Postgres, etc.), and not an in-memory test database. And please make sure the types of requests that you’re sending to your server closely resemble the requests your server would normally receive in production. Let’s say your server normally receives a lot of requests in production that result in database writes; however, if your load test is just sending a bunch of database read requests to your server, then that is not a very valuable load test.

Testing and the Development Velocity

It might seem counterintuitive for companies to invest so much time and resources in testing. I make the claim that under normal circumstances, most of the code produced by the software engineers should be test code, not the actual production code. One could think this would slow down the development velocity, i.e. how fast the teams can implement the various software features. However, according to my own observations and according to a lot of software engineers wiser than me, the opposite is true. More testing increases the overall development velocity.

The more time and resources are invested in proper software testing practices from the very beginning of the development cycle, a lot less time is going to be spent fighting the bugs and weird issues coming up in the long term. The software teams are going to spend more of their time developing new features instead of trying to debug weird issues that constantly plague the system.

As one old saying goes: Testing rocks, debugging sucks.

Without proper testing practices, in the long term, the quality of your company’s product or service is going to be affected adversely. Your company’s reputation can take a huge hit if a low quality product or service is released to your customers.

Reputation is everything. Once it’s gone, it is very hard to bring it back.