There are answers to the question on how test classes that connect to a database, e.g. “Should service test classes connect …” and “Unit testing – Database coupled app”.
So, in short let’s assume you have a class A that needs to connect to a database. Instead of letting A actually connect, you provide A with an interface that A may use to connect. For testing you implement this interface with some stuff – without connecting of course. If class B instantiates A it has to pass a “real” database connection to A. But that means B opens a database connection. That means to test B you inject the connection into B. But B is instantiated in class C and so forth.
So at which point do I have to say “here I fetch data from a database and I will not write a unit test for this piece of code”?
In other words: Somewhere in the code in some class I must call sqlDB.connect()
or something similar. How do I test this class?
And is it the same with code that has to deal with a GUI or a file system?
I want to do Unit-Test. Any other kind of test is not related to my question. I know that I will only test one class with it (I so agree with you Kilian). Now, some class has to connect to a DB. If I want to test this class and ask “How do I do this” many say: “Use Dependency Injection!” But that only shifts the problem to another class, doesn’t it? So I ask, how do I test the class that really, really establishes the connection?
Bonus question: Some answers here boil down to “Use mock objects!” What does that mean? I mock classes that the class-under-test depend upon. Shall I mock the class-under-test now and actually test the mock (which comes close to the idea of using Template Methods, see below)?
6
The point of a unit test is to test one class (in fact, it should usually test one method).
This means that when you test class A
, you inject a test database into it – something self-written, or a lightning-fast in-memory database, whatever gets the job done.
However, if you test class B
, which is a client of A
, then usually you mock the entire A
object with something else, presumably something that does its job in a primitive, pre-programmed way – without using an actual A
object and certainly without using a database (unless A
passes the entire database connection back to its caller – but that’s so horrible I don’t want to think about it). Likewise, when you write a unit test for class C
, which is a client of B
, you would mock something that takes the role of B
, and forget about A
altogether.
If you don’t do that, it’s no longer a unit test, but a system or integration test. Those are very important as well, but a whole different kettle of fish. To begin with, they are usually more effort to set up and run, it isn’t practicable to demand to pass them as a precondition to check-ins, etc.
Performing unit tests against a database connection is perfectly normal, and a common practice. It’s simply not possible to create a purist
approach where everything in your system is dependency injectable.
The key here is to test against a temporary or testing only database, and having the lightest possible start up process for building that test database.
For unit testing in CakePHP there are things called fixtures
. Fixtures are temporary database tables created on the fly for a unit test. The fixture has convenience methods for creating them. They can recreate a schema from a production database inside the testing database, or you can define the schema using a simple notation.
The key to success with this is to not implement the business database, but to focus only on the aspect of the code you are testing. If you have a unit test that verifies that a data model reads only published documents, then the table schema for that test should only have the fields required by that code. You don’t have to re-implement an entire content management database just to test that code.
Some additional references.
http://en.wikipedia.org/wiki/Test_fixture
http://phpunit.de/manual/3.7/en/database.html
http://book.cakephp.org/2.0/en/development/testing.html#fixtures
13
There is, somewhere in your codebase, a line of code that performs the actual action of connecting to the remote DB. This line of code is, 9 times in 10, a call to a “built-in” method provided by the runtime libraries specific to your language and environment. As such, it’s not “your” code and so you don’t need to test it; for the purposes of a unit test, you can trust that this method call will perform correctly. What you can, and should, still test in your unit test suite are things like ensuring the parameters that will be used for this call are what you expect them to be, such as making sure the connection string is correct, or the SQL statement or stored procedure name.
This is one of the purposes behind the restriction that unit tests should not leave their runtime “sandbox” and be dependent upon external state. It’s actually quite practical; the purpose of a unit test is to verify that the code you wrote (or are about to write, in TDD) behaves the way you thought it would. Code that you didn’t write, such as the library you are using to perform your database operations, shouldn’t be part of the scope of any unit test, for the very simple reason that you didn’t write it.
In your integration test suite, these restrictions are relaxed. Now you can design tests that touch the database, to make sure that the code you did write plays nicely with code you didn’t. These two test suites should remain segregated, however, because your unit test suite is more effective the faster it runs (so you can quickly verify that all the assertions made by developers about their code still hold), and almost by definition, an integration test is slower by orders of magnitude because of the added dependencies on external resources. Let the build-bot handle running your full integration suite every few hours, executing the tests that lock up external resources, so that the developers aren’t stepping on each other’s toes by running these same tests locally. And if the build breaks, so what? A lot more importance is placed on ensuring the build-bot never fails a build than probably should be.
Now, how strictly you can adhere to this is dependent on your exact strategy for connecting to and querying the database. In many cases where you must use the “bare-bones” data access framework, such as ADO.NET’s SqlConnection and SqlStatement objects, an entire method developed by you may consist of built-in method calls and other code that is dependent on having a database connection, and so the best you could do in this situation is mock the entire function and trust your integration test suites. It also depends on how willing you are to design your classes to allow specific lines of code to be replaced for testing purposes (such as Tobi’s suggestion of the Template Method pattern, which is a good one because it allows “partial mocks” that exercise some methods of a real class while overriding others that have side effects).
If your data persistence model relies on code in your data layer (such as triggers, stored procs, etc) then there simply is no other way to exercise code you yourself are writing than to develop tests that either live inside the data layer or cross the boundary between your application runtime and the DBMS. A purist would say this pattern, for this reason, is to be avoided in favor of something like an ORM. I don’t think I’d go quite that far; even in the age of language-integrated queries and other compiler-checked, domain-dependent persistence operations, I see the value in locking the database down to only the operations exposed via stored procedure, and of course such stored procedures must be verified using automated tests. But, such tests are not unit tests. They are integration tests.
If you have a problem with this distinction, it’s usually based on a high importance placed on complete “code coverage” aka “unit test coverage”. You want to ensure every line of your code is covered by a unit test. A noble goal on its face, but I say hogwash; that mentality lends itself to anti-patterns stretching far beyond this specific case, such as writing assertionless tests that execute but do not exercise your code. These types of end-runs solely for the sake of coverage numbers are more harmful than relaxing your minimum coverage. If you want to ensure that every line of your codebase is executed by some automated test, then that’s easy; when computing code coverage metrics, include the integration tests. You could even go one step further and isolate these disputed “Itino” tests (“Integration in name only”), and between your unit test suite and this sub-category of integration tests (which should still run reasonably fast) you should get darn near close to full coverage.
Unit tests should never connect to a database. By definition, they should test a single unit of code each (a method) in total isolation from the rest of your system. If they don’t, then they are not a unit test.
Semantics aside, there are a myriad of reasons why this is beneficial:
- Tests run orders of magnitude faster
- Feedback loop becomes instant (<1s feedback for TDD, as an example)
- Tests can be run in parallel for build/deploy systems
- Tests don’t need a database to be running (makes build much easier, or at least quicker)
Unit tests are a way to check your work. They should outline all of the scenarios for a given method, which typically means all of the different paths through a method. It is your specification that you are building to, similar to double-entry bookkeeping.
What you are describing is another type of automated test: an integration test. While they are also very important, ideally you will have much less of them. They should verify that a group of units integrate with each other properly.
So how do you test things with database access? All of your data access code should be in a specific layer, so your application code can interact with mockable services instead of the actual database. It shouldn’t care whether those services are backed by any type of SQL database, in-memory test data, or even remote webservice data. That is not their concern.
Ideally (and this is very subjective), you want the bulk of your code covered by unit tests. This gives you confidence that each piece works independently. Once the pieces are built, you need to put them together. Example – when I hash the user’s password, I should get this exact output.
Let’s say that each component is made up of roughly 5 classes – you’d want to test all of the points of failure within them. This equates to a lot less tests just to ensure everything is wired properly. Example – test you can find the user from the database given a username/password.
Finally, you want some acceptance tests to actually ensure you are meeting business objectives. There are even less of these; they may ensure the application is running and does what it was built to do. Example – given this test data, I should be able to log in.
Think of these three types of tests as a pyramid. You need a lot of unit tests to support everything, and then you work your way up from there.
Testing with external data is integration test. Unit test means you are testing the unit only. It is mostly done with your business logic. To make your code unit testable you have to follow some guidelines, like you have to make your unit independent from other part of your code. During unit test if you need data then you need to forcefully inject that data with dependency injection. There are some mocking and stubbing framework out there.
The Template Method Pattern might help.
You wrap the calls to a database in protected
methods. To test this class you actually test a fake object which inherits from the real database connetion class and overrides the protected methods.
This way the actual calls to the database are never under unit tests, that’s right. But it’s only these few lines of code. And that is acceptable.
1