In the current company that I work for there is a very large Utilities library. It was written a few years back (long before I joined) and has grown ergonomically over time to now do everything:
- Sending Emails
- Exporting Files in different formats, i.e. CSV, tab delimited
- Receiving data from 3rd party providers, i.e. Bloomberg, Facset
- Sending files to SFTP sites
- Generic database connection
The list goes on and it is pretty ugly really.
The original developer here used it in every application that he wrote for convenience sake and added to it as necessary. The utilities library itself has references to 3rd party API’s to do some data receiving from Bloomberg/Factset. Using these 3rd party API’s means that any project that wants to reference the large Utilities library also then needs to have the Bloomberg API’s and Facset API’s with it. The list is starting to grow too and it is becoming very unmaintainable and unnecessary to have all these 3rd party DLL’s accompanied with this large Utilities project but are required to compile it.
I want to rewrite this entire library it to more manageable libraries. i.e:
- CompanyName.Email
- CompanyName.FileExporter
- CompanyName.Bloomberg
Each project will then only reference the necessary projects it needs to get its job done. i.e. reference the CompanyName.Email to send emails.
I have no issues doing that, but the complication facing me is that some of the new libraries will require their own database connection and queries.
The example below is:
[HolidayChecker]
[HolidayChecker.DAL]
We store a list of holiday days in the database. What I would want the HolidayChecker project to do is expose a method (amongst others) to check if a particular date is a holiday.
public interface IHolidayChecker
{
bool IsHoliday(DateTime dateToCheck);
}
public class HolidayChecker : IHolidayChecker
{
public bool IsHoliday(DateTime dateToCheck)
{
//Query database for date logic...
}
}
Many applications will then reference this HolidayChecker project and use the IsHoliday method. These applications will also have their own connection and queries to the database to perform their own tasks.
[GenericApplication (References HolidayChecker)]
[GenericApplication.BusinessLogic]
[GenericApplication.DAL]
If the application and the HolidayChecker project both access the database and query the data, this comes across heavily as code smell. Is anyone aware of a tidier way of achieving this? Or is there? Is this something I will just have to live with?
EDIT:
Thinking about this overnight, I don’t think it is a bad as first thought. Projects like ELMAH are stand alone that also connect to the database and do their own queries. As @Allan pointed out, it is just a matter of designing my HolidayChecker project correctly.
0
I would suggest that organization may be more important than separation. Separating things into separate DLLs could be useful when a client library only needs part (but not all) of the total functionality.
My suggestion would be to follow the DRY principle — don’t duplicate code. Shared code can be put into a base library that is shared among the more specialized ones. Just be careful that you don’t get yourself into a dependency/DLL hell where you have to mess with an entire dependency tree…
To get around the DLL hell, I would recommend creating a single solution (.sln) file for the whole thing, then break down the separate DLLs as projects within it. That way you can rebuild everything at once, but still have some flexibility.
Another thought: as you work through your code base, you may find that organizing by business use case might not be as effective as by type of functionality or something else.
2
Wait. Do you have an issue in the first place? What are you trying to solve?
Too large
You say you have a library which is too large. What does it mean? It may mean that:
-
The binary is just so huge that it’s hardly possible to distribute it. Nobody wants to include (and so distribute) a 100 MB DLL file in order to use only few of the methods.
Is this your case? How much space does it takes? 100 MB? 1 GB? I’ve rarely seen .NET libraries larger than a few megabytes. Given actual average connection speeds, price by GB of memory and disk space, etc., one megabyte more or less doesn’t matter, and obviously doesn’t worth wasting days, weeks or months of your time.
-
The library contains so many methods that you can’t find the one you need.
This is not the problem of size, but rather the issue with the organization. Create additional namespaces, if OOP wasn’t understood by your predecessor — refactor heavily, but don’t split a large library into multiple smaller ones just because the code is a mess: it probably won’t solve the problem.
Too dependent
You say any project which references the utilities library should also reference the different APIs. How those APIs are implemented? What makes it mandatory to reference them? In ordinary cases, in .NET Framework, when project A references project B which references project C, A should reference C if and only if A uses C through B. So to solve this particular problem, just fix the way those APIs are referenced.
Too slow?
You might also say that a large library is slower to load. Don’t. Use a profiler, gather metrics, and if and only if those metrics show that there is a huge gap between loading a 1 MB library or loading, say, four 150 KB libraries, you may optimize. Without precise profiler results, talks about performance remain pure speculation.
So, one library for everything?
There are reasons to split a large library into several smaller ones.
-
Compile times.
On a not-very-fast PC, a large library will take too long to build. If you’re actively working on one part of the library, while other parts remain pretty stable, it may be annoying to be forced to rebuild everything at every change. Putting the code you’re working on into a dedicated library may render the local compilation faster.
I know, compiling a library, even a large one, is pretty fast on any decent configuration. Agreed, but what about Code contracts, for example? On my machine, a project may compile within a few seconds, then take minutes to do the static checking (even with caching enabled, very surprisingly). Smaller library means faster results, meaning shorter cycles.
-
Security.
Redistributing a library means it may be subject to reverse engineering. If this is a concern, less code you give, less is the risk for it to be stolen and reused.
-
Layers.
You may need to be sure that some parts of the library are not using other parts. For example, in an ordinary web application, the data access layer is not expected to use presentation layer.
To enforce those rules, you’ll ordinarily use Layer Diagram in Visual Studio, specifying which libraries can reference which others. If all the code is in one library, there is no easy way to check automatically that a part of the library is not using another one.
-
Frequent updates.
If the library is referenced by an application which is used by hundreds of thousands of persons and the project is updated nearly every day, reducing the library by 100 KB means saving 20 GB of bandwidth daily, which is not so bad.
Conclusion
No, a large library is not a code smell and has nothing to do with code smells. It may have some drawbacks, but only in a few particular circumstances.
I would be more concerned if there were, say, fifty small libraries. It’s difficult to organize, it’s difficult to deal with, and more importantly, developers would be less inclined to move code from a library to another rather than from a namespace to another, which would sooner or later lead to duplication of code.
3
If there are a lot of heavy shared functions, I would move as much as possible of the implementation into a REST-based web service. Then all the library assemblies could be made into light-weight wrappers using the built-in .NET libraries only.
If you can get away from client-api dll distribution (eg: bloomberg), and keep them in one place, it makes it much easier to swap them out or replace them in the future.
1
If I need to deal with the database, I would like to have a single library that provides all the functionality for that database. It’s OK if you have a large database with lots of operations, the key is to divide the operations into logcal namespaces and classes.
You just want to make sure you are not using the database for a lot of unrelated things. This would be like having a class with low cohesion (there is weak connection between the public methods). If this is the case then split it into two databases and then you can have one library for each.