I’ve inherited a .NET project that has close to 2 thousand clients out in the field that need to push data periodically up to a central repository. The clients wake up and attempt to push the data up via a series of WCF webservices where they are passing each entity framework entity as parameter. Once the service receives this object, it preforms some business logic on the data, and then turns around and sticks it in it’s own database that mirrors the database on the client machines.
The trick is, is that this data is being transmitted over a metered connection, which is very expensive. So optimizing the data is a serious priority. Now, we are using a custom encoder that compresses the data (and decompresses it on the other end) while it is being transmitted, and this is reducing the data footprint. However, the amount of data that the clients are using, seem ridiculously large, given the amount of information that is actually being transmitted.
It seems me that entity framework itself may be to blame. I’m suspecting that the objects are very large when serialized to be sent over wire, with a lot context information and who knows what else, when what we really need is just the ‘new’ inserts.
Is using the entity framework and WCF services as we have done so far the correct way, architecturally, of approaching this n-tiered, asynchronous, push only problem? Or is there a different approach, that could optimize the data use?
There’s no doubt you could optimize this application – any application can be optimized. But before you dive in are you sure you need to do this? Is there a problem with the current process – is it too slow, too expensive, is someone complaining? If you’re just doing this as an iterative improvement & imagine you’ll be a hero if you can reduce data transfer by 30%, the risks are far bigger than the benefits. Rewriting your service contracts will mean you’ll need to add transformation code at each end, which means rehydrating EF objects and ensuring they have the correct state to reattach to the data context. It sounds easy but it’s a big change.
You should definitely profile how expensive the EF objects are compared to equivalent DTOs. You’re running on a hunch at the moment. How much data will you save by making this change?
Are there simpler, more obvious improvements? When optimizing WCF services in the past I’ve identified that a huge overhead in service request size was Windows Authentication headers – enormous security tokens being passed between client and server – which can be replaced with a much smaller certificate. Is all the data being sent completely necessary? I assume you’re sending binary (net.tcp) rather than text (http), but if you’re not that’s an obvious improvement.
DTOs are a useful pattern, and they are championed heavily by MVC guys, but this isn’t because of any data-saving concern – it’s because they provide a service interface, an abstraction from the database. Without a DTO you add a dependency to your database model. This doesn’t apply in your case because it seems you have the same database model on both ends. The simplest approach will be to send the EF objects over the wire and directly insert them, just as you’re currently doing.
Reducing data traffic will save some money. How much money? Enough to warrant your time developing this solution, additional maintenance time due to increased application complexity?
2
You can use some Data Transfer Objects (DTO) only to transfer objects from the clients to the server and the other way around. Obviously the structure of this lightweight objects should be shared aross the system (clients) in order to have the ability of serializing and deserializing. This approach is commonly used to reduce the amount of data tranferred and to avoid the exposition of the domain.
I guess that your current implementation is like that so the domain objects are used across the system and all the clients are able to understand the objects that they recieve/send. It would be the same concept with DTO’s.
This process of converting Domaing models to DTO’s will also have an impact in the performance, however is much smaller and probably not a problem in your context.
In the Application Layer that will be responsible for mapping Domin Models to DTO (and viceversa) you can use AutoMapper, which is a very nice library to do this work.
I would say that transfering the entity framework objects over WCF is a bad idea. Any time you do any kind of operation on an EF object, it will generally result in database calls. This can be bad enough for performance when it is all being done in the one service, and adding an extra tier over network connection will really slow things down!
There is concept/pattern called the DTO (data transfer object) pattern that involves creating new simpler types that just contain all the information that will be required in the client and passing this instead.
You then create a DTO Assembler which is responsible for building DTOs from your entity framework models.
Googling DTO’s will hopefully be a good starting point for you. Here is an example link at MSDN
4
It seems you are using self-tracking entities and indeed these objects being serialised will be large and not recommended to send over WCF in your scenario. DTO, as others have mentioned would also be my preferred option. The next question would be what wire level serialisation will give the smallest size. One option is using protocol-buffers which I haven’t used personally but worth looking into.
But the one I will suggest is to use JSON as JSON is significantly smaller than equivalent XML. You can use WCF ([WebGet], [WebIvoke] attributes) or better yet use the ASP.Net MVC Web API. Enabling gzip compression using IIS should significantly reduce the payload size.