I have to design and build an import script (in C#) that can handle the following:
- read data from various sources (XML, XSLX, CSV)
- verify data
- write the data to various object types (customer, address)
The data will come from a number of sources but a source will always have one import format (either csv, xml, xslx). Import formats can vary from source to source. New import formats may be added in the future.
The destination object types are always the same (customer, addres and some more).
I’ve been thinking about using generics and I read something about the factory pattern but I’m a pretty big noob in this area so any advice is more than welcome.
What is an appropriate design pattern to solve this problem?
1
You are going overboard with fancy concepts was too soon. Generics – when you see a case use them, but otherwise don’t worry. Factory pattern – way too much flexibility ( and added confusion ) for this yet.
Keep it simple. Use fundamental practices.
-
Try to imagine the common things between doing a read for XML, a read for CSV whatever. Things like, next record, next line. Since New formats may be added, try to imagine commonality that the to be determined format would have with the known ones. Use this commonality and define an ‘interface’ or a contract that all formats must adhere to. Though they adhere to the common ground, they all may have their specific internal rules.
-
For validating the data, try to provide a way to easily plug in new or different validator code blocks. So again, try to define an interface where each validator, responsible for a particular kind of data construction adheres to a contract.
-
For creating the data constructions you will probably be constrained by whoever designs the suggested output objects more than anything. Try to figure out what the next step for the data objects is, and are there any optimizations you can make by knowing the final use. For example if you know the objects are going to be used in an interactive application, you could help the developer of that app by providing ‘summations’ or counts of the objects or other kinds of derived information.
I’d say most of these are Template patterns or Strategy patterns. The whole project would be an Adapter pattern.
4
The obvious thing is to apply Strategy pattern. Have a generic base class ReadStrategy
and for each input format a subclass like XmlReadStrategy
, CSVReadStrategy
etc. This will allow you to change the import processing independently from the verfication processing and the output processing.
Depending on the details it may be also possible to keep most parts of the import generic and exchange only parts of the input processing (for example, reading of one record). This may lead you to Template Method pattern.
2
A suitable pattern for an importing utility that you may need to extend in the future would be to use MEF – you can keep memory use low by loading the converter you need on the fly from a lazy list, create MEF imports that are decorated with attributes that help select the right converter for the import you are trying to perform and provides an easy way of separating the different importing classes out.
Each MEF part can be built to satisfy an importing interface with some standard methods that convert a row of the import file to your output data or override a base class with the basic functionality.
MEF is an framework for creating a plug-in architecture – its how outlook and Visual Studio are built, all those lovely extensions in VS are MEF parts.
To build a MEF (Managed Extensability Framework) app start with including a reference to System.ComponentModel.Composition
Define interfaces to spec out what the converter will do
public interface IImportConverter
{
int UserId { set; }
bool Validate(byte[] fileData, string fileName, ImportType importType);
ImportResult ImportData(byte[] fileData, string fileName, ImportType importType);
}
This can be used for all the file types you want to import.
Add attributes to a new class that define what the class will “Export”
[Export(typeof(IImportConverter))]
[MyImport(ImportType.Address, ImportFileType.CSV, "4eca4a5f-74e0")]
public class ImportCSVFormat1 : ImportCSV, IImportConverter
{
...interface methods...
}
This would define a class that will import CSV files (of a particular format : Format1) and has a custom attributes that sets MEF Export Attribute Metadata. You’d repeat this for each format or file type you want to import. You can set custom attributes with a class like:
[MetadataAttribute]
[AttributeUsage(AttributeTargets.All, AllowMultiple = false)]
public class ImportAttribute : ExportAttribute
{
public ImportAttribute(ImportType importType, ImportFileType fileType, string customerUID)
: base(typeof(IImportConverter))
{
ImportType = importType;
FileType = fileType;
CustomerUID = customerUID;
}
public ImportType ImportType { get; set; }
public ImportFileType FileType { get; set; }
public string CustomerUID { get; set; }
}
To actually use the MEF converters you need to import the MEF parts you create when your converting code is run :
[ImportMany(AllowRecomposition = true)]
protected internal Lazy<IImportConverter, IImportMetadata>[] converters { get; set; }
AggregateCatalog catalog = new AggregateCatalog();
catalog
collects the parts from a folder, default is app location.
converters
is a lazy list of the imported MEF parts
Then when you know what sort of file you want to convert (importFileType
and importType
) get a converter from the list of imported parts in converters
var tmpConverter = (from x in converters
where x.Metadata.FileType == importFileType
&& x.Metadata.ImportType == importType
&& (x.Metadata.CustomerUID == import.ImportDataCustomer.CustomerUID)
select x).OrderByDescending(x => x.Metadata.CustomerUID).FirstOrDefault();
if (tmpConverter != null)
{
var converter = (IImportConverter)tmpConverter.Value;
result = converter.ImportData(import.ImportDataFile, import.ImportDataFileName, importType);
....
}
The call to converter.ImportData
will use the code in the imported class.
Might seem like a lot of code and it can take a while to get your head round whats going on but its extremely flexible when it comes to adding new converter types and can even allow you to add new ones during runtime.
4
What is an appropriate design pattern to solve this problem?
C# idioms involve using the built in serialization framework to do this. You annotate the objects with metadata, and then instantiate different serializers that use those annotations to rip out data to put into the right form, or vice versa.
Xml, JSON, and binary forms are most common, but I would not be surprised if others already exist in a nice packaged form for you to consume.
5