Decoupling when constructors have non-contractual parameters

By non-contractual parameters, I mean parameters that are not interfaces or service dependencies, something like class Person(string name).

I am writing a webpage scraping application, and so far I’ve written it in the wrong order (hence the creation of this question). I made a class to parse HTML documents (from string format) and give me all the urls and images in it.

That class has the following description:

public class PageParser
{
    private readonly string _html;

    public PageParser(string html) {
        _html = html;
    }

    public IEnumerable<string> GetImages() {
        /* not important */
    }

    public IEnumerable<string> GetLinks() {
        /* not important */
    }   
}

This code works great, unit-tested, 100% coverage, etc… The problem is now how do I write unit-tests for code that uses this class without worry about the constructor of this class. I only need this new class to work with the behavior of the PageParser.

The pseudo code for this new functionality is as follows:

public Report CreateReport(string url)
{
    var html = _webClient.DownloadString(url); 

    var parser = new PageParser(html);  

    var images = parser.GetImages();
    var links = parser.GetLinks();

    var relevantLinks = links.Where(l => l.Contains('something'));

    return _reportBuilder.Create(images, relevantLinks);
}

The problem that I have is I don’t want to have a bunch of unit-tests that have epically sized HTML documents for all my various test cases the CreateReport method can have.

The options I have come up with are:

1) Leave everything as is

Leave the class structure as is, just live with having tests with large setup variables

2) Expose the page parser (+interface) from a Factory

Create an interface for PageParser, and create a factory with corresponding interface.

interface IPageParserFactory
{
    IPageParser Create(string html);
}

class PageParserFactory : IPageParserFactory
{
    IPageParser Create(string html)
    {
        return new PageParser(html);
    }
}

Then the resulting pseudo code would look something as follows.

// ... download string

var pageParser = _pageParserFactory.Create(html);

// ... create report

3) Move the parameter html to each method and create an interface (so now the class has no construction dependencies)

interface IPageParser
{
    IEnumerable<string> GetImages(string html);
    IEnumerable<string> GetLinks(string html);
}

4) Something else entirely

If there’s another pattern that solves this problem I would like to know it.

From a quick glance, it seems that your CreateReport method is violating the single-responsibility principle, as it does a lot of things:

it reads HTML from URL via DownloadString
it creates the PageParser and retrieves its parsing results
it filters the result based on some relevance criterion
it does the actual creation of a report

Given the method name, one would expect that method to only have that last responsibility. What you have given here instead is something that is highly coupled together with the data retrieval, parsing and filtering logic.

The idea of creating a report is usually based on some sort of already prepared data. So one already starts wondering, why does your method take a URL as input?

From your proposed solutions 1) to 3), none addresses these issues really. When implementing either of these, you will still need to create local files with more-or-less complete HTML contents for your unit tests.

Note that some people even go as far as claiming that no unit test should access the filesystem, because that is generally too slow. Of course, one may disagree with this, but it should at least make you feel a little bit guilty, such that you only do that if you have a very good reason to do so.. and you don’t have one in this case.

So let’s go with option 4) something else entirely. Taking a different look at your problem, we see that it’s really all about dealing with some sort of document data. Your PageParser is the first one to create such data, so what about changing it to not return individual IEnumerables, but instead a single PageData object, which in turn is able to provide the IEnumerables.

Continuing with this line of though, a report generation really only needs a PageData object. Your _reportBuilder.Create(images, relevantLinks) becomes a call to _reportBuilder.Create(pageData) and your CreateReport takes a PageData instead of an URL as input. Actually, CreateReport may get shortened to that one Create line. Everything else was not its responsibility anyways.

Outside of either of these classes (PageParser, PageData, report generator class) you can define an independent PageDataFilter interface and implementation classes. One of them will be the filter you had in your CreateReport. Note that the interface is a simple one-method filtering of the sort PageData ApplyFilter(PageData pageData). Dealing with filters like that gives you the advantages of easy testability of individual filters, easy composition of filters and it decouples the filters from everything else. The report generator need not care if its given PageData was filtered in one way or the other. (Note that for the implementation you have to decide if PageData is immutable, or can be modified by filters in-place. The first option is mathematically sound, easy to reason about, better suitable for parallelization, etc., whereas the latter is more efficient in terms of memory and cpu.)

At this point, you can easily test your PageParser, your report generator, and your link filter. On top of that you can do all of that testing by simply preparing one straightforward instance of a required class in memory. No more URL loading, no more involved HTML stuff outside of PageParser tests, no more file system accesses, and no more coupling that needs to be taken care of in your tests.

Addendum: What the above text is missing is how these components are wired together. Let’s say you now have a PageParser that creates PageData which you can filter via PageDataFilter instances and create reports from via ReportCreator. What you still need is the instantiation and instrumentation of these classes – somewhere further outside. It depends on your application’s design as to where a proper place would be for that. But for simplicity’s sake, you could assume an Application class with a main method. In that method you would do the following:

Download HTML URL to a String
Instantiate a PageParser with this string as constructor argument
Instantiate the desired PageDataFilter objects
Apply the PageDataFilter objects to the PageData delivered by the PageParser
Instantiate the ReportCreator and give it the filtered PageData for creating a report.

Again, it depends on your overall design and architecture, if these things are manually instantiated, dependency-injected, or what-have-you. The major point though is: keep your classes small and limited to their one and only responsibility, then instrument them from the outside. If the instrumentation code gets larger and does too many things, then recursively apply the above ideas to that code as well.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 00:51

Thẻ: class-design, constructors, dependency-injection, object-oriented

Thiết kế website giá rẻ

Danh mục

Decoupling when constructors have non-contractual parameters