TDD: how to test file outputs?

I’m really new to TDD, so I guess this question is pretty basic.

We’re building a website, and part of the functionality is generating some files (binary files: Excel, PDF, whatever). How should I test this feature?

I’ve think about creating some static files and compare those with the generated ones, but a binary comparison isn’t reliable (the files can have the same content but different checksums), and, if I understood TDD correctly, a logic comparison isn’t a good idea, since I should use basically the same algorithm I use to generate the files, so I wouldn’t be really testing anything.

How are this kind of things usually dealt with?

I’ve done this by using a library to parse the files. Sounds like you already have one available so you should use this.

Even if you could compare the entirety of the binary files, I wouldn’t recommend this. Lets say you’ve got 100 tests that are testing various things by comparing the entire binary output. Then a new requirement comes in that the title of each file should be changed from “Foo” to “Bar”. Assuming all your output files were outputting a title of “Foo”, you’ve now got 100 broken tests you have to fix.

A better way is to only test one thing in a test that doesn’t overlap with the other tests. In this context, only a few tests should be responsible for checking the title. Then if the same requirement comes in, only those few tests would need to be changed.

How are this kind of things usually dealt with?

By not unit testing the files.

Generally, there’s some intermediary format that defines the content of the file. You would test that the intermediary content you expect is being sent off to be PDF-ified. Basically, isolate everything but actually encoding your output to PDF. In an ideal world, that work will be done by some library, so you don’t need to test it.

In a realistic world, you need humans to eyeball the PDFs to make sure that they’re well formed and “look right” anyways (and load properly in Acrobat on different platforms, etc.), so have them check the content too. Not everything is a good fit for TDD.

Testing output files is always a difficult thing, same goes for testing downloading files from the web or output to the console.

One question you should ask yourself is: “How far can I test until I need the file?” Most of the logic can be tested by using some kind of replacement code, or simple text file generator.
Looking at your question you can generate multiple files, so I guess you already have some sort of separation between the code that provides the data and the code that is generating the file. So my answer will be based on the assumption that there is a kind of factory that generates the actual file.

A question you can ask yourself is: how complex is the factory, does it contains lots of extra logic or is only calling functions from another library that is creating the files for you?
If the factory does have lots of logic, is this the same for all kind of outputs or different for each output? When it is the same, you should think of refactoring it out of the factory, or add a format that is easy to test (like a plain text file).
When it only calls the a library, do you really have to test it automatically? aren’t you going to test the library you are using?

So basically I would test as much as possible that doesn’t require me to create a complex file. Plain text files, including html and json are relative easy to test because they are readable and comparable.
To test the binary outputs, I would probably generate a simple sample file to test against and write a test that uses the factory to generate the same file and test the file at byte level. Generated with the same parameters should generate the same file over and over again.

File output usually belongs to integration-tests(=having some components working together) and not to unittests(=test one component in isolation)

if your pdf-generation is implemented in a way that the same input always produces the same output you can try approvaltests which does a binary compare to the previous call result. if there is no previous result or the binary compare is different a gui asks you if old and new version are the same showing you both pdf files with acrobat reader.

This way you are informed every time the output changes which might be ok or not.
Example

note:

if the pdf-generation inserts the current date into the output then the ouput will be different on every call.

If you provide the date as a parameter to the pdf-api then you can generate identical output.

There are a few different ways of testing files:

Binary comparison. Output a “golden” result file for each test; manually confirm it’s right; store it and compare it against future file outputs. Pros: Simple. Cons: Fragile. Prone to false negatives (esp if files contain metadata, like ‘creation date’ that change even when the ‘payload’ data does not.)
Parsing the file. Use a library to parse the (Excel, PDF, text, …) file. Run assertions on the parsed data. Pro: Less fragile than binary comparison. Can easily avoid metadata flutter. Cons: More difficult and complex to code. Not all file formats are readily parsed, nor are the interesting features of the output amenable to a concise, descriptive set of test assertions.
Rendered comparison. Output the file; use a rendering engine to convert the file to a more comparable representation (such as a lossless image file). Pro: Can be less fragile than pure binary comparison, and easier to code than parsing plus assertions. Focuses easily on ‘payload’ rather than metadata. Cons: Dependent on renderer and rendering envionrment (including specific renderer version). Requires same “golden image” files as straightforward binary comparison. May require cropping, filtering, and sorting operations to align and compare only the relevant parts.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 22:26

Thẻ: agile, tdd

TDD: how to test file outputs?

I’m really new to TDD, so I guess this question is pretty basic.

We’re building a website, and part of the functionality is generating some files (binary files: Excel, PDF, whatever). How should I test this feature?

How are this kind of things usually dealt with?

I’ve done this by using a library to parse the files. Sounds like you already have one available so you should use this.

How are this kind of things usually dealt with?

By not unit testing the files.

Testing output files is always a difficult thing, same goes for testing downloading files from the web or output to the console.

File output usually belongs to integration-tests(=having some components working together) and not to unittests(=test one component in isolation)

This way you are informed every time the output changes which might be ok or not.
Example

note:

if the pdf-generation inserts the current date into the output then the ouput will be different on every call.

If you provide the date as a parameter to the pdf-api then you can generate identical output.

There are a few different ways of testing files:

Binary comparison. Output a “golden” result file for each test; manually confirm it’s right; store it and compare it against future file outputs. Pros: Simple. Cons: Fragile. Prone to false negatives (esp if files contain metadata, like ‘creation date’ that change even when the ‘payload’ data does not.)
Parsing the file. Use a library to parse the (Excel, PDF, text, …) file. Run assertions on the parsed data. Pro: Less fragile than binary comparison. Can easily avoid metadata flutter. Cons: More difficult and complex to code. Not all file formats are readily parsed, nor are the interesting features of the output amenable to a concise, descriptive set of test assertions.
Rendered comparison. Output the file; use a rendering engine to convert the file to a more comparable representation (such as a lossless image file). Pro: Can be less fragile than pure binary comparison, and easier to code than parsing plus assertions. Focuses easily on ‘payload’ rather than metadata. Cons: Dependent on renderer and rendering envionrment (including specific renderer version). Requires same “golden image” files as straightforward binary comparison. May require cropping, filtering, and sorting operations to align and compare only the relevant parts.

Filed under: softwareengineering - @ 22:26

Thẻ: agile, tdd

Thiết kế website giá rẻ

Danh mục

TDD: how to test file outputs?

TDD: how to test file outputs?