At what point/range is a code file too big?

I’m finding lots of 2-3k line files, and it doesn’t really feel like they should be that big.

What is a good criteria to objectively call a source code file “too big”?, is there such thing as a maximum amount of lines a source code file should have?

As an ideal model I use the following criteria (with a similar rationale to what Martin Beckett suggested, i.e. to think in terms of logical structure and not in terms of lines of code):

Rule 1

One class per file (in C++: one class -> one header and one implementation file).

Rule 2

Seven is considered the number of items that our brain can observe at the same time without getting confused. Above 7 we find it difficult to keep an overview of what we see. Therefore: each class should not have more than 7-10 methods. A class that has more than 10 method is probably too complex and you should try to split it. Splitting is a very effective method because every time you split a class you reduce the complexity of each individual class at least by a factor of 2.

Rule 3

A method body that does not fit in one or two screens is too big (I assume that a screen / editor window is about 50 lines). Ideally, you can see the whole method in one window. If this is not the case, you only need to scroll up and down a bit, without forgetting the part of the method that gets hidden. So, if you have to scroll more than one screen up or down to read the whole method body, your method is probably too big and you can easily lose the overview.

Again, splitting methods using private help methods can reduce method complexity very fast (at every split the complexity is at least halved). If you introduce too many private help methods you can consider creating a separate class to collect them (if you have more private methods than public ones, maybe a second class is hiding inside your main class).

Putting together these very rough estimates:

At most one class per source file.
At most 10 public method per class.
At most 10 private method per class.
At most 100 lines per method.

So a source file that is more than 2000 lines is probably too large and starting to be too messy.

This is really a very rough estimate and I do not follow these criteria systematically (especially because there is not always enough time to do proper refactoring). Also, as Martin Beckett suggested, there are situations in which a class is a large collection of methods and it does not make sense to split them in some artificial way just to make the class smaller.

Anyway, in my experience a file starts to get unreadable when one of the above parameters is not respected (e.g. a 300 line method body that spans six screens, or a source file with 5000 lines of code).

No – not in terms of lines of code. The driver should be logical grouping. There certainly shouldn’t be multiple classes in one large file for example

If you had a class that legitmately had a few hundred methods (not impossible in say 3D modelling) it would be a lot less convenient to split that into arbitrary files. We used to have to do this when memory was scarcer and processors slower – and it was a pains, constantly searching for the function definition.

When the code in it becomes unmaintainable. i.e: you can’t tell just by watching the code if the method/class/function you’re looking for (and have to edit/debug) is in there, or not, and if so, where it is.

Your IDE/Editor choice and features will influence this upper limit’s actual quantification, though. Code folding, function/method listing, and lookup will postpone the moment this development scenario presents.

But when it does, it’s time to split it.

Consider this Metaphor. When it comes to code length, I think we should consider the following:

The Cat in The Hat (50 pp.)

and

Lord of The Rings (1,178 pp.)

There is nothing wrong with Lord of the Rings. It’s a fabulous book. The Cat in the Hat is also a great book. Both can be understood by a 5 year olds, but only one is better suited due to content.

To my point, writing code should make sense to a 5 year old whenever we can. Cyclomatic Complexity is an important concept that may developers should consider as they generate code. Utilizing and creating libraries to enhance functionality and code reusability as much as possible. This way our code can speak more volumes than what we see written.

Most of us are not writing assembly code. But the root of our code is assembly. Searching through 10000 lines assembly is harder than 10000 lines of python, if it is done correctly.

But some work requires writing 500 to 1000 lines. Our goal with code should be to write 300 lines of clean code.

As developers, we want to write “Lord of The Rings”. Until we get a bug and wish we were writing “Cat in the Hat”. Don’t make coding a measure of ego. Just make things work in a simple fashion.

Developers don’t want to document code, (I love documented code personally, I’m not that selfish). So don’t write code that only you can understand/read. Write Cat in the Hat code.

We all know you are J.R.R. Tolken (in your head). Remember you will have nothing to prove with bug free code.

Another reason for the Metaphor.

Don’t overkill the reader spread the wealth. If you work with a group of people and all of them have to change that same a large file, you are will probably putting yourself into git merge hell.

Everyone loves rebasing.

-> Said no one ever!

TL;DR
Focus on readability. Spread your code and helper over multiple lines and files as much as you can. Don’t throw 8 or 9 classes in a single file, It makes the code hard to read and harder to maintain. If you have a large condition code or loop, consider changing them to Lambdas if the language supports it. Utilities functions should be considered a great avenue to increase code readability. Avoid heavy nesting.

Here is an alternative view: you are asking about how to limit file size. My opinion is that there are many factors that make large code files very problematic. Sometimes the code file is huge but its content are well clustered and extremely clean code, so that the size does not cause any significant problems. I have seen lots of files that are very readable despite the high LOC.

Instead of tapping into LOC metric, I would rather think about using the history data to understand how often the code gets broken in those large files. Usually the reason for that is that the developers do not have time to patience to check the relevant other places in the same file and make the change with “quick fix” mentality without enough understanding.

The bigger danger is the presence of copy-paste code. Copy-paste coding naturally also speeds up the LOC growth. I think eliminating copy-paste is even more imporant than keeping LOC below some magic number. In addition to pure copy-paste, there is also a second danger in the big files: overlapping functionality. The bigger the file is, the more likely you end up reimplementing some snippet that is already in some other section of the same file.

So, as long as bug fix ratio (ratio of bug fix commits to all commits) is low for the larger files, the situation is tolerable. Please try git log for it and skim through how many of the commits are related to errors. Or use a tool that can automatically analyze and visualize it, e.g. Softagram.

I don’t really see any limit beyond practicalities imposed by version control, compilers, IDEs, language features, etc. Yet those practicalities do matter.

So it’s relative to those practicalities as I see it. I remember being infuriated some decades back looking at codebases that frequently had C source files with over 20,000 LOC, but in hindsight, the biggest source of my frustration is that they often defined static file scope variables that were visible to those 20,000+ lines of code. It had to do with the extremely wide variable scope that I was getting so frustrated debugging such code, testing it, and trying to reason about it. The other is that there was a lot of toe-stepping just due to the version control tools we were using (SVN back then) which made merge conflicts especially painful as multiple developers modified the same source file. If we didn’t use such file-scope globals and different version control tools, I wouldn’t have found it nearly as problematic to have huge source files. The IDEs were already good enough back then in the 90s to quickly find relevant symbols without scrolling through these monstrous source files.

It’s all relative as I see it. I’ve gone increasingly pragmatic over the years (at least in relative terms) and I recommend people do the same. There might be a real problem behind a big source file but narrow down what they actually are rather than just, “big source file is bad!” At least our persuasive factor with our team improves if we can pinpoint what’s truly causing us grief. You know, I heard a wise quote one time in Star Trek that only a Sith lord deals in absolutes (JK, I know it’s Star Wars but I have a slightly sadistic enjoyment of making some programmer types uncomfortable with blasphemous technical inaccuracies).

Another thing for me at least in languages like C and C++ with their compilation and linkage models is that the cheapest way to reduce build times tends to be in reducing the number of compilation units involved. Of course we still probably want more than one for a large codebase to allow parallel builds, but there is at least an argument in favor of reduced build times and improved productivity with such compilers to favor somewhat larger source files, especially if we can avoid some of the issues I’ve found problematic above like file scope statics. There are those types of practicalities to consider, depending on your tools, and practices, as to whether it is desirable to have larger or smaller source files. In C++, I’ve found even through extensive use of precompiled headers (which do help a lot) that it can help, just as much, not to favor a style that leads to thousands upon thousands of compilation units by, say, putting every single class implementation in a separate pair of header and source files. But again that’s very tailored to our use cases, our tools, processes, and what we find beneficial, and may not apply to everyone else. It’s all relative at the end of the day as I see it. We’re not dealing with absolutes here.

It is definitely easier to deal with absolutes, especially when starting out. “i” before “e” except after “c”, even if that sometimes produces ‘wierd’ results. But it’s all relative as I see it, and there are countless exceptions in the real world. It tends to help to try to focus on the most painful and immediate problems at hand as specifically as possible, in the same way that anyone tackling performance-critical code can benefit from the specifics that making a profiler their best friend can provide. The appreciation of nuance is the key to appreciating the benefits of favoring relatives rather than absolutes.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 16:41

Thẻ: code-quality, code-smell

Thiết kế website giá rẻ

Danh mục

At what point/range is a code file too big?