Should I pass in filenames to be opened, or open files?

Suppose I have a function that does things with a text file – for example reads from it and removes the word ‘a’. I could either pass it a filename and handle the opening/closing in the function, or I could pass it the opened file and expect that whoever calls it would deal with closing it.

The first way seems like a better way to guarantee no files are left open, but prevents me from using things like StringIO objects

The second way could be a little dangerous – no way of knowing if the file will be closed or not, but I would be able to use file-like objects

def ver_1(filename):
    with open(filename, 'r') as f:
        return do_stuff(f)

def ver_2(open_file):
    return do_stuff(open_file)

print ver_1('my_file.txt')

with open('my_file.txt', 'r') as f:
    print ver_2(f)

Is one of these generally preferred? Is it generally expected that a function will behave in one of these two ways? Or should it just be well documented such that the programmer can use the function as appropriate?

Convenient interfaces are nice, and sometimes the way to go. However, most of the time good composability is more important than convenience, as a composable abstraction allows us to to implement other functionality (incl. convenience wrappers) on top of it.

The most general way for your function to use files is to take an open file handle as parameter, as this allows it to also use file handles that are not part of the filesystem (e.g. pipes, sockets, …):

def your_function(open_file):
    return do_stuff(open_file)

If spelling out with open(filename, 'r') as f: result = your_function(f) is too much to ask of your users, you could choose one of the following solutions:

your_function takes an open file or a file name as parameter. If it is a filename, the file is opened and closed, and exceptions propagated. There is a bit of an issue with ambiguity here which could be worked around using named arguments.
Offer a simple wrapper that takes care of opening the file, e.g.
```
def your_function_filename(file):
    with open(file, 'r') as f:
        return your_function(f)
```
I generally perceive such functions as API bloat, but if they provide commonly used functionality, the gained convenience is a sufficiently strong argument.
Wrap the with open functionality in another composable function:
```
def with_file(filename, callback):
    with open(filename, 'r') as f:
        return callback(f)
```
used as with_file(name, your_function) or in more complicated cases with_file(name, lambda f: some_function(1, 2, f, named=4))

The real question is one of completeness. Is your file processing function the complete processing of the file, or is it just one piece in a chain of processing steps? If it is complete in and of its own, then feel free to encapsulate all file access within a function.

def ver(filepath):
    with open(filepath, "r") as f:
        # do processing steps on f
        return result

This has the very nice property of finalizing the resource (closing the file) at the end of the with statement.

If however there is possibly a need for processing an already-open file, then the distinction of your ver_1 and ver_2 makes more sense. For example:

def _ver_file(f):
    # do processing steps on f
    return result

def ver(fileobj):
    if isinstance(fileobj, str):
        with open(fileobj, 'r') as f:
            return _ver_file(f)
    else:
        return _ver_file(fileobj)

This kind of explicit type testing is often frowned upon, especially in languages like Java, Julia, and Go where type- or interface-based dispatching is directly supported. In Python, however, there is no language support for type-based dispatching. You may occasionally see criticism of direct type-testing in Python, but in practice it’s both extremely common and quite effective. It enables a function to have a high degree of generality, handling whatever datatypes are likely to come its way, aka “duck typing.” Note the leading underscore on _ver_file; that is a conventional way of designating a “private” function (or method). While it can technically be called directly, it suggests that function is not intended for direct external consumption.

2019 update: Given recent updates in Python 3, for example that paths are now potentially stored as pathlib.Path objects not just str or bytes (3.4+), and that type hinting has gone from esoteric to mainstream (circa 3.6+, though still actively evolving), here’s updated code that takes these advances into account:

from pathlib import Path
from typing import IO, Any, AnyStr, Union

Pathish = Union[AnyStr, Path]  # in lieu of yet-unimplemented PEP 519
FileSpec = Union[IO, Pathish]

def _ver_file(f: IO) -> Any:
    "Process file f"
    ...
    return result

def ver(fileobj: FileSpec) -> Any:
    "Process file (or file path) f"
    if isinstance(fileobj, (str, bytes, Path)):
        with open(fileobj, 'r') as f:
            return _ver_file(f)
    else:
        return _ver_file(fileobj)

If you pass the file name around instead of the file handle then there is no guarantee that the second file is the same file as the first one when it is opened; this can lead to correctness bugs and security holes.

This is about ownership and the responsibility to close the file. You can pass on a stream or file handle or whatever thingy that should be closed/disposed at some point to another method, as long as you make sure it is clear who owns it and certain it will be closed by the owner when you are done. This typically involves a try-finally construct or the disposable pattern.

One aspect the other answers haven’t pointed out is a “capabilities” approach. This is usually applied in the context of security, but can also help with design and development (e.g. defensive coding, debugging, testability, etc.).

Roughly: a “capability” is the information required to perform some action. If we have a capability, we can perform its corresponding actions; we can’t perform actions without a corresponding capability. Examples of capabilities include URLs, API keys, username/password pairs, etc. Capabilities are useful since they focus on what could happen rather than what should happen.

In your case it seems like filenames and handles are roughly equivalent: they let us read files. Yet things aren’t that simple, and capabilities give us a way to think about the differences.

Our code should read data from a file, so it needs the capability (AKA information required) to do so. Does a filename give us this capability? Not quite. In particular:

There might be no file with the given filename. In this case we can’t read a file, so (according to my rough definition above) such filenames are not the capabilities we need.
If there is a file with the given filename, our program’s user account might not have permission to read it. Again, from my rough definition, such filenames are not the capabilities we need: we would need some extra information to access them, like the login details of a permitted user.

Opened files (AKA handles or ‘file-like objects’) don’t have these problems: permission or file-not-found errors will be thrown before our function gets called; and presumably (via the single-responsibility principle) the code that’s trying to open those files is better placed to handle those errors or pass them on.

So filenames aren’t always the capabilities we need. They might also give us extra capabilities we don’t want, too! For example, given a filename we can (try to) delete it, rename it, move it, etc.; whilst we can’t do that using a handle (at least, not as easily). If we write our code using a more restricted set of capabilities, it’s less likely that we’ll trigger some unwanted action by mistake; so this is another reason to accept handles instead of filenames. I would also argue that it makes understanding and debugging easier, since we can guess from its signature whether a function might be the source of a problem (like, say, files getting accidentally deleted). This way of thinking can also influence our design and architecture, since an important aspect is making sure each component has everything it needs to do its job (i.e. the right capabilities are available), whilst encapasulating/modularising/protecting other parts of the system from interference (i.e. restricting those capabilities).

Caveat: If we’re relying on a capability model for security, it’s important that capabilities are kept secret; are unguessable (e.g. we shouldn’t be able to increment one valid ID to get another); and undiscoverable (e.g. the ability to list filenames should be restricted). We must assume that if something is plausible, then malicious actors will exploit it (e.g. digging deep into the attributes of a file handle to figure out the filename). If we’re not concerned about malicious actors, I tend to assume that developers (including me) are mostly lazy: if there’s an lazy solution and a tricky solution (e.g. reading from a given handle vs. figuring out its filename and opening it), then we can usually ignore the tricky approach when we’re designing, since it’s unlikely to be (ab)used.

If you choose to pass open files you can do something like the following BUT you have no access to the filename in the function that writes into the file.

I would do this if I wanted to have a class that was 100% responsible for file/stream operations and other classes or function that would would be naive and not expected to open or close said files/streams.

Remember that context managers work like having a finally clause.
So if an exception is thrown in the writer function the file is going to be closed no matter what.

import contextlib

class FileOpener:

    def __init__(self, path_to_file):
        self.path_to_file = path_to_file

    @contextlib.contextmanager
    def open_write(self):
        # ...
        # Here you can add code to create the directory that will accept the file.
        # ...
        # And you can add code that will check that the file does not exist 
        # already and maybe raise FileExistsError
        # ...
        try:            
            with open(self.path_to_file, "w") as file:
                print(f"open_write: has opened the file with id:{id(file)}")            
                yield file                
        except IOError:
            raise
        finally:
            # The try/catch/finally is not mandatory (except if you want to manage Exceptions in an other way, as file objects have predefined cleanup actions 
            # and when used with a 'with' ie. a context manager (not the decorator in this example) 
            # are closed even if an error occurs. Finally here is just used to demonstrate that the 
            # file was really closed.
            print(f"open_write: has closed the file with id:{id(file)} - {file.closed}")        


def writer(file_open, data, raise_exc):
    with file_open() as file:
        print("writer: started writing data.")
        file.write(data)
        if raise_exc:
            raise IOError("I am a broken data cable in your server!")
        print("writer: wrote data.")
    print("writer: finished.")

if __name__ == "__main__":
    fo = FileOpener('./my_test_file.txt')    
    data = "Hello!"  
    raise_exc = False  # change me to True and see that the file is closed even if an Exception is raised.
    writer(fo.open_write, data, raise_exc)

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 21:36

Thẻ: file-handling, python

Thiết kế website giá rẻ

Danh mục

Should I pass in filenames to be opened, or open files?