Suppose I have a function that does things with a text file – for example reads from it and removes the word ‘a’. I could either pass it a filename and handle the opening/closing in the function, or I could pass it the opened file and expect that whoever calls it would deal with closing it.
The first way seems like a better way to guarantee no files are left open, but prevents me from using things like StringIO objects
The second way could be a little dangerous – no way of knowing if the file will be closed or not, but I would be able to use file-like objects
def ver_1(filename):
with open(filename, 'r') as f:
return do_stuff(f)
def ver_2(open_file):
return do_stuff(open_file)
print ver_1('my_file.txt')
with open('my_file.txt', 'r') as f:
print ver_2(f)
Is one of these generally preferred? Is it generally expected that a function will behave in one of these two ways? Or should it just be well documented such that the programmer can use the function as appropriate?
Convenient interfaces are nice, and sometimes the way to go. However, most of the time good composability is more important than convenience, as a composable abstraction allows us to to implement other functionality (incl. convenience wrappers) on top of it.
The most general way for your function to use files is to take an open file handle as parameter, as this allows it to also use file handles that are not part of the filesystem (e.g. pipes, sockets, …):
def your_function(open_file):
return do_stuff(open_file)
If spelling out with open(filename, 'r') as f: result = your_function(f)
is too much to ask of your users, you could choose one of the following solutions:
your_function
takes an open file or a file name as parameter. If it is a filename, the file is opened and closed, and exceptions propagated. There is a bit of an issue with ambiguity here which could be worked around using named arguments.-
Offer a simple wrapper that takes care of opening the file, e.g.
def your_function_filename(file): with open(file, 'r') as f: return your_function(f)
I generally perceive such functions as API bloat, but if they provide commonly used functionality, the gained convenience is a sufficiently strong argument.
-
Wrap the
with open
functionality in another composable function:def with_file(filename, callback): with open(filename, 'r') as f: return callback(f)
used as
with_file(name, your_function)
or in more complicated caseswith_file(name, lambda f: some_function(1, 2, f, named=4))
1
The real question is one of completeness. Is your file processing function the complete processing of the file, or is it just one piece in a chain of processing steps? If it is complete in and of its own, then feel free to encapsulate all file access within a function.
def ver(filepath):
with open(filepath, "r") as f:
# do processing steps on f
return result
This has the very nice property of finalizing the resource (closing the file) at the end of the with
statement.
If however there is possibly a need for processing an already-open file, then the distinction of your ver_1
and ver_2
makes more sense. For example:
def _ver_file(f):
# do processing steps on f
return result
def ver(fileobj):
if isinstance(fileobj, str):
with open(fileobj, 'r') as f:
return _ver_file(f)
else:
return _ver_file(fileobj)
This kind of explicit type testing is often frowned upon, especially in languages like Java, Julia, and Go where type- or interface-based dispatching is directly supported. In Python, however, there is no language support for type-based dispatching. You may occasionally see criticism of direct type-testing in Python, but in practice it’s both extremely common and quite effective. It enables a function to have a high degree of generality, handling whatever datatypes are likely to come its way, aka “duck typing.” Note the leading underscore on _ver_file
; that is a conventional way of designating a “private” function (or method). While it can technically be called directly, it suggests that function is not intended for direct external consumption.
2019 update: Given recent updates in Python 3, for example that paths are now potentially stored as pathlib.Path
objects not just str
or bytes
(3.4+), and that type hinting has gone from esoteric to mainstream (circa 3.6+, though still actively evolving), here’s updated code that takes these advances into account:
from pathlib import Path
from typing import IO, Any, AnyStr, Union
Pathish = Union[AnyStr, Path] # in lieu of yet-unimplemented PEP 519
FileSpec = Union[IO, Pathish]
def _ver_file(f: IO) -> Any:
"Process file f"
...
return result
def ver(fileobj: FileSpec) -> Any:
"Process file (or file path) f"
if isinstance(fileobj, (str, bytes, Path)):
with open(fileobj, 'r') as f:
return _ver_file(f)
else:
return _ver_file(fileobj)
4
If you pass the file name around instead of the file handle then there is no guarantee that the second file is the same file as the first one when it is opened; this can lead to correctness bugs and security holes.
8
This is about ownership and the responsibility to close the file. You can pass on a stream or file handle or whatever thingy that should be closed/disposed at some point to another method, as long as you make sure it is clear who owns it and certain it will be closed by the owner when you are done. This typically involves a try-finally construct or the disposable pattern.
One aspect the other answers haven’t pointed out is a “capabilities” approach. This is usually applied in the context of security, but can also help with design and development (e.g. defensive coding, debugging, testability, etc.).
Roughly: a “capability” is the information required to perform some action. If we have a capability, we can perform its corresponding actions; we can’t perform actions without a corresponding capability. Examples of capabilities include URLs, API keys, username/password pairs, etc. Capabilities are useful since they focus on what could happen rather than what should happen.
In your case it seems like filenames and handles are roughly equivalent: they let us read files. Yet things aren’t that simple, and capabilities give us a way to think about the differences.
Our code should read data from a file, so it needs the capability (AKA information required) to do so. Does a filename give us this capability? Not quite. In particular:
- There might be no file with the given filename. In this case we can’t read a file, so (according to my rough definition above) such filenames are not the capabilities we need.
- If there is a file with the given filename, our program’s user account might not have permission to read it. Again, from my rough definition, such filenames are not the capabilities we need: we would need some extra information to access them, like the login details of a permitted user.
Opened files (AKA handles or ‘file-like objects’) don’t have these problems: permission or file-not-found errors will be thrown before our function gets called; and presumably (via the single-responsibility principle) the code that’s trying to open those files is better placed to handle those errors or pass them on.
So filenames aren’t always the capabilities we need. They might also give us extra capabilities we don’t want, too! For example, given a filename we can (try to) delete it, rename it, move it, etc.; whilst we can’t do that using a handle (at least, not as easily). If we write our code using a more restricted set of capabilities, it’s less likely that we’ll trigger some unwanted action by mistake; so this is another reason to accept handles instead of filenames. I would also argue that it makes understanding and debugging easier, since we can guess from its signature whether a function might be the source of a problem (like, say, files getting accidentally deleted). This way of thinking can also influence our design and architecture, since an important aspect is making sure each component has everything it needs to do its job (i.e. the right capabilities are available), whilst encapasulating/modularising/protecting other parts of the system from interference (i.e. restricting those capabilities).
Caveat: If we’re relying on a capability model for security, it’s important that capabilities are kept secret; are unguessable (e.g. we shouldn’t be able to increment one valid ID to get another); and undiscoverable (e.g. the ability to list filenames should be restricted). We must assume that if something is plausible, then malicious actors will exploit it (e.g. digging deep into the attributes of a file handle to figure out the filename). If we’re not concerned about malicious actors, I tend to assume that developers (including me) are mostly lazy: if there’s an lazy solution and a tricky solution (e.g. reading from a given handle vs. figuring out its filename and opening it), then we can usually ignore the tricky approach when we’re designing, since it’s unlikely to be (ab)used.
If you choose to pass open files you can do something like the following BUT you have no access to the filename in the function that writes into the file.
I would do this if I wanted to have a class that was 100% responsible for file/stream operations and other classes or function that would would be naive and not expected to open or close said files/streams.
Remember that context managers work like having a finally clause.
So if an exception is thrown in the writer function the file is going to be closed no matter what.
import contextlib
class FileOpener:
def __init__(self, path_to_file):
self.path_to_file = path_to_file
@contextlib.contextmanager
def open_write(self):
# ...
# Here you can add code to create the directory that will accept the file.
# ...
# And you can add code that will check that the file does not exist
# already and maybe raise FileExistsError
# ...
try:
with open(self.path_to_file, "w") as file:
print(f"open_write: has opened the file with id:{id(file)}")
yield file
except IOError:
raise
finally:
# The try/catch/finally is not mandatory (except if you want to manage Exceptions in an other way, as file objects have predefined cleanup actions
# and when used with a 'with' ie. a context manager (not the decorator in this example)
# are closed even if an error occurs. Finally here is just used to demonstrate that the
# file was really closed.
print(f"open_write: has closed the file with id:{id(file)} - {file.closed}")
def writer(file_open, data, raise_exc):
with file_open() as file:
print("writer: started writing data.")
file.write(data)
if raise_exc:
raise IOError("I am a broken data cable in your server!")
print("writer: wrote data.")
print("writer: finished.")
if __name__ == "__main__":
fo = FileOpener('./my_test_file.txt')
data = "Hello!"
raise_exc = False # change me to True and see that the file is closed even if an Exception is raised.
writer(fo.open_write, data, raise_exc)
3