I encountered a CalledProcessError when I ran the HDFS cp command using the subprocess.check_output function. Below is a sample of my program.
>>import subprocess
>>command = "hdfs dfs -cp -f /hdfs/path/temp/file.csv/part-* /hdfs/path/file.csv"
>>try:
>> result = subprocess.check_output(command.split(), stderr=subprocess.STDOUT)
>>except subprocess.CalledProcessError as e:
>> log("An error occurred: {}".format(e.output.decode()))
The command works fine in the terminal, but it throws an exception when executed through a subprocess.
Please assist me in resolving the issue or proposing alternative methods to execute the above HDFS command.
Please make a note of the following text:
Versions: Python 2.7, Spark 2.4