Thiết kế website giá rẻ

Question

I am trying to create a Spark program to read the airport data from airport.text file, find all the airports which are located in United States and output the airport’s name and city’s name to an out_airport_in_usa.text file but I get error codes when using .saveAsTextFile

I created a .py file containing a function Utils.COMMA_DELIMITER and I added the directory folder containing this .py file to my PYTHONPATH path so that I can import the module contained in the .py file

My code and error are below:

<code>from pyspark import SparkContext, SparkConf

import Utils #this is the module in the .py file

Create a function

def splitComma(line: str):

splits = Utils.COMMA_DELIMITER.split(line)

return "{}, {}".format(splits[1], splits[2])`

conf = SparkConf().setAppName("airports").setMaster("local[*]")

sc = SparkContext(conf = conf)`

Import dataset and auto-creating an RDD

airports = sc.textFile("airports.text")

filter the data to find all airports in d USA using a lamda function

airportsInUSA = airports.filter(lambda line : Utils.COMMA_DELIMITER.split(line)[3] == ""United States"")

airportsNameAndCityNames = airportsInUSA.map(splitComma)

airportsNameAndCityNames.saveAsTextFile("out/airports_in_usa.text")

--------------------------------------------------------------------

Py4JJavaError Traceback (most recent call last)

c:UsersuserDocumentsPython ProgrammingPySpark_for_Big_Data_3.ipynb Cell 18 line 2

1 # Save output in a new text file

----> 2 airportsNameAndCityNames.saveAsTextFile("out/airports_in_usa.text")

3 # displays an error

File C:sparkspark-3.4.2-bin-hadoop3pythonpysparkrdd.py:3406, in RDD.saveAsTextFile(self, path, compressionCodecClass)

3404 keyed._jrdd.map(self.ctx._jvm.BytesToString()).saveAsTextFile(path, compressionCodec)

3405 else:

-> 3406 keyed._jrdd.map(self.ctx._jvm.BytesToString()).saveAsTextFile(path)

File C:sparkspark-3.4.2-bin-hadoop3pythonlibpy4j-0.10.9.7-src.zippy4jjava_gateway.py:1322, in JavaMember.__call__(self, *args)

1316 command = proto.CALL_COMMAND_NAME +

1317 self.command_header +

1318 args_command +

1319 proto.END_COMMAND_PART

1321 answer = self.gateway_client.send_command(command)

-> 1322 return_value = get_return_value(

1323 answer, self.gateway_client, self.target_id, self.name)

1325 for temp_arg in temp_args:

1326 if hasattr(temp_arg, "_detach"):

File C:sparkspark-3.4.2-bin-hadoop3pythonlibpy4j-0.10.9.7-src.zippy4jprotocol.py:326, in get_return_value(answer, gateway_client, target_id, name)

324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)

...

at py4j.commands.CallCommand.execute(CallCommand.java:79)

at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)

at py4j.ClientServerConnection.run(ClientServerConnection.java:106)

at java.base/java.lang.Thread.run(Thread.java:842)

Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...```

Help is urgently needed.

My software specs are:

Python 3.11.8

Pyspark 3.4.2

Spark 3.4.2

Java 17

I use VS Code IDE

</code>

<code>from pyspark import SparkContext, SparkConf import Utils #this is the module in the .py file Create a function def splitComma(line: str): splits = Utils.COMMA_DELIMITER.split(line) return "{}, {}".format(splits[1], splits[2])` conf = SparkConf().setAppName("airports").setMaster("local[*]") sc = SparkContext(conf = conf)` Import dataset and auto-creating an RDD airports = sc.textFile("airports.text") filter the data to find all airports in d USA using a lamda function airportsInUSA = airports.filter(lambda line : Utils.COMMA_DELIMITER.split(line)[3] == ""United States"") airportsNameAndCityNames = airportsInUSA.map(splitComma) airportsNameAndCityNames.saveAsTextFile("out/airports_in_usa.text") -------------------------------------------------------------------- Py4JJavaError Traceback (most recent call last) c:UsersuserDocumentsPython ProgrammingPySpark_for_Big_Data_3.ipynb Cell 18 line 2 1 # Save output in a new text file ----> 2 airportsNameAndCityNames.saveAsTextFile("out/airports_in_usa.text") 3 # displays an error File C:sparkspark-3.4.2-bin-hadoop3pythonpysparkrdd.py:3406, in RDD.saveAsTextFile(self, path, compressionCodecClass) 3404 keyed._jrdd.map(self.ctx._jvm.BytesToString()).saveAsTextFile(path, compressionCodec) 3405 else: -> 3406 keyed._jrdd.map(self.ctx._jvm.BytesToString()).saveAsTextFile(path) File C:sparkspark-3.4.2-bin-hadoop3pythonlibpy4j-0.10.9.7-src.zippy4jjava_gateway.py:1322, in JavaMember.__call__(self, *args) 1316 command = proto.CALL_COMMAND_NAME + 1317 self.command_header + 1318 args_command + 1319 proto.END_COMMAND_PART 1321 answer = self.gateway_client.send_command(command) -> 1322 return_value = get_return_value( 1323 answer, self.gateway_client, self.target_id, self.name) 1325 for temp_arg in temp_args: 1326 if hasattr(temp_arg, "_detach"): File C:sparkspark-3.4.2-bin-hadoop3pythonlibpy4j-0.10.9.7-src.zippy4jprotocol.py:326, in get_return_value(answer, gateway_client, target_id, name) 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client) ... at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.base/java.lang.Thread.run(Thread.java:842) Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...``` Help is urgently needed. My software specs are: Python 3.11.8 Pyspark 3.4.2 Spark 3.4.2 Java 17 I use VS Code IDE </code>

from pyspark import SparkContext, SparkConf
import Utils     #this is the module in the .py file

Create a function
def splitComma(line: str):    
    splits = Utils.COMMA_DELIMITER.split(line)    
    return "{}, {}".format(splits[1], splits[2])`

conf = SparkConf().setAppName("airports").setMaster("local[*]")   
sc = SparkContext(conf = conf)`


Import dataset and auto-creating an RDD
airports = sc.textFile("airports.text")

filter the data to find all airports in d USA using a lamda function 
airportsInUSA = airports.filter(lambda line : Utils.COMMA_DELIMITER.split(line)[3] == ""United States"")  

airportsNameAndCityNames = airportsInUSA.map(splitComma)

airportsNameAndCityNames.saveAsTextFile("out/airports_in_usa.text")
--------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
c:UsersuserDocumentsPython ProgrammingPySpark_for_Big_Data_3.ipynb Cell 18 line 2
      1 # Save output in a new text file
----> 2 airportsNameAndCityNames.saveAsTextFile("out/airports_in_usa.text")
      3 # displays an error

File C:sparkspark-3.4.2-bin-hadoop3pythonpysparkrdd.py:3406, in RDD.saveAsTextFile(self, path, compressionCodecClass)
   3404     keyed._jrdd.map(self.ctx._jvm.BytesToString()).saveAsTextFile(path, compressionCodec)
   3405 else:
-> 3406     keyed._jrdd.map(self.ctx._jvm.BytesToString()).saveAsTextFile(path)

File C:sparkspark-3.4.2-bin-hadoop3pythonlibpy4j-0.10.9.7-src.zippy4jjava_gateway.py:1322, in JavaMember.__call__(self, *args)
   1316 command = proto.CALL_COMMAND_NAME +
   1317     self.command_header +
   1318     args_command +
   1319     proto.END_COMMAND_PART
   1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
   1323     answer, self.gateway_client, self.target_id, self.name)
   1325 for temp_arg in temp_args:
   1326     if hasattr(temp_arg, "_detach"):

File C:sparkspark-3.4.2-bin-hadoop3pythonlibpy4j-0.10.9.7-src.zippy4jprotocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
...
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.base/java.lang.Thread.run(Thread.java:842)
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...```

Help is urgently needed.

My software specs are:
Python 3.11.8  
Pyspark 3.4.2  
Spark 3.4.2  
Java 17  
I use VS Code IDE

Thiết kế website giá rẻ

Danh mục

I can’t save RDD object into text files using PySpark