I am trying to integrate the URLFrontier to a simple crawler but I am running into issues
I have started URLFrontier which listens on port 9100
My client in Python is
client.py
import requests
class URLFrontierClient:
def __init__(self, base_url):
self.base_url = base_url
def get_next_url(self):
response = requests.get(f"{self.base_url}/next_url")
print(self.base_url/next_url)
if response.status_code == 200:
return response.json().get('url')
return None
Example usage:
if name == “main“:
frontier = URLFrontierClient(base_url=’http://localhost:9100′)
# Fetch the next URL
next_url = frontier.get_next_url()
if next_url:
print(f"Next URL to crawl: {next_url}")
# Add new URLs to the frontier
new_urls = ["http://example.com/page1", "http://example.com/page2"]
frontier.add_urls(new_urls)
When I executed client.py I get the following error
File “/Users/tvganesh/backup-mini/software/prorata/crawler1/crawler3/URLFrontierClient.py”, line 29, in
next_url = frontier.get_next_url()
^^^^^^^^^^^^^^^^^^^^^^^
File “/Users/tvganesh/backup-mini/software/prorata/crawler1/crawler3/URLFrontierClient.py”, line 11, in get_next_url
return response.json().get(‘url’)
When I tried
curl http://localhost:9100/next_url
I get this strange dumpHELP jvm_buffer_pool_used_bytes Used bytes of a given JVM buffer pool.
TYPE jvm_buffer_pool_used_bytes gauge
jvm_buffer_pool_used_bytes{pool=”mapped”,} 0.0
jvm_buffer_pool_used_bytes{pool=”direct”,} 658695.0
HELP jvm_buffer_pool_capacity_bytes Bytes capacity of a given JVM buffer pool.
TYPE jvm_buffer_pool_capacity_bytes gauge
jvm_buffer_pool_capacity_bytes{pool=”mapped”,} 0.0
jvm_buffer_pool_capacity_bytes{pool=”direct”,} 658694.0
HELP jvm_buffer_pool_used_buffers Used buffers of a given JVM buffer pool.
TYPE jvm_buffer_pool_used_buffers gauge
jvm_buffer_pool_used_buffers{pool=”mapped”,} 0.0
jvm_buffer_pool_used_buffers{pool=”direct”,} 17.0
HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds.
TYPE jvm_gc_collection_seconds summary
jvm_gc_collection_seconds_count{gc=”G1 Young Generation”,} 0.0
jvm_gc_collection_seconds_sum{gc=”G1 Young Generation”,} 0.0
jvm_gc_collection_seconds_count{gc=”G1 Old Generation”,} 0.0
….
Any help will be appreciated