I’m trying to use curl in Bash to fetch data from a web page using this script.
#!/bin/bash
# Define variables for URL and browser
sGCitta="fucecchio"
sGTypo="sale-houses"
sGDomain="real-estate"
url="https://www.$sGDomain.it/$sGTypo/$sGCitta"
# Get HTML content of the page
html_content=$(curl -s -L "$url")
# Use html-xml-utils to extract announcements
announcements=$(echo "$html_content" | hxselect 'li.nd-list__item.in-searchLayoutListItem')
# Connect to SQLite database
db_file="immo.db"
# Initialize arrays for prices, links, descriptions, sizes, and auction
prices=()
links=()
descriptions=()
sizes=()
auctions=()
# Loop through the announcements
while IFS= read -r announcement; do
# Extract price
price=$(echo "$announcement" | hxselect 'div.in-listingCardPrice span' -c | grep -oP '(?<=€ )[0-9,.]+')
# Extract link
link=$(echo "$announcement" | hxselect 'a.in-listingCardTitle' -s 'n' | grep -o 'href="[^"]*"' | sed 's/^href="//' | sed 's/"$//')
# Extract description
description=$(echo "$announcement" | hxselect 'a.in-listingCardTitle' -s 'n' | grep -oP '(?<=title=")[^"]+')
# Extract size
size=$(echo "$announcement" | hxselect 'div.in-listingCardFeatureList__item:nth-of-type(2) span' -c | grep -oP '[0-9]+')
# Check if the description contains the word "auction"
if [[ "$description" =~ "auction" ]]; then
auction=1
else
auction=0
fi
# Add data to the arrays
prices+=("$price")
links+=("$link")
descriptions+=("$description")
sizes+=("$size")
auctions+=("$auction")
done <<< "$announcements"
# Print the sizes of the arrays
echo "Size of the prices array: ${#prices[@]}"
echo "Size of the links array: ${#links[@]}"
echo "Size of the descriptions array: ${#descriptions[@]}"
echo "Size of the sizes array: ${#sizes[@]}"
echo "Size of the auctions array: ${#auctions[@]}"
# Insert data into the SQLite database
for ((i = 0; i < ${#prices[@]}; i++)); do
sqlite3 "$db_file" "INSERT INTO $sGDomain (price, link, description, size, auction) VALUES ('${prices[i]}', '${links[i]}', '${descriptions[i]}', '${sizes[i]}', '${auctions[i]}')"
done
Currently, it places all the data related to prices (for example, price) in prices[1]. However, I’d like it to put the first price in prices[1] and the second price in prices[2], but I’m not sure how to accomplish this. Can anyone give me a hint?
I tried using the ‘hxselect’ command to extract data from the web page and store it in arrays. However, the data for prices was all stored in the same array index (prices[1]) instead of being distributed across different indices based on the position of the prices on the web page. I expected that the prices would be stored in different array indices corresponding to their position on the page