Examples of boundary characters can be: ""
, ''
, ()
, space
, ^$
(start and end of the line if any other boundary characters are not specified explicitly). Boundary characters should be easily configurable.
""
, ''
, ()
boundary characters should have more priority over space
boundary character.
If no boundary characters are found in the line, ^$
(start and end of the line) should be considered as boundary characters.
Text inside boundary characters is URI (directory path, file path, http address, etc). But it is not required to check URIs on pattern correctness. Let assume that inside boundary characters can be any text.
Redundant space characters should be trimmed.
Below different examples with lines of text, cursor position, boundary characters and what should be extracted.
Symbol |
in examples shows current cursor position. It will not be present in actual lines as symbol |
. Only its position number is known. But in examples I will use |
character for visual clarity. And it will be visually present only if it affects the result (otherwise it can be anywhere in the line).
Examples of lines with cursor positions and boundary characters, text that should be retrieved from them (after ->
) and comment (after #
):
(line with cursor -> extracted text # comment)
~ -> ~ # cursor can be anywhere in the line, result should be the same, "~" extracted
~ -> ~ # the same but with trimmed spaces, "~" extracted
~/.local/share/applications -> ~/.local/share/applications # spaces trimmed, "~/.local/share/applications" extracted
/home -> /home
'~/te st' -> ~/te st # result should be the same no matter where the cursor is located
|"~/te st1" '~/te st2' -> ~/te st1 # border case: cursor touches boundary character, "~/te st1" extracted
"~/te st1"| '~/te st2' -> ~/te st1 # border case, same result
"~/te st1" |'~/te st2' -> ~/te st2 # border case, "~/te st2" extracted
"~/te st1" '|~/te st2' -> ~/te st2
"~/te st1" '~/te st2|' -> ~/te st2
"~/te st1" '~/te st2'| -> ~/te st2 # border case, "~/te st2" extracted
/ -> /
https://stackoverflow.com -> https://stackoverflow.com
'https://stackoverflow.com' -> https://stackoverflow.com
~ /etc 1 /etc 2 '/home' 3 /etc 4 '/home' 5 "/media" 6 "~/te st" 7 ~ "$HO|ME" ~ -> $HOME # many URIs in one line, "$HOME" extracted
Later in processing tilde sign ~
and environment variables like $HOME
will be expanded. But this can be excluded from scope of the current question. Trimming of spaces can also be excluded for keeping this question more simple.
Below I will present my own script with current solution. But I do not like it because it contains code duplication for processing each boundary characters set and is not easily configurable because of this. Nevertheless it works as described above (except support of ()
boundary chars for now). And performance is good since only one string traversal is used.
Probably will be better to rewrite it with using function calls.
But may be already is there some better solution since as I understand this must be very typical task for text editors and text processing?
extract-url-and-xdg-open.sh
#!/bin/bash
# extracts URI based on cursor position from strings like:
# 1 "/media" 2 '/home' 3 /etc 4 '/home' 5 "/media" 6
start_time=$(date +%s%3N)
tmp_dir=""
virtual_tmp_dir="/mnt/tmpfs"
if [ -d "$virtual_tmp_dir" ]; then
tmp_dir="$virtual_tmp_dir"
else
tmp_dir="/tmp"
fi
log_file="$tmp_dir/extract-url-and-xdg-open.log"
log_enabled=0 # 0 is disabled, 1 is enabled
log(){
if [ "$log_enabled" -eq 1 ]; then
msg="`date +%Y-%m-%dT%H:%M:%S`: $1"
#echo $msg
# printf '%s' "$msg" >> "$logfile"
echo -e "$msg" >> "$log_file"
fi
}
filepath="$1"
linenumber="$2"
cursorposition="$3"
log "cursorposition: $cursorposition"
linenumber=$((linenumber+1))
log "filepath: $filepath"
log "linenumber: $linenumber"
line=$(sed "${linenumber}q;d" "$filepath")
log "line: $line"
length=${#line}
log "line length: $length"
###########################################
if [ "$cursorposition" -eq 0 ]; then # when cursor is at the beginning of line: '|/home'
first_char="${line:0:1}"
log "first_char: $first_char"
if [ "$first_char" = """ ] || [ "$first_char" = "'" ]; then
(( cursorposition++ )) # after cursor moved: '|/home'
log "cursor position moved 1 char forward because it points on the beginning of line with " or '"
fi
elif [ "$cursorposition" -eq $length ]; then # when cursor is in the end of line: '/home'|
last_char="${line:$length-1:1}"
log "last_char: $last_char"
if [ "$last_char" = """ ] || [ "$last_char" = "'" ]; then
(( cursorposition-- )) # after cursor moved: '/home|'
log "cursor position moved 1 char bacwards because it points on the end of line with " or '"
fi
else
char_at_cursor="${line:$cursorposition:1}"
log "char at current cursor position: $char_at_cursor"
if [ "$char_at_cursor" = """ ] || [ "$char_at_cursor" = "'" ] ; then # start |'/home' end
previous_char="${line:$cursorposition-1:1}"
log "previous_char: $previous_char"
if [ "$previous_char" = " " ]; then
(( cursorposition++ )) # after moving cursor: start '|/home' end
log "cursor position moved 1 char forward because it points on " or ' and the previous char is space"
fi
elif [ "$char_at_cursor" = " " ]; then # start '/home'| end
previous_char="${line:$cursorposition-1:1}"
log "previous_char: $previous_char"
if [ "$previous_char" = """ ] || [ "$previous_char" = "'" ]; then
(( cursorposition-- )) # after moving cursor: start '/home|' end
log "cursor position moved 1 char backword because it points on space and the previous char is " or '"
fi
fi
fi
###########################################
left_singlequote_position=-1
left_doublequote_position=-1
left_space_position=-1
#left direction: search in range [current-position-1 .. 0]
for (( i=$cursorposition-1; i>=0; i-- )); do
currchar=${line:$i:1}
log "[left direction] current index: $i ; char: $currchar"
if [ "$currchar" = '"' ] && [ "$left_doublequote_position" -eq -1 ]; then
left_doublequote_position=$i
log "found ""
fi
if [ "$currchar" = "'" ] && [ "$left_singlequote_position" -eq -1 ]; then
left_singlequote_position=$i
log "found '"
fi
if [ "$currchar" = " " ] && [ "$left_space_position" -eq -1 ]; then
left_space_position=$i
log "found space"
fi
done
log "left_doublequote_position $left_doublequote_position"
log "left_singlequote_position: $left_singlequote_position"
log "left_space_position: $left_space_position"
###########################################
right_singlequote_position=-1
right_doublequote_position=-1
right_space_position=-1
#right direction: search in range [current-position .. length-1]
for (( i=$cursorposition; i<$length; i++ )); do
currchar=${line:$i:1}
log "[right direction] current index: $i ; char: $currchar"
if [ "$currchar" = '"' ] && [ "$right_doublequote_position" -eq -1 ]; then
right_doublequote_position=$i
log "found ""
fi
if [ "$currchar" = "'" ] && [ "$right_singlequote_position" -eq -1 ]; then
right_singlequote_position=$i
log "found '"
fi
if [ "$currchar" = " " ] && [ "$right_space_position" -eq -1 ]; then
right_space_position=$i
log "found space"
fi
done
log "right_doublequote_position $right_doublequote_position"
log "right_singlequote_position: $right_singlequote_position"
log "right_space_position: $right_space_position"
###########################################
if [ "$left_doublequote_position" -ne -1 ] && [ "$right_doublequote_position" -ne -1 ]; then
((length_between_doublequotes=right_doublequote_position-left_doublequote_position-1))
log "length_between_doublequotes: $length_between_doublequotes"
uri_between_doublequotes=${line:$left_doublequote_position+1:$length_between_doublequotes}
log "uri_between_doublequotes: $uri_between_doublequotes"
first_letter=${uri_between_doublequotes:0:1}
last_letter=${uri_between_doublequotes:$length_between_doublequotes-1:1}
log "first_letter: $first_letter"
log "last_letter: $last_letter"
if [ "$length_between_doublequotes" -eq 2 ] || [ "$first_letter" = " " ] || [ "$last_letter" = " " ]; then
log "uri_between_doublequotes is not valid"
uri_between_doublequotes=""
fi
fi
if [ "$left_singlequote_position" -ne -1 ] && [ "$right_singlequote_position" -ne -1 ]; then
((length_between_singlequotes=right_singlequote_position-left_singlequote_position-1))
log "length_between_singlequotes: $length_between_singlequotes"
uri_between_singlequotes=${line:$left_singlequote_position+1:$length_between_singlequotes}
log "uri_between_singlequotes: $uri_between_singlequotes"
first_letter=${uri_between_singlequotes:0:1}
last_letter=${uri_between_singlequotes:$length_between_singlequotes-1:1}
log "first_letter: $first_letter"
log "last_letter: $last_letter"
if [ "$length_between_singlequotes" -eq 2 ] || [ "$first_letter" = " " ] || [ "$last_letter" = " " ]; then
log "uri_between_singlequotes is not valid"
uri_between_singlequotes=""
fi
fi
if [ "$left_space_position" -eq -1 ]; then
left_space_position=-1
log "left_space_position is set as the most left position of the line -1"
fi
if [ "$right_space_position" -eq -1 ]; then
right_space_position="$length"
log "right_space_position is set as the most right position of the line +1"
fi
#if [ "$left_space_position" -ne -1 ] && [ "$right_space_position" -ne -1 ]; then
((length_between_spaces=right_space_position-left_space_position-1))
log "length_between_spaces: $length_between_spaces"
uri_between_spaces=${line:$left_space_position+1:$length_between_spaces}
log "uri_between_spaces: $uri_between_spaces"
###########################################
if [ -n "$uri_between_singlequotes" ]; then
uri=$uri_between_singlequotes
elif [ -n "$uri_between_doublequotes" ]; then
uri=$uri_between_doublequotes
elif [ -n "$uri_between_spaces" ]; then
uri=$uri_between_spaces
fi
log "uri: $uri"
if [ -z "$uri" ]; then # for case with spaces at the beginning and end: /etc
trimmed_input=$line
trimmed_input="${trimmed_input#"${trimmed_input%%[![:space:]]*}"}" # remove leading whitespace characters
uri="${trimmed_input%"${trimmed_input##*[![:space:]]}"}" # remove trailing whitespace characters
log "trimmed the whole line: $uri"
fi
uri_with_tilde_expanded="${uri/#~/$HOME}"
log "uri_with_tilde_expanded: $uri_with_tilde_expanded"
uri_with_env_vars_expanded=`echo "$uri_with_tilde_expanded" | envsubst`
log "uri_with_env_vars_expanded: $uri_with_env_vars_expanded"
xdg-open "$uri_with_env_vars_expanded"
###########################################
end_time=$(date +%s%3N)
duration_ms=$((end_time - start_time))
log "Execution time in ms: $duration_ms"
###########################################