I am currently busy with a project for sensory substitution. It is an extension of a previous project and i dont have the time to convert it to python 3 so have continued working in python 2.7, also working on ubuntu 16 with ROS kinetic. Its a visual to audio system and the previous system took color and converted it to sound. I am getting input from a depth camera so i have a color stream and depth stream and use a combination to produce sound. i have essentially just edited the previous code to detect people and play a sound when it is detected. It plays a sound but it is always in the centre so cant tell where the person is using the system.
This is the format of the images passed to the sound generator, top is the normal image that has detected a human, and bottem is the depth camera. i have converted both images to a 5×10 array of pixels. When a human is recognized a black pixel is placed by their torso.
enter image description here
This is the code that deals with the sound generation:
def sound_generator_algorithm(retinal_encoded_image_cv2_format):
global color_image_cv2_format
if color_image_cv2_format is None:
rospy.logwarn("Color image format is None, skipping this frame.")
return
retinal_encoded_image_width = len(retinal_encoded_image_cv2_format[0])
retinal_encoded_image_height = len(retinal_encoded_image_cv2_format)
if not is_setup:
setup(retinal_encoded_image_cv2_format)
for row in range(retinal_encoded_image_height):
for column in range(retinal_encoded_image_width):
# Obtaining depth value
depth = retinal_encoded_image_cv2_format[row][column]
# Obtaining color value
color_value = color_image_cv2_format[row][column]
color_key = None
#print(color_value)
# Find the correct color key
for color in colors:
if (colors[color] == color_value).all():
#print('true')
color_key = color
break
#if not color_key == 'human_back':
#print(color_value)
#print(color_key)
# If color_key is not found, continue to next pixel
if color_key is None:
continue
# Muting all other colored sounds (except current)
for color in sound_sources[row][column]:
if color != color_key:
sound_sources[row][column][color].gain = 0.0
if np.isnan(depth) or (depth == 0.0) or (color_key != 'black'):
#if color_key=='black':
#print('black')
sound_sources[row][column][color_key].gain = 0.0
else:
if color_key == 'black':
#print('true')
sound_sources[row][column][color_key].gain = gain_scaled * 2
else:
sound_sources[row][column][color_key].gain = gain_scaled / 2
# Update pitch based on row
if row == 0:
sound_sources[row][column][color_key].pitch = 1.7
elif row == 1:
sound_sources[row][column][color_key].pitch = 1.3
elif row == 2:
sound_sources[row][column][color_key].pitch = 1.0
elif row == 3:
sound_sources[row][column][color_key].pitch = 0.7
elif row == 4:
sound_sources[row][column][color_key].pitch = 0.3
#print('here')
projected_min_depth = ssf_core.projected_pixel(unit_vector_map,
column,
row,
depth_camera_min_depth)[2]
x_scale = 4
y_scale = 1.0
z_scale = 1.3
z_power_scale = 2.0
depth = (projected_min_depth * z_scale) + (
((depth - projected_min_depth) * z_scale) ** (z_power_scale * 1.0))
projected_pixel = ssf_core.projected_pixel(unit_vector_map,
column,
row,
depth)
# Update the sound sources position based on the projected pixel
sound_sources[row][column][color_key].position = [projected_pixel[0] * x_scale,
projected_pixel[1],
-projected_pixel[2]]
soundsink.update()
The setup method just places a sound source on each pixel for each sound available, in this case its just black and background and background is always muted.
This is the original code which i have made changed to. Sound localization actually works with it
def sound_generator_algorithm(retinal_encoded_image_cv2_format):
retinal_encoded_image_width = len(retinal_encoded_image_cv2_format[0])
retinal_encoded_image_height = len(retinal_encoded_image_cv2_format)
# NOTE: PyAL uses the RHS coordinate system
# Hence, the horizontal extent of the monitor represents the x-axis,
# with right being positive. The vertical extent of the monitor represents
# the y-axis, with up being positive; and from ones eyes going into the
# monitor represents the positive z-axis.
if not is_setup:
setup(retinal_encoded_image_cv2_format)
for row in xrange(retinal_encoded_image_height):
for column in xrange(retinal_encoded_image_width):
# Obtaining depth value
depth = retinal_encoded_image_cv2_format[row][column]
# Obtaining color value
color_value = color_image_cv2_format[row][column]
color_key = None
# Muting all other colored sounds (except current)
for color in sound_sources[row][column]:
if (colors[color] != color_value).any():
sound_sources[row][column][color].gain = 0.0
else:
color_key = color
if np.isnan(depth) or (depth == 0.0) or (depth >= depth_camera_max_depth) or color_key == 'background':
sound_sources[row][column][color_key].gain = 0.0
else:
if color_key == 'background':
sound_sources[row][column][color_key].gain = gain_scaled / 2
else:
sound_sources[row][column][color_key].gain = gain_scaled * 2
# If the sound isn't muted, also update its pitch
# dependent on its y value (i.e. row)
# NOTE: Setting the pitch stretches or compresses the sound
# by the given value. For example, if the pitch of a
# 440Hz tone is set to 2.0, the tone played would be
# 2 * 440Hz = 880Hz
if row == 0:
sound_sources[row][column][color_key].pitch = 1.7
elif row == 1:
sound_sources[row][column][color_key].pitch = 1.3
elif row == 2:
sound_sources[row][column][color_key].pitch = 1.0
elif row == 3:
sound_sources[row][column][color_key].pitch = 0.7
elif row == 4:
sound_sources[row][column][color_key].pitch = 0.3
projected_min_depth = ssf_core.projected_pixel(unit_vector_map,
column,
row,
depth_camera_min_depth)[2]
x_scale = 4 # 2.5
y_scale = 1.0
z_scale = 1.3
# scales anything beyond the projected_min_depth,
# scaling it along the ray
z_power_scale = 2.0
# NOTE: only the depth is scaled, and then x, y and z are projected
# according to that depth.
depth = (projected_min_depth * z_scale) + (
((depth - projected_min_depth) * z_scale) ** (z_power_scale * 1.0))
projected_pixel = ssf_core.projected_pixel(unit_vector_map,
column,
row,
depth)
# Update the sound sources position based on the projected pixel
sound_sources[row][column][color_key].position = [projected_pixel[0] * x_scale,
projected_pixel[1],
-projected_pixel[2]]
soundsink.update()
I have attempted redoing it with minimal changes apart from the human detection to see if i accidentally altered something i was not supposed to, but still get the same result of the sound only being from the centre
BuddyJ101 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.