EDIT :
I’m… not too sure what I did, but I accidently solved my issue ?
While I was working on integrating comma support I decided to extract from the below function the conditions and put it in another one.
Also, because of the CLRF, I had to transform my text file from which the hashmap is constructed from dos to unix format (the file was originally downloaded from windows).
So maybe there was an issue with how lines were read ?
—
Let’s say you have the sentence ” I have apples “, and that you want to create its phonetic version. Here, that would be ” ay | h aah v | aah p uh l z “.
To do so, one could create an hashmap that would store each possible words and its respective phonetic. Then, to create your new sentence, you would just have to iterate through each word of the original sentence, get their phonetics using the hashmap, and concatenate each word’s phonetic back into a new string.
Of note, I’m using Wide_Strings because of the possibility of french accents.
This is what I’m trying to do. However, the step of concatenating ” each phonetic back into a new string ” doesn’t work. Indeed, instead of having the final string as I want it to be (that is to say, concatenated), it would seems that when I add phonetics to the Unbounded_Wide_String, it add them starting from the beginning of the string instead of its end, effectively overwritting previous characters.
My code is as follows,
The hashmap definition,
package Cmudict is new Ada.Containers.Indefinite_Hashed_Maps
(Key_Type => Wide_String, Element_Type => Wide_String,
Hash => Ada.Strings.Wide_Hash, Equivalent_Keys => "=");
The function,
function To_Phonems
(Sentence : S_WU.Unbounded_Wide_String) return Wide_String
is
dict : Cmudict.Map;
Phonems_Version : S_WU.Unbounded_Wide_String;
Index : Natural := 1;
F : Positive;
L : Natural;
Whitespace : constant S_WM.Wide_Character_Set := S_WM.To_Set (' ');
begin
Init_Cmudict (dict); -- add <Wide_String, Wide_String> pairs from a text file.
while Index in 1 .. S_WU.Length (Sentence) loop
S_WU.Find_Token
(Source => Sentence, Set => Whitespace, From => Index,
Test => Str.Outside, First => F, Last => L);
exit when L = 0;
declare
word : constant Wide_String := S_WU.Slice (Sentence, F, L);
begin
if dict.Contains (word) then
S_WU.Append (Phonems_Version, dict (word)); -- this is the problematic line.
else
W_IO.Put_Line
("Warning : '" & S_WU.Slice (Sentence, F, L) &
"' is not in the dictionnary. Consider adding it.");
end if;
end;
Index := L + 1;
end loop;
-- ...
end;
With my test sentence ” la science nous contraint à exploser le soleil ! ” [1], instead of obtaining ” ll aa | ss yy an ss | nn ou | kk on tt rr in | aa | ai kk ss pp ll oo zz ei | ll ee | ss oo ll ai yy ! ” (I added breaks so it’s easier to see each words); I have, ” !s oo ll ai yy oo zz ei ” which is exactly what you obtain if each word is overwitten by the next starting from the begining of the string.
I tried to check if that was an issue with the Append function by appening ” words ” instead of ” dict (word) “, but it works (it shows ” lasciencenouscontraintàexploserlesoleil! “).
I tried to see if I was adding wrong phonetics, but printing dict (word) shows no problem.
As Append works if I give it a Wide_String
, it would seem that dict (word) is not of type Wide_String
. And indeed, when trying to use Ada.Characters.Conversion.Is_Wide_String
, I have a compilation error because of its Ada.Containers.Indefinite_Hashed_Maps.Constant_Reference_Type
type. However, trying to cast it using Wide_String'
does not change anything.
I also thought that maybe it was the call itself, dict (word), which was the issue, and tried several others methods (storing dict (word) in a vector and then adding, using Iterate) but they all failed too.
Finally, I tried looking around, but a similar question didn’t seem to have been asked. It may be something trivial, but for the life of me, I can’t see what I’m doing wrong.
[1] In english, ” Science compels us to explode the sun ! “.
2