Hangman is a word-guessing game where the player tries to guess a hidden word letter by letter. Each correct guess reveals all instances of the letter in the word, while incorrect guesses reduce the player’s remaining attempts. The goal is to guess the word before running out of attempts. The length of the hidden word is variable and the maximum number of attempts is 5.
I have to come up with a model/algorithm that guesses a letter for a hidden string (eg. if the hidden word is “mathematics” and the input is _ a t h e _ a t i _ s, the model should ideally guess ‘m’ or ‘c’). The testing would be done by simulating hangman games starting from a completely hidden word (only underscores).
The training data is a list of words from the English language and the testing data will not include any of these given words.
Machine learning approaches are preferred but not compulsory.
I tried to train a BiLSTM model on data I generated by randomly dropping instances of letters from the given words and then output a softmax for 27 classes (1 for each letter of the alphabet and 1 for the padding toens) for each of the positions in the input string (including the words that have already been revealed). Since the maximum number of words in the training set was 32, I padded up the input strings to make it 32 characters. This did not give satisfactory results.
Also tried Byte-Pair Encoding (BPE) to extract relevant subwords the training vocabulary and then tried to guess words from the testing set as a combination of three words from the BPE vocabulary.
What else can I try? What kind of problem would this come under? Where can I read more about such problems?
Pranav Kumar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1