I have a situation where I have several text documents and I want to create a list that contains the words in those documents. But I also need to record several properties of each word. These properties are
- The index of the word within all the words in the documents (Integer)
- The word itself (String)
- The document that the word is in (Integer)
- The topic value associated with this word (Integer)
I can think of two ways of doing this. The first is simply creating a list of tuples of the form (word,doc,topic) where the word index is given by the index of the tuple in the list. My second idea is to create a word class where the given properties are member variables in the class. Then just create a list of objects from this class.
So my question is which is the best solution – the list of tuples or the list of word objects? And a related question is what are the situations when either of these approaches is preferable?
4
The answer to this question will depend also on how the words will be searched, will the search key be whole words or can they be incomplete words, will you search on basis of document etc. etc.
Also how are you planning to use the words and properties.
1