I have written a series of macros to extract the sentences and words in a novel. The numbers of each word are counted, and for each word, an example sentence containing that word is provided. Sentence variety is optimized, meaning that whenever a word is found, if the previous sentence associated with that word had a frequency (total within the corpus) of n, the new sentence selected for that word will be selected if its frequency in the corpus is less than n.
The problem I am having is with fragments (things that are not proper sentences, such as titles).
What I want to do is count the words (as long as the words are not ALL CAPS — those are simply removed) in fragments and merge the fragments with the rest of the corpus.
Fragments are artificially assigned frequencies starting from 10001 (instead of 1) to get them out of the corpus as quickly as possible.
I have gone around in circles for a couple days. I can get the non-fragments counted correctly. But when I try to add in the fragments, I find that some of them not are flagged as fragments (frequencies of less than 10000), or their frequencies are not counted correctly.
I’m sorry if this description is confusing. It confuses even me. Please ask for clarification if needed.
Here is the code. I would really appreciate any help you can provide. I am aware that my code is not as streamlined as it could be. This is the only version that I could get to (nearly) work.
This code looks at a sheet titled “Raw_Data”. The A column is the word. It is blank if the word is all caps. The B column is the sentence.
There is always something in the B column, so the macro ends when it finds a blank in the B column.
The C column is “Fragment” if the B column content is a fragment, otherwise it is blank.
Any help would be greatly appreciated. Thank you very much, guys. Newbie coder. Apologies.
JAB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.