I am using RStudio. I am working with a data set on blue and red historical plaques in London. Some interesting columns I would like to look at include “title”, “gender” (options are male, female, object, place etc), “erection” (year when the plaque was erected), “subject_lead_primary_role” (their job title, usually 1-5 words) and “inscription” (the inscription on the plaque, multiple words).
I am interested in filtering a subset containing all writers from the original set. Using the columns like “subject_lead_primary_role” and “inscription” I want to filter our all rows that contain any or more of the following words: “author”, “writer”, “novelist”, “essayist”, “poet”, “playwright”, “journalist”, “dramatist”, and “diarist.” I want this subset to be free of NAs.
- I am importing the data from a csv file, set my wd to the folder that has the file and hope that’s enough to work?
- How do I create a subset that shows me only rows that contain the above, either one or more of the words.
- How do I tell R to filter for these words and mark all rows that contain one or more of them?
- How can I tell R to count the number of individual words (e.g. how often is the noun “writer” used on a plaque in comparison to “author”?).
- what are some good ways to visualise this? What are some good packages to have?
I am very new to R. I am able to import a dataset and perform basic visualisations on numerical data, but have not worked with strings before.
user25035995 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.