Suppose that I have some string of text or file.
We desire to print only the lines of text that contain a four digit year.
How do we do this?
RETAIN | The Prince and the Pauper (1990) |
DISCARD | Lorem ipsum dolor sit amet, consectetur adipiscing elit. |
RETAIN | Duck Tales the Movie: Treasure of the Lost Lamp (1990) |
DISCARD | Fun Fact of the Day: "ALCASEAR" means "to the king and sire" |
DISCARD | SML SML BIG BIG |
RETAIN | Shipwrecked (1990) |
DISCARD | Fun Fact of the Day: "HECHO" means "made an echo or made a copy" (When you delete the "H" from "HECHO" yo get "ECHO") |
DISCARD | BIG BIG SML BIG BIG SML BIG SML SML BIG BIG |
RETAIN | Beauty and the Beast (1991) |
DISCARD | Who"s there? |
DISCARD | Nobel. |
DISCARD | Nobel who? |
DISCARD | Nobel…that’s why I knocked! |
DISCARD | BIG BIG SML BIG SML BIG BIG BIG |
RETAIN | White Fang (1991) |
RETAIN | The Rocketeer (1991) |
DISCARD | BIG BIG SML BIG BIG SML BIG BIG BIG BIG BIG BIG |
RETAIN | Wild Hearts Can't Be Broken(1991) |
DISCARD | BIG BIG |
DISCARD | BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG SML BIG BIG SML BIG BIG |
RETAIN | Robin Williams,Jonathan Freeman,Gilbert Gottfried, Linda Larkin, Douglas Seale, Scott Weinger, and Frank Welker used to be famous actors, but anyone born after the year 2020 is unlikely to recognize their names. |
RETAIN | Aladdin (1992) |
DISCARD | BIG BIG |
DISCARD | BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG BIG SML BIG BIG SML BIG BIG |
DISCARD | Names of formerly famous people include Christian Bale, Marty Belafsky, Max Casella, Aaron Lohr, Dominic Maldonado,David Moscow,and Trey Parkerin. |
RETAIN | Newsies (1992) |
DISCARD | BIG BIG SML BIG BIG BIG BIG |
RETAIN | "The MightyDucks" (1992) was a fun movie to watch. |
DISCARD | BIG BIG |
DISCARD | BIG BIG BIG BIG BIG BIG SML BIG BIG SML BIG BIG BIG BIG BIG |
RETAIN | The Muppet Christmas Carol (1992) |
The following is a regular expression for a four digit year.
(?:^|[^d])([12][0-9][0-9][0-9])(?:$|[^d])
A test is shown here.
The following python script works well enough.
import re
# `re` is the `regular expression library`
text = """
APPLE
2024
KIWI
1995
ORANGE
1987
"""
regex_for_year = "(?:^|[^d])([12][0-9][0-9][0-9])(?:$|[^d])"
for line in text.split('n'):
result = re.findall(
regex_for_year,
line
)
has_year = len(result) > 0
if has_year:
print(line)