I’m trying to use REGEXP
in an MariaDB 11.0.2 query to extract all of the image source links within the HTML and text of the field post_content
that is in each row in wp_posts
. I’ve looked at related questions but still can’t get this to work.
My Regex works in a fiddle: https://regex101.com/r/CyuWhi/1
But using it in a query in Adminer 4.8.1:
SELECT post_content
FROM wp_posts
WHERE post_content REGEXP 'src=["|'](.*?)["|']';
returns the src links, but also other HTML and text. What I want is just a list of the image source links.
So, either my REGEXP
is wrong for Adminer, or the structure of my overall query is not working.
Any ideas?
Edit: Barmar makes the good point that I should use REGEXP_SUBSTR
. How would I do that?
The regex101 fiddle has sample row data and the regex. But I have thousands of rows in wp_posts
and each has a post_content
column that I want to search.
So would I use REGEXP_SUBSTR() with table data? I.e., example #4:
But as in the example, all I need are the src URLs in a list, not the year.
The screenshot is of Adminer; the regex selects the row, but not just the image source URLs. (Image links are out of sight to the right.)
7
Get rid of the ?
lazy modifier, which isn’t supported in MySQL 5.x regular expressions. It’s also not needed when just testing for a regexp match, it’s only needed when you need to determine the matching portion (as you might do with REGEXP_SUBSTR()
or REGEXP_REPLACE()
in MySQL 8.x).
SELECT post_content
FROM wp_posts
WHERE post_content REGEXP 'src=["'].*["']';
Note that MySQL doesn’t use the PCRE library, so selecting that engine at regex101 may produce misleading results.
4