Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I have a large database which contains records that have <a> tags in them and I would like to remove them. Of course there is the method where I create a PHP script that selects all, uses strip_tags and updates the database, but this takes a long time. So how can I do this with a simple (or complicated) MySQL query?

MySQL >= 5.5 provides XML functions to solve your issue:

SELECT ExtractValue(field, '//text()') FROM table;

Reference: https://dev.mysql.com/doc/refman/5.5/en/xml-functions.html

This is the best answer for less antiquated MySQL/MariaDB versions. I would like to note that mariadb supports this as well. mariadb.com/kb/en/library/extractvalue – domdambrogia Aug 15, 2019 at 19:05 Will not work on non-html text, unlike PHP's strip_tags. Also gives warnings for badly formated html. – Laloutre Oct 22, 2019 at 3:19 I'd love to use this, but it doesn't work for any text that contains HTML tags that don't terminate, like <br> or <img src="" alt="">. – Ty. Jun 4, 2021 at 20:46
CREATE FUNCTION `strip_tags`($str text) RETURNS text
BEGIN
    DECLARE $start, $end INT DEFAULT 1;
        SET $start = LOCATE("<", $str, $start);
        IF (!$start) THEN RETURN $str; END IF;
        SET $end = LOCATE(">", $str, $start);
        IF (!$end) THEN SET $end = $start; END IF;
        SET $str = INSERT($str, $start, $end - $start + 1, "");
    END LOOP;

I made sure it removes mismatched opening brackets because they're dangerous, though it ignores any unpaired closing brackets because they're harmless.

mysql> select strip_tags('<span>hel<b>lo <a href="world">wo<>rld</a> <<x>again<.');
+----------------------------------------------------------------------+
| strip_tags('<span>hel<b>lo <a href="world">wo<>rld</a> <<x>again<.') |
+----------------------------------------------------------------------+
| hello world again.                                                   |
+----------------------------------------------------------------------+
1 row in set
                didnt work for me;  MySQL said: Documentation #1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 3
– mahen3d
                Nov 8, 2014 at 10:26
                delimiter //    CREATE FUNCTION strip_tags($str text) RETURNS text    BEGIN        DECLARE $start, $end INT DEFAULT 1;        LOOP            SET $start = LOCATE("<", $str, $start);            IF (!$start) THEN RETURN $str; END IF;            SET $end = LOCATE(">", $str, $start);            IF (!$end) THEN SET $end = $start; END IF;            SET $str = INSERT($str, $start, $end - $start + 1, "");        END LOOP;    END//    delimiter ;
– nzn
                Jan 5, 2015 at 20:06
                DROP FUNCTION IF EXISTS strip_tags;     DELIMITER |     CREATE FUNCTION strip_tags($str text) RETURNS text     BEGIN         DECLARE $start, $end INT DEFAULT 1;         LOOP             SET $start = LOCATE("<", $str, $start);             IF (!$start) THEN RETURN $str; END IF;             SET $end = LOCATE(">", $str, $start);             IF (!$end) THEN SET $end = $start; END IF;             SET $str = INSERT($str, $start, $end - $start + 1, "");         END LOOP;     END;     |     DELIMITER ;
– IRvanFauziE
                May 17, 2015 at 0:15
                even if I declare RETURNS text CHARSET utf8 the function returns question marks instead of accented characters, any clues please?
– ptica
                Jun 25, 2015 at 20:05

I don't believe there's any efficient way to do this in MySQL alone.

MySQL does have a REPLACE() function, but it can only replace constant strings, not patterns. You could possibly write a MySQL stored function to search for and replace tags, but at that point you're probably better off writing a PHP script to do the job. It might not be quite as fast, but it will probably be faster to write.

I am passing this code on, seems very similar to the above. Worked for me, hope it helps.

BEGIN
  DECLARE iStart, iEnd, iLength   INT;
  WHILE locate('<', Dirty) > 0 AND locate('>', Dirty, locate('<', Dirty)) > 0
    BEGIN
      SET iStart = locate('<', Dirty), iEnd = locate('>', Dirty, locate('<', Dirty));
      SET iLength = (iEnd - iStart) + 1;
      IF iLength > 0 THEN
        BEGIN
          SET Dirty = insert(Dirty, iStart, iLength, '');
      END IF;
  END WHILE;
  RETURN Dirty;
                I did a small rough benchmark on 5000 (~20mb) various plain text / html samples (scraped job descriptions). Output of your example is exactly the same as of Boann's, however your code took ~32s to process and Boann's just 7s making Boann's solution 4.5x faster. I'm just putting this here for future reference if someone will face the same dilemma as I did. Thanks to both of you guys.
– Dušan Brejka
                May 27, 2015 at 12:04

I just extended the answer @boann to allow targetting of any specific tag so that we can replace out the tags one by one with each function call. You just need pass the tag parameter, e.g. 'a' to replace out all opening/closing anchor tags. This answers the question asked by OP, unlike the accepted answer, which strips out ALL tags.

# MySQL function to programmatically replace out specified html tags from text/html fields
# run this to drop/update the stored function
DROP FUNCTION IF EXISTS `strip_tags`;
DELIMITER |
# function to nuke all opening and closing tags of type specified in argument 2
CREATE FUNCTION `strip_tags`($str text, $tag text) RETURNS text
BEGIN
    DECLARE $start, $end INT DEFAULT 1;
    SET $str = COALESCE($str, '');
        SET $start = LOCATE(CONCAT('<', $tag), $str, $start);
        IF (!$start) THEN RETURN $str; END IF;
        SET $end = LOCATE('>', $str, $start);
        IF (!$end) THEN SET $end = $start; END IF;
        SET $str = INSERT($str, $start, $end - $start + 1, '');
        SET $str = REPLACE($str, CONCAT('</', $tag, '>'), '');
    END LOOP;
| DELIMITER ;
# test select to nuke all opening <a> tags
SELECT 
    STRIP_TAGS(description, 'a') AS stripped
    tmpcat;
# run update query to replace out all <a> tags
UPDATE tmpcat
    description = STRIP_TAGS(description, 'a');
                Error in query (1418): This function has none of DETERMINISTIC, NO SQL, or READS SQL DATA in its declaration and binary logging is enabled (you might want to use the less safe log_bin_trust_function_creators variable)
– kklepper
                Sep 26, 2021 at 21:13
  

Also to note, you may want to put a SET $str = COALESCE($str, ''); just before the loop otherwise null values may cause a crash/never ending query. – Tom C Aug 17 at 9:51

I'm using the lib_mysqludf_preg library for this and a regex like this:

SELECT PREG_REPLACE('#<[^>]+>#',' ',cell) FROM table;

Also did it like this for rows which with encoded html entities:

SELECT PREG_REPLACE('#&lt;.+?&gt;#',' ',cell) FROM table;

There are probably cases where these might fail but I haven't encountered any and they're reasonably fast.

...and the not so subtle: (Converting strings into slugs)

 LOWER(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(TRIM(node.title), ':', ''), 'é', 'e'), ')', ''), '(', ''), ',', ''), '\\', ''), '\/', ''), '\"', ''), '?', ''), '\'', ''), '&', ''), '!', ''), '.', ''), '–', ''), ' ', '-'), '--', '-'), '--', '-'), '’', '')) as `post_name`
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.