How to Remove non-printable and non-ASCII characters from WordPress posts

Having non-printable and non-ASCII characters in your WordPress blog posts or Woocommerce product pages may present a serious problem. Your blog may generate a garbled output, certain PHP functions may not work properly. Also, operations like caching or static page export may not generate your pages correctly. In addition, if you need to export your statically generated pages to online cloud storage like AWS S3, the upload may generate errors because S3 does not accept non-ASCII characters in filenames. In order to remove non-printable and non-ACSII characters from your WordPress blog post titles, descriptions and excerpts we will access PhpMyAdmin utility and run several MySQL commands on WordPress tables. So let us get started.

1. Install and launch PhpMYAdmin utility

If you do not have PhpMyAdmin already install, run this command at your Linux prompt:

sudo apt-get install -y phpmyadmin 

You will be prompted to choose apache2 vs lighttpd, configure database, pick and confirm a password. Once installed PhpMyAdmin utility will be accessible at yourdomain.com/phpmyadmin

2. Access your WordPress database and locate table called wp_posts

Login yourdomain.com/phpmyadmin. Click on your WordPress Database on the left side of the screen and click on table wp_posts on the right side of the screen:

3. Run MySQL query to find all posts containing Non-ASCII characters

Click on wp_posts table and then click on SQL tab on your PhpMyAdmin screen. You will be presented with a ” Run SQL query/queries on table…” text window. Paste the following command and hit “Go”.

SELECT * FROM wp_posts WHERE NOT HEX(post_title) REGEXP '^([0-7][0-9A-F])*$'

This command will find all your posts with non-ASCII characters in their titles. If you would like to check the descriptions and excerpts as well, then run these two commands.

SELECT * FROM wp_posts WHERE NOT HEX(post_content) REGEXP '^([0-7][0-9A-F])*$'
SELECT * FROM wp_posts WHERE NOT HEX(post_excerpt) REGEXP '^([0-7][0-9A-F])*$'

These queries should be run separately and each one will return all posts with non-printable characters in posts/product descriptions AND excerpts respectively. Once you have an idea of how many problematic posts you have, you can move to the cleanup stage.

3. Cleanup non-ASCII characters from posts or product descriptions

Run the following three lines of code separately in MySQL query window for your WordPress database.

This code will cleanup non-ASCII characters from all your post titles or product names:

SET post_title = CONVERT(post_title USING ASCII);

This code will cleanup non-ASCII characters from all your post articles or product descriptions:

SET post_content = CONVERT(post_content USING ASCII);

This code will cleanup non-ASCII characters from all your post excerpts:

SET post_excerpt = CONVERT(post_excerpt USING ASCII);

Once the cleanup is completed, you may want to refresh your WordPress blog cache and/or re-generate static pages in order to see the results.