Migrating from Drupal 6 to WordPress: Part 2 – Content, Tags, Links and Images
This is second in a series of posts about migrating from Drupal 6 to WordPress. The first post tackled installing WordPress and some basic security options.
This time it’s a gnarly bunch of SQL for converting content out of Drupal 6 into WordPress version 3.7. It should also work with the latest version (at time of writing WordPress 3.8 Parker).
Thankfully, despite the scant notes on WordPress’ Codex, there’s some great reference material for the Drupal to WordPress migration. Huge thank to Scott Anderson, Mike Smullin and Nile (see references).
Convert from Drupal to WordPress
I used Scott Anderson’s SQL as the basis for the transfer of content from Drupal to WordPress.
First up, some assumptions:
- Some basic SQL knowledge will be a big help, some of the lines below will need to be updated to fit your site (there’s a great primer here).
- The WordPress database is called wordpress with a table prefix of wp_ (think these are the defaults)
- The Drupal database is called drupal with no table prefix
- Both databases are on the same server and accessible using the same database user
and the big assumption:
This is a fresh install and you’re moving all of the content from the original Drupal site to the new WordPress one. Merging content is a whole different beast and well outside the scope of this post.
Clear Out WordPress
Assuming that it’s a case of moving everything lock, stock and barrel. Any content in the WordPress database needs to be zapped:
# Empty previous content from WordPress database. TRUNCATE TABLE wordpress.wp_comments; TRUNCATE TABLE wordpress.wp_links; TRUNCATE TABLE wordpress.wp_postmeta; TRUNCATE TABLE wordpress.wp_posts; TRUNCATE TABLE wordpress.wp_term_relationships; TRUNCATE TABLE wordpress.wp_term_taxonomy; TRUNCATE TABLE wordpress.wp_terms;
In this case there’s only one author, but if migrating multiple authors, there are options in the original script to handle this.
Next, it’s time to migrate the tags from Drupal, using the SQL below. WordPress is a bit fussier than Drupal about tags, so they need to be forced to lowercase and spaces
stores tags in the database in lower case, Drupal doesn’t care, so some munging is required.
# Using REPLACE prevents script from breaking if Drupal contains duplicate terms. REPLACE INTO wordpress.wp_terms (term_id, `name`, slug, term_group) SELECT DISTINCT d.tid, d.name, REPLACE(LOWER(d.name), ' ', '_'), 0 FROM drupal.term_data d INNER JOIN drupal.term_hierarchy h USING(tid) INNER JOIN drupal.term_node n USING(tid) WHERE (1 # This helps eliminate spam tags from import; uncomment if necessary. # AND LENGTH(d.name) < 50 ) ; INSERT INTO wordpress.wp_term_taxonomy (term_id, taxonomy, description, parent) SELECT DISTINCT d.tid `term_id`, 'post_tag' `taxonomy`, d.description `description`, h.parent `parent` FROM drupal.term_data d INNER JOIN drupal.term_hierarchy h USING(tid) INNER JOIN drupal.term_node n USING(tid) WHERE (1 # This helps eliminate spam tags from import; uncomment if necessary. # AND LENGTH(d.name) < 50 ) ;
Posts, Pages & Tags
The next part is properly magic, transforming Drupal content into posts that WordPress can handle. Watch out for the line:
WHERE n.type IN ('post', 'page', 'blog')
Add in any other content types that are used in the Drupal installation. For example, older installations, often use a story content type. If so, alter the above line to:
WHERE n.type IN ('post', 'page', 'blog', 'story')
the main code below does the magic and also keeps any unpublished posts hidden.
# Keeps private posts hidden. INSERT INTO wordpress.wp_posts (id, post_author, post_date, post_content, post_title, post_excerpt, post_name, post_modified, post_type, `post_status`) SELECT DISTINCT n.nid `id`, n.uid `post_author`, FROM_UNIXTIME(n.created) `post_date`, r.body `post_content`, n.title `post_title`, r.teaser `post_excerpt`, IF(SUBSTR(a.dst, 11, 1) = '/', SUBSTR(a.dst, 12), a.dst) `post_name`, FROM_UNIXTIME(n.changed) `post_modified`, n.type `post_type`, IF(n.status = 1, 'publish', 'private') `post_status` FROM drupal.node n INNER JOIN drupal.node_revisions r USING(vid) LEFT OUTER JOIN drupal.url_alias a ON a.src = CONCAT('node/', n.nid) # Add more Drupal content types below if applicable. WHERE n.type IN ('post', 'page', 'blog') ;
The next part sorts out all the posts in WordPress. N.B. Don’t stick pages in here, WordPress and Drupal both post/content types of pages. Mapping more complex post types is well outside the scope of this post, sorry.
# Fix post type; http://www.mikesmullin.com/development/migrate-convert-import-drupal-5-to-wordpress-27/#comment-17826 # Add more Drupal content types below if applicable. UPDATE wordpress.wp_posts SET post_type = 'post' WHERE post_type IN ('blog') ;
The next chunk of code sets all the pages to pending – if the page structure is staying the same between the old and new sites, this can safely be ignored.
# If you're keeping the same page structure from Drupal, comment out this query # and the new page INSERT at the end of this script. UPDATE wordpress.wp_posts SET post_status = 'pending' WHERE post_type = 'page';
Next up, the tags and posts that have been imported need to be connected:
# POST/TAG RELATIONSHIPS INSERT INTO wordpress.wp_term_relationships (object_id, term_taxonomy_id) SELECT DISTINCT nid, tid FROM drupal.term_node ; # Update tag counts. UPDATE wp_term_taxonomy tt SET `count` = ( SELECT COUNT(tr.object_id) FROM wp_term_relationships tr WHERE tr.term_taxonomy_id = tt.term_taxonomy_id ) ; # Fix taxonomy; http://www.mikesmullin.com/development/migrate-convert-import-drupal-5-to-wordpress-27/#comment-27140 UPDATE IGNORE wordpress.wp_term_relationships, wordpress.wp_term_taxonomy SET wordpress.wp_term_relationships.term_taxonomy_id = wordpress.wp_term_taxonomy.term_taxonomy_id WHERE wordpress.wp_term_relationships.term_taxonomy_id = wordpress.wp_term_taxonomy.term_id ;
If the Drupal installation has content types that include image or video fields, they’ll need to be handled separately. Scott Anderson’s SQL has code that’ll handle this. Files and images that are linked into the posts are covered below.
Next, a little more magic to transfer the comments to WordPress. Important note: with popularity comes downsides, especially comment spammers. Make sure an anti-spam tool like Akismet is installed!
# Keeps unapproved comments hidden. # Incorporates change noted here: http://www.mikesmullin.com/development/migrate-convert-import-drupal-5-to-wordpress-27/#comment-32169 INSERT INTO wordpress.wp_comments (comment_post_ID, comment_date, comment_content, comment_parent, comment_author, comment_author_email, comment_author_url, comment_approved) SELECT DISTINCT nid, FROM_UNIXTIME(timestamp), comment, thread, name, mail, homepage, ((status + 1) % 2) FROM drupal.comments ; # Update comments count on wp_posts table. UPDATE wordpress.wp_posts SET `comment_count` = ( SELECT COUNT(`comment_post_id`) FROM wordpress.wp_comments WHERE wordpress.wp_posts.`id` = wordpress.wp_comments.`comment_post_id` ) ;
Files & Images
By default, Drupal and WordPress store images in different places. Be default Drupal sticks them in a /files directory whilst WordPress uses wp-content/uploads with sub-folders for years and months.
The easiest solution is to:
- Copy all the Drupal files directory into the wp-content/uploads directory, assuming there’s no conflicts
- Add an alias to your web server configuration so that the images/files still work (see example below)
Alias /drupal-files /path/to/wordpress-install/wp-content/drupal-files
If you don’t want to tinker with the web server and update the links in the posts/pages directly, you can use the following SQL to achieve the same effect:
# Fix images in post content; uncomment if you're moving files from "files" to "wp-content/uploads". # UPDATE wordpress.wp_posts SET post_content = REPLACE(post_content, '"/files/', '"/wp-content/uploads/files/');
Drupal’s super flexible approach to tags means this part of the migration may need some jiggery pokery which’ll have to be done by hand. WordPress stores all of its tags and categories in the same database table wp_terms, then uses another table wp_term_taxonomy to define whether it’s a tag or category.
So far, the code above just dumps all of the tags out of Drupal into WordPress, it doesn’t care about how they were organised. If you’re anything like me, chances are one of the taxonomies you created was used to categories blog posts.
Now it’s time to pull out tags to use as categories. Here’s the process:
- Manually create the categories in WordPress (can be done in SQL – see below – or using the web interface: Admin > Posts > Categories)
- Update any tags that already exist, WordPress usually capitalises the first letter of a category
- Set specific tags as categories
- Associate blog posts and other content with these categories – this will differ significantly from site-to-site, but the basic logic is listed below
Manually Creating WordPress Categories
Firstly, choose which tags are going to be your categories. The SQL below will insert them, but ignore any that already exist. In this case I’m just using 3 as examples, you can add as many as you’d like:
# Manually create categories in WordPress INSERT IGNORE INTO wordpress.wp_terms (name, slug) VALUES ('Marketing', 'marketing'), ('Social Media', 'social-media'), ('Geekery', 'geekery') ;
Remember, the slug is the name that WordPress will use when creating the URL for the categories. It mustn’t contain spaces or / characters.
Update Existing Tags
Let’s say we have a couple of existing tags that we’d like to turn into categories, by convention WordPress capitalises the first letter of the tag (although it’s not compulsory). This SQL tidies up the existing tags – you’ll need to edit this to include your own:
# Update tags from Drupal into WordPress categories UPDATE wordpress.wp_terms SET name = 'Internal Communications' WHERE name = 'internal comms'; UPDATE wordpress.wp_terms SET name = 'Extreme Baking' WHERE name = 'extreme baking'; UPDATE wordpress.wp_terms SET name = 'Rock & Roll Accountancy' WHERE name = 'grooving numbers';
Turn Tags Into Categories
Next, we need to set these as categories in WordPress. The SQL below uses the slug (the machine readable name, remember no spaces or /) and includes the new categories and the ones we’ve just updated:
# Add categories to taxonomy in WordPress INSERT INTO wordpress.wp_term_taxonomy (term_id, taxonomy) VALUES ((SELECT term_id FROM wp_terms WHERE slug = 'marketing'), 'category'), ((SELECT term_id FROM wp_terms WHERE slug = 'social-media'), 'category'), ((SELECT term_id FROM wp_terms WHERE slug = 'geekery'), 'category'), ((SELECT term_id FROM wp_terms WHERE slug = 'internal-comms'), 'category'), ((SELECT term_id FROM wp_terms WHERE slug = 'extreme-baking'), 'category'), ((SELECT term_id FROM wp_terms WHERE slug = 'grooving-numbers'), 'category') ;
Add Content to Categories
And now the tricky part – assigning the imported content to those categories.
The SQL uses two pieces of logic to handle this:
- content contains a specific word or phrase
- and/or that it was tagged in Drupal with one or more terms
# Auto-assign posts to category. # You'll need to work out your own logic to determine strings/terms to match. # Repeat this block as needed for each category you're creating. INSERT IGNORE INTO wordpress.wp_term_relationships (object_id, term_taxonomy_id) SELECT DISTINCT p.ID AS object_id, (SELECT tt.term_taxonomy_id FROM wordpress.wp_term_taxonomy tt INNER JOIN wordpress.wp_terms t USING (term_id) WHERE t.slug = 'enter-category-slug-here' AND tt.taxonomy = 'category') AS term_taxonomy_id FROM wordpress.wp_posts p WHERE p.post_content LIKE '%enter string to match here%' OR p.ID IN ( SELECT tr.object_id FROM wordpress.wp_term_taxonomy tt INNER JOIN wordpress.wp_terms t USING (term_id) INNER JOIN wordpress.wp_term_relationships tr USING (term_taxonomy_id) WHERE t.slug IN ('enter','terms','to','match','here') AND tt.taxonomy = 'post_tag' ) ;
Pay paticular attention to the highlighted lines (9, 12, 18):
- Line 9 – add the WordPress slug for the category defined above e.g. geekery
- Line 12 – change to any text you’d like to match from the posts e.g. Drupal setup. N.B. This can also be set to ‘%%’ which will match everything
- Line 18 – one or more Drupal tags to match e.g. if we used geekery in the original Drupal site, add ‘geekery’ here as a tag, remember this is the tag from Drupal not WordPress
Here’s the generic code from above modified with those settings:
# Auto-assign posts to category. # You'll need to work out your own logic to determine strings/terms to match. # Repeat this block as needed for each category you're creating. INSERT IGNORE INTO wordpress.wp_term_relationships (object_id, term_taxonomy_id) SELECT DISTINCT p.ID AS object_id, (SELECT tt.term_taxonomy_id FROM wordpress.wp_term_taxonomy tt INNER JOIN wordpress.wp_terms t USING (term_id) WHERE t.slug = 'geekery' AND tt.taxonomy = 'category') AS term_taxonomy_id FROM wordpress.wp_posts p WHERE p.post_content LIKE '%Drupal setup%' OR p.ID IN ( SELECT tr.object_id FROM wordpress.wp_term_taxonomy tt INNER JOIN wordpress.wp_terms t USING (term_id) INNER JOIN wordpress.wp_term_relationships tr USING (term_taxonomy_id) WHERE t.slug IN ('geekery') AND tt.taxonomy = 'post_tag' ) ;
This will need to be repeated for each of the categories created in WordPress.
WARNING: This is pretty hard to undo if you get the logic wrong – I ended up starting over twice until I realised you can easily test your logic by deleting the first line (starting INSERT) and running the SQL. This’ll show which posts will be affected.
Finally for the tags, the category counts need to be updated.
# Update category counts. UPDATE wp_term_taxonomy tt SET `count` = ( SELECT COUNT(tr.object_id) FROM wp_term_relationships tr WHERE tr.term_taxonomy_id = tt.term_taxonomy_id ) ;
And breathe. The tags are probably the fiddliest bit of the whole transfer.
Creating Redirect for Old Posts to New URLs
WordPress and Drupal use different patterns for permalinks and part of the migration process will have altered the permalinks that WordPress uses to show the posts. Fortunately the node id from Drupal is also the post id in WordPress.
Redirects for Posts
Using a bit of SQL, we can build a list of redirects that can be added to the web server configuration. For search engine optimisation this is important otherwise all the work done getting your posts good profile on Google will start again from scratch.
First, make sure that no / are left in the URLs otherwise they won’t show up in WordPress at all. This should have been covered in the main import of posts above, but a few slipped through when I tried it.
# Double check all slashes are removed from permalinks UPDATE wordpress.wp_posts SET post_name = REPLACE(post_name, ‘/’, ‘-’);
The SQL below builds the redirect statements which’ll then need to be added to the web server configuration:
# Create Apache redirects SELECT CONCAT ("Redirect permanent /", dst, " /?p=", RIGHT(src, LENGTH(src)-5)) FROM `url_alias` WHERE src REGEXP '^node' AND RIGHT(src, LENGTH(src)-5) IN ( SELECT nid FROM node WHERE type IN ('story', 'blog') );
The Drupal content types can be updated or changed by altering line 8 in the code above.
Redirects for Tags
Tags in Drupal use – as a separator for spaces, WordPress uses _, so we’ll also need to create a bunch of redirects for the tag pages. This may need to be adjusted depending on the structure of the Drupal site.
In this instance:
- old site was: /tags/tag-name
- new site should be: /tag/tag_name
This SQL will do the magic:
# Create Apache redirects for tags SELECT DISTINCT CONCAT("Redirect permanent /tags/", REPLACE(LOWER(d.name), ' ', '-'), " /tag/", REPLACE(LOWER(d.name), ' ', '_')) FROM drupal.term_data d INNER JOIN drupal.term_hierarchy h USING(tid) INNER JOIN drupal.term_node n USING(tid) ;
If you have different URL structures for your tag links, adjust lines 3 and 5 accordingly.
The lines generated from the two SQL statements above can be added to the top of the .htaccess file and bingo, the search engines should pick start picking up the new links and indexing them.
Job done…That wasn’t so bad, was it? OK, it’s pretty industrial but hopefully everything worked out OK. Now it’s time to add some plugins to make it work better!