Pouring Iron by zman z28

Migrating from Drupal 6 to WordPress: Part 2 – Content, Tags, Links and Images

This is second in a series of posts about migrating from Drupal 6 to WordPress. The first post tackled installing WordPress and some basic security options.

This time it’s a gnarly bunch of SQL for converting content out of Drupal 6 into WordPress version 3.7. It should also work with the latest version (at time of writing WordPress 3.8 Parker).

Thankfully, despite the scant notes on WordPress’ Codex, there’s some great reference material for the Drupal to WordPress migration. Huge thank to Scott AndersonMike Smullin and Nile (see references).

Convert from Drupal to WordPress

I used Scott Anderson’s SQL as the basis for the transfer of content from Drupal to WordPress.

First up, some assumptions:

  • Some basic SQL knowledge will be a big help, some of the lines below will need to be updated to fit your site (there’s a great primer here).
  • The WordPress database is called wordpress with a table prefix of wp_ (think these are the defaults)
  • The Drupal database is called drupal with no table prefix
  • Both databases are on the same server and accessible using the same database user

and the big assumption:

This is a fresh install and you’re moving all of the content from the original Drupal site to the new WordPress one. Merging content is a whole different beast and well outside the scope of this post.

Clear Out WordPress

Assuming that it’s a case of moving everything lock, stock and barrel. Any content in the WordPress database needs to be zapped:

# Empty previous content from WordPress database.
TRUNCATE TABLE wordpress.wp_comments;
TRUNCATE TABLE wordpress.wp_links;
TRUNCATE TABLE wordpress.wp_postmeta;
TRUNCATE TABLE wordpress.wp_posts;
TRUNCATE TABLE wordpress.wp_term_relationships;
TRUNCATE TABLE wordpress.wp_term_taxonomy;
TRUNCATE TABLE wordpress.wp_terms;

In this case there’s only one author, but if migrating multiple authors, there are options in the original script to handle this.

Tags

Next, it’s time to migrate the tags from Drupal, using the SQL below. WordPress is a bit fussier than Drupal about tags, so they need to be forced to lowercase and spaces

stores tags in the database in lower case, Drupal doesn’t care, so some munging is required.

# Using REPLACE prevents script from breaking if Drupal contains duplicate terms.

REPLACE INTO wordpress.wp_terms
   (term_id, `name`, slug, term_group)
   SELECT DISTINCT
       d.tid, d.name, REPLACE(LOWER(d.name), ' ', '_'), 0
   FROM drupal.term_data d
   INNER JOIN drupal.term_hierarchy h
       USING(tid)
   INNER JOIN drupal.term_node n
       USING(tid)
   WHERE (1
        # This helps eliminate spam tags from import; uncomment if necessary.
        # AND LENGTH(d.name) < 50
   )
;

INSERT INTO wordpress.wp_term_taxonomy
   (term_id, taxonomy, description, parent)
   SELECT DISTINCT
       d.tid `term_id`,
       'post_tag' `taxonomy`,
       d.description `description`,
       h.parent `parent`
   FROM drupal.term_data d
   INNER JOIN drupal.term_hierarchy h
       USING(tid)
   INNER JOIN drupal.term_node n
       USING(tid)
   WHERE (1
        # This helps eliminate spam tags from import; uncomment if necessary.
        # AND LENGTH(d.name) < 50
   )
;

Posts, Pages & Tags

The next part is properly magic, transforming Drupal content into posts that WordPress can handle. Watch out for the line:

WHERE n.type IN ('post', 'page', 'blog')

Add in any other content types that are used in the Drupal installation. For example, older installations, often use a story content type. If so, alter the above line to:

WHERE n.type IN ('post', 'page', 'blog', 'story')

the main code below does the magic and also keeps any unpublished posts hidden.

# Keeps private posts hidden.
INSERT INTO wordpress.wp_posts
   (id, post_author, post_date, post_content, post_title, post_excerpt,
   post_name, post_modified, post_type, `post_status`)
   SELECT DISTINCT
       n.nid `id`,
       n.uid `post_author`,
       FROM_UNIXTIME(n.created) `post_date`,
       r.body `post_content`,
       n.title `post_title`,
       r.teaser `post_excerpt`,
       IF(SUBSTR(a.dst, 11, 1) = '/', SUBSTR(a.dst, 12), a.dst) `post_name`,
       FROM_UNIXTIME(n.changed) `post_modified`,
       n.type `post_type`,
       IF(n.status = 1, 'publish', 'private') `post_status`
   FROM drupal.node n
   INNER JOIN drupal.node_revisions r
       USING(vid)
   LEFT OUTER JOIN drupal.url_alias a
       ON a.src = CONCAT('node/', n.nid)
   # Add more Drupal content types below if applicable.
   WHERE n.type IN ('post', 'page', 'blog')
;

The next part sorts out all the posts in WordPress. N.B. Don’t stick pages in here, WordPress and Drupal both post/content types of pages. Mapping more complex post types is well outside the scope of this post, sorry.

# Fix post type; http://www.mikesmullin.com/development/migrate-convert-import-drupal-5-to-wordpress-27/#comment-17826

# Add more Drupal content types below if applicable.
UPDATE wordpress.wp_posts
   SET post_type = 'post'
   WHERE post_type IN ('blog')
;

The next chunk of code sets all the pages to pending – if the page structure is staying the same between the old and new sites, this can safely be ignored.

# If you're keeping the same page structure from Drupal, comment out this query
# and the new page INSERT at the end of this script.
UPDATE wordpress.wp_posts SET post_status = 'pending' WHERE post_type = 'page';

Next up, the tags and posts that have been imported need to be connected:

# POST/TAG RELATIONSHIPS
INSERT INTO wordpress.wp_term_relationships (object_id, term_taxonomy_id)
   SELECT DISTINCT nid, tid FROM drupal.term_node
;
# Update tag counts.
UPDATE wp_term_taxonomy tt
   SET `count` = (
       SELECT COUNT(tr.object_id)
       FROM wp_term_relationships tr
       WHERE tr.term_taxonomy_id = tt.term_taxonomy_id
   )
;
# Fix taxonomy; http://www.mikesmullin.com/development/migrate-convert-import-drupal-5-to-wordpress-27/#comment-27140
UPDATE IGNORE wordpress.wp_term_relationships, wordpress.wp_term_taxonomy
   SET wordpress.wp_term_relationships.term_taxonomy_id = wordpress.wp_term_taxonomy.term_taxonomy_id
   WHERE wordpress.wp_term_relationships.term_taxonomy_id = wordpress.wp_term_taxonomy.term_id
;

If the Drupal installation has content types that include image or video fields, they’ll need to be handled separately. Scott Anderson’s SQL has code that’ll handle this. Files and images that are linked into the posts are covered below.

Comments

Next, a little more magic to transfer the comments to WordPress. Important note: with popularity comes downsides, especially comment spammers. Make sure an anti-spam tool like Akismet is installed!

# Keeps unapproved comments hidden.
# Incorporates change noted here: http://www.mikesmullin.com/development/migrate-convert-import-drupal-5-to-wordpress-27/#comment-32169
INSERT INTO wordpress.wp_comments
   (comment_post_ID, comment_date, comment_content, comment_parent, comment_author,
   comment_author_email, comment_author_url, comment_approved)
   SELECT DISTINCT
       nid, FROM_UNIXTIME(timestamp), comment, thread, name,
       mail, homepage, ((status + 1) % 2)
   FROM drupal.comments
;
# Update comments count on wp_posts table.
UPDATE wordpress.wp_posts
   SET `comment_count` = (
       SELECT COUNT(`comment_post_id`)
       FROM wordpress.wp_comments
       WHERE wordpress.wp_posts.`id` = wordpress.wp_comments.`comment_post_id`
   )
;

Files & Images

By default, Drupal and WordPress store images in different places. Be default Drupal sticks them in a /files directory whilst WordPress uses wp-content/uploads with sub-folders for years and months.

The easiest solution is to:

  • Copy all the Drupal files directory into the wp-content/uploads directory, assuming there’s no conflicts
  • Add an alias to your web server configuration so that the images/files still work (see example below)
Alias /drupal-files /path/to/wordpress-install/wp-content/drupal-files

If you don’t want to tinker with the web server and update the links in the posts/pages directly, you can use the following SQL to achieve the same effect:

# Fix images in post content; uncomment if you're moving files from "files" to "wp-content/uploads".
# UPDATE wordpress.wp_posts SET post_content = REPLACE(post_content, '"/files/', '"/wp-content/uploads/files/');

Categories

Drupal’s super flexible approach to tags means this part of the migration may need some jiggery pokery which’ll have to be done by hand. WordPress stores all of its tags and categories in the same database table wp_terms, then uses another table wp_term_taxonomy to define whether it’s a tag or category.

So far, the code above just dumps all of the tags out of Drupal into WordPress, it doesn’t care about how they were organised. If you’re anything like me, chances are one of the taxonomies you created was used to categories blog posts.

Now it’s time to pull out tags to use as categories. Here’s the process:

  1. Manually create the categories in WordPress (can be done in SQL – see below – or using the web interface: Admin > Posts > Categories)
  2. Update any tags that already exist, WordPress usually capitalises the first letter of a category
  3. Set specific tags as categories
  4. Associate blog posts and other content with these categories – this will differ significantly from site-to-site, but the basic logic is listed below

Manually Creating WordPress Categories

Firstly, choose which tags are going to be your categories. The SQL below will insert them, but ignore any that already exist. In this case I’m just using 3 as examples, you can add as many as you’d like:

# Manually create categories in WordPress
INSERT IGNORE INTO wordpress.wp_terms (name, slug)
   VALUES
   ('Marketing', 'marketing'),
   ('Social Media', 'social-media'),
   ('Geekery', 'geekery')
;

Remember, the slug is the name that WordPress will use when creating the URL for the categories. It mustn’t contain spaces or / characters.

Update Existing Tags

Let’s say we have a couple of existing tags that we’d like to turn into categories, by convention WordPress capitalises the first letter of the tag (although it’s not compulsory). This SQL tidies up the existing tags – you’ll need to edit this to include your own:

# Update tags from Drupal into WordPress categories
UPDATE wordpress.wp_terms SET name = 'Internal Communications' WHERE name = 'internal comms';
UPDATE wordpress.wp_terms SET name = 'Extreme Baking' WHERE name = 'extreme baking';
UPDATE wordpress.wp_terms SET name = 'Rock & Roll Accountancy' WHERE name = 'grooving numbers';

Turn Tags Into Categories

Next, we need to set these as categories in WordPress. The SQL below uses the slug (the machine readable name, remember no spaces or /) and includes the new categories and the ones we’ve just updated:

# Add categories to taxonomy in WordPress
INSERT INTO wordpress.wp_term_taxonomy (term_id, taxonomy)
	VALUES
	((SELECT term_id FROM wp_terms WHERE slug = 'marketing'), 'category'),
	((SELECT term_id FROM wp_terms WHERE slug = 'social-media'), 'category'),
	((SELECT term_id FROM wp_terms WHERE slug = 'geekery'), 'category'),
	((SELECT term_id FROM wp_terms WHERE slug = 'internal-comms'), 'category'),
	((SELECT term_id FROM wp_terms WHERE slug = 'extreme-baking'), 'category'),
	((SELECT term_id FROM wp_terms WHERE slug = 'grooving-numbers'), 'category')
;

Add Content to Categories

And now the tricky part – assigning the imported content to those categories.

The SQL uses two pieces of logic to handle this:

  1. content contains a specific word or phrase
  2. and/or that it was tagged in Drupal with one or more terms
# Auto-assign posts to category.
# You'll need to work out your own logic to determine strings/terms to match.
# Repeat this block as needed for each category you're creating.
INSERT IGNORE INTO wordpress.wp_term_relationships (object_id, term_taxonomy_id)
     SELECT DISTINCT p.ID AS object_id,
          (SELECT tt.term_taxonomy_id
          FROM wordpress.wp_term_taxonomy tt
          INNER JOIN wordpress.wp_terms t USING (term_id)
          WHERE t.slug = 'enter-category-slug-here'
          AND tt.taxonomy = 'category') AS term_taxonomy_id
     FROM wordpress.wp_posts p
     WHERE p.post_content LIKE '%enter string to match here%'
     OR p.ID IN (
          SELECT tr.object_id
          FROM wordpress.wp_term_taxonomy tt
          INNER JOIN wordpress.wp_terms t USING (term_id)
          INNER JOIN wordpress.wp_term_relationships tr USING (term_taxonomy_id)
          WHERE t.slug IN ('enter','terms','to','match','here')
          AND tt.taxonomy = 'post_tag'
     )
;

Pay paticular attention to the highlighted lines (9, 12, 18):

  • Line 9 – add the WordPress slug for the category defined above e.g. geekery
  • Line 12 – change to any text you’d like to match from the posts e.g. Drupal setup. N.B. This can also be set to ‘%%’ which will match everything
  • Line 18 – one or more Drupal tags to match e.g. if we used geekery in the original Drupal site, add ‘geekery’ here as a tag, remember this is the tag from Drupal not WordPress

Here’s the generic code from above modified with those settings:

# Auto-assign posts to category.
# You'll need to work out your own logic to determine strings/terms to match.
# Repeat this block as needed for each category you're creating.
INSERT IGNORE INTO wordpress.wp_term_relationships (object_id, term_taxonomy_id)
     SELECT DISTINCT p.ID AS object_id,
          (SELECT tt.term_taxonomy_id
          FROM wordpress.wp_term_taxonomy tt
          INNER JOIN wordpress.wp_terms t USING (term_id)
          WHERE t.slug = 'geekery'
          AND tt.taxonomy = 'category') AS term_taxonomy_id
     FROM wordpress.wp_posts p
     WHERE p.post_content LIKE '%Drupal setup%'
     OR p.ID IN (
          SELECT tr.object_id
          FROM wordpress.wp_term_taxonomy tt
          INNER JOIN wordpress.wp_terms t USING (term_id)
          INNER JOIN wordpress.wp_term_relationships tr USING (term_taxonomy_id)
          WHERE t.slug IN ('geekery')
          AND tt.taxonomy = 'post_tag'
     )
;

This will need to be repeated for each of the categories created in WordPress. 

WARNING: This is pretty hard to undo if you get the logic wrong – I ended up starting over twice until I realised you can easily test your logic by deleting the first line (starting INSERT) and running the SQL. This’ll show which posts will be affected.

Finally for the tags, the category counts need to be updated.

# Update category counts.
UPDATE wp_term_taxonomy tt
     SET `count` = (
          SELECT COUNT(tr.object_id)
          FROM wp_term_relationships tr
          WHERE tr.term_taxonomy_id = tt.term_taxonomy_id
     )
;

And breathe. The tags are probably the fiddliest bit of the whole transfer.

Creating Redirect for Old Posts to New URLs

WordPress and Drupal use different patterns for permalinks and part of the migration process will have altered the permalinks that WordPress uses to show the posts. Fortunately the node id from Drupal is also the post id in WordPress.

Redirects for Posts

Using a bit of SQL, we can build a list of redirects that can be added to the web server configuration. For search engine optimisation this is important otherwise all the work done getting your posts good profile on Google will start again from scratch.

First, make sure that no / are left in the URLs otherwise they won’t show up in WordPress at all. This should have been covered in the main import of posts above, but a few slipped through when I tried it.

# Double check all slashes are removed from permalinks
UPDATE wordpress.wp_posts
   SET post_name = REPLACE(post_name, ‘/’, ‘-’);

The SQL below builds the redirect statements which’ll then need to be added to the web server configuration:

# Create Apache redirects
SELECT CONCAT ("Redirect permanent /", dst, " /?p=", RIGHT(src, LENGTH(src)-5)) 	
	FROM `url_alias`
WHERE
	src REGEXP '^node'
	AND
	RIGHT(src, LENGTH(src)-5) IN (
		SELECT nid FROM node WHERE type IN ('story', 'blog')
	);

The Drupal content types can be updated or changed by altering line 8 in the code above.

Redirects for Tags

Tags in Drupal use – as a separator for spaces, WordPress uses _, so we’ll also need to create a bunch of redirects for the tag pages. This may need to be adjusted depending on the structure of the Drupal site.

In this instance:

  • old site was: /tags/tag-name
  • new site should be: /tag/tag_name

This SQL will do the magic:

# Create Apache redirects for tags
SELECT DISTINCT
	CONCAT("Redirect permanent /tags/",
		REPLACE(LOWER(d.name), ' ', '-'),
		" /tag/",
		REPLACE(LOWER(d.name), ' ', '_'))
   FROM drupal.term_data d
	   INNER JOIN drupal.term_hierarchy h
    	   USING(tid)
	   INNER JOIN drupal.term_node n
    	   USING(tid)
;

If you have different URL structures for your tag links, adjust lines 3 and 5 accordingly.

The lines generated from the two SQL statements above can be added to the top of the .htaccess file and bingo, the search engines should pick start picking up the new links and indexing them.

Job done…That wasn’t so bad, was it? OK, it’s pretty industrial but hopefully everything worked out OK. Now it’s time to add some plugins to make it work better!

Photo (cc) zman z28 on Flickr.

About 

Inquisitive. Hopeful. Jovial. Cantankerous. Digital marketer. Event organiser. Long-time fan of tech, collaboration and innovation. Exploring digital, social, business, technology, society, psychology & startups. Founder Chinwag, Digital Mission, Pitch NYC, ChinwagPsych. Former Exec Dir, Social Media Week London. More short stuff @toodlepip on Twitter.

  • facebook
  • linkedin
  • twitter
  • flickr
  • googleplus