Importing and Exporting Nodes, Taxonomy and CCK Fields

How to merge content from two Drupal sites into one site

I just battled through two days of moving content from two unique Drupal installations into one, this is my experience.

Call it what you want to, merging, copying, moving data from one drupal site to another — I've done it many times, and there are many methods which currently exist to accomplish this feat, one thing that they have in common (IMHO) is 1) It takes a freaking long time, 2) You need to document each step very carefully and 3) Trial and error is a necessity.

Methods for importing/exporting nodes and other content into a Drupal site

The method that I chose for my recent project was to work within the mysql databases using primarily phpmyadmin, for reasons that I will get to further down, if this method doesn't suit you, take a look at the Node Export module (http://drupal.org/project/node_export module).  You can use this module in conjunction with Node Import which allows you to map content from a CSV (Comma Separated Values) document into node fields.  This method works with fields created using the CCK module, Taxonomy and Location.  It probably works with many other contributed modules as well.

On a past project I used the Node Import module to populate a Drupal site with hundreds of pages of content from a previously hand coded project.  This method turned out to be incredibly tedious since I was dealing with content that was inconsistent, didn't validate and the Node Import module turned out to be a bit buggy.  My saving grace was carefully documenting each step in the process and a ridiculous amount of patience.

Another method worth exploring is using the FeedAPI module.  When used in conjunction with other modules it's possible to ingest data from a CSV. I'm currently using FeedAPI to import and update the vehicle inventory of a car dealership.  Because of the immensity of data it takes several cron runs scheduled each night to completely update the inventory — roughly 4-6 cars at a time, including pictures — but it works excellently.

UPDATE:  I've recently begun exploring the Deployment module.  It's promising, but I'm not sure if it's ready for prime-time as it's under heavy development.  What makes something like Deployment such a major project is the vast number of content module which would require individual support.

Copying Nodes, Taxonomy, Locations, etc. using MySQL

When I finally got up the courage to begin tackling this one I was convinced that it would be an enormous challenge, but after completion, I would have to say that it was a lot easier than using the methods described above.

My strength after years of working with the Drupal CMS is figuring out how to use the contributed modules to accomplish any given task as elegantly as possible, and built excellent sites in the process — my weak spot is programming, which is why I usually stick with methods of working with data in ways that I am familiar.  Having a lot of experience using the other methods described above I realized that for this project they were not feasible.  Here are a couple of reasons why:

  1. I was working with 800+ nodes, node import would potentially take many hours to batch import all of the data, and I would undoubtedly need to import the data more than once
  2. Taxonomy, an nasty nested hierarchy that doesn't work well on a flat spreadsheet
  3. Locations and CCK — yes, it's possible to import, but again, it's slow going
  4. I had to move more than one content type, which means additional steps for importing content using Node Import

 

But I also had one big advantage, which was that all of my node id's started somewhere around 15000 — far above that of the content in the second site.  Since this is the primary key in most tables, knowing that there would be not conflicts here was a huge relief.

Helpful tools that you are going to need

Terminal and Shell access to your server

If you're not familiar with using a terminal (the black and white screen where you type codes and it does stuff), I would highly suggest that you fine a buddy who can get you started, get yourself a server to test your skills on and start hacking away.

Using the terminal makes it very easy to execute those MySQL dumps that you can make in phpmyadmin and shows you where the errors are.

phpMyAdmin

This is an awesome tool that is most often pre-installed on your server.  It will allow you to import, export and modify your Drupal databases and tables.  You will need this tool to copy your nodes, taxonomy terms, etc, using the method I'm describing here. You can also easily copy an entire database to create a backup under the 'Operations' tab for a given database.

A Text Editor

Find and Find and Replace are super helpful when you need to modify one or two lines from your MySQL export.  You'll need the text editor to keep track of the errors you experience and the steps to merging your content as you go along.

Skip here to begin the process

If you just landed here from Google, you're probably just looking for the answer, this is the closest you're going to get to a quick/easy answer:

The Concept

What makes this method easy, is that you can quickly export the tables that contain the content you need and just as quickly import them into your second site.  What makes this method difficult is that you need to make sure that you'll have no duplicate PRIMARY KEYs when you import your tables.

Exporting and Importing MySQL exports using phpMyAdmin

Using phpMyAdmin, you can export and import data fairly quickly by using the "Export" and "Import" tabs.  When exporting data, you won't need to change the main settings, but you will want to select which tables to Export.

Exporting:

  1. From the drop-down on the left frame of phpMyAdmin select the database you wish to export
  2. In the right frame, click on the Export tab.
  3. Select the tables you want to Export
  4. Scroll to the bottom and check the 'Save as file' checkbox
  5. Choose your prefered form of compression (I like to gzip it)
  6. Click GO


Importing:

My prefered method of importing is to open mysql in the terminal and execute the file from there.  Using this method shows me real time the progress and errors that have occured.

mysql> \. /folder/folder/myexport.sql

Using phpMyAdmin you can import your tables using this method:

  1. Select your database from the drop-down in the left pane (make sure you've created a backup, never work on your original)
  2. From the tabs along the top, click on 'Import'
  3. Select your file and import it.

 

Changing the Primary Key for your Nodes

In phpMyAdmin, you can look at the table structure and see which rows contain the Primary Key. In the case of Nodes and Taxonomy these keys are integers in rows titled 'nid' and 'tid'.  The Node keys (nid) are used everywhere by other modules to associate other content with that module.  To avoid making duplicates you can increment your nid's by a number higher than the highest nid in your second site db.  If you're using other modules that enhance the data on your nodes such as location, cck, url_alias or taxonomy, to name a few, look for their associated tables and increment the 'nid' value by the same amount.

phpMyAdmin will allow you to run snippets of code.  Choose the table you want to update, then click the "Query" tab.  The snippet below will increase every 'nid' in the 'node' table by 1500.

UPDATE node SET nid = nid + 1500

Changing the Primary Key for your Taxonomy

The same approach for avoiding overlap can be applied to your Taxonomy tables.  Look for tables that contain the vocabulary id and the term id.  Make sure to increase them by a value higher than those in your second site db.

Changing the Primary Key for CCK Fields

If you look at the table structure for CCK Fields you'll notice that the Primary Key is the id you gave to the field when it was first created with "field" tacked on the front, such as "field_phone" or "field_sales_contact".  In my experience, if the none of the fields you are importing have matching id's to the CCK Fields in your second site, you're good to go.

If you do have matching Primary Keys here that's OK but you'll still need to be careful that you do not cause errors on import.

If you have Primary Key's that match but the field or entire content type is exactly the same, you may only need to import the matching table for the field. If you decide to change the id of the Primary field make sure that you change it on every table including content_type_[nodetype], content_node_field_instance, content_node_field, content_group, content_field_[field_id] and on the table heading of the fields table as well.

I would recommend that if you are moving content from a new content type, that you use the CCK import/export features before merging your content.  In my experience this works well.

If you've used the filefield or imagefield modules with CCK you'll also need to update the Primary Key in the 'files' table.  You can use something like this:

UPDATE files SET fid = fid + 15000

Once you've done that, you'll need to update the matching fid's in your matching CCK table with something like this:

UPDATE content_type_business SET field_catalog_photo_fid = field_catalog_photo_fid + 15000 WHERE field_catalog_photo_fid IS NOT NULL

What happens if I have duplicate Key's when I import the content?

If you are importing several tables at once via an sql import, the sql import will contain several queries.  If one of the queries fails due to an error that data is not imported and sql will move on to perform the next query.  If you know where you went wrong, you can go back, fix the appropriate table and export/import that particular table again.

Here's a list of the tables I needed to copy, it may be helpful to you:

  • content_field_company_brands
  • content_field_email_dir (I had duplicate keys here, so I added "dir" to the end)
  • content_field_phone_dir (Same here)
  • content_field_product_images
  • content_field_sales_contact
  • content_group (If you used CCK import/export you won't need to import this)
  • content_group_fields (or this)
  • content_node_field (or this)
  • content_node_field_instance (or this)
  • content_type_business (or this)
  • content_type_distributors (or this)
  • files (Yes, you'll most likely also have to fix your key's here too)
  • imagecache_action (There is currently no import/export function for image_cache)
  • imagecache_preset (Again, watch for duplicate keys)
  • location (I was using the location module)
  • location_instance
  • node
  • node_revisions (Don't skip this one, it contains data too)
  • node_type
  • term_data
  • term_hierarchy
  • term_node
  • term_relation
  • term_synonym
  • url_alias (You can skip this if you rebuild your alias later, but I had hundreds of aliases which I wanted to keep)
  • vocabulary
  • vocabulary_node_types
  • zipcodes

 

A Step By Step Process

  1. Update both sites so you have matching module versions.  Make sure to run update.php
  2. Copy your content types using CCK Import/Export
  3. Create a backup of db 2 (your final db)
  4. Create a duplicate of db 1 (you'll be making changes to this content so that it will merge with db 2 without errors)
  5. Guess which primary keys are going to overlap
  6. TEST: Export your content tables (nodes, cck, taxonomy, location, etc.) from db 1 and import them into db 2
  7. Take note of all your errors.  These are most likely your duplicate Primary Key's
  8. Go back and modify the Primary Keys in db 1 to avoid duplication errors
  9. TEST 2: Export your content tables from db 1 and import them into a fresh copy of db 2
  10. Take note of your errors, if none go test your site.  Make sure that all of your data has been imported correctly, otherwise jump back to step 8.


Stuff to watch out for:

  • If you haven't already, go ahead and install the Admin Menu module.  The drop-down menu has an option to 'clear all caches'.  Every time you make a change to the database you'll need to clear your caches. If you don't, you may not be able to see new content.
  • If you're working with huge files your import may fail, or be incomplete.  This is why I prefer to use the terminal, it shows me the import progress in real time, the errors that occur and it won't time out or fail due to memory limits.
  • Never make changes to your original database.  Alwasy create a backup, and just in case, export both databases and put them somewhere safe.
  • Make sure that every node has a 'uid', I experienced a bug in which content without a uid would not show.  The uid is the author of the node.  Update the revisions table too:

 

UPDATE node SET uid = 24 WHERE uid = 0