Over the weekend I migrated my blog from a Domino based blog running v 3.0.2 of the BlogSphere template to WordPress. This is the first of a few posts about the migration process, this post will cover the tools used to accomplish this move.
The process I am describing worked well for me, your mileage may vary, the tools which are being made available (see the bottom of the post for download details), are provided as is, with no guarantees.
The biggest challenge was to get the data out of the BlogSphere template into a format that could be read and imported in to WordPress. Matt White was kind enough to share an agent he wrote to export blog posts and comments into an xml file. After importing the agent in to the template, from the action menu run Export to WP. Watch the Status Bar on the bottom for progress.
This will generate an XML file, the agent by default is set to dump the file in to the root of C:\, if you want it somewhere else you will have to change it in the Agent.
The next step is to fix the formatting of the file, the outputted XML is valid, but WordPress is a little picky about how it wants to see it. The easiest way to do this is to open the file in Eclipse and press CTRL-SHIFT-F, depending on the size of your file, it could take a few minutes to format.
The next step is to fix the header, the exported file header will look like this
You need to replace the header with this xml (available for download in a file see the end of the post) in order for WordPress to be able to import it.
At this point the file is technically ready for import in to WordPress, but there are a couple of other issues you may want to address.
The first issue is images in blog posts, one of the nice features when blogging in Notes, is the ability to simply drag and drop an image in to a post, well these images are now all exported as HTML, but pointing to your domain, if you import them as is, and then subsequently change your domain to point to your new blog your images will all be broken. To solve this problem before I imported I replaced “img src=”http://www.curiousmitch.com/” with “img src=”http://www.oldblog.curiousmitch.com/” as for the time being I have left my old blog running resolved to http://oldblog.curiousmitch.com, the images continue to work. If I ever decide to remove the old blog completely I will have to revisit this, but for now it works.
The second issue is that the formatting of the exported posts are not perfect, for some reason there are Carriage Returns and Line Feeds inserted at random points which when imported into WordPress were translated as Line breaks “<br>“ which meant two things, 1. some posts when imported are not formatted correctly – a minor issue, which you could manually fix in posts that really bother you, 2. some of the “<br>” tags were inserted in the middle of HTML elements, like images, causing images not to render, and the RSS feed to be invalid. This was a tricky one as you can not simply remove all of the Carriage Returns from the file as some of them are needed for the XML to remain valid. A colleague of mine was kind enough to write, and allow me to share a small utility which removes all of the Carriage Returns and Line feeds from the post sections of the XML only. The result of which is valid XML, and cleaner imported posts. The utility is named ExportXMLCleaner.exe and is simply run from a command line
which will produce a new file names output.xml, this is the file you want to import in to WordPress, if it is not already included in your WordPress install you will need the WordPress Importer plugin to import the file.
exporttowordpress.lss – Export to WP Lotuscript Agent (Shared with Matt’s permission)
ExportXMLCleaner.exe – Utility to clean CR/LF from export file as described above
header.xml – header to use in the export file