Insight

What we think, say & do

23 March 2017
Confluence Migration

As a fun discovery project we spent a little time prototyping a small application to migrate data from MediaWiki to our Confluence service. This article sheds some light on the details.

Atlassian in the digital world

Atlassian products became well known in recent years. The digital world shifted to a fast moving, volatile material, which requires adaptive tools to be controlled with. Teams and individuals need faster communication with innovative features to support required information bandwidth. Atlassian supports this with products specialized on project / code / team / event / account / etc management.

One of them is Confluence, which is for team collaboration. Our team is using Confluence for a long time on various project with huge success.

Data migrations

For any product carrying data it's important to provide an interface for importing external content or as a matter of fact exporting its own state. Confluence provides several ways to import content:

  • Between other Confluence instances

  • From files (Word document, XML, HTML)

  • Via its REST api

We took a quick look on the existing import solutions, although there are a few, they are either very specific - meaning it's strictly from one type of system only - or obsolete. This was one of the main reason that led us to write a flexible solution.

On the other side there was MediaWiki, a mature and widely popular content management system. It's api is fairly flexible - however it appears to be old and rigid at some cases. Content can be extracted from its admin page in forms of XML files and can be collected through its REST api as well.

The main issue is the content being "wiki" formatted pages. This means there are reusable (in MediaWiki) templates, crosslinks and media asset references - just to mention few.

Organisation wise MediaWiki has the content as the main building block. It's grouped to a limited set of namespaces, however so called "special" namespaces can be used as categories.

In Confluence there are spaces (equivalent of namespaces) and labels for tagging content. Apart from the building elements Confluence also has rich features for content, such as integrations with services or permissions.

Technicalities of our prototype

Due to the quick session we wanted to have a working prototype that we can both customize as we want, quick to be shipped and also has some grounds of stability. The idea was to have an app that can move content from a defined source to a defined destination without any restrictions. It needed to be configurable per migration in case there are more than one of them.

We decided to go with PHP and well known components as a base - via Composer, the PHP dependency manager. The very essence is the Console package used in Symfony, since it operates only in the command line. For persistence we use SQLite. The configuration happens via YAML (also supported by the Symfony package).

On the architecture side most part of the prototype are pure interfaces for easy extensibility. This is tied together by the migration engine including a configuration interpreter and migration mapper.

Conclusion

A few hours of work yielded an already working initial migration, from scratch. Using the REST apis on both sides were seamless.

Taking care of the wiki format is - on other hand - a challenge still. Templates needs to be handled each of them separately. References require special care also.

It occurred to us that the configuration needs to hold a MediaWiki namespace to Confluence space/label mapping as well. 

It would be interesting to see if existing solutions can be helpful in this use case, such as the (seem to be obsolete) Universal Wiki Converter for Confluence. However that's for another day.