Building this site / Migrating from WordPress to Next.js

May 7, 2021

For some time now I've wanted to move my site away from WordPress, which I've been using for as long as I can remember, and into something less clunky and more modern. I had heard about the wonders of static site generators and thought they might be fun to explore. I was also intrigued by the idea of getting away from a CMS and writing in markdown directly from a project repository—I'm already used to private writing/journaling in plain text, and love speed and portability.

I didn't spent too long thinking about which particular technology to use; I've been learning React and Next.js lately, so that's what made the most sense to me. I did try Gatsby a few months earlier, toying around with many of the available starters, but at the time didn't feel comfortable enough yet with the whole thing and promptly lost interest. But as it happens, as someone intimidated by modern web development I find Next.js very intuitive and a pleasure to work with.

Here I'll outline some of my process to get this new site up and running with basic functionality. This is not meant to be a comprehensive technical document nor a tutorial—just notes about things I find most interesting, and to help me remember what I'm doing.

Converting WordPress posts to markdown

Going from WordPress to markdown seems a tried and true path nowadays, and there are many available tools that convert exported WordPress XML. A quick search yielded me the aptly named wordpress-export-to-markdown, which was perfect for my purposes. I simply cloned the repo, dropped the XML file into the same folder, and ran the script locally.

I chose to have the resulting .md files organized into folders by year, and each file name prefixed by the post's date (YYYY-MM-DD). The script also saves all images into a folder. I had a relatively small number of posts in total (about 40 or so), spanning only 2 years, and very few images, so everything worked swimmingly. In the end I had a file structure like this:

- 2020
  - 2020-MM-DD-file-name.md
  ...
- 2021
  - images
    - an-image.jpg
    ...
  - 2021-MM-DD-another-file.mmd
  ...

I even went into some of the files to manually edit tags, rename files, and whatnot—the speed and ease with which I was able to do this in my code editor was a breath of fresh air.

Creating a blog with Next.js

From there, I did the standard npx create-next-app, suffixing it with the name of my project, cerdenia.com. I also installed React Bootstrap to, well, bootstrap my UI as I'm already familiar with it and don't have much patience for styling components anyhow. With the foundations in place, I looked for a tutorial to help me set up a simple blog: I used an easy tutorial by one Sagar, with which I ended up with a home page that lists the titles of all posts; clicking on any of them takes you to a new page that shows the content of that post. Check out the tutorial to see what I mean.

Even with a pretty modest result, this way I was able to learn a lot about how Next.js works—in particular, working with the file system to reference markdown files (which will eventually be rendered into posts), and assigning them routes dynamically.

Two main takeaways:

  1. Firstly, with regard to the home page: the index page component is represented by a function IndexPage(props), which requires a props to render. A method called getStaticProps is called at build time to retrieve the data that the component needs: in this case, a list of posts. Within that method, we specify the path within our root directory from which to read our markdown files. Each file is read, and the data (title, tags, content, etc.) bundled into a return object with the property props. Here the name is important as it tells Next.js that it should be passed on to IndexPage.

  2. Secondly, the blueprint for an individual blog post page is named [slug].js. The brackets are especially important, as they represent a dynamic route, which we need assigned to each file so Next.js knows how to generate pages for all of them. This is all done with another method, getStaticPaths, which generates paths for all our blog posts at build time. Basically the method reads the file system and outputs an object into which all our markdown files are mapped as their "slugs." The resulting data is passed into getStaticProps, which then outputs a props object for our component BlogPostPage.

I'm not getting too deep into all of the above, as it's all in the tutorial anyway. Suffice to say, it's not all as complicated as it sounds. The latter method is highly similar to the index page's getStaticProps: it also involves going into the file system and extracting data from each file. The difference is that the resulting data is passed into a kind of markdown parser. In the end we get a return object with the markdown file's content converted to a string, preparing it to be rendered finally as a web page.

Problems to solve

All of the above gives us a perfectly serviceable if bare-bones site, but for real-world usage, one quickly realizes its limitations and several immediate areas for expansion. Without getting into adding non-essential features yet, these are some of the issues I found myself having to address straight away:

  • The above only works for a flat contents folder, that is, all markdown files are just dumped in there with no further organization of any kind. But my markdown files, as I've said, are organized by year—otherwise you can imagine how quickly it would become a headache.

  • All is fine if we want any given posts' slug (or path) to be exactly its file name—in my case, with the publication date prefixed. But I don't want dates in my URLs. For example, a file named 2021-05-07-my-post.md should be routed to my-site.com/my-post.

  • Rather than listing all my posts in the home page, I'd like it to show only the most recent five posts or so, and have a separate blog page that lists all posts.

  • Drafts or posts in progress shouldn't be pulled for rendering.

  • Shared functionality between methods in different pages can be abstracted and modularized to make them more reusable.

Retrieving posts from the file system

The first thing I did was to abstract and refactor all the methods that had to do with retrieving and parsing markdown files into posts into a helpers/posts.js file. This way all the methods that have to do with getting post data are in one place and reusable by different components. Here's what the top of the file looks like:

import fs from 'fs';
import matter from 'gray-matter';

const options = { encoding: 'utf-8' }
const pathToPosts = `${process.cwd()}/content/posts`;

// get post folders arranged by year
const years = fs
  .readdirSync(`${pathToPosts}`, options)
  .filter((dir) => !dir.startsWith('.'));

// get list of file references as '[year]/[date]-[slug].md'
export function getMarkdownPaths() {
  const filePaths = [];
  years.forEach((year) => {
    const filesByYear = fs.readdirSync(`${pathToPosts}/${year}`, options);
    filesByYear.forEach((fn) => filePaths.push(`${year}/${fn}`));
  });

  return filePaths.filter((fn) => fn.endsWith('.md'));
}

Right away I create a reference to the specific path or folder in which all our post markdown files are located, pathToPosts, so I don't have to keep writing it over and over again. Then I read each folder in that location, collectively storing them in an array called years since my folders are arranged by year, while ignoring any hidden files (which start with a period).

The function getMarkdownPaths is meant to get us a list of references to all files in all subfolders, which includes the specific year. So a file named 2021-05-07-my-post.md would be returned as 2021/2021-05-07-my-post.md. The year folder prefix is arguably not necessary as the file name itself includes a reference to the year anyway but I don't want to be dependent on the file names themselves for locating my files within subfolders. This will come in handy if I decide to change my file naming convention and remove the dates—which I probably will at some point, as frankly I find them a little ugly.

Getting post metadata

Then we have the methods for extracting data from posts, both individually and collectively:

export function getPostsMetadata() {
  return getMarkdownPaths()
    .map((fn) => getPostMetadata(fn))
    .filter((post) => !post.isDraft); // return only public posts
}

function getPostMetadata(filePath) {
  const path = `${pathToPosts}/${filePath}`
  const rawContent = fs.readFileSync(path, options);
  const { data } = matter(rawContent);
  
  return { 
    title: data.title, 
    date: data.date,
    tags: (data.tags) ? data.tags : null, 
    isDraft: (data.isDraft) ? data.isDraft : null,
    slug: slugify(filePath) 
  }
}

export function slugify(fn) {
  // remove year and date prefixes and file type suffix
  return fn.slice(16).replace('.md', '') 
}

The function getPostsMetadata is meant to return to us all our posts with only metadata (i.e., not the content) included—this would be useful to any component that wishes to display a list of posts by title, date, etc. I abstracted the code that actually extracts relevant data from each post into a getPostMetadata function in case I need it in the future. Anyhow, there are a couple of things going on here:

  1. I included an isDraft property to determine whether a post is public or not. If a given markdown file has no such data, then it is simply treated as a public post. But when writing any new posts in markdown, I include isDraft: true on top of the file to exclude it from getPostsMetadata.

  2. The slug property is simply the name of the markdown file itself without the date. I created a slugify function to make things easy.

Parsing markdown content

The final method within helpers/posts.js is for actually parsing a markdown file's contents. Abstracting this is useful for when we need to parse any markdown content anywhere else in the site that isn't a post, such as special pages. In the future I should probably put this function in a file of its own, but for now it's fine where it is.

export async function parseMarkdown(path) {
  const html = require("remark-html");
  const highlight = require("remark-highlight.js");
  const unified = require("unified");
  const markdown = require("remark-parse");
  
  const rawContent = fs.readFileSync(path, options);
  const { data, content } = matter(rawContent); 

  const result = await unified()
    .use(markdown)
    .use(highlight) // highlight code block
    .use(html)
    .process(content); // pass content to process

  return {
    ...data,
    content: result.toString()
  } 
}

In action!

From here, getting the needed data to our site components is greatly simplified. In our home page index.js, for example, in which I want a list of only the most recent posts:

export default function Home(props) {
  // Do whatever with props
  ...
}

// props is passed to Home component
export async function getStaticProps() {
  // get first 5 posts reverse chronologically
  const posts = getPostsMetadata().reverse().slice(0, 5);
  return {
    props: { posts },
  };
}

And in an individual post page, [slug].js, this is how things look:

export default function BlogPostPage(props) {
  // Do whatever with props
  ...
}

// pass props to BlogPostPage component
export async function getStaticProps(context) {
  const slug = context.params.slug;
  const files = getMarkdownPaths();
  const i = files.findIndex((fn) => slugify(fn) === slug);
  const path = `${process.cwd()}/content/posts/${files[i]}`;
  const post = await parseMarkdown(path);

  const nextSlug = (i + 1 < files.length) ? slugify(files[i + 1]) : null;
  const prevSlug = (i - 1 >= 0) ? slugify(files[i - 1]) : null;

  return {
    props: { post, nextSlug, prevSlug }
  }
}

// generate HTML paths at build time
export async function getStaticPaths(context) {
  const files = getMarkdownPaths();
  return {
    paths: files.map((fn) => {
      return {
        params: { slug: slugify(fn) }
      }
    }),
    fallback: false
  }
}

More things going on here: first, getStaticPaths gets us a list of all our markdown files as slugs, which Next.js uses to route our posts dynamically. For each post, getStaticProps obtains a reference to the post's slug through context.params. Then I use that slug to go into the file system again to find that file's specific path, passing it into our parseMarkdown method, which handles the rest. The parsed markdown is passed into the component's props.

Another thing to note here is I included the properties nextSlug and prevSlug as well, which gets passed into our component. This is so the component has a reference to the next and previous posts chronologically. This would smoothen navigation between posts, rather than having to go back to a master list of posts each time after reading. There's a problem though which I haven't solved yet: drafts of posts still get included in all this, even though they don't show up in said master list—an issue for another time.

Deployment, and a fork in the road: The question of keeping content private

For deployment, I went with Vercel, which is incredibly easy. All updates are simply pushed to a remote repository on GitHub, and Vercel handles the rest.

I did come across a significant issue at this point: at first I had the entire project in a public GitHub repository, all code and posts included, which made me not totally comfortable. I considered separating my posts into a Git submodule, which is a whole different beast that I don't really intend to tackle here. The short of it is that all content would exist in a private repo of its own, to be simply referenced by the parent repo. While it sounds great in theory, it includes the extra step of having to pull post updates from the private repo each time the content changes, since a submodule should never be worked on from its parent repo, but only directly in its own repo.

I tried the above briefly but did not appreciate the workflow, and ran into many annoying issues anyhow when pulling updates into my parent repo. Besides that, there was the even bigger problem of Vercel not being able to fetch content indirectly from a private submodule. There are various workarounds that apparently exist, but all things considered it all seemed more trouble for me than it's worth.

Another thing I briefly considered was using a headless CMS for writing and managing posts. Since I already did the work of abstracting all the methods for fetching posts, it would be easy to swap them with methods referring to an external API. I tried Ghost, which is fast and attractive, but in the end did not feel like having to write using a web-based interface.

Ultimately I settled on the simplest, most obvious solution: keeping the whole remote repo private.

Concluding thoughts... for now

Anyhow, these are some of the things that went into my thought process as I went about building this site in the last few days. Obviously it's still rather basic, and I haven't put too much thought yet into the visual side of things, but I'm having a lot of fun learning Next.js and having full control over my site's functionality. In the near future I plan to work on enabling tags (right now they appear, but link to nothing yet), as well as RSS. I'm pretty sure this will become another series of posts as I develop this site further and deal with new issues as they come up.


Comments are not enabled (yet?). Please email me if you see anything that interests you.