How to Make Blog Post Management Effortless with Python

How to Make Blog Post Management Effortless with Python

Introduction

Hello everyone.

In this article, I will show you how to create an all-in-one blog manager system. We will use Python, a high-level dynamically typed language, and GitHub Actions, a continuous integration system developed by GitHub.

Setting the Stage

Imagine having to publish to multiple platforms. Sure, posting to the first is simple.

But, with some experience, I can say that the "Import post" features for these platforms aren't perfect.

Yet, if you persist, you will find yourself spending lots of time reformatting blocks of text, using some service to upload blocks of code (for example), and maybe even having to re-upload every single image.

Now do you see the problem?

So what better way to optimise your publishing than to create a GitHub-hosted CMS? Even better when automated by a Python-powered GitHub Actions script!

Understanding GitHub Actions

The essence of GitHub Actions is best explained in the official docs. There you will become familiar with workflows, actions, and runners, all of which are essential for our system to function.

System Overview

A user’s blog will be a GitHub repository with a workflow that makes use of our blog publishing action. The repo will contain folders for each of the user’s posts. Each post folder will contain the content of the article.md, along with any images.

We will use the ImgBB API to upload our images and use the uploaded URL with both platforms.

Also, the post’s cover image can be included, which will be automatically configured for the publishing platform API.

But where does Python fit into all of this? Well, because Python isn’t officially supported as a language to create an action, we must use a composite action to run a Python script.

But this does mean that there are no toolkit packages like there are for JavaScript. So, we must make direct communication with the GitHub API from the Python code.

Preparing the Action Folder

Let’s start by creating a new Python project, using Poetry. Poetry is a dependency management system for Python, similar to Node JS’s NPM:

poetry new blog-manager-action

This should create a folder named blog-manager-action. Within will be a folder named blog_manager_action. The Python code will live there.

Now we can install all the required dependencies:

poetry add pygithub requests python-dotenv python-frontmatter

Building Action!

Our blog publisher action will require the platform integration tokens & API keys.

Now, we can begin to define the action file:

name: "Blog Manager Action"
description: "A GitHub Action that uploads articles to your blogging platform"
inputs:
  medium_integration_token:
    description: "Medium's Integration Token. Token can be retrieved at medium.com/me/settings/security, under 'Integration tokens'"
    required: true
  hashnode_integration_token:
    description: "Hashnode's Integration Token. Token can be retrieved at hashnode.com/settings/developer"
    required: true
  hashnode_hostname:
    description: "Hostname for Hashnode blog. e.g. cs310.hashnode.dev"
    required: true
  hashnode_publication_id:
    description: "Publication ID for Hashnode blog. Appears in your Blog Dashboard page URL"
    required: true
  github_token:
    description: "A GitHub PAT"
    required: true
  imgbb_api_key:
    description: "API Key for Imgbb CDN"
    required: true
outputs:
  hashnode_url:
    description: "URL of the Hashnode Post"
    value: ${{ steps.run-script.outputs.hashnode_url }}
  medium_url:
    description: "URL of the Medium Post"
    value: ${{ steps.run-script.outputs.medium_url }}
runs:
  using: "composite"
  steps:
    ...

Preventing Catastrophe

But before we go into the specific steps, I need to address an issue.

The action is meant to run when the repository has been pushed to the remote. But, the system must acknowledge that the user has published a post by updating the frontmatter. And, updating the frontmatter will lead to another push to the remote.

So, to prevent endless triggering of the action, we can stop the action from running if we know it was the action itself that made the push.

And how do we do that? We can decide to not run the Python script if the commit message that generated the push isn’t CI_COMMIT_MESSAGE.

- name: Configures Auto-Commit message in environment
  run: echo "CI_COMMIT_MESSAGE=Apply output IDs to article files" >> $GITHUB_ENV
  shell: bash

- name: Set environment variable "is-auto-commit"
  if: github.event.commits[0].message == env.CI_COMMIT_MESSAGE
  run: echo "is-auto-commit=true" >> $GITHUB_ENV
  shell: bash

- name: Display Github event variable "github.event.commits[0].message"
  run: echo "last commit message = ${{ github.event.commits[0].message }}"
  shell: bash

- name: Display environment variable "is-auto-commit"
  run: echo "is-auto-commit=${{ env.is-auto-commit }}"
  shell: bash

Now we can include Python, Poetry, and dependencies, and setup the Python project:

- name: Install Python
  id: setup-python
  uses: actions/setup-python@v4
  with:
    python-version: "3.9"
- name: Install Poetry
  uses: snok/install-poetry@v1.3.3
  with:
    virtualenvs-create: true
    virtualenvs-in-project: true
    installer-parallel: true
- name: Load cached venv
  id: cached-poetry-dependencies
  uses: actions/cache@v3
  with:
    path: .venv
    key: venv-${{ runner.os }}-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('**/poetry.lock') }}
- name: Install dependencies
  if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
  run: poetry install --no-interaction --no-root
  shell: bash
- name: Install project
  run: poetry install --no-interaction
  shell: bash

Another drawback of using Python for GH action scripting is that the action inputs are not included automatically.

- name: Pass Inputs to Shell
  run: |
    echo "MEDIUM_INTEGRATION_TOKEN=${{ inputs.medium_integration_token }}" >> $GITHUB_ENV
    echo "HASHNODE_INTEGRATION_TOKEN=${{ inputs.hashnode_integration_token }}" >> $GITHUB_ENV
    echo "HASHNODE_HOSTNAME=${{ inputs.hashnode_hostname }}" >> $GITHUB_ENV
    echo "HASHNODE_PUBLICATION_ID=${{ inputs.hashnode_publication_id }}" >> $GITHUB_ENV
    echo "IMGBB_API_KEY=${{ inputs.imgbb_api_key }}" >> $GITHUB_ENV
    echo "GITHUB_TOKEN=${{ inputs.github_token }}" >> $GITHUB_ENV
    echo "GITHUB_REPOSITORY=${{ github.repository }}" >> $GITHUB_ENV
    echo "GITHUB_SHA=${{ github.sha }}" >> $GITHUB_ENV
  shell: bash

Finally, we can conditionally run the script.

- name: run-script
  if: env.is-auto-commit == false
  run: |
    source .venv/bin/activate
    python blog_manager_action/main.py
  shell: bash

Python Scripting

Here’s the high-level overview of the Python code, in main.py:

from connections import get_last_commit
from extract_article_folders import extract_article_folders
from publish_article import publish_article

commit = get_last_commit()

# Find the root folder of each committed file
# i.e. the one containing an "article.md"
folders = extract_article_folders(commit.files)

for folder in folders:
    publish_article(folder)

Connections

We can include all of our shared variables for access to the GitHub repo in connections.py:

from github import Github
from github import Auth
import os

_auth = _gh = _repo = _last_commit = None

def get_auth():
    global _auth
    if not _auth:
        _auth = Auth.Token(os.environ.get("GITHUB_TOKEN"))
    return _auth

def get_github():
    global _gh
    if not _gh:
        _gh = Github(auth=get_auth())
    return _gh

def get_repo():
    global _repo
    if not _repo:
        _repo = get_github().get_repo(os.environ.get("GITHUB_REPOSITORY"))
    return _repo

def get_last_commit():
    global _last_commit
    if not _last_commit:
        _last_commit = get_repo().get_commit(sha=os.environ.get("GITHUB_SHA"))
    return _last_commit

Extracting the Folder Containing an Article

Here we take each changed file and bubble up the directory until we meet an article.md to publish:

from connections import get_repo
from github import UnknownObjectException

def extract_article_folders(files):
    repo = get_repo()
    folders = []

    for file in files:
        parts = file.filename.split("/")
        for i in range(len(parts) - 2, -1, -1):
            # Reconstruct the folder path
            folder = "/".join(parts[0 : i + 1])

            try:
                dir_contents = repo.get_contents(folder)

                # If `article.md` exists within the directory
                if any(
                    ["article.md" in file_content.name for file_content in dir_contents]
                ):
                    folders.append(folder)
                    break
            except UnknownObjectException:
                pass

    return folders

Publishing An Article

We shall begin publish_article.py with imports and constants.

The imports and constants:

from connections import get_repo
import requests
import os
import mimetypes
import frontmatter
from hashnode import publish_hashnode
from medium import publish_medium
import re

COVER_IMAGE_NAME = "cover.png"

Image Uploading

First, define the function get_image_links to parse the images within the folder:

def get_image_links(files):
        # Mapping of image file names to uploaded URLs
    urls = {}
    for file in files:
                # Filter out non-image filetypes
        file_type = mimetypes.guess_type(file.name)[0]
        if file_type == None or not file_type.startswith("image/"):
            continue

        # Upload file content to CDN
        res = requests.post(
            "<https://api.imgbb.com/1/upload>",
            {
                "key": os.environ.get("IMGBB_API_KEY"),
                "image": file.content,
            },
        )

        # Throw error if status code != 200
        res.raise_for_status()
        urls[file.name] = res.json()["data"]["url"]
    return urls

Including Uploaded Images

The function replace_image_links will search for any image link in the article markdown content, and replace it with the Imgbb URL:

def replace_image_links(markdown, images):
    new_content = markdown

    image_names = [re.escape(name) for name in images.keys()]

        # e.g. ![test image](image-file.png alt="Just an example")
    MARKDOWN_IMAGE = re.compile(
        rf'!\\[[^\\]]*\\]\\(({"|".join(image_names)})\\s*((?:\\w+=)?"(?:.*[^"])")?\\s*\\)'
    )

    re_match = MARKDOWN_IMAGE.search(new_content)

    while re_match != None:
        # Replace URL
        new_content = (
            new_content[: re_match.start(1)]
            + images[re_match[1]]
            + new_content[re_match.end(1) :]
        )

        re_match = MARKDOWN_IMAGE.search(new_content)
    return new_content

Wrapping It All Up

publish_article ties everything together.

We publish the formatted markdown content to each platform. Then, we add the published URLs to the action’s output.

Now is also when we use CI_COMMIT_MESSAGE to indicate that the article has been uploaded:

def publish_article(folder):
    repo = get_repo()
    contents = repo.get_contents(folder)

    article_file = next(file for file in contents if file.name == "article.md")#
    article = article_file.decoded_content.decode()

    # Upload all the images within the folder to the CDN
    images = get_image_links(contents)

    # Replace all image references with their uploaded CDN URLs
    article = replace_image_links(article, images)

    # Tag `.metadata` onto article, using frontmatter
    article = frontmatter.loads(article)

    # Publish to blogging platforms
    cover_image_url = images.get(COVER_IMAGE_NAME)

    hashnode_url = publish_hashnode(article, cover_image_url)
    print(f"Published to Hashnode at {hashnode_url}")
    print(f"::set-output name=hashnode_url::{hashnode_url}")

    medium_url = publish_medium(article, cover_image_url)
    print(f"Published to Medium at {medium_url}")
    print(f"::set-output name=medium_url::{medium_url}")

    article["is_published"] = True

    result = repo.update_file(
        f"{folder}/article.md",
        os.environ.get("CI_COMMIT_MESSAGE"),
        frontmatter.dumps(article),
        article_file.sha,
    )

    new_commit = result["commit"].sha

    # Push to remote
    head = repo.get_git_ref("heads/main")
    head.edit(new_commit)

Using Hashnode

We use the Hashnode API to publish markdown content to Hashnode.

def publish_hashnode(article, cover_image_url=None):
    query = """mutation CreateStory($input: CreateStoryInput!) {
                    createStory(input: $input) {
                        code
                        success
                        message
                    }
            }"""

    # Send an UPDATE if the post has already been published
    if article.get("is_published") and "hashnode_id" in article.keys():
        query = query.replace(
            "createStory(", f'updateStory(postId: "{article["hashnode_id"]}", '
        )

    variables = {
        "input": {
            "title": article["title"],
            "slug": article.get("slug"),
            "contentMarkdown": article.content,
            "tags": article.get("hashnode_tags", []),
            "isPartOfPublication": { "publicationId": os.environ.get("HASHNODE_PUBLICATION_ID") }
        }
    }

        # Accept a cover image, if available
    if cover_image_url:
        variables["input"]["coverImageURL"] = cover_image_url

    if "canonical_url" in article.keys():
        variables["input"]["isRepublished"] = { "originalArticleURL": article["canonical_url"] }

    res = requests.post(
        "<https://api.hashnode.com>",
        json={
            "query": query,
            "variables": variables,
        },
        headers={"Authorization": os.environ.get("HASHNODE_INTEGRATION_TOKEN")},
    )

    json = res.json()

    if json.get("errors") and len(json["errors"]) > 0:
        exit(", ".join([e["message"] for e in json["errors"]]))

    return f'https://{os.environ.get("HASHNODE_HOSTNAME")}'

Using Medium

Start by importing the necessary dependencies at the top of medium.py:

import requests
import os

from connections import get_github
from urllib.parse import urlparse
from github import InputFileContent
from re import compile

Medium publishing is slightly different. From my experience, code blocks are not imported nicely. So we will use GitHub Gists to host the code blocks.

def gistify_code_blocks(markdown):
    gh = get_github()
    user = gh.get_user()
    new_content = markdown

    MARKDOWN_CODE_BLOCK = compile(r"```(?:.+)?#(.+)\\n((?:.|\\n)+?)\\n+```")

    re_match = MARKDOWN_CODE_BLOCK.search(new_content)

    while re_match != None:
        # Create a gist with the code block content and filename
        gist = user.create_gist(True, {
            re_match[1]: InputFileContent(re_match[2])
        })

        # Replace the code block in the markdown with the uploaded URL
        new_content = (
            new_content[: re_match.start()]
                + gist.html_url
                + new_content[re_match.end():]
        )

        re_match = MARKDOWN_CODE_BLOCK.search(new_content)
    return new_content

Also, Medium articles display a “Originally published at …” at the end of the article, which includes the canonical URL (the original address).

Define a function create_canonical_reference that returns such display, in markdown format:

def create_canonical_reference(url):
    if url == None:
        return ""
    parsed_url = urlparse(url)
    base_url = f"{parsed_url.scheme}://{parsed_url.hostname}"

    # e.g. ... at [cs310.hashnode.dev](<https://cs310.hashnode.dev/post-4>)
    return f"\\n\\n---\\n\\n*Originally published at [{base_url}]({url}).*"

Publishing

Note that the Medium API requires even the title to be in the markdown content. So, if this is the first time the user is

Now we can publish. But first, only when the post is being published for the first time, can we add the article title. This is because the Medium API requires that this be in the markdown content, unlike Hashnode.

def publish_medium(article, cover_image_url=None):
    user_id = get_user_id()

        # Enrich the markdown content with title, cover image & canonical reference
    def transform_content(article):
        content = gistify_code_blocks(article.content)
        if not article.get("is_published"):
            content = f"# {article['title']}\\n" + content

        if cover_image_url:
            content = f"![cover image]({cover_image_url})\\n" + content

        if "canonical_url" in article.keys():
            content += create_canonical_reference(article["canonical_url"])

        return content

    res = requests.post(
        f"<https://api.medium.com/v1/users/{user_id}/posts>",
        headers={
            "Authorization": f"Bearer {os.environ.get('MEDIUM_INTEGRATION_TOKEN')}"
        },
        json={
            "title": article["title"],
            "contentFormat": "markdown",
            "content": transform_content(article),
            "tags": article.get("medium_tags", []),
            "canonicalUrl": article.get("canonical_url"),
            "publishStatus": "public",
        },
    )

    json = res.json()

    if json.get("errors") and len(json["errors"]) > 0:
        exit(json["errors"][0]["message"])

    return json["data"]["url"]

Sample Workflow

Here is a sample workflow that would use the action.

name: Test Action
on: push

jobs:
  run-script:
    runs-on: ubuntu-latest
    name: Returns the published URL
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Run publishing script
        id: execute
        uses: WoolDoughnut310/blog-manager-action@main
        with:
          medium_integration_token: ${{ secrets.medium_integration_token }}
          hashnode_integration_token: ${{ secrets.hashnode_integration_token }}
          hashnode_hostname: ${{ secrets.hashnode_hostname }}
          hashnode_publication_id: ${{ secrets.hashnode_publication_id }}
          github_token: ${{ secrets.token }}
          imgbb_api_key: ${{ secrets.imgbb_api_key }}

Note that each credential must be added to the secrets area on the GitHub repo. Also, the repo must have Read & Write access for the contents to be updated (pushed to).

Bonus: Submodules

Another additional feature would be for users to include their code repositories within this blog repository. So the blog repository forms almost a mega repository.

Well, this isn’t much a feature of our system, but a feature built into Git.

Let's say the user has a code repo at REPO_URL and they feature snippets of it as part of their blog post.

They can include the repo into their larger repository under the folder name project with the following command:

git submodules add REPO_URL project

Conclusion

And that’s all for today’s project. I hope you’ve seen the potential GitHub actions have for producing a comprehensive blog management system.

Perhaps it may be useful for those wanting to start their blogging journey.

If you liked this post, comment on your own experiences and stick around for more!

You can find the code for this article on GitHub.

References

GitHub Actions documentation - GitHub Docs

Most effective ways to push within GitHub Actions | Johtizen

Shipyard | Writing Your First Python GitHub Action

Git - Submodules (git-scm.com)

Medium/medium-api-docs: Documentation for Medium's OAuth2 API (github.com)

api.hashnode.com

PyGithub — PyGithub 2.1.0 documentation

What exactly is Frontmatter? (daily-dev-tips.com)

Author’s Note

I did get the infinite recursion of repo pushes and workflow runs!


One drawback of using Medium is that users can’t edit their articles through the API. Plus, it has been deprecated for some time now.

Did you find this article valuable?

Support CS310 by becoming a sponsor. Any amount is appreciated!