Jekyll: Import Disqus comments for Staticman

Due to privacy concerns with Disqus, I have exported old Disqus comments and integrated them directly in Jekyll.

For some years already, I try to rely for this website on less external resources and avoid ad-powered services to improve the privacy for my dear readers.

Recently, I removed the comments provided by Disqus from this blog, because Disqus introduced too much data sharing with many third parties. Norway just fined this year Disqus 2,5 Mio Euro for tracking without legal basis.

Please find hereafter some tips on how to export comments from Disqus and display them in a privacy-friendly way in your Jekyll blog.

Export Disqus Comments to JSON and YAML

  1. Disqus documents the export and export format at https://docs.disqus.com/developers/export/
  2. Navigate to http://disqus.com/admin/discussions/export/ to export your comments to XML format.

    The XML has principally 3 parts: meta data, a list with webpages and a list with comments that are linked each to a webpage (via a Disqus identifier) and possibly a parent comment in case the comment is a reply.

    For use within Jekyll, I need to restructure the data and have a list of comments for each webpage by my own identifier (e.g. post slug) and convert everything to a format that Jekyll can handle, hence YAML, JSON, CSV, or TSV. I choose YAML.

  3. Install the linux tool xq to manipulate XML files and export to JSON and the tool jq. xq is basically a wrapper of jq.

    pip install xq
    

    Download binaries of jq here: https://stedolan.github.io/jq/download/

  4. I convert then the Disqus XML export into a JSON file with the code in export-disqus-xml2json.sh

  5. Then, I pipe the output through import-json-yaml.rb to split the list of comments into individual files for easy consumption by Jekyll.
# file: 'export-disqus-xml2json.sh'

#!/usr/bin/env sh

xq '.disqus | .thread as $threads | .post | map(select(.isDeleted == "false")) | map(.thread."@dsq:id" as $id | ($threads[] | select(."@dsq:id" == $id)) as $thread | {id: ("disqus-"+."@dsq:id"), date: .createdAt, slug: ($thread.id | tostring | gsub("/$";"") | split("/") | last), name: (if .author.name == "Robert" then "Robert Riemann" else .author.name end), avatar: .author | (if has("username") and .username != "rriemann" then "https://disqus.com/api/users/avatars/"+.username+".jpg" else null end), email: .author | (if has("username") and .username == "rriemann" then "my@mail.com" else null end), message, origin: ($thread.link | tostring | gsub("^https://blog.riemann.cc";"")), replying_to: (if has("parent") then ("disqus-"+.parent."@dsq:id") else null end)})' "$@"

Example comment from the JSON list:

{
  "id": "disqus-4145062197",
  "date": "2018-10-14T22:14:58Z",
  "slug": "versioning-of-openoffice-libreoffice-documents-using-git",
  "name": "Robert Riemann",
  "avatar": null,
  "email": "my@mail.com",
  "message": "<p>I agree, it is not perfect. I have no solution how to keep the noise out of git.</p>",
  "origin": "/2013/04/23/versioning-of-openoffice-libreoffice-documents-using-git/",
  "replying_to": "disqus-4136593561"
}

The script import-json-yaml.rb takes each comment and puts it in YAML format with a unique filenname in the folder named after the slug.

# file: 'import-json-yaml.rb'
#!/usr/bin/env ruby

require 'json'
require 'yaml'
require 'fileutils'
require 'date'

data = if ARGV.length > 0 then
  JSON.load_file(ARGV[0])
else
  JSON.parse(ARGF.read)
end

data.each do |comment|
  FileUtils.mkdir_p comment['slug']
  File.write "#{comment['slug']}/#{comment['id']}-#{Date.parse(comment['date']).strftime('%s')}.yml", comment.to_yaml
end

The output with tree looks like:

_data
├── comments
│   ├── announcing-kubeplayer
│   │   ├── disqus-113988522-1292630400.yml
│   │   └── disqus-1858985256-1424044800.yml
│   ├── requires-owncloud-serverside-backend
│   │   ├── disqus-41270666-1269302400.yml
│   │   ├── disqus-41273219-1269302400.yml
...

Display Comments in Jekyll

Those comments are accessible in jekyll posts/pages via site.data.comments[page.slug]

Most helpful for the integration of comments to Jekyll was the post https://mademistakes.com/mastering-jekyll/static-comments-improved/.

<!-- file: 'my-comments.html' -->
{% assign comments = site.data.comments[page.slug] | sort %}
{% for comment in comments %}
  {% assign index       = forloop.index %}
  {% assign replying_to = comment[1].replying_to | to_integer %}
  {% assign avatar      = comment[1].avatar %}
  {% assign email       = comment[1].email %}
  {% assign name        = comment[1].name %}
  {% assign url         = comment[1].url %}
  {% assign date        = comment[1].date %}
  {% assign message     = comment[1].message %}
  {% include comment index=index replying_to=replying_to avatar=avatar email=email name=name url=url date=date message=message %}
{% endfor %}
<!-- file: 'comment' -->
<article id="comment{% unless include.r %}{{ index | prepend: '-' }}{% else %}{{ include.index | prepend: '-' }}{% endunless %}" class="js-comment comment {% if include.name == site.author.name %}admin{% endif %} {% unless include.replying_to == 0 %}child{% endunless %}">
  <div class="comment__avatar">
    {% if include.avatar %}
      <img src="{{ include.avatar }}" alt="{{ include.name | escape }}">
    {% elsif include.email %}
      <img src="https://www.gravatar.com/avatar/{{ include.email | md5 }}?d=mm&s=60" srcset="https://www.gravatar.com/avatar/{{ include.email | md5 }}?d=mm&s=120 2x" alt="{{ include.name | escape }}">
    {% else %}
      <img src="/assets/img/avatar-60.jpg" srcset="/assets/img/avatar-120.jpg 2x" alt="{{ include.name | escape }}">
    {% endif %}
  </div>
  <div class="comment__inner">
    <header>
      <p>
        <span class="comment__author-name">
          {% unless include.url == blank %}
            <a rel="external nofollow" href="{{ include.url }}">
              {{ include.name }}
            </a>
          {% else %}
            {{ include.name }}
          {% endunless %}
        </span>
        wrote on
        <span class="comment__timestamp">
          {% if include.date %}
            {% if include.index %}<a href="#comment{% if r %}{{ index | prepend: '-' }}{% else %}{{ include.index | prepend: '-' }}{% endif %}" title="link to this comment">{% endif %}
            <time datetime="{{ include.date | date_to_xmlschema }}">{{ include.date | date: '%B %d, %Y' }}</time>
            {% if include.index %}</a>{% endif %}
          {% endif %}
        </span>
      </p>
    </header>
    <div class="comment__content">
      {{ include.message | markdownify }}
    </div>
  </div>
</article>

Receiving New Comments

Like explained in https://mademistakes.com/mastering-jekyll/static-comments/, the software https://staticman.net/ allows to feed POST HTTP requests to Github and Gitlab pull requests, so that comments can be added automatically. Of course, the website requires after each time a rebuild.

I had much trouble to setup Staticman. Eventually, I decided to use a Ruby CGI program that emails me the new comment as an attachment. I like Ruby very much. :wink: Once I figured out how to use the Gitlab API wrapper, I may also use pull requests instead of email attachments.

# file: 'index.rb'
#!/usr/bin/env ruby

Gem.paths = { 'GEM_PATH' => '/var/www/virtual/rriemann/gem' }

require 'cgi'
require 'yaml'
require 'date'
require 'mail'

cgi = CGI.new

# rudimentary validation
unless ENV['HTTP_ORIGIN'] == 'https://blog.riemann.cc' and
       ENV['CONTENT_TYPE'] == 'application/x-www-form-urlencoded' and
       ENV['REQUEST_METHOD'] == 'POST' and
       cgi.params['email']&.first&.strip =~ URI::MailTo::EMAIL_REGEXP and
       cgi.params['age']&.first == '' then # age is a bot honeypot
  print cgi.http_header("status" => "FORBIDDEN")
  print "<p>Error: 403 Forbidden</p>"
  exit
end

output = Hash.new
date = DateTime.now

output['id'] = ENV['UNIQUE_ID']
output['date'] = date.iso8601
output['updated'] = date.iso8601
output['origin'] = cgi.params['origin']&.first
output['slug'] = cgi.params['slug']&.first&.gsub(/[^\w-]/, '') # some sanitizing

output['name'] = cgi.params['name']&.first
output['email'] = cgi.params['email']&.first&.downcase&.strip
output['url'] = cgi.params['url']&.first
output['message'] = cgi.params['message']&.join("\n").encode(universal_newline: true)

output['replying_to'] = cgi.params['replying_to']&.first

#Mail.defaults do
#  delivery_method :sendmail
#end

Mail.defaults do
  delivery_method :smtp, address: "smtp.domain", port: 587, user_name: "smtp_user", password: "smtp_password", enable_starttls_auto: true
end

mail = Mail.new do
  from    'no-reply@domain' # 'rriemann'
  to      'comments-recipient@domain'  # ENV['SERVER_ADMIN']
  reply_to output['email']
  header['X-Blog-Comment'] = output['slug']

  subject "New Comment from #{output['name']} for #{cgi.params['title']&.first}"
  body    <<~BODY
    Hi blog author,

    a new comment from #{output['name']} for https://blog.riemann.cc#{output['origin']}:

    #{output['message']}
  BODY
  add_file(filename: "#{output['id']}-#{date.strftime('%s')}.yml", content: output.to_yaml)
end

mail.deliver

if mail.error_status then
  print cgi.http_header("status" => "SERVER_ERROR")
  cgi.print <<~RESPONSE
    <p><b>Error: </b> #{mail.error_status}</p>

    <p>An error occured. Please try again later.</p>
    <p><a href="javascript:history.back()">Go back</a></p>
  RESPONSE
else
  print cgi.http_header
  cgi.print <<~RESPONSE
    <p><b>Thank you</b> for your fedback! Your comment is published after review.</p>
    <p><a href="#{output['origin']}">Back to the previous page</a></p>
  RESPONSE
end

To make it work with Apache, you may need to add these lines to the Apache configuration (could be a .htaccess file):

DirectoryIndex index.html index.rb
Options +ExecCGI
SetHandler cgi-script
AddHandler cgi-script .rb