mrhands

Sexy game(s) maker

  • he/him

I do UI programming for AAA games and I have opinions about adult games


Discord
mrhands31

Or: How the left-pad incident happens

Last night, I was looking for a way to count words in my Markdown files, and I found this package called markdown-magic-wordcount. Looking closer, it's actually a plugin for markdown-magic, which does a whole suite of transformations on files written in Markdown.

Hmm, I figured, do I really want to pull in this whole dependency just for this one plugin? I'd better check out the source. And down the rabbit hole, I went.


Friends, this is the entirety of the source code for the markdown-magic-wordcount plugin:

/* Custom Transform Plugin example */
const wordcount = require('wordcount')

module.exports = function WORDCOUNT(content, options, config) {
  const count = wordcount(config.outputContent);
  return count
}

Okay, that's hilarious. All this package does is pull in another package to call one function. It doesn't even do any of the work itself! Checking out the source code for the wordcount package reveals that it also just pulls in another package:

/*!
 * wordcount <https://github.com/jonschlinkert/wordcount>
 *
 * Copyright (c) 2014-2015 Jon Schlinkert.
 * Licensed under the MIT License
 */

'use strict';

var matches = require('match-words');

module.exports = function wordcount(str) {
  if (typeof str !== 'string') {
    throw new TypeError('expected a string');
  }
  var m = matches(str);
  if (!m) return 0;
  return m.length;
};

Surely, the magic is happening in the match-words package, right? Nope:

/*!
 * match-words <https://github.com/jonschlinkert/match-words>
 *
 * Copyright (c) 2015, Jon Schlinkert.
 * Licensed under the MIT License.
 */

'use strict';

var regex = require('word-regex');

module.exports = function(str) {
  if (typeof str !== 'string') {
    throw new TypeError('expected a string');
  }
  return str.match(regex());
};

Finally, we've arrived at word-regex, with a power so impressive that the only way to contain it was to put it into a separate package:

/*!
 * word-regex <https://github.com/jonschlinkert/word-regex>
 *
 * Copyright (c) 2015 Jon Schlinkert.
 * Licensed under the MIT license.
 */

'use strict';

// Modified from: https://github.com/lepture/editor/blob/master/src/intro.js#L343
module.exports = function () {
  return /[a-zA-Z0-9_'\u0392-\u03c9\u0400-\u04FF\u0027]+|[\u4E00-\u9FFF\u3400-\u4dbf\uf900-\ufaff\u3040-\u309f\uac00-\ud7af\u0400-\u04FF]+|[\u00E4\u00C4\u00E5\u00C5\u00F6\u00D6]+|[\u0531-\u0556\u0561-\u0586\u0559\u055A\u055B]+|\w+/g;
};

So here's what I did. I uninstalled all these dependencies and made a new Typescript file called word-count.ts for my project:

/*!
 * Taken from word-regex <https://github.com/jonschlinkert/word-regex>
 *
 * Copyright (c) 2015 Jon Schlinkert.
 * Licensed under the MIT license.
 */

const MatchWords = /[a-zA-Z0-9_\u0392-\u03c9\u0400-\u04FF]+|[\u4E00-\u9FFF\u3400-\u4dbf\uf900-\ufaff\u3040-\u309f\uac00-\ud7af\u0400-\u04FF]+|[\u00E4\u00C4\u00E5\u00C5\u00F6\u00D6]+|\w+/g;

function WordCount(text: string) {
	return text.match(MatchWords)?.length ?? 0;
}

export { WordCount };

Thank you for your service, Mr. Jon Schlinkert.


You must log in to comment.

in reply to @mrhands's post:

// Modified from: https://github.com/lepture/editor/blob/master/src/intro.js#L343

Curious, I decided to check out the source:

/* The right word count in respect for CJK. */
function wordCount(data) {
    //...

Apparently along with a regex it has an algorithm for handling CJK so that it counts CJK characters separately as individual "words" for the purposes of word counting. This is simply lost in the code quoted in OP.

(I'm using this version specifically https://github.com/lepture/editor/blob/7cc8fe036ed0b6b90b8eb0894590fc15a6cb1abf/src/intro.js#L328 but I don't think it's ever been modified)

PS: I suspect the regex can be fixed by just not using + after the CJK class, but I haven't verified