When do words count?

There are a bunch of word counts in my stats. This post addresses a years-old “to do” in the FAQ at the bottom - to explain which words get included, and why, with some examples.

Plan: combine what used to be in the FAQ with an old forum post and upping my own word count with extra blether.

wanna give me the scoop

So… in a multimedia work with text that stretches ASCII conventions, the exact word count can always be debated.

But, here’s my basic approach. Everything is counted by code, of course. The count of human English is fairly standard, plus I normalize a few MSPA quirks. Also as well as counting words in titles, pesterlogs and prose sections, I’ve transcribed text (and lyrics) that feature in media (image, Flash, etc.) panels.

DID I FORGET AN APOSTROPHE SOMEWHERE?

You’d think there were rigorous official rules about counting words in simple English. But, how many in the previous sentence, then? I say thirrteen but perhaps “you would” count fourteen?

On the internet I see contractions causing unresolved arguments amongst those that care. But all the definitions of the term “contraction” talk about making one word stand in for several, so I reckon once you’ve contracted it, it counts as one.

AND THAT’S YOUR CHOICE TOO. PAGE HUMAN ENGLISH.

Then there are compound words. My code couldn’t recognise portmanteaus anyway, but if the letters are all run together then it looks like one word to me too.

Once hyphens come into play, I usually honor those. So a “arm-swingy-dealy” has one more word than a “flippy-lever”. (Let’s not argue about how many letters are inside, mind!)

But for certain short prefixes I “re-equip” my common sense and count just the one word rather than two.

You Said Some Of Those Words Two Or Three Times…

One very editorial decision is to avoid overlaps and repeats that I judge don’t function as part of the narrative. Especially in images, words get repeated panel to panel - but I’m trying to measure the size of this thing as a story.

Even so, some repeats get counted because they re-introduce context the reader hasn’t seen for a while. Also, occasionally I include repeats when they seem to me to be part of the art.

…And Most Of Them Werent Words

Moving on from standard English… Troll quirktuations g–Et smoot)(ed out. So do some other occasional non-Latin characters. Then I chose to count various flavors of textual arrow (“==>” and friends) and some other things like zodiac symbols as words, because… I guess they are used that way in the text?

see if this dumbass code actually does the trick.

This all got decided by trial and error, coding up lots of minor variations, then looking at what difference they actually made (which “words” came and went) and choosing the rules that were doing more good than harm. I iterated towards satisfaction, ending up with:

image

Yeah.

Really though, most of the wrinkles don’t make a huge difference. Last time I checked, then even the most controversial and impactful of the choices above wouldn’t shift the count by as much as a full percentage point.

A gr8 example of this can 8e o8served

Remember Intermission 2? Well, unless I broke something or changed the rules since this post, you’ll see that my stats claim it has twenty words.

Let’s use it as a worked example, it illustrates a bunch of things including how much room for argument there is.

It has two pages. I count 15 words on the first and 5 on the second.

Starting with the easier second page, 006012, I count 1 word for the title “==>”, and 4 for the “END OF INTERMISSION 2.” text.

(I don’t count words in the “[S] ACT 6” link at the bottom, because they will be included as the title in the next page.)

Then on the first page, 006011, I count 4 for the title “[S] Begin intermission 2”. (I choose to include “[S]” as a word. I figure it’s kind of pronounced “sound”.)

Then I count the words in the code that appears in the Flash:

image

I include this even though it repeats what we already saw on e.g. page 003991, because we haven’t seen the code for ages and I consider it a key part of the narrative in this scene.

image

I count 9 words in the Flash code: “import”, “universe”, “U”, “~ATH”, “u”, “EXECUTE”, the flashing pool ball, “THIS” and “DIE”. Keywords and symbols from code are of course weird edge cases! Especially the flashing pool ball - it certainly isn’t an identifier that I can pass as a parameter in my own coding. But I “transcribed” it, and so that’s what my script ended up counting here.

Lastly there are 2 “HONK"s.

image
image

So adding all that up gives a total of (1 + 4) + (4 + 9 + 2) = 20 words for HSI2.

Worth noting what I didn’t count: numbers on the two clock faces, and on the many many pool balls. There are places in the story where the pool ball numbers seem like maybe they are part of the narrative, but their flashiness makes it implausible to count, so I just always ignore them.

image
image

(I do count e.g. ticking numbers in countdown timers, though. Like I said: my rules are arbitrary!)

  1. im-the-antonymph-of-your-mom reblogged this from readmspa
  2. farawaytimes reblogged this from readmspa
  3. rafr reblogged this from readmspa and added:
    For anyone wondering how I do the word counts on Acts and Pages of Homestuck, the answer is that I get them from Anthony...
  4. love-n-stuffs reblogged this from togrutajedi
  5. togrutajedi reblogged this from readmspa
  6. readmspa posted this
blog comments powered by Disqus