5 Stars of Ambiguity

Originally posted at my Ludum Dare blog

Lots of blog posts this morning about ratings.

I wrote this post last week when I got started with reviewing the LD24 entries. Mostly this was advice for how to make YOUR game if you want me to rate it higher. Partly it was advice on how to look at a game when giving it your rating.

I recently posted a comment in Oogby’s blog, where he was talking about his own feelings about rating LD games:

Ludum Dare ratings are highly subjective. It shouldn’t matter because the weight of averages should (in theory anyway) correct for bad judges, even if everyone is a bad judge. [This works better in theory than in practice, but generally speaking if you really could get all 1400 of us to rate every game, it would work pretty well in practice.]

It’s not a perfect system and it’s not intended to be. There’s almost nothing at stake beyond how much value you personally place on the rankings, so it’s not a big deal. It’s a pragmatic solution to find a way to recognize the better efforts among the pack of entrants, and it works well generally although, sure if you wanted to quibble about the #7 game being inferior to the #8 game in some category, I’m sure that happens. That’s why we all get to apply our own rating — we can rank however we want, secure in the knowledge that we know best. The overall ranking scores merely tell us what the participants as a whole felt about the games they rated.

Not everyone rates every single game, most in fact do not, and it’s easy to overlook great games. Most of the best games from LD23 I didn’t even find until after the rankings were published.

It’s best not to take rating (either rating someone else’s game, or how others rate your own game) too seriously. They’re just opinions, after all. You can use them to guide how you think about the objective quality of your game if you want, or you can discard them entirely, or rage at them, or anything you want to.

I think that’s worth keeping in mind.

When I was younger, I would have felt very strongly about the ratings meaning something objective and quantifiable. I now look at this view as naive, and that the ratings are just a tool. An imperfect tool, and something not to take as written in stone, or objective in any way. They’re the aggregated consensus of our individually limited opinions. We can be wrong or right about any particular factor that we base our ratings on. We can have different tastes. We can see things others miss, and miss things others see.

Nevertheless, I feel like when I apply a star-rating to a category for some game that I’m rating, I need to know what that rating means. It may well mean something completely different to everyone else who applies their rating to the same game, but I need to know what I mean by it when I give that rating. I expect there’s some similarity, as well, of course, but probably quite a bit of variance.

It’d be controversial for me to say “you’re all rating wrong, here’s how to do it.” I’m not saying that, at all. I just want to convey my viewpoint and my thoughts, in order to share them. And you may agree or not, and in any case whether you agree or not is less important and less interesting than whether you choose to engage me with your thoughts and allow me to be shaped in turn by what I think about them and how I choose to respond. I’m always looking for new ways to think about things, hopefully to improve myself, but also to understand others.

So, then: What the hell is a “star”?

Does “star” have an inherent Meaning?

Stars sound like good things. We have five star movies, hotels, and generals, and by and large we thing highly of these things, or at least respect them.

But what about other symbols? We don’t use them, but I could conceive of a strange rating system that would allow me to grant all the Lucky Charms symbols, not just stars. Maybe a mix of stars and hearts, or maybe all the Zapf Dingbats. I kindof want to have a few airplanes and boat anchors in my ratings.

That’s absurd, isn’t it? These things are just tokens, they don’t stand for anything, they’re just a thing to be counted, is my point. Sure, we could invent meaning: the stars mean talent, the hearts mean passion, the boat anchors mean something else, I don’t know… but that’s not the point of how the rating system is set up, is it.

We keep it simple so that we can understand what a rating means quickly, at a glance. If we wanted to take a lot of time to mull over a complex rating, we might as well take that time to experience the game directly. Simple ratings scores sacrifices nuance and detail and precision, and that’s okay. It’s more than OK, it’s a feature.

We add compexity back in by having multiple ratings categories. That way we can fairly assess specific aspects of our games. But we don’t want to have so many categories that the ratings become once again too complicated.

Minimum rating?

Even a 1-star general is still a general. We don’t think they’re a “bad” general. They’re just a lower rank than a 2-star general.

But when we think of a 1-star hotel, restaurant, or movie, we think they’re to be avoided. A videogame is more like a movie than like a military officer, so it’s tempting to discard the notion that a 1-star rating could mean something is good. Some people really shy away from using the 1 (or sometimes the 5, or both) star ratings, preferring to reserve these for the ultra rare games that “truly” deserve the highest or lowest rating. Effectively, they constrain themselves to a 4- or 3-star system, then. Which, when you realize that, is pretty silly.

Personally, I believe that unless you’re willing to use the full range of the rating system, you’re not using the system correctly. That means, for me, I have no problem assigning 1 star to a game if I think it deserves 1 star. I don’t take into account whether the the developer never programmed before, or if this was their first game — in fact, I expect those entries to be of inferior quality and to get a lower rating.

I hope this doesn’t discourage anyone from continuing to make games and try to do better each time. We all start out sucking. We enjoy what we do and we have some success at it, and we keep doing it, and we get better. Low ranking shouldn’t be interpreted as “give up, find something else” — it should mean “keep trying, learn and get better.” Of course, some people maybe will give up, and find other things to do. But this is a decision that should come from within, not be influenced by what other people think.

I have a sense that some reviewers may be uncomfortable giving a “poor” rating to a game. I’m probably a bit opposite of that, I am more stingy with my high ratings. But I do use the full scale. I expect most games to be fairly low. After all, we only had 48 hours to develop them, how good can they really be? (Surprisingly good, in some cases, and these are the ones that get 4 and 5 stars.) But when I apply ratings, I rate the games as games — not as games that were developed in 48 hours. My thesis is that a game developed in just 48 hour can be awesome. There have been a few that I have enjoyed every bit as much as I’ve enjoyed the best games of all time. That’s really amazing when you think about it, but it’s true.

At its core, a 5-unit ranking system is just a 5-unit ranking system. How we choose to interpret the numbers can vary. Consider the following 5-value series:

  • 1 2 3 4 5
  • -2 -1 0 1 2
  • F D C B A

They mean different things, don’t they?

Well, I tend to look at the rating system as 1, 2, 3, 4, 5.

A person who tends to avoid giving out one-star ratings probably interprets the system as -2 -1 0 1 2. To them, giving a -2 feels negative, and they don’t want to be negative and discourage someone who put their heart into their project, and it was the best they could do, just not very good. So they subconsciously weight the rating “relative to what I think the developer’s ability must have been”. And unfortunately, since we are comparing the games against each other, this only skews the rankings.

Someone looking it like the american primary school letter grade system has yet a different way of looking at it: A = 100-90% B = 89-80% C = 79-70% D = 69-60% F <= 59%. Or perhaps they “grade on a curve” and try to adjust the ratings they give the game relative to the ratings they’ve already given to other games. They want to “normalize” the numbers so that the “average” score is a C, and there’s a standard distribution of the other letter grades.

My point is that everyone can make some claim of their way of thinking about the 5-tiered rating in one of these ways, or perhaps yet another way, and be at least somewhat justified.

I would, however, strongly advise against trying to “grade on a curve” because it’s impossible to know what the curve should be until you’ve assessed every single game.

Anyhow, based on my way of looking at it, you shouldn’t feel bad if you get a 1 or 2 rating. That’s still better than zero, right?

Zero Ratings

In some sense, 0 might be the lowest possible rating. But that’s not really true. 0 really means “not applicable”.

It’s hard to say someone did a bad job at [category] if they weren’t trying for that category.

But it’s still ambiguous. A zero could also mean “I don’t know” or “I forgot to rate this category.” So I don’t like to encourage thinking of 0 as “lowest”. I don’t know how they do the math when the calculate the rankings, but I hope that 0’s don’t get counted against a game.

My 5-scale

Here’s how I do it:

0 = not applicable. You didn’t try to do anything for this category, and it wouldn’t be fair to rate the game poorly based on it not having [category].

1 = Well, you did the minimum. At least there’s something. OR, the game sorely lacks [category] and needs it. It’s one thing if you make a silent game because you are using silence as an element of the game. It’s another if you made a game that really should have sounds, but doesn’t, because you don’t have the ability to fit delivery of the sound features in the time allotted. But probably all you managed here was a basic implementation, maybe some kind of placeholder content or “hello world” level implementation. Maybe it’s buggy, or maybe it’s just an idea that didn’t pan out, or maybe it’s something that had potential but needed lots more polish and balancing to make it work well.

2 = It needs something more to feel “finished” or “good”, but it’s more than just “placeholder content.” For a 48 hour project it’s enough to get by, and is probably all the average hobbyist developer can reasonably manage in that amount of time if they’re not a professional [programmer|designer|artist|musician|sound engineer|whatever]. Definitely don’t feel bad if you get a 2-star rating from me!

3 = Pretty solid, it’s evident that time was put into it, and you have some idea what you’re doing, and are talented or have a knack for good taste or good decision making. There may be flaws, and some additional polish would probably help, but as is, this is pretty good.

4 = Well conceived and well executed. Probably well balanced and consistently high quality across the whole of the game, too. Genuinely fun. Starting to not feel like it’s experimental at this point.

5 = Amazing! Your game stands up well against anything I’ve ever played. Professional quality. If I’m rating graphics, this does NOT mean Crysis. Asteroids has 5-star graphics, too. It means that your graphical style works very well and as a cohesive whole makes your game look awesome. There are MANY ways to look awesome. Just as there are many ways to excel at all of the other categories.

Leave a Reply