Tag: AI

ZORK|DALL-E

I decided to play Zork through DALL-E on Twitter, and created a new account for it:

Follow along, if you like!

Artist’s Statement

I’m using a walkthrough, because I can’t remember how to solve all the puzzles 42 years on, not that I ever DID solve all the puzzles back in the day anyway!

I was all of 8 or 10 years old when I first played Zork, I think, in 1982 or ’84, on a PC at a friend’s house that I didn’t have infinite time on, so give me a break. I also had a Choose Your Own Adventure-like book adaptation of the game, which I “played” through many times when I was the same age. It wasn’t published by the CYOA people, but it was the same concept — Choose Your Own Adventure stories are basically a print form of the text adventure computer game in many ways, although a bit more limited in the choices the player can “input”. Zork was one of the most popular home computer games in the early 80s, at a time when tabletop role-playing games like Advanced Dungeons & Dragons were at one of their early peaks in popularity. Together these games created the cornerstones of the geek subculture, a movement which has blossomed and thrived since then, particularly as the internet took off in the 1990s.

DALL-E is a text-to-image AI developed by OpenAI, that uses natural language inputs to generate high quality images. It’s been growing in popularity in recent weeks as the internet has begun to discover and share the images created by it. A twitter account @weirddalle is worth a follow, if you like that sort of thing. (Which, who possibly couldn’t?)

I’ll be feeding the text of Zork through DALL-E as input, and the results, will be the images that I tweet along the text from the walkthrough on the twitter account for the project.

I get rate limited to 50 posts every 23.5 hours on DALL-E, so each time I hit my limit, I’ll have to take a break. Accordingly, it’s expected that this will take maybe a few days or weeks to complete. It’s also possible that I could run afoul of DALL-E’s anti-abuse filters with some parts of the game, and if that happens I will failover to CrAIyon, the DALL-E Mini AI. It doesn’t generate as good images, but it’ll do as a backup.

I’m really pleased with this project, it’s so simple and the execution is easy, but it’s fun, and I feel like a creative guy just for having the idea to do it. Simple ideas really make me happy.

To put Zork, one of the earliest PC text adventures, which was released some 42 years ago, into an AI-based text to image generating system, and see what it outputs for illustrations seems like the funnest, coolest thing you could do, and a great way to tie the cutting edge of technology to some of its early roots.

Not all of the images DALL-E will generate will be accurate to the game, and that’s OK. It’s fun just to see what it comes up with, using the sparse descriptions that the game gives. Most of Zork took place in your imagination, and so we get to see what an AI might imagine.

The downfall of this process is that DALL-E will not remember from one run to the next all the context from the previous events in the game, so it will in many cases forget things that it should be aware of, resulting in some odd continuity. But that’s not the point, of course. The point is to do something fun with technology, playing with it to see what happens.

If you want to play Zork for yourself, you can do that! It’s free to play in your browser through an embedded DOSBox emulator.

DALL-E mini… legit or cheat?

DALL-E mini is a scaled down implementation of DALL-E, a neural net AI that can create novel images based on natural language instructions. I’ve been having fun trying it out.

I like pugs, so I have told DALL-E mini to make me a lot of pug pictures of various types. A lot of the results look at least halfway decent.

The twitter account @weirddalle posts a lot of “good” (amusing) DALL-E results. Most of which I think are safe to say are novel creations, not something you would expect to find many (if any) examples from a google image search (although, who knows, the internet is a really big, really weird space).

And then I asked DALL-E mini for “a great big pug” and the mystique unraveled for me. I could recognize a lot of familiar pug photos from Google’s image search results page. I tried to go to google to find them, but the current results it gives me are a bit different; I’d guess that the images in the screen cap of DALL-E, below, would have been the top hits for “pug” in Google Image search several years ago.

The four in the upper right corner look especially familiar to me, as though I’ve seen those four images in that arrangement many times before (as I believe I have from searching Google for images of pugs many times.) I feel very confident that I have seen images very close to all nine of them before. Of course now, in 2022, if I search google for images of pugs, I get different results. But I’ve been a frequent searcher of pug pictures for 20 years, and I’m pretty confident that perhaps 5 or 10 years ago, most of the above 9 images were first-page results in Google Images for “pug”.

So, this makes me wonder, is DALL-E merely a sophisticated plagiarist? Is it simply taking images from Google and running some filters on them? For a very generic, simple query, it seems like the answer might be “maybe.”

DALL-E’s source code is available on github, which should make answering this question somewhat easy for someone who has expertise in the programming language and in AI. But I probably don’t have much hope of understanding it myself if I try to read it. I can program a little bit, sure, but I have no experience in writing neural net code.

My guess is that DALL-E does some natural language parsing to guess at the intent of the query, tokenizes the full query to break it up into parts that it can use to search for images, very likely using Google Image search. Then it (randomly? algorithmically?) selects some of those images, and does some kind of edge detection to break down the image’s composition into recognizable objects. We’ve been training AI to do image recognition by solving captchas for a while, although most of that seems to be to help create an AI that can drive a car. But DALL-E has to recognize whatever elements form the Google Image Search results match the token in the query string. Once it does so, it “steals” that element out of the original Google image, and combines it with other recognizable objects from other images from other parts of the tokenized query string, compositing them algorithmically into a new composition, and then as a final touch it may apply one of several photoshop filters on the image to give it the appearance of a photograph, painting, drawing, or what have you.

These results can be impressive, or they can be a total failure. Often they’re close to being “good”, suggesting a composition for an actually-good image that would satisfy the original query, but only if you don’t look at it too closely, because if you do, you just see a blurred mess. But perhaps if DALL-E were given more resources, more data, or more time, it might make these images cleaner and better than it does.

(Mind you, I’m not saying using the set of data that is Google Image Search results is the cheating part. Obviously, the AI needs to have data about the world in order to apply its neural net logic to. But there’s a difference between analyzing billions of images and using that analysis to come up with rules for creating new images based on a natural language text query, and simply selecting an image result that matches the query and applying a filter to it to disguise it as a new image, and then call it a day.)

So, when you give DALL-E very little to work with, such as a single keyword, does it just give you an entire, recognizable image from Google Image search results, with a filter applied to it.

Is it all just smoke-and-mirrors?

I guess all of AI is, to a greater or lesser degree, “just smoke and mirrors” — depending on how well you understand smoke and mirrors. But the question I’m trying to ask really is “just how simple or sophisticated are these smoke and mirrors?” If it is easy to deduce the AI’s “reasoning method” then maybe it’s too simple, and we might regard it as “phony” AI. But if, on the other hand, we can be fooled (“convinced”) that the AI did something that requires “real intelligence” to accomplish, then it is “sophisticated”.

I really enjoy playing around with DALL-E mini and seeing what it can do. It is delightful when it gives good results to a query.

For example:

DALL-E: A group of pugs having a tea party
DALL-E: “A world war one doughboy who is a capybara” didn’t quite work, but I’m giving them points for trying. I wanted an anthropomorphic capybara in a WWI uniform. Perhaps I should have asked more specifically.
DALL-E: “Pugs at the last supper”
DALL-E: Pug Da Vinci
I think DALL-E gets Da Vinci pretty well. The Puga Lisa?
I like these so much.
DALL-E doesn’t do a bad Klimt, either.
DALL-E, a beautiful stained glass window of a pug
DALL-E, a mosaic tile floor depicting a pug

I would proudly hang any of these cubist pug paintings in my house.

I think this puggy spilled the paint before he could get his portrait finished.

Thinking about a human-like AI for playing Scrabble

[I got into playing Words With Friends on Facebook and my mobile phone back in 2012, and started writing a lengthy article on designing an AI to play scrabble-like games in a manner that convincingly simulates a learning human. This weekend, several years later, I’m a spectator at a local Scrabble tournament, and decided to finally finish up my thoughts.]

Designing AI for Scrabble-like games

I’ve been playing the Zynga game Words with Friends with various people for a few weeks, and have gotten progressively better at the game. After looking back and reflecting on the evolution of my play, and the development of my strategy, I became inspired by the idea of a convincingly human-like AI that embodied the various stages of my development as a player.

While actually programming it is a little more effort than I want to put into it, even just thinking about the design for such an AI is interesting.

(more…)

AI_targeting debrief

First, who would have thought that one small component of AI behavior for my game would have taken so long to get working?

I was on a good roll, making steady progress on my project for most of December. Then the holidays hit and I couldn’t work on the project as much as I wanted. I had also just started to run into some stuff that was a little tricky (not that it was really hard, just that it was new to me) around this time, so the lack of putting time into it also made me feel nervous that I’d get stuck. There’s no way I’m ever giving up this project until I complete it, and that’s that, but I’ve run into problems in the past with projects where I get stuck, don’t know where to turn, and it sucks a lot. Oftentimes that puts the entire project at risk. But this is a project that I’ll never accept failure on — I’m working on an idea I had 30 years ago, and if it’s been in my head that long, and not gone away, it never will.

So, into January, I had less time than I hoped to get back into the project. When I did, I wanted to make the time productive, so I tended to pick things that I knew I could do, and that needed doing, but not necessarily the thing I’d gotten stuck on. That’s OK, but normally when you see something is going to be hard for you to figure out, you should wade into it and tackle the problem. I didn’t do this with myself, so much as I tried an idea a little bit, and when it didn’t do what I was expecting, I put it aside again and worked on something where I had more traction. I had a fatalistic sense of “When I am ready for this to make sense to me, it will.”

Also, during a lot of this time I was spending a lot of my project time on reading documentation, not coding. It was a struggle to make sense of what I was reading. My mind kept tripping up on something that didn’t make sense to me, and which in the end turned out to be inaccurate (unless I *still* misunderstand something, but I don’t think so). So that wasn’t too helpful.

In the reading that I did, I discovered a lot of things that merited further reading, and had to trace down a lot of avenues that potentially could have led to my solution, but didn’t. This wasn’t wasted time, though, because a lot of that stuff may end up becoming useful later, and having a clue that it’s out there is going to be helpful down the road.

Ultimately, I was able to prevail over my problem, get un-stuck, and deliver a working proof of concept. I need to do some further work to turn this proof of concept into an extension that I can import into any future Game Maker project that I work on, and from there I still need to bring it into my game project. But that’s all academic, and I have no doubt that I will get it done, and so I’m able to confidently declare victory at this point.

My initial attempts to implement the solution I was after focused on doing it directly in the current game project. I’ll call that a mistake now. For one, the existing game already has a lot of stuff in it, and the complexity of it makes it difficult to see (or think about) any new problems clearly. I had several false starts which ended up failing, trying this way.

Eventually, I got to the point where I recognized that what I needed to be able to solve the problem was simplicity. So to get that, I started a new project, and threw into it just enough bare bones to provide me with the building blocks I needed to test out the AI code that I was trying to figure out how to write.

So I did that. Twice. The first time was almost right, the second time was right, at least so far as it went, and I’d figured out enough to know that what I’d built there would work for what I need, but I need to do the rest of it back in the main project. The first attempt help me to figure out what I was doing wrong, or rather, what I needed to do.

So, that exercise was very beneficial. The second attempt only took me about 5-6 hours of hacking away at it to get it to work, which is about par for every other feature that I’ve committed in the project so far. So the fact that it took a few weeks of thinking, procrastinating, reading, and trying various things doesn’t worry me so much. I know the next time I get stuck with a problem like this, I’ll get to the solution that much sooner because I can take this general approach to it.

What was the most useful for me in solving this was the stuff I built into the project to provide me with feedback so I had something to diagnose. I strongly recommend building instrumentation and logging capabilities into whatever code you write. Otherwise, you’re only able to see what you can observe from the outside, which often ain’t much, and is apt to be very confusing when the application is behaving in some bizarre, unexpected way that you can’t figure out based on what you thought your instructions were saying to the compiler or interpreter.

2D Targeting for AI in Game Maker 8

After several weeks of effort, I have finally nailed an effective set of 2D targeting scripts for AI in Game Maker 8.

The story for this is worth telling sometime, but for now I’ll just be posting a video demo:

Source .gmk is available on Releases.

I’ll be refactoring this into a Game Maker Extension (.gex) soon as well, which will also be available along with full source.