Water Dissolves Water: The Crescent Heart

Today, this 26th day of Childwinter 2024, is known as The Crescent Heart. I have no idea what the name means, although it does sound poetic. A Google search directed me toward recipes for some kind of pastry, which I'm pretty sure isn't what Angus MacLise was thinking.

For this post, I tried to create an image of a crescent moon inside of a human chest where the heart should be, but with little success. I tried using several different AI models - DALLE-3, Midjourney, Stable Diffusion XL, and Playground - and multiple prompts for each, but none generated an image even close to that which I had in mind. I got crescents tattooed on chests, I got anatomical diagrams in front of crescent moons, I got crescent moons floating in front of humans with varying levels of internal organs revealed. But I couldn't get a crescent instead of a heart, only various combinations of the two.

It's hard to generate a specific image exactly as you imagined it, but that's part of the fun of using generative AI models. It's a collaborative effort between you, the user, and the model. Between man and machine, as it were. Generally speaking, the embellishments that the AI adds, like the ornamentation along the borders of the picture above, are welcome enhancements to the imagery. And if you don't like the embellishments, you can just try prompting again and see what comes up next.

Yesterday, the New York Times ran an article about AI image generators and copyright infringement. The Times has sued OpenAI for copyright infringement based on its alleged use of the paper's news content. OpenAI developed the DALLE-3 image generator, which is accessible for free to any and all users through the OpenAI app and website. The model was also licensed by Microsoft and is accessible on its Bing website.

But the Times article wasn't about OpenAI or the DALLE-3 model, but about Midjourney, a separate software created by a separate developer, but which also generates images based on prompts suggested by the user. According to the article, a movie concept artist based in Michigan and a professor at NYU were able to use Midjourney to create images that were very nearly identical to scenes from motion pictures, which of course are protected by copyright. The article claims the generated images are clear evidence of exploitation and use of intellectual property for which Midjourney is not licensed.

What bothers me is that the prompts used specifically requested a "screenshot from a movie" and a "popular movie screencap" and then they were shocked - shocked, I tell you - clutching their pearls when the AI gave them exactly what they were requesting.

Putting aside the issue of fair use and whether this is copyright and IP infringement at all, the question is who's the bad actor here - the two who requested a copyright image or the tool they used to generate it? If I were to illegally take a photograph of a protected artwork, who is guilty - the photographer or the manufacturer of the camera?

Also, and this may not surprise you, but I used the exact same prompts identified in the article in Midjourney and got different, sometimes very different, results. I can only conclude that the two must have rolled the dice many, many times before they got the close matches shown in the article.

There is no discussion in the article on how AI image generators actually work. Many online commenters on the article expressed outrage that the models can freely go online, "scrape" the protected content of websites to create a database of images, and then use the collected images to generate their own creations.

But that's not how generative AI works. There is no "database" of collected images and the generators don't assemble bits and pieces of other images and then sort of "photoshop" them together. They do not simply slap images together like a collage. What they do is actually far more complex and, frankly, amazing than that.

I'm no tech or AI expert, but even I know that the models don't "know" or "think about" how to fulfill a prompt, even though we often use those very words to describe the process. Instead, they translate the prompt words into mathematical terms so they can then do predictive diffusion, and then they apply a learned set of probabilities that correspond to the prompted words. The models predictively remove pixels from a starting image of random static, and then repeat this process over and over again until they form a final image.

The models "learn" by reviewing millions and millions of images on the internet. The "scraping" is fundamentally the same thing as Google and other search engines do with words. A human looks at pictures and images and doesn't "steal" but learns millions of different lessons and techniques, which in turn influences (consciously or otherwise) their resulting style. All artists, without exception, learned and are influenced by the artists before them. It's the same with AI, and the data the models train on is so vast that any one artist's entire lifetime catalog of work wouldn't constitute even a tiny percent of the information in these models.

But the Times article doesn't describe any of that and instead allows literally hundreds of commenters to express their outrage over a misunderstanding of the technology. The article doesn't explicitly promote any disinformation but comes very close to implying the models are illegally storing copyrighted material. It's tempting to think the Times, with their lawsuit against OpenAI, are deliberately trying to stir up public sentiment against AI to support their case.

It doesn't comfort me that I wrote a comment on the article stating that the Times should clarify how the models actually work and to be more transparent about their role in the lawsuit, and 24 hours later now, the comment is still "waiting approval" to be uploaded.

Water Dissolves Water

Friday, January 26, 2024

The Crescent Heart

No comments:

Followers