AI-Generated Content is Vandalizing the Web - A Proposal for Resistance

Posted January 28, 2024

Earlier this week, I was asked to help a colleague replace the pneumatic cylinder in an office chair. I had never done this before, and it turned out to be quite a challenge. The cylinder is held in place with friction, and it is intended to be gripped tight enough that one can’t simply pull it out. After pounding on it from various angles with a mallet, I decided it would be wise to consult the Internet for better advice. I typed “how to remove pneumatic cylinder from an office chair” into DuckDuckGo and got the expected page full of results. Unfortunately, it became clear that the majority of those results were on poorly-crafted pages full of AI-generated garbage text. Redundant details seemed to repeat over and over. For example:

First and foremost, you’ll need to gather the necessary tools and supplies for the job. This includes items like a replacement cylinder, a wrench, and possibly a rubber mallet.

Then one paragraph later:

To replace the pneumatic cylinder in your office chair, you’ll need a few tools and supplies.

A few paragraphs later:

To replace the pneumatic cylinder, you will need a few tools such as a wrench, pliers, and a rubber mallet.

Reading the entire instruction HERE , you will see that this article doesn’t actually describe how to correctly replace a pneumatic cylinder that is held in place with friction. It appears to be describing a process of replacing with a cylinder that includes an entire new mounting bracket to attach to the bottom of the chair – this is my best guess anyway, as it is telling you to remove bolts and references the cylinder’s “adjustment lever,” which is not a part of the actual cylinder itself. To my knowledge, replacement cylinders are not sold with the brackets, since these can vary wildly depending on the chair. Amusingly, the article includes a video (clearly made by another publisher) that shows the correct way to do this by twisting it out with a large wrench.

You may want to ignore this as a petty complaint; the poor quality of the article is ultimately the responsibility of the human who chose to upload this junk without any editing to fix the redundancy or even basic research to verify if the instructions are correct. But I say that responsibility was already abdicated when they chose to use an automated workflow to generate the content for the page. Look HERE and HERE to see exactly how these automation techniques are being taught and encouraged. It’s possible to generate huge amounts of content this way, with very little intervention or editing from humans. The goal is to reduce the need for human input to its barest minimum and, based on the results I see, I do not doubt that most people using this technique scarcely give it a glance either before or after it has been posted.

But who cares, right? I still got the info I needed in the end, so what difference does it make? The Web has always been filled with junk, etc, etc… True, since the Internet explosion of the mid-1990’s, it has been populated with countless websites of dubious quality and content. The difference now, and my chief criticism, is that it has become so highly automated that the process of junk generation has accelerated to the point that it overloads search engine results with trash, leading to wasted time sifting through that trash trying to find something of value. It hits me as an acute problem because this simply wasn’t the fact 10 years ago. Remember when you could type a search into Google and actually get some good results right there on the first page? Then it started becoming the first page was mostly ads. Now the first page is ads plus AI trash. Now Bing has decided to skip the middleman and just serve up some AI-generated nonsense of its own instead of actual links to pages, and Google is set to be doing the same with Chrome soon having powerful new AI content generation built-in.

They seem to be envisioning a future where the Internet is no longer an expanding web of static-yet-living sites, representing an ever-growing repository of information, but instead it is something generated-on-demand. If you think about how AI-based generation works, this should stop you in your tracks. AI content generation doesn’t represent a growing body of human knowledge. It is generated from a static model that represents only a snapshot of a given moment in time. If a model was made on April 5, 2017, all the content it generates is locked to the world as it was at that moment. So, of course, new models must be generated continuously. But what happens when so much of the Internet’s content is AI-generated that it becomes incestuous and cannibalistic? We know the consequences of when humans do that, and I think the analogy here is apt. And this leads to the second big problem with AI models, the quality of the content that trains the models is absolutely critical. The quality of the way the model is trained is also critical. This has been shown time and again with embarrassing episodes of AI generating racist and outright incorrect content.

They’ve been using the whimsical term “hallucinations” for this phenomenon, but this is misleading. An hallucination is caused by incorrect processing forming an illusion of reality. But these results aren’t an illusion from incorrect processing; the processing is performing exactly as it should. The problem is poor models from poor training. And as the available content degrades further and further, the models will only become worse. The first generations of models scraped all manner of data wholesale with no regard to ownership or copyright – it was done by academic researchers who only needed massive datasets to build their experimental models; this wasn’t to build a product to sell, yet. With that in mind, I personally don’t take issue with the use of copyrighted material for the purpose of research models. But, the problem came when many of these models then got pressed into commercial service without being re-made with consideration to intellectual property rights. I’m not going to dwell on the topic of IP rights and the creation of the large language models; that is a topic that deserves its own essay. What needs to be kept in mind is this: since the use of these IP-violating models has created such a pushback, it is inevitable that the quality of material used to train them will become poorer, since creators will now be demanding just compensation for the use of their work for training and they will be keenly aware of the possibility of their work being used. The endgame of incestuous cannibalism may come much sooner than it would have through natural webscraping alone.

To this, I propose that we fight back against the AI-induced destruction of the Web by defiantly declaring our content to be generated solely by Humans. Resist any temptation to automate content generation in any way. Display the “Human-Made” label proudly and demand others do the same. Actively search for Human-Made websites and refuse to patronize those that are not. The next, more radical, step would be the creation of a new search engine that would exclusively show only Human-Made sites in the results.

Writing is magic. It allows us to transfer thoughts from our own minds directly into the minds of others; worlds that we conjured up from nothing suddenly exist, duplicated in the imagination of all who read the words. Unlike speaking, writing allows us to do this over unfathomable distances of time and space. It is the singular most sacred art that has been cultivated by Humanity for millennia. We profane it at our own peril.

Author Martin C.
Categories AI, Essays

The Flashing Cursor

A Human-Made Website

AI-Generated Content is Vandalizing the Web - A Proposal for Resistance