Why We Need Web 5.0 Before Web 4.0
A draft protocol aimed at saving our cognitive legacy and future from Generative AI
Web 3 represented a decentralized web (blockchain). Web 4, which is not here yet, might represent a better and more nuanced version of it.
As of today, Wikipedia does not define Web 5.0, so I use the term to represent my idea.
Having dealt with the nomenclature, let me expand upon an existential problem that Web 5.0 will try to mitigate: How to defend human-generated content from AI.
Content creation involves three distinct phases: Information gathering, analysis, and synthesis.
The conventional web solved the information-gathering problem. Anyone could access anything, provided they knew the right kind of search query.
However, analysis and synthesis (and their relative degrees) distinguished creators from consumers.
With generative AI, that distinction is being erased.
There are good uses for it, though. Frankly, being a programmer, I can’t appreciate it enough. So before you put me into one of those AI-opposing camps, let us begin to understand the most basic implications of generative-AI lead businesses.
Let’s say you start a recipe website:
In the times of the conventional web, you would at least have to be a foodie, if not a chef.
You would spend serious time templatizing what constitutes a good recipe. You might also spend some time scouring exotic recipes that your competitor websites don’t have.
Once a basic framework is laid out, you would outsource the mundane tasks of search foods-write-publish, and get your content mill running.
In times of Generative AI, you don’t need to be anything related to food-making. Passion for your creation was the most needed ingredient (pun intended), which is obliterated now. Prompting about a food recipe isn’t the same as writing it from scratch.
Without much knowledge about food, writing, or programming, you can have a recipe website up and running, in a day. You don’t need anyone to maintain it.
Some people will be jobless: Those who did the brainless work of publishing copy-paste recipes and maintaining a small business website. I am not worried about them, though. They will be forced to up their game and probably succeed in it.
My side is clear: Generative AI is the rightful winner here; it took the messy and meaningless out of the process.
Let’s say you are a poet:
This case is different. I will pick the side after the case is over.
With genuine poetry passion in your heart, let’s say you write mediocre poetry.
Your Twitter-freak friend who is farthest from literature decides to use ChatGPT for the same task.
With his street-smart instincts, he is able to instruct the AI to write poems in the style of Edgar Allan Poe. His poems possess dark shades of melancholy that Poe is known for.
You both run a popularity poll contest on Twitter.
Your friend’s poems win.
But as a genuinely ambitious poet, you take defeat as a challenge. After all, Poe’s style had been available all along. You have to give it to your friend for his creativity — combining unrelated ideas to construct a novel product. If you had thought about it, you could have done it better than AI.
Over the years, you toil hard to develop your poetry portfolio, developing your style .You gain several genuine followers on social media, too.
Your friend, who has no room for poetry in his heart, is smart enough to find the business value in your poetry. He hires a machine learning engineer to train an AI bot on your poetic style.
In trying to enslave/outpace machines, you are deviating your attention from pure art and craft to how the machine works best, and how your peers use them to combat you.
Within a few weeks, you see social media accounts that are serious contenders for your style of poetry. Yet, you cannot sue them, because there is no plagiarism. Even if you do, they can pivot to a different style and stay in business.
Your organic growth strategies go down a pipe carrying internet cable leading to an AI data center.
Verdict: Your friend wins the creation battle.
Also: You learn your lesson: Without machines, you can’t take your emotions and expressions to the world.
You can’t outpace them, so you need to use them. In doing so, you are deviating your attention from pure art and craft to how the machine works best, and how your peers use them to combat you.
Unlike the recipe website example, I am not with AI in this matter. We need a way to distinguish AI-produced content from the humanly-crafted one.
The fundamental question in distinguishing AI-produced content:
They say you could detect if a piece is produced by a human or AI.
Today, if you Google the term “AI content detector”, several entries pop up: Contentdetector.ai, Copyleaks, GPTRadar, and Contentatscale.ai.
They are the descendants of plagiarism checkers, and it is easier to defeat them.
OpenAI (maker of ChatGPT) has also made a text classifier. It can detect if a piece of text (minimum 1000 characters) is generated by ChatGPT or not.
All these tools-
- Take the final draft as input.
- Try to derive certain properties of text.
- Evaluate these properties by assigning some numbers
- Provide a probability value (% certainty) that would say if the text was AI-generated or human-written.
A slightly higher than all these combined is a public-domain tool called GLTR. It is made by IBM Watson lab, and is open-source, which means that anyone with tech know-how can oversee/change the way it works. It can detect text generated by a wide variety of LLMs.
GLTR is highly used in academia to verify student essay submissions.
This doesn’t solve our root problem, though.
The difference between an AI-produced piece and a human-generated one lies in the process, not in the outcome.
It’s easy to catch someone who has plagiarised. It is possible to engineer a submission gateway AI-checker that will primitively ensure all entries are original.
It’s also easy to catch someone who has utilized an AI tool as long as the submission is not modified after the AI phase. The AI tools I mentioned above are made exactly for that purpose.
However, AI detection is not easy when:
- Two people experts in prompt engineering and writing have collaborated to train an LLM, followed by an unlimited number of prompts to generate pro-writer-like text.
- An expert writer goes meta: In one browser window, he/she uses an AI-bot to generate content (e.g. “Write me an Alice in Wonderland style opening to my newest children’s Novel”). In another window, he edits the style + vocabulary (“Change the style of text to that of a 21st-century children’s author”). Yet another window to modify the dialogues to include modern slang? Sure. ChatGPT is quite extensive in its fiction writing capability.
- Two AI-bots (Bard and ChatGPT) collaborate to create and edit a piece i.e. one bot’s output is another bot’s input, and so on.
All the AI-detection tools suffer from a common shortcoming: They evaluate the content, but completely neglect how the content traveled from the creator’s mind to the computer/paper i.e. transfer of content into the real world.
The difference between an AI-produced piece and a human-generated one lies in the process, not in the outcome.
There is a solution if only the right people want to build it and sustain it.
Human-Generated content requires its separate web (Web 5.0):
The idea behind Web 5.0 is to segregate a portion of the web that will:
- Evaluate content on its human-generation likelihood scale, based on how the content was generated (aka human-content websites)
- Incentivize human-generated content, and block/punish AI-generated one. (aka content beneficiary websites)
The good news is, it is possible to achieve the detection using the content-generation process.
The bad news isn’t that it is technically complex. With elaborate design & engineering, tools can detect how a given piece of text/image is created.
The bad news is that on top of the technical solution, it requires cooperation from the current beneficiaries of AI.
Here is a high-level strategy/fundamental constraints that drive Web 5.0:
- Design a separate web (much akin to a blockchain) where only human-generated content-creation websites reside. They could be certified websites with a .human domain, for example.
- These .human websites host writing tools written in open-source software. (they can host drawing/video editing tools as well. For the sake of this article, let us stick to the simplest format — text)
- The tools should be designed to detect human keystrokes and/or other input methods. They should also detect a computer user’s research habits/pattern.
- Based on the detected pattern, the tool would score the content on the produced-by-human scale. A dedicated browser designed to detect efforts to produce the original would immensely boost the accuracy. In the future, brain-computer interfaces (e.g. Elon Musk’s Neuralink) could also detect brain activity while typing/speaking to determine whether the content was copied or original. (remember Elon is among those who advocate full-STOP beyond GPT 4? I don’t know if that was a coincidence.)
- The tools must prevent the basic paste function. Voice input (with a filter to ignore machine voice, or disincentivize fluent reading from another source) should be allowed to provide accessibility. The paste operation can be selectively allowed from places like Wikipedia. For this to happen, internet/desktop/mobile device software standards must change drastically.
- An action (Submit button) would post the content to a content beneficiary that accepts and rewards exclusively human-generated content. Before this step, a smart AI detector can perform the final check on the content, and include its score in the submission package. This way, the content beneficiary has enough level of trust regarding the human touch involved.
- Content beneficiaries must be legally bound against using human-generated content to train the AI. Or they must do so under transparent manner, and the outcomes of training must be segregated enough.
- The solution can be hosted onto a public blockchain, ensuring complete transparency, gated by a paywall controlled by the beneficiary website.
Over to my tech friends!
This moment deserves serious consideration to WWW redesign, as it is akin to the birth of the new web itself.
- As someone who has spent 23 years in tech, I find it impossible to refrain from saying that the road to hell is paved with good intentions. Although this is only a draft protocol for Web 5.0, I feel it is necessary to address some basic feasibility questions on a product engineering level.
- It would still be possible to generate AI content in one browser window and type it by hand on a .human website in another browser window. However, an in-depth study of writers’ typing patterns could eliminate this, and AI itself could come in handy in figuring this out.
- Another loophole: Someone in the US/UK could email an AI-produced piece to an Asian company, where a $3/hour typist could input it inside a .human website. However, such efforts won’t scale, especially if certain access restrictions are applied. For example, an independent body could collect staggering fees for .human domain allowance IPs for companies. Access can also be geographically tied with the legal creator (i.e. if the legal owner of the content is in the UK, his submission can’t be accepted from an IP that’s far from the UK). As content-at-scale becomes punitive, content-mongers will be less likely to exploit it.
- I still hear many “That’s not how the web is designed” voices. Yes, you are correct. But this moment deserves serious consideration to WWW redesign, as it is akin to the birth of the new web itself. Also, with advancing research in 5G and 6G, internet connectivity is becoming cheaper by the hour. We would keep witnessing newer webs of the existing scale in less and less time. If AI-threat requires redesigning existing models, so be it.
- Open sourcing of AI generative models: I hear many advocates of this. Obviously, much of this is coming from competitors of OpenAI. While open-sourcing of generative-AI is welcome, it is not enough. Most of the Blockchains were open source, and it did little to de-risk the most susceptible masses. The field of machine learning is one level above general programming in its cognitive complexity. Even with best educational outcomes, it would take at least a couple of generations before many of us will even be able to appear in command of it. So yes, open sourcing of generative AI is necessary, but light-years away from sufficient.
- Let us not forget that this is only a start. As more minds understand the spirit, we would be better armed to combat challenges.
Is it worth it?
Would we allow robots to compete against humans in the Olympics or FIFA?
At this point, I have read at least 40 pieces that say a variation of this:
- Writers must try to enslave the AI to up their game
- There is no point in opposing an innovation because we will be left out.
While I agree with them in spirit (At best, they want to save present writers from fear of obsolescence), they are perhaps underestimating the size of the threat AI-generated content presents.
The invention of the Gutenberg press connected us to our collective past. Although it can be argued that it was an automation of the same kind as AI, it didn’t trespass on the cognitive territory associated with knowledge. It was closer to the distribution of content, and predecessor to the present-day internet.
Both inventions ended up democratizing the intellect, not strangling it.
When AI-produced pieces swim in the same lake as original content (and will surely outpace humans in their similarity to human-produced content):
- How will future generations distinguish what humans were capable of?
- How will we justify the importance of creation using gray cells?
- What will happen to our capacity to command machines to produce better and/or unique and/or life-altering content?
It took Leonardo Da Vinci some 14 years to create Monalisa.
When I buy its hand-painted clone from a street painter in Pisa for $50, I would mentally ascribe it to the following portions:
- Portion A: $30 for the beauty value it adds to my house — I know most of my guests won’t taunt me for my audacity to buy a cheap clone of a masterpiece
- Portion B: $20 for the duration and exertion it took for the painter who endeavored to clone it by hand.
Over time, our perception of portion A would only grow. And that is an area where AI will crush humans.
Shall we remember that we used to pay artists in the proportion to Portion B and how much it affects Portion A?
Creative work is different. It cannot be put in the same box as creating boilerplate experiences, where innovation should be welcomed with open arms. If you are not with me on that, you would have to agree on allowing robots to compete against humans in the Olympics.
To save ourselves from cognitive extinction, a time must come when AI-produced content will saturate the web, but human-produced content will fetch premium prices because of its rarity and exclusivity.
It would serve as a reminder of what humans are capable of achieving.
To ensure that that time comes, the separation of the human web from the general web must be engineered.
The intent is to incentivize something that’s created with first-hand research, iterations, and endless yet progressive drafts
First Web 5.0 draft might have several technical loopholes. But so did Web 1.0, Web 2.0, and Web 3.0.
In its spirit, the concept isn’t different from organic farming. The intent is to incentivize something that’s created with first-hand research, iterations, and endless yet progressive drafts.
The intent is to empower the consumption of things created with blood, sweat, and dying and regenerating gray cells. The intent is to cultivate the will to pay for the exclusivity.
Technology isn’t the hard part. Intention is.