Openness AI and Web3
History
The convergence of open-source principles, artificial intelligence, and Web3 has ignited a transformative shift in the technological landscape. This confluence is reshaping how we think about software development, data ownership, and the business models in the AI domain. But why is this intersection particularly compelling now, and what historical precedents have set the stage for this moment?
Historical Precedence:
The open-source movement began in earnest in the late 20th century and championed that software should be free in cost, accessibility and modifiability. Projects like the GNU operating system and the Linux kernel exemplified this ethos by fostering collaborative innovation and democratising software development. This ethos was a philosophical stance supporting community-driven development and transparency.
Parallel to this, the late 20th and early 21st centuries saw the rise of artificial intelligence. Initially rooted in symbolic logic and expert systems, AI underwent a renaissance with the advent of machine learning and deep learning. Combined with vast amounts of data and computational power, these techniques led to breakthroughs in natural language processing and computer vision. However, as AI models grew in complexity and capability, so did concerns about their transparency, accessibility, and potential monopolisation by a few tech giants.
Enter the blockchain — a decentralised vision of the internet. Web3 promises to return data ownership to users, emphasising decentralisation, trustless transactions, and token-based economies. This new internet paradigm offers a solution to the centralisation concerns of AI and an alluring business model where users are stakeholders and beneficiaries rather than mere data sources.
Why Now?
OpenAI’s transition from an open-source approach to a more closed model underscores the tensions between commercial interests and the open-source ethos in the AI domain. In January 2023, OpenAI announced that it had reached 100 million users. Despite this, the code behind models like ChatGPT remains closed, serving as a ‘black box’ that raises questions about transparency and accountability. In particular, the model is trained on publicly available information, licensed third-party data, and user-provided data, the details of which are not fully disclosed.
Furthermore, the combination of rapid advancements in AI with the maturation of Web3 technologies makes a world where AI models are community-owned assets feasible. Web3’s tokenisation opportunities provide a mechanism to realise this vision, allowing ways to incentivise, distribute, and monetise open-source AI projects.
Open is not Open-Source
The advent of artificial intelligence has reignited the question of defining what “Open” truly embodies. Historically, the Open Source Initiative (OSI) provided a structured definition of Open Source, fitting for the software landscape at relevant times in the tech industry’s history. This definition struggles to encompass the evolving AI realm’s intricacies despite its clarity.
Openness in AI manifests in multifaceted ways, encompassing a spectrum that ranges from API access with reuse licences to extensive transparency in model design, training datasets, and evaluation protocols. Understanding these nuances requires categorisation:
Transparency:
For AI to be termed ‘open’, it should not be an enigmatic black box. Access to its source code, a thorough vetting of its documentation, and insights into the data it’s trained on are quintessential. This ensures that users and researchers can ascertain its processes, behaviour, and inherent biases.
Reusability:
A pivotal facet of Open AI is the ability of third parties to understand, replicate, reuse, and retrain these models. This necessitates accommodating licensing, ensuring minimal barriers to adapt and implement. This reusability paves the way for many applications, from research innovations to industry implementations.
Extensibility:
Open AI models should not be static entities but foundations upon which further advancements can be sculpted. Extensibility denotes how easily pre-trained models can be fine-tuned, adapted, and improved for specific tasks or challenges. More extensibility accelerates progress; users can harness off-the-shelf solutions as starting points for niche innovations.
While ‘Open AI’ might seem straightforward, it encapsulates complex expectations and capabilities. It’s a clarion call for enhanced transparency, seamless reusability, and adaptable extensibility, pushing the boundaries of what AI can achieve when democratised.
More web3 for more openness in AI
Web3, representing a decentralised and trustless vision of the internet, inherently champions transparency, reusability, and extensibility — the core tenets of Open Artificial Intelligence. Decentralised platforms can act as public ledgers for AI models, ensuring verifiable transparency and fostering community-driven development. Additionally, the decentralised nature of Web3 can facilitate global collaboration, allowing AI models to be improved upon, fine-tuned, and repurposed democratically.
The Triad of Artificial Intelligence: Compute, Data, and Model
Artificial Intelligence, as we perceive and interact with it today, combines intricate components. Three primary elements stand out: Compute, Data, and Model.
- Compute represents the processing power, encompassing the hardware infrastructure and the computational capabilities that allow AI models to run, learn, and evolve.
- Data is the lifeblood of AI. It feeds into the system to train models, enabling them to recognise patterns, make predictions, and execute tasks.
- The Model is the brain, the neural architecture that learns from the data, adjusts its parameters, and produces an output based on its training.
We’ll explore the impact Web3 can have on these components below.
Crowdsourced Compute with Web3:
AI’s growth is stymied by GPU shortages, which are critical for model training. Web3 can help by promoting computational resource sharing. Participants offer resources, like GPUs, to AI initiatives and earn tokens in return. Crowdsourced compute can speed up AI computations. In the future, decentralised data storage could be adopted to craft a decentralised AI-optimised cloud. Akash is an example of a functioning marketplace for underutilised computing resources.
Technical Hurdles:
- Resource Management: In a decentralised setup, handling diverse GPU types and their specific capacities is tricky. AI models must be paired with suitable GPU resources. Dynamic allocation across this distributed setup introduces added intricacy, especially if real-time bidding is used.
- Latency Issues: Data transfer time lags within the decentralised network can hamper real-time AI tasks. While local GPU clusters or edge computing might offer a solution, they bring added resource management and data security challenges.
- Scalability Concerns: As the network grows, so do scalability challenges. Conventional blockchains already grapple with scalability. Introducing AI, which demands high data and computational power, only intensifies this. L2 solutions may address these scalability issues but introduce additional complexities with data consistency and transaction verification.
Reasons to Believe in Decentralized Compute:
To subscribe to the idea of decentralised AI computation, one must recognise specific evolving market dynamics:
- Price Sensitivity: Decentralised compute solutions can offer cost-effective alternatives to traditional cloud providers, making it attractive for price-sensitive customer segments (assuming an ~equivalent level of performance).
- Scale: Untapped computing outside of centralised players exists at scale. As much as 30% of server capacity sits idle in many data centres.
- Customisation and Control: Many users, particularly tech-savvy ones, crave more control and the ability to customise their computational environments.
- Local Compute: In verticals such as healthcare, finance and defence, companies don’t want to rely on third parties for security and/or regulatory reasons.
- Access Issues: Access to established cloud platforms can be limited or non-existent in many emerging markets.
- Open Source Opportunities: Open source models and datasets have seen significant traction already, and there are likely innovation opportunities at the compute layer too. Hugging Face had >600m model downloads in August 2023 and >13000 models submitted to their Open LLM leaderboard in a few months.
Enhanced Data Privacy and Control with Cryptography and ZKML
We’ve thought a lot about cryptographic verification, trustless consensus mechanisms and liveness guarantees as an industry. Learnings and best practices can be used to build a trustless compute network with economic incentives for participation, and verification work was completed as promised.
Today, we expect compute protocols to utilise these cryptographic and game theoretic mechanisms to connect and verify deep learning work trustlessly. In the future, Zero-Knowledge Machine Learning (ZKML) could be used instead. It seamlessly merges zero-knowledge proofs with machine learning, safeguarding privacy even during computations. With ZKML, we expect:
- Guaranteed Data Privacy: While conventional machine learning methods require raw data access, ZKML will let models learn without accessing the data, a massive leap for privacy preservation.
- Secure Multi-Party Computation: Within decentralised frameworks, data for AI training can come from multiple sources. ZKML will ensure this data remains inscrutable, making it impossible for any party to reverse-engineer it.
- Staying Ahead of Regulations: The rise of data privacy laws like the GDPR demands more stringent measures. ZKML may assist in meeting these standards by guaranteeing personal data remains cloaked during computations.
Data Collection:
Sufficiently large, tagged datasets are essential for trending models. AI is an inefficient learner and requires vast amounts of data, while publicly available data sources are decreasing as content platforms realise their worth.
Similar to how DePIN networks use tokens as an incentive to bootstrap the supply sides of their marketplaces, tokens can be used to coordinate data collection within a data network. Tokens can crowdsource the collection of data that is not accessible on the publicly scrapable internet, ultimately culminating in a world of user-owned foundation models. Here, AI models are trained on users’ data directly and owned by the community whose data they are trained on. This private dataset would contain proprietary data on people’s communication from sources not typically available for AI model training.
Tokens can also be used in cases where players want to tailor the data they collect and coordinate collection. Early innings of this can be seen in the robotics vertical, where crypto incentives are used to collect real-world data to train AI for Robots. To dive into one example, one of the biggest bottlenecks in building models suitable for autonomous sidewalk robots has been labelled data. Tesla has generated over one bn miles of labelled driving data, and only with these vast sets of human action labels can they even conceive of fully autonomous driving models. Games such as Frodobots allow gamers to pilot 4G-enabled sidewalk robots on various missions in the real world while earning points that reward them for the value of the data they generate and ultimately will give them a stake in the overall network.
Breaking down the AI product
Much of the discourse for decentralised models centres on blockchains’ immutability and auditable properties, powering onchain verifiability. We’re more excited about crypto’s role in encouraging many ideas from many sources. Hugging Face has seen >600m model downloads in August 2023 and >13000 models submitted to their Open LLM leaderboard in a few months. Crypto has the potential to make it commercially viable for these models to remain open and allow users to iterate on them.
Specifically, building an end-to-end product can be broken down, and players can focus on one part of the process. Crunch Labs provides a platform where companies can upload their data in an obfuscated manner and get access to their community of 1000s of data scientists. The data scientists upload their models and get rewarded if their models fulfil the requirements of the company that provides the data. Here, crypto incentivises the data scientists who provide the model.
Challenges and Criticisms:
While the convergence of Web3 and AI offers numerous advantages, it comes with challenges. Scalability remains a critical hurdle, with blockchain networks struggling to meet the computational demands required by sophisticated AI algorithms. Regulatory frameworks, e.g. concerning data privacy and tokenisation, could introduce further complexity and slow progress in this quickly evolving space.
There are best practices and open questions from crypto’s progress in protocol and blockchain design that can be applied to AI. As we’ve seen tokens being used to bootstrap the supply side of DePIN networks, we look forward to exploring the novel ways tokens can be used to bootstrap the supply side of data networks.
The early success of open-source AI platforms, such as Hugging Face, demonstrates the power of community-driven AI development. Web3 offers further opportunities to supercharge this with transparent, decentralised collaboration and equitable resource distribution through token-based economies.
As these technologies evolve, we stand at the cusp of a future where AI is no longer the domain of a few centralised entities but a shared resource driven by open collaboration and underpinned by solid privacy safeguards. Ultimately, the promise of open, democratised AI is within reach, and its potential far outweighs the obstacles we face today.
For more information on Fabric’s portfolio, opportunities and our investment thesis please visit our website and follow us on Twitter, LinkedIn & Farcaster.