What is Shinkai?

Shinkai is pioneering a decentralized data network designed to optimize and expand the capabilities of Large Language Models (LLMs) in the AI-driven digital era. Traditional LLMs are constrained by their static training data, limiting their ability to access and process dynamically updated information. Shinkai addresses this critical gap by creating a trustless network layer that enriches the internet with AI-friendly embeddings. It incorporates a novel file system tailored for AI data management, akin to traditional hierarchical file systems but specifically optimized for efficient storage and retrieval of embeddings.

Shinkai leverages a decentralized network of nodes, akin to platforms like Bittorrent, but with a more user-friendly approach and organized categorization of topics and websites. This structure allows for seamless and up-to-date access to a wide array of data sources, enhancing AI's ability to provide timely and contextually relevant responses. A unique aspect of Shinkai is its integration of a zero-knowledge multiparty computation protocol (MPC), ensuring the authenticity and accuracy of data and its embeddings sourced from the web.

Central to Shinkai's architecture is the Shinkai Node, pivotal for data management across the network and enabling AI agents to engage in planning with task scheduling. At its developmental stage, these agents facilitate the alignment of tasks with user needs, marking an important step in structured planning for LLMs. With the integration of PDDL handling, though still in its early stages, sets the stage for more tailored AI-user interactions, enhancing the user experience by aligning AI capabilities more closely with individual requirements.

With its innovative approach, Shinkai not only addresses the limitations of current AI technologies but also sets the stage for a more dynamic, responsive, and interconnected AI ecosystem. This whitepaper details the design, functionality, and potential applications of Shinkai, illustrating how it fundamentally transforms the AI landscape by infusing the internet with a decentralized, AI-compatible layer of intelligence and data accessibility.

Keywords: Blockchain, AI, Decentralized Networks, Large Language Models, Data Embeddings, Zero-knowledge Proofs, Multiparty Computation (MPC)

The Problem

When discussing AI's future, the conversation often veers towards the centralized versus decentralized debate. However, this binary classification is insufficient because it overlooks the practicality and privacy requirements of users. Personal AI is emerging as the sought-after solution, where computation for Large Language Models (LLMs) can be executed locally on personal devices. This local execution is preferred because it avoids the increased costs and privacy risks associated with decentralized AI, which, while beneficial for blockchain applications like smart contracts and gaming, may not align with the daily needs of AI users.

Decentralized computation has been crucial for trustless systems where transactions and interactions need to be verified without central oversight. However, for AI, the interaction isn't just transactional; it's about knowledge processing. This is where the centralization versus decentralization framework fails to capture the nuances of AI's requirements. The true value in the AI landscape is the ongoing access to dynamic and current data, not merely the static datasets used during initial training.

LLMs can grow their knowledge base primarily through two methods: fine-tuning and Retrieval-Augmented Generation (RAG). Fine-tuning is more rigid, suitable for enhancing specific skills but less so for integrating a broad range of new knowledge. RAG excels at incorporating specific knowledge but is not designed to teach new complex skills. As LLM computation becomes more affordable due to the rise of open-source models, the focus shifts to the proprietary data that entities own, which is not only unique but also continuously updated.

In the near future, LLMs will be widely available for local use or via inexpensive cloud services. The challenge lies in the ability of these LLMs to interact with and process the vast amounts of data being created constantly. Presently, the internet caters to human users with UIs and search engine optimization for bots, but lacks the infrastructure for efficient AI operations which require embeddings and vector databases for computation.

Shinkai is poised to fill this gap, establishing an off-chain peer-to-peer messaging network that connects LLMs like ChatGPT with up-to-date knowledge, access to private information, privacy, and personal control. The current AI systems often fail to access current information due to reliance on top search engine results, which can be outdated. Moreover, platforms like ChatGPT are not configured to perform personalized tasks based on highly sensitive data such as bank accounts, crypto access, emails, and medical records, due to the inherent risks and legal liabilities involved.

Shinkai, on the other hand, seeks to strike the right balance by connecting LLMs to a decentralized AI network that offers instant access to the latest data (embeddings) from the web. It provides a solution that runs on users' computers, granting them full control over their digital lives without compromising their data privacy. Shinkai's approach is made possible by creating a new filesystem for AI embeddings, a decentralized network of nodes providing AI data, and a zero-knowledge multi-party computation (MPC) protocol to validate data sources. This ensures that data remains confidential and under the user's control while allowing for the expansion of the AI's knowledge and skills through subscriptions to nodes within the Shinkai Data AI Network.

Furthermore, Shinkai differentiates itself by not just aiming to be a better search engine but a personal system that offers tools for performing tasks. It enables LLMs to perform complex tasks by using external tools to interact with the computing stack, such as services, APIs, web search, etc. By taking lessons from programming theory, Shinkai introduces typed tools with interactive plan generation, unlocking advanced multi-step capabilities—for instance, creating a report on the last six months of discussions with a partner from all Slack, Discord, and email communications.

In essence, Shinkai offers a comprehensive and private AI system that empowers users to harness AI for a wide range of real-time applications, transcending the limitations of current AI platforms. It leverages cryptographic protocols optimized for practical use, ensuring a secure, efficient, and user-centric AI experience that aligns with the needs of the modern digital landscape.