Skip to main content

3. Technical Architecture

3.1 Basic Principles of Distributed Storage Computing

Distributed storage computing systems are technologies that distribute data storage and computational processes across multiple network nodes, aiming to enhance the efficiency and reliability of data processing. The key advantages of this architecture include:

  • Scalability: The system can be easily expanded by adding more nodes to accommodate larger datasets and more complex computational tasks.
  • Fault Tolerance: Data and tasks are replicated and distributed across multiple nodes, ensuring that the system can continue to operate and maintain data integrity even if some nodes fail or go offline. It also can automatically coordinate the recovery of failed nodes.
  • Flexibility: The system can adapt to different workloads and data types, providing support for various application scenarios such as data mining, machine learning, and scientific computing. It can also dynamically adjust resource allocation and task scheduling according to demand.

3.2 Overview of VshareCloud's Technical Architecture

VshareCloud utilizes an advanced distributed storage computing architecture, specially designed to optimize data processing speed and system stability. Its main components include:

  • Data Layer: Uses IPFS to store large datasets, ensuring high availability and redundancy of data.
  • Computation Layer: Employs containerization technology and data scheduling algorithms to dynamically allocate computational resources, supporting elastic scaling and load balancing for various storage and computational tasks.
  • Management Layer: Features control nodes with saturated disaster recovery backup for system monitoring, task scheduling, and resource management of nodes worldwide, simplifying operational tasks.

3.3 Introduction to Core Components and Functions

VshareCloud consists of the following core components, each fulfilling specific functions:

3.3.1 CosmicCluster Distributed System Communication Framework

  • VshareCloud's proprietary CosmicCluster distributed computing communication framework enables efficient communication and data transfer between nodes, as well as unified and reliable task scheduling and resource management.
  • CosmicCluster employs secure communication mechanisms and advanced hot-pluggable designs for task applications, ensuring system stability and reliability.
  • CosmicCluster's communication methods support both centralized (HTTP, WebSocket) and decentralized (Pubsub, IPFS) protocols for fully compatible peer-to-peer communication, ensuring system scalability and flexibility.
  • CosmicCluster provides efficient and unified interfaces for different task applications ("Apps", each with its own task queue and resource pool), allowing diverse tasks to run efficiently within the same system.

3.3.2 CosmicCluster - Server

  • Monitors node status to inform task scheduling and ensure the healthy operation of nodes.
  • Serves as a secure bridge for communication between nodes and applications, ensuring the reliability and security of communications.
  • Offers unified application access APIs, allowing developers to focus on the logic of new distributed computing tasks and scheduling algorithms for each task type, without worrying about underlying communication and node authentication details.

3.3.3 CosmicCluster - Client

  • Acts as a node agent, communicating with the Server to receive task scheduling and data distribution instructions.
  • Collects node status information for reporting to the Server, facilitating health monitoring and task scheduling by the Server.
  • Collects and monitors routing information between nodes to optimize task scheduling and data distribution by the Server, and to avoid false network conditions on some nodes.
  • Provides unified access APIs for applications, allowing them to communicate with the Server through a unified interface without concerning themselves with the details of underlying communications.

3.3.4 CosmicCluster - App

  • Each independent task type focuses on its own task scheduling and resource management, communicating through the Server and Client to accomplish task scheduling and resource management.
  • Applications can share key data through the CosmicCluster framework, providing top-down cooperation for overall system optimization.
  • Cosmic applications should include the following basic functions:
    • Node Manager: Monitors and manages the status of computing nodes to ensure their healthy operation.
    • Data Distributor: Efficiently distributes data to the appropriate nodes according to computational task requirements, reducing data transmission time and improving computational efficiency.
    • Task Scheduler: Intelligently assigns computational tasks to the most suitable nodes, taking into account the load and resource utilization of nodes to optimize task execution time.
    • Result Collector: Collects and summarizes the results of distributed computing to ensure data consistency and completeness.

First Example Application - Vshare Storage

Basic Architecture:

  • Vshare Storage is the first application of VshareCloud, a distributed storage system based on the IPFS protocol that enables efficient data storage and distribution.

  • The main features of Vshare Storage include:

    • Data Storage: Data is distributed across multiple nodes and geographical locations, ensuring high availability and redundancy.
    • Data Distribution: Efficiently distributes data to appropriate nodes as per computational task requirements, reducing data transfer time and enhancing computational efficiency.
    • Data Retrieval: Facilitates efficient data retrieval and transfer through the IPFS protocol, supporting various data types and formats.
    • Data Backup: Automatically backs up data across multiple nodes to ensure data security and reliability.
  • The design and implementation of Vshare Storage provide the infrastructure and technical support for other VshareCloud applications, including data storage and distribution, task scheduling, and resource management.

  • Vshare Storage's Client Components:

    • Client Node: Manages communication and processing with the Server, receiving task scheduling and data distribution commands.
    • IPFS Repo Manager: Manages the local IPFS Repo, handling volume management, status monitoring, and service management for an IPFS Repo on a specific machine.
    • IPFS sPoST Manager: Manages local IPFS sPoST (Simple Proof of Space Time) tasks, processing PoST tasks from the Server, which determine the node's performance and health status.
    • IPFS Pin Manager: Manages local IPFS Pin tasks, handling Pin and Unpin commands from the Server.
    • Data Fixer: Addresses local data repair, monitoring erroneous data and submitting error reports and data repair requests to the Server, enabling other nodes to assist in data correction.
    • IPFS GC Scheduler: Manages local IPFS GC tasks, monitoring the status of the local IPFS Repo to ensure its performance and health.
  • Vshare Storage's Server Components:

    • Server: Manages communication with Clients, receiving task scheduling and data distribution commands.
    • Order Manager: Handles the creation and matching of orders, ensuring data availability and redundancy.
    • Node Status Monitor: Monitors and manages the status of computing nodes, ensuring their healthy operation.
    • API Service: Provides API services, offering an interactive interface for client applications for data storage and retrieval.
    • Financial System: Manages order payment and settlement, ensuring data security and reliability.

    sPoST (Simple Proof of Space Time) Design and Implementation:

    • What is sPoST (Simple Proof of Space Time)?
      • sPoST is a storage proof mechanism based on the IPFS protocol that verifies whether nodes have effectively stored specific data and assesses the performance of storage media.
      • For detailed implementation, see: IPFS_Raid_Chanllege_Demo(in Python)
      • Why is sPoST secure?
      sPoST ensures the reliability and stability of storage media by assessing their performance. It uses the Merkle DAG feature of IPFS to challenge and judge data read performance, providing a relatively secure verification of data reliability, ensuring data consistency and integrity.
      • Through sPoST, Vshare Storage not only verifies the validity of data but also assesses the performance of storage media, aiding in the system's overall performance and health assessment.

    Definitions of Performance and Health

    • Performance: The time taken by a node to accurately complete an sPoST is considered its performance.
    • Health: The stability and accuracy rate of a node in completing sPoSTs are considered its health.

    Data Region Division and Order Matching Rules

    • Data Region Division: Vshare Storage divides the global data region into several areas, including Zenith (operated jointly by officials and partners), Galaxy (community-operated), Premium (exclusive for premium orders), and Filecoin (Filecoin network) regions.

      • Zenith: Operated jointly by officials and partners, serving as the backbone infrastructure for storage in the network, ensuring high availability and efficient data retrieval, providing users with stable and fast data retrieval services.
      • Galaxy: Community-operated region, offering cost-effective storage and backup services, providing users with economical and reliable data storage services.
      • Premium: Exclusive region for premium orders, providing dedicated storage services to ensure the reliability and efficiency of premium order data storage.
      • Filecoin: Integrates with the Filecoin network for data exchange and storage, enhancing the real data storage rate of the Filecoin network and the reliability of customer data storage without being controlled by a single service provider.
    • Order Matching Rules: Vshare Storage ensures that user orders are matched with the most optimal nodes under expected conditions through its order matching rules, minimizing costs while improving data reliability and availability. Currently, Vshare Storage's storage order matching rules include: cost priority, quality priority, and fixed-price orders.

      • For each valid node:

        • RreputationR_{\text{reputation}}: The ranking of the node's reputation in its region
        • RpriceR_{\text{price}}: The ranking of the node's price in its region
        • RperformanceR_{\text{performance}}: The ranking of the node's performance in its region
        • Calculating reputation score: Sreputation=(1Rreputation1N1)×100S_{\text{reputation}} = \left(1 - \frac{R_{\text{reputation}} - 1}{N - 1}\right) \times 100
        • Calculating price score: Sprice=(1Rprice1N1)×100S_{\text{price}} = \left(1 - \frac{R_{\text{price}} - 1}{N - 1}\right) \times 100
        • Calculating performance score: Sperformance=(Rperformance1N1)×100S_{\text{performance}} = \left(\frac{R_{\text{performance}} - 1}{N - 1}\right) \times 100

        Vshare uses a simple weighted average method to calculate the weighted average of reputation, price, and performance to ensure the lowest storage costs and highest data quality for users.

      • Cost Priority: Prioritizes matching with the lowest cost nodes to ensure the lowest storage costs for users.

          1. Reputation Weight: 0.3
          1. Price Weight: 0.5
          1. Storage Medium Performance Weight: 0.2
        • Scoring Matching Rule: Soverall=0.3×Sreputation+0.5×Sprice+0.2×SperformanceS_{\text{overall}} = 0.3 \times S_{\text{reputation}} + 0.5 \times S_{\text{price}} + 0.2 \times S_{\text{performance}}
      • Quality Priority: Prioritizes matching with the highest performance nodes to ensure the highest data storage quality for users.

          1. Reputation Weight: 0.4
          1. Price Weight: 0.2
          1. Storage Medium Performance Weight: 0.4
        • Scoring Matching Rule: Soverall=0.4×Sreputation+0.2×Sprice+0.4×SperformanceS_{\text{overall}} = 0.4 \times S_{\text{reputation}} + 0.2 \times S_{\text{price}} + 0.4 \times S_{\text{performance}}

        For each order matching, the node with the highest overall score is chosen for storage, ensuring the lowest storage costs and highest data quality for users. If multiple nodes have the same overall score, a random number from Drand on the League of Entropy is used for selection to ensure fairness.

      • Fixed-Price Orders: Users can set a maximum price for storage orders to ensure storage costs do not exceed expectations.

        • Since the price is predefined by the user, the price factor's weight is not applicable in fixed-price orders. Thus, we focus on the following two factors:
          • Reputation Weight: 0.6
          • Storage Medium Performance Weight: 0.4
        • After selecting nodes with the closest prices, calculate the overall score for each qualifying node: Soverall=0.6×Sreputation+0.4×SperformanceS_{\text{overall}} = 0.6 \times S_{\text{reputation}} + 0.4 \times S_{\text{performance}}
        • The node with the highest overall score is chosen for matching.

    Vshare Storage API

    • Vshare Storage offers extensive API support, enabling developers to easily integrate Vshare Storage into their existing workflows and systems, thereby enhancing work efficiency.
    • The APIs provided by Vshare Storage include:
    • Account Management: Supports the asset management of user accounts.
    • Order Initiation: Allows users to initiate storage and retrieval orders.
    • Existing Orders and Data Management: Enables users to manage their existing storage and retrieval orders.
    • Data Distribution: Supports cross-regional data scheduling for users to distribute data according to their business needs.
    • Integration with Applications: Integrates with IPFS ecosystem applications to offer users more interactive interfaces for data storage and retrieval.

    Native Applications of IPFS

    • Vshare Storage supports native applications of IPFS, including but not limited to:
    • PCDN
    • IPFS-Cluster
    • IPFS-Desktop
    • HLS-IPFS
    • Applications developed within the IPFS ecosystem

Technical Innovations

The technological innovations of VshareCloud in the field of distributed storage computing are mainly reflected in the following aspects:

  1. CosmicCluster Communication Framework: Through the self-developed CosmicCluster communication framework, VshareCloud achieves efficient, secure communication and data transfer between nodes. The zero-liveness hot-swappable design and peer-to-peer communication capabilities significantly enhance the system's stability, reliability, and scalability.

  2. Dynamic Resource Scheduling Algorithm: VshareCloud employs advanced data scheduling algorithms to dynamically allocate computing resources, supporting the elastic scaling and load balancing of data and computational tasks. This is crucial for handling large-scale distributed computing tasks, as it allows for the automatic adjustment of resource allocation based on real-time task demands, optimizing computational efficiency.

  3. sPoST Mechanism: VshareCloud adopts the sPoST (Simple Proof of Space Time) mechanism for storage verification, which ensures the security and integrity of data by verifying that a node has effectively stored specific data. Additionally, by evaluating the performance of nodes' storage media, this mechanism improves the overall system performance and health assessment.

  4. Data Regional Segmentation and Order Matching Rules: Through precise segmentation of global data regions and intelligent order matching rules, VshareCloud can intelligently match the optimal storage solution based on data storage needs and node performance. This not only ensures the efficiency and economy of data storage but also enhances the reliability and availability of the data.

  5. IPFS Integration and Support for Native Applications: VshareCloud deeply integrates IPFS technology, leveraging the powerful capabilities of IPFS in distributed storage, while also supporting various native applications and ecosystems of IPFS. This provides users with richer and more flexible solutions for data storage, retrieval, and distribution.

  6. Comprehensive API Support and Ecosystem Integration: VshareCloud offers a range of APIs, enabling developers and enterprise users to easily integrate VshareCloud services into their existing business processes. Moreover, by integrating with ecosystem applications such as IPFS, it further expands its application scenarios and functionalities, promoting the prosperous development of the ecosystem.

In summary, VshareCloud's technological innovations provide an efficient, reliable, and flexible solution in the field of distributed storage computing, significantly advancing the technological progress and application innovation in this field.