PropertyGPT

Official Website:
Social Profiles:

We've just announced IQ AI.

Check it out

PropertyGPT

PropertyGPT is an end-to-end system that leverages large language models (LLMs) to automate the generation of comprehensive properties for formal verification of . The whitepaper's authors are Ye Liu, Yue Xue, , Yuqiang Sun, Yi Li, Miaolei Shi, and Yang Liu. [1]

Overview

PropertyGPT addresses the challenges of security by automating the generation of verification rules using advanced natural language processing, particularly GPT models. This technology comprehends complex code features and automatically transfers knowledge to create efficient and widely applicable security rules. PropertyGPT reduces reliance on security experts by reducing costs and time associated with formulating these rules. This approach allows PropertyGPT to swiftly produce numerous invariant verification rules adaptable to diverse and intricate nature, ensuring they meet stringent security standards. Automating rule generation streamlines the security audit process and reduces overall costs, making it feasible for more projects to secure their early in development. [1][2]

The system has been implemented and evaluated using 623 human-written properties from 23 Certora projects, achieving an 80% recall. PropertyGPT has detected CVEs, attack incidents, and zero-day vulnerabilities within these projects, resulting in bug bounty rewards. It uses transfer learning, retrieval-augmented generation, and a property verification process involving symbolic execution, modular verification, and bounded model checking to detect a wide range of vulnerabilities. [1][2]

Technology

Vector Database

The vector database in PropertyGPT is a crucial component that facilitates the retrieval of similar codes and mapping to their original reference properties. It is constructed by embedding the critical code corresponding to the reference properties into numerical representations, often vectors.

  • Creation of the Vector Database: The vector database is formed by embedding essential parts of the code relevant to the reference properties. This process converts the code into numerical representations or vectors.
  • Retrieval of Similar Code: When PropertyGPT receives subject code for property generation, it queries the vector database to retrieve similar code within a certain threshold. The similarity is based on the numerical representations of the code in the database, with the threshold filtering out dissimilar code.
  • Mapping to Reference Properties: PropertyGPT maps each piece of code to its original reference properties after retrieving similar code from the vector database. This mapping ensures that generated properties are based on existing, relevant properties, leveraging the knowledge encoded in the reference properties for more accurate and context-specific property generation.

The vector database acts as a repository of code representations that capture the essence of the reference properties, enabling PropertyGPT to generate customized properties for unknown code by retrieving similar code and mapping it to reference properties. Specific details of the embedding and similarity calculation techniques depend on the methods and algorithms chosen by the researchers behind PropertyGPT. [1][2]

Iterative Property Generation

Iterative property generation in PropertyGPT involves creating and refining candidate properties for subject code to ensure their correctness and compilability. The process begins with retrieving reference properties similar to the subject code from the vector database. These reference properties serve as examples and provide a starting point for generating candidate properties. A generation prompt then guides the creation of these properties, focusing on structural and syntactic aspects rather than specific variable or function names. [1]

Once the candidate properties are generated, they are checked for compilation errors using a Property Specification Language (PSL) compiler. A property failing to compile indicates errors or inconsistencies in the generated code. PropertyGPT employs a revising prompt to guide the iterative revision process, instructing the language model to fix the rule code and correct any errors while maintaining structural and syntactic consistency. The language model analyzes the errors, identifies the necessary fixes, and generates revised versions of the properties. This iterative process continues until the properties are compilable or until a maximum number of revision attempts is reached. [1]

In addition to ensuring the syntactical correctness of the properties through compilation, PropertyGPT also performs additional checks to ensure the properties are meaningful and capture the intended behavior. For example, it verifies whether the properties include the target subject function and if they are functionally meaningful for testing the desired behavior. By iteratively generating and revising candidate properties, PropertyGPT aims to produce properties that are both syntactically correct and functionally meaningful for the subject code. This iterative approach allows for refinement and improvement, increasing the accuracy and effectiveness of the generated properties. [1]

Property Specification Language (PSL)

The Property Specification Language (PSL) is an intermediate language proposed in PropertyGPT for specifying properties. PSL extends the Solidity programming language, offering a structured and expressive way to define properties. It allows for the definition of invariants, which are properties that must always hold during execution. These invariants are defined over state variables, ensuring essential characteristics and constraints are maintained throughout the lifecycle. [1][3]

Function pre

  • and post-conditions are also supported using Hoare triples. Pre-conditions specify the requirements that must be met before a function is executed, while post-conditions define the expected outcomes or guarantees after the function execution. This helps ensure the correctness and functionality of individual functions. [1][3]

Scenario-based properties in PSL are defined on restricted environments or specific conditions and implemented as different rules through assumptions and assertions. Assumptions pertain to the state of the or conditions that need to be satisfied, while assertions check specific conditions or behaviors within the . PSL shares most of its syntax and structure with Solidity, making it easy to write contract specifications and avoid complex logic bindings. This simplicity enhances the ease of writing and maintaining specifications. [1][3]

PSL specifications are reusable and upgradable, as PSL is a customized variant of Solidity that allows for inheritance and modularization management. This facilitates the inclusion of other specifications, enabling reuse and extension. Additionally, PSL specifications can be directly automated from code, ensuring specification quality and reducing the chances of errors and inconsistencies. [1][3]

Formal Prover

The formal prover in PropertyGPT is a component designed to verify the correctness of properties generated for . It employs formal verification techniques to rigorously analyze the properties and the code, ensuring their accuracy under all possible execution scenarios. This involves applying logical reasoning and mathematical methods to confirm that a system meets specified properties or specifications. [1]

In PropertyGPT, the formal prover examines the contract code and properties specified in the Property Specification Language (PSL). It analyzes control flow, data flow, and state transitions to verify that the properties are met. The prover uses model checking, theorem proving, or symbolic execution. Model checking explores all system states and transitions, theorem proving uses mathematical logic to establish properties, and symbolic execution considers all possible inputs and paths to verify properties. [1]

During verification, the prover may identify counterexamples or scenarios where properties are violated, helping developers understand specific conditions leading to these violations and enabling them to address potential vulnerabilities in the code. By identifying potential issues and inconsistencies, the formal prover enhances the security and reliability of smart contracts, assuring that they function as intended and reducing the risk of exploits or failures. The specific techniques and algorithms used by the formal prover vary based on the complexity of the contract code, the properties being verified, and the desired level of assurance. [1]

Weighted Algorithm

The weighted algorithm in PropertyGPT assigns weights to generated properties based on their relevance and importance. The process starts with generating a set of candidate properties for the subject code using an iterative approach. Each candidate property is then assessed for its relevance by analyzing its similarity to the subject code and considering the context and purpose of the code. [1][2]

Following this, the importance of each candidate property is evaluated based on factors such as complexity, criticality, and impact. Based on these relevance and importance assessments, weights are assigned to each candidate property. The candidate properties are ranked according to their assigned weights, with higher weights indicating higher priority. [1][2]

The algorithm allows for iterative refinement of weights and rankings based on user feedback or domain-specific knowledge. This method ensures a targeted and efficient approach to property generation, prioritizing the most relevant and important properties for testing and verification. By considering multiple dimensions of similarity and assigning weights accordingly, the algorithm captures critical behaviors, ensures correctness, and improves the effectiveness of formal verification. [1][2]

See something wrong?

Average Rating

Based on over 1 ratings

How was your experience?

Give this wiki a quick rating to let us know!

Edited By

Profile picture of Anonymous uservzbrv

Edited On

July 8, 2024

REFERENCES

HomeCategoriesRankEventsGlossary