1 of 5

Aggregation Library

The Aggregation Library is a collection of smart contracts that provides reliable on-chain data aggregation for various use cases. Each method addresses specific data scenarios to maintain accuracy and integrity in decentralized environments.

Considerations for choosing the proper aggregation:

Time-variance. Since time is continuous, fetchers access the source at slightly different times. We don’t expect the time differences to be significant; more particularly, the time differences should not exceed one second, which is the rate of our blockchain i\o.
Simplicity. It is crucial due to runtime considerations and explaining to users how our mechanism works (KISS/Occam’s razor).
(Honest) mistakes. Although we have incentive mechanisms to punish/reward fetchers’ behavior, mistakes are unavoidable. For instance, downtime, latency, etc. These can happen even if fetchers are honest and thus should be accounted for.
Malicious behavior. Our solution should be as robust as possible to attacks. Namely, it should minimize the consequences of an attack and facilitate punishing attackers.

For detailed explanations, please refer to

For some shor example for aggregation methods please refer to

Median

Definition

The median algorithm returns the middle value from a sorted list of validator reports. It is most naturally used for numerical data but can be applied to any ordered set of data.

A possible variant is the weighted median - Let be the stake of data validator , and assume that . Also, for simplicity, assume that are sorted. The weighted median is an element such that

Rationale and Properties

Resistant to extreme values and outliers, as they only affect the tails of the sorted list—ignores the impact of anomalous data points due to its reliance on central values.
The aggregation value will always be between the honest reports, assuming a majority of the stake belongs to honest validators (weighted median)

Limitations

May not reflect the influence of large/small prices if they are outliers
Non-incremental - the aggregated value cannot be updated upon each report in an online manner (as opposed to mean, for example, which supports where
is the mean after reports and is the -th report).

Clustering

Definition

Groups identical/similar data points together and at the end of each aggregation window outputs the value corresponding to the heaviest cluster assuming sufficient weight (x% of total validators stake)

Default parameters:

Aggregation window of one block
Sufficient stake is 67% of the total validators stake

Rationale and Properties

Proper in cases where we expect the same result among all data validators . For example, fetching from a single and stable source or performing the same computation
Final result is strictly backed by most validators or stake weight (assuming 67% requirement)
Given a relative majority submitting accurate reports, iInaccurate reports do not affect final result (as deviant or malicious reports fall into smaller clusters)
Efficient online implementation and possible optimizations (Cleaning small clusters occasionally etc)

Limitations

Storage inefficiency in worst case scenarios (ccurs when reports have high variance)
Limited effectiveness with volatile data—when values change rapidly, multiple small clusters form instead of a clear majority, making it difficult to reach consensus.

TWAP

Prices are susceptible to noise and volatility. Therefore, financial applications often average prices over time. Well-known methods include Moving Average, Exponential Smoothing, Time-weighted Average Price (TWAP), and Volume-weighted Average Price (VWAP)

Definition

TWAP (Time-Weighted Average Price) - calculates an average value over a specified time period, weighting each data point by the duration or frequency of updates.

TWAP is calculated as follows:

where is the value of the at the -th measurement and is the change in time since the previous measurement.

Rationale and Properties

Reduces the impact of rapid fluctuations in the data by spreading them out over time.
Incorporates time factors, making the final output reflective of trends rather than single-point events.
Offers smoother price data for algorithmic trading and liquidity management.

Limitations

Less appropriate sensitive to fast market shifts or abrupt changes -smoothing over time makes TWAP less responsive than other algorithms.
Requires additional storage of the historic data.

Robust Aggregation 101

This document analyzes how to aggregate multiple data reports into a single reliable estimate. We review aggregation methods and their properties, focusing on resistance to errors and manipulation.

When protocols encounter tasks that cannot be solved on-chain (oracle tasks) and require a decentralized solution, it's crucial to distinguish between two fundamentally different purposes of decentralization:

Decentralization for Better Results - When there's no single "right" answer, decentralizing the methodology itself improves outcomes. This applies to scenarios where different approaches provide complementary insights or where aggregating multiple viewpoints improves accuracy.
Decentralization for Security —When the task is well-defined but requires protection against manipulation or failure. This applies to operations requiring high reliability and fault tolerance where a single point of failure must be avoided.

The distinction between these purposes is fundamental: the first seeks to improve quality through diverse methodologies, while the second ensures integrity through distributed execution of a single, well-defined process.

This document analyzes how to aggregate multiple data reports into a single reliable estimate. We review aggregation methods and their properties, focusing on resistance to errors and manipulation, to find an optimal strategy balancing accuracy and efficiency. We place special emphasis on the weighted median since it is the most common aggregation method for price feed oracles.

Problem Definition

We first consider a simple setting where data origins from a single-source.

Let $p_t$ be the relevant quantity at time $t$ , e.g., the BTC/USD price. Notice that $p_t$ is unknown. Instead, we can access an estimate $r_t$ from a known and agreed-upon source, for instance, interactive brokers. A set of $N$ fetchers fetch $r_t^1,r_t^2,...,r_t^N$ for us, where $r_t^i$ denotes the quantity reported by fetcher $i$ . The aggregate $S_t$ is the number we forward as our estimate of $r_t$ (and essentially $p_t$ ).

The question is how to aggregate $r_t^1,r_t^2,...,r_t^N$ into one number, $S_t$ , representing the price according to that source at time $t$ .

Considerations

Time-variance. Since time is continuous, fetchers access the source at slightly different times. We don’t expect the time differences to be significant; more particularly, the time differences should not exceed one second, which is the rate of our blockchain i\o.
Simplicity. It is crucial due to runtime considerations and explaining to users how our mechanism works (KISS/Occam’s razor).
(Honest) mistakes. Although we have incentive mechanisms to punish/reward fetchers’ behavior, mistakes are unavoidable. For instance, downtime, latency, etc. These can happen even if fetchers are honest and thus should be accounted for.
Malicious behavior. Our solution should be as robust as possible to attacks. Namely, it should minimize the consequences of an attack and facilitate punishing attackers.

To quantify the last point, the Breakdown Point of an aggregate is the minimal ratio of reports by a malicious actor that allows it to achieve arbitrary deviation from $r_t$ .

Possible solutions

We review below several options for our selection of an aggregation. All of them are special cases of minimizing an objective function. While there are infinite such functions, our analysis focuses only on median, average, and their trimmed counterparts.

Simple Average (Mean)
- Pros: Easy to calculate; treats all reports equally; good for consistent data without outliers.
- Cons: Can be skewed by outliers: its breakdown point is zero.
Weighted Average
- Pros: Accounts for the varying significance of each report (e.g., based on stake); more accurate if reports are not equally reliable.
- Cons: More complex to calculate, can still be skewed.
Median
- Pros: Less affected by outliers than the mean; simple to understand and calculate.
- Cons: May not reflect the influence of large/small prices if they are outliers.
Mode (most common value)
- Pros: Represents the most frequently occurring price; useful in markets with standard pricing.
- Cons: Vary widely or if there is no repeating price.
Trimmed Mean
- Pros: Excludes outliers by trimming a specified percentage of the highest and lowest values before averaging; balances the influence of outliers.
- Cons: Arbitrariness in deciding what percentage to trim; could exclude relevant data.
Quantile-based Aggregation
- Pros: Can focus on a specific part of the distribution (e.g., median is the 50% quantile); useful for risk management strategies.
- Cons: Not representative of the entire data set; can be as sensitive to outliers as the mean.

Weighted median

weighted median enjoys the robustness of the median and the ability to consider different significance levels as the weighted average. Its breakdown point is 50% of the weight; below that, an adversary can only manipulate the result within the range of correctly reported values (as we prove later on). The weighted median allows us to incorporate the stake of the different fetchers.

To demonstrate the robustness of the weighted median, we present the following theorem. It proves that as long as the majority of the stake belongs to honest fetchers, the aggregate will always be between the honest reports; namely, an attacker with a minority weight (stake) cannot shift the aggregate too much.

Averaging Over Time

Recall that prices are susceptible to noise and volatility. Therefore, financial applications often average prices over time. Well-known methods include Moving Average, Exponential Smoothing, Time-weighted Average Price (TWAP), and Volume-weighted Average Price (VWAP).

Our current service does not implement such time averages. We allow our customers the flexibility of the computation at their end.

Multi-Source Aggregation

There are several ways by which we can set the weights.

Volume-Weighted Average Price (VWAP):
- Description: VWAP is calculated by taking the dollar amount of all trading periods and dividing it by the total trading volume for the current day. In your case, it involves weighting each source's rate by its volume, giving more influence to sources with higher trading volumes.
- Advantages: Reflects more liquidity and is a common benchmark used by traders. It gives a fair reflection of market conditions over the given period.
- Disadvantages: More susceptible to volume spikes, which can distort the average price.
Liquidity-Adjusted Weighting:
- Description: Here, the rate from each source is weighted based on its liquidity. This method requires a clear definition and measurement of liquidity, which can include factors like bid-ask spread, market depth, and the speed of price recovery after a trade.
- Advantages: Provides a more realistic view of the market by acknowledging that more liquid markets better reflect the true market price.
- Disadvantages: Liquidity can be harder to measure accurately and may vary quickly, making it challenging to maintain an accurate aggregate price in real-time.

Summary

This document explores single-source price aggregation in oracle systems, covering both result improvement and security aspects of decentralization. It analyzes various aggregation methods, focusing on weighted median for its manipulation resistance and stake-based weighting capabilities, while also examining time-based averaging and multi-source aggregation approaches.

The document highlights breakdown points in aggregation methods, demonstrates weighted median's security when honest fetchers hold majority stake, and evaluates trade-offs between different aggregation strategies.

Robust Aggregation 101

This document analyzes how to aggregate multiple data reports into a single reliable estimate. We review aggregation methods and their properties, focusing on resistance to errors and manipulation.

Decentralization for Better Results - When there's no single "right" answer, decentralizing the methodology itself improves outcomes. This applies to scenarios where different approaches provide complementary insights or where aggregating multiple viewpoints improves accuracy.
Decentralization for Security —When the task is well-defined but requires protection against manipulation or failure. This applies to operations requiring high reliability and fault tolerance where a single point of failure must be avoided.

Problem Definition

We first consider a simple setting where data origins from a single-source.

The question is how to aggregate $r_t^1,r_t^2,...,r_t^N$ into one number, $S_t$ , representing the price according to that source at time $t$ .

Considerations

Time-variance. Since time is continuous, fetchers access the source at slightly different times. We don’t expect the time differences to be significant; more particularly, the time differences should not exceed one second, which is the rate of our blockchain i\o.
Simplicity. It is crucial due to runtime considerations and explaining to users how our mechanism works (KISS/Occam’s razor).
(Honest) mistakes. Although we have incentive mechanisms to punish/reward fetchers’ behavior, mistakes are unavoidable. For instance, downtime, latency, etc. These can happen even if fetchers are honest and thus should be accounted for.
Malicious behavior. Our solution should be as robust as possible to attacks. Namely, it should minimize the consequences of an attack and facilitate punishing attackers.

To quantify the last point, the Breakdown Point of an aggregate is the minimal ratio of reports by a malicious actor that allows it to achieve arbitrary deviation from $r_t$ .

Possible solutions

Simple Average (Mean)
- Pros: Easy to calculate; treats all reports equally; good for consistent data without outliers.
- Cons: Can be skewed by outliers: its breakdown point is zero.
Weighted Average
- Pros: Accounts for the varying significance of each report (e.g., based on stake); more accurate if reports are not equally reliable.
- Cons: More complex to calculate, can still be skewed.
Median
- Pros: Less affected by outliers than the mean; simple to understand and calculate.
- Cons: May not reflect the influence of large/small prices if they are outliers.
Mode (most common value)
- Pros: Represents the most frequently occurring price; useful in markets with standard pricing.
- Cons: Vary widely or if there is no repeating price.
Trimmed Mean
- Pros: Excludes outliers by trimming a specified percentage of the highest and lowest values before averaging; balances the influence of outliers.
- Cons: Arbitrariness in deciding what percentage to trim; could exclude relevant data.
Quantile-based Aggregation
- Pros: Can focus on a specific part of the distribution (e.g., median is the 50% quantile); useful for risk management strategies.
- Cons: Not representative of the entire data set; can be as sensitive to outliers as the mean.

Weighted median

Mathematically, let $w^i$ be the (positive) stake of fetcher $i$ , and assume that $\sum_{i=1}^Nw^i=1$ . Also, for simplicity, assume that $r_t^1,r_t^2,...,r_t^N$ are sorted. The weighted median is an element $r_t^k$ such that

\sum_{i=1}^{k-1} w^i \leq \frac{1}{2} \text{ and } \sum_{i=k+1}^{N} w^i \leq \frac{1}{2}

Therefore, our aggregate $A_t=r_t^k$ for such $k$ .

Theorem: Let $H$ be the set of honest fetchers, for $H\subset [N]$ such that $\sum_{i\in H}w^i > \frac 1 2$ , and let $M$ be the set of malicious fetchers, for $M\subset[N]$ such that \sum_{i\in M}w^i < \frac{1}{2}$ and $H\cup M=[N].

Then, the weighted median aggregate $A_t$ always satisfies

A_t\in [\min_{i\in H} r^i_t, \max_{i\in H} r^i_t].

Recall that in the reasonable case we do not expect high variance among (honest) reports; thus, the interval $[\min_{i\in H} r^i_t, \max_{i\in H} r^i_t]$ will be small. This ensures that our aggregate is robust to manipulations.

Averaging Over Time

Our current service does not implement such time averages. We allow our customers the flexibility of the computation at their end.

Multi-Source Aggregation

We are now facing a similar problem, but now each quantity is given by a source (and not a fetcher). We have a set of $N$ sources $1,...,N$ , where each source has a price $S_t^i$ . Along the different prices $S_t^1,S_t^2,...,S_t^N$ we have additional weight $w_i$ per source. Weights capture our confidence in that source, volume, liquidity, etc. The weight-adjusted price is given by

A_t=\frac{\sum_{i=1}^N w^i S^i_t}{\sum_{i=1}^N w^i}

There are several ways by which we can set the weights.

Volume-Weighted Average Price (VWAP):
- Description: VWAP is calculated by taking the dollar amount of all trading periods and dividing it by the total trading volume for the current day. In your case, it involves weighting each source's rate by its volume, giving more influence to sources with higher trading volumes.
- Advantages: Reflects more liquidity and is a common benchmark used by traders. It gives a fair reflection of market conditions over the given period.
- Disadvantages: More susceptible to volume spikes, which can distort the average price.
Liquidity-Adjusted Weighting:
- Description: Here, the rate from each source is weighted based on its liquidity. This method requires a clear definition and measurement of liquidity, which can include factors like bid-ask spread, market depth, and the speed of price recovery after a trade.
- Advantages: Provides a more realistic view of the market by acknowledging that more liquid markets better reflect the true market price.
- Disadvantages: Liquidity can be harder to measure accurately and may vary quickly, making it challenging to maintain an accurate aggregate price in real-time.