How On-Chain Reputation Scoring Actually Works

February 26, 2026 · 5 min read

Protocol Engineering

"Just analyze their on-chain history" sounds simple until you try to do it. Raw blockchain data is noisy, cross-chain identities are fragmented, and naive scoring approaches are trivially gameable. Here's how we built a scoring system that's actually useful.

The Naive Approach (and Why It Fails)

The simplest reputation metric: count transactions. More transactions = more reputable.

This fails immediately. A bot can generate thousands of transactions per day for pennies on L2s. Transaction count tells you nothing about reliability — it tells you about activity volume, which is easy to fake.

Slightly better: total value transacted. But this is also gameable via wash trading — send 100 ETH to yourself in a loop and you look like a whale.

Real reputation requires understanding the quality of on-chain behavior, not just the quantity.

Our Scoring Pipeline

CrowdProof's scoring engine processes raw blockchain data through four stages:

Stage 1: Data Ingestion

Indexers continuously scan supported chains (Ethereum, Polygon, Arbitrum, Optimism, Base) for relevant events:

Lending: deposits, borrows, repayments, liquidations (Aave, Compound, MakerDAO)
DEX: swaps, liquidity provision, positions (Uniswap, Curve, Balancer)
Governance: votes, delegations, proposals (Governor contracts, Snapshot)
NFT: mints, transfers, listings, sales (OpenSea, Blur, marketplace contracts)

Each event is normalized into a standard schema with timestamp, value, counterparty, and protocol metadata.

Stage 2: Feature Extraction

Raw events become features. For the DeFi Lending category, we extract:

Feature	Description	Anti-Gaming
`repayment_ratio`	Loans repaid on time / total loans	Hard to fake — requires actual capital
`collateral_health`	Average health factor across positions	Sustained over time, expensive to manipulate
`protocol_diversity`	Number of distinct lending protocols used	Requires genuine multi-protocol activity
`time_weighted_volume`	TVL × duration, not just peak	Prevents flash-loan inflation
`liquidation_rate`	Liquidations / total positions	Negative signal, hard to avoid if genuinely risky
`position_duration`	Median time positions are held open	Rewards patience over quick flips

The key insight: features that require sustained capital commitment are hard to game. Maintaining a healthy collateral ratio across 3 protocols for 6 months is expensive to fake.

Stage 3: Model Inference

Each category has a dedicated ML model trained on labeled data. We use gradient-boosted decision trees (LightGBM) because:

Interpretable — Feature importance is transparent, unlike neural networks
Fast inference — Microsecond predictions, critical for API latency
Handles missing data — New wallets with sparse history don't crash the model
Resistant to overfitting — Built-in regularization for small training sets

The model outputs a raw score which is then calibrated to the 0–1000 scale.

Stage 4: Confidence Calculation

A score without confidence is misleading. A "750" based on 2 transactions is very different from a "750" based on 200 transactions across 3 years.

confidence = f(data_volume, data_recency, cross_chain_coverage, data_consistency)

Factor	Weight	Rationale
Data volume	35%	More data points = more reliable estimate
Data recency	25%	Recent activity is more predictive than old
Cross-chain coverage	20%	Multi-chain users are harder to Sybil
Data consistency	20%	Contradictory signals reduce confidence

A confidence of 0.92 means "we're very confident in this score." A confidence of 0.3 means "we don't have enough data to be sure."

Score Decay

Reputation isn't permanent. A wallet that was active 2 years ago but has been dormant since may no longer be a reliable indicator. We apply exponential decay:

decayed_score = base_score × e^(-0.001 × days_inactive)

Inactive Period	Score Retention
1 month	97%
3 months	91%
6 months	84%
1 year	69%

The decay rate (λ = 0.001) is a governance parameter — token holders can vote to adjust it if the community believes scores should decay faster or slower.

Decay resets the moment new on-chain activity is detected.

Anti-Gaming Measures

Beyond feature design, we employ several systemic anti-gaming measures:

Sybil resistance: Scores are per-address, but we detect related addresses through on-chain graph analysis (funding sources, interaction patterns). Splitting activity across multiple wallets doesn't multiply your reputation.

Time weighting: Recent behavior counts more, but not exclusively. This prevents "score boosting" where someone behaves perfectly for a week to get a high score, then defaults.

Cross-protocol validation: A high lending score on Aave alone is less convincing than consistent behavior across Aave, Compound, and MakerDAO.

Model versioning: When we update scoring models, both old and new versions run in parallel for 48 hours. Major model changes require DAO governance approval.

Using Scores in Your Protocol

Scores are available via the REST API with sub-100ms latency:

curl https://crowdproof-api.azurewebsites.net/api/v1/reputation/0x1234.../DEFI_LENDING

{
  "address": "0x1234...",
  "category": "DEFI_LENDING",
  "score": 782,
  "confidence": 0.91,
  "tier": "Good",
  "calculatedAt": "2026-02-26T12:00:00Z",
  "modelVersion": "v2.1.0"
}

Common integration patterns:

Dynamic collateral — Reduce collateral requirements for high-score borrowers
Governance weight — Weight DAO votes by governance reputation
Access gating — Require minimum scores for protocol features
Risk pricing — Adjust interest rates based on credit history

See the Reputation Scores guide for integration examples and the API reference for endpoint details.

The Naive Approach (and Why It Fails)​

Our Scoring Pipeline​

Stage 1: Data Ingestion​

Stage 2: Feature Extraction​

Stage 3: Model Inference​

Stage 4: Confidence Calculation​

Score Decay​

Anti-Gaming Measures​

Using Scores in Your Protocol​