Skip to main content

How On-Chain Reputation Scoring Actually Works

· 5 min read
CrowdProof Team
Protocol Engineering

"Just analyze their on-chain history" sounds simple until you try to do it. Raw blockchain data is noisy, cross-chain identities are fragmented, and naive scoring approaches are trivially gameable. Here's how we built a scoring system that's actually useful.

The Naive Approach (and Why It Fails)

The simplest reputation metric: count transactions. More transactions = more reputable.

This fails immediately. A bot can generate thousands of transactions per day for pennies on L2s. Transaction count tells you nothing about reliability — it tells you about activity volume, which is easy to fake.

Slightly better: total value transacted. But this is also gameable via wash trading — send 100 ETH to yourself in a loop and you look like a whale.

Real reputation requires understanding the quality of on-chain behavior, not just the quantity.

Our Scoring Pipeline

CrowdProof's scoring engine processes raw blockchain data through four stages:

Stage 1: Data Ingestion

Indexers continuously scan supported chains (Ethereum, Polygon, Arbitrum, Optimism, Base) for relevant events:

  • Lending: deposits, borrows, repayments, liquidations (Aave, Compound, MakerDAO)
  • DEX: swaps, liquidity provision, positions (Uniswap, Curve, Balancer)
  • Governance: votes, delegations, proposals (Governor contracts, Snapshot)
  • NFT: mints, transfers, listings, sales (OpenSea, Blur, marketplace contracts)

Each event is normalized into a standard schema with timestamp, value, counterparty, and protocol metadata.

Stage 2: Feature Extraction

Raw events become features. For the DeFi Lending category, we extract:

FeatureDescriptionAnti-Gaming
repayment_ratioLoans repaid on time / total loansHard to fake — requires actual capital
collateral_healthAverage health factor across positionsSustained over time, expensive to manipulate
protocol_diversityNumber of distinct lending protocols usedRequires genuine multi-protocol activity
time_weighted_volumeTVL × duration, not just peakPrevents flash-loan inflation
liquidation_rateLiquidations / total positionsNegative signal, hard to avoid if genuinely risky
position_durationMedian time positions are held openRewards patience over quick flips

The key insight: features that require sustained capital commitment are hard to game. Maintaining a healthy collateral ratio across 3 protocols for 6 months is expensive to fake.

Stage 3: Model Inference

Each category has a dedicated ML model trained on labeled data. We use gradient-boosted decision trees (LightGBM) because:

  1. Interpretable — Feature importance is transparent, unlike neural networks
  2. Fast inference — Microsecond predictions, critical for API latency
  3. Handles missing data — New wallets with sparse history don't crash the model
  4. Resistant to overfitting — Built-in regularization for small training sets

The model outputs a raw score which is then calibrated to the 0–1000 scale.

Stage 4: Confidence Calculation

A score without confidence is misleading. A "750" based on 2 transactions is very different from a "750" based on 200 transactions across 3 years.

confidence = f(data_volume, data_recency, cross_chain_coverage, data_consistency)
FactorWeightRationale
Data volume35%More data points = more reliable estimate
Data recency25%Recent activity is more predictive than old
Cross-chain coverage20%Multi-chain users are harder to Sybil
Data consistency20%Contradictory signals reduce confidence

A confidence of 0.92 means "we're very confident in this score." A confidence of 0.3 means "we don't have enough data to be sure."

Score Decay

Reputation isn't permanent. A wallet that was active 2 years ago but has been dormant since may no longer be a reliable indicator. We apply exponential decay:

decayed_score = base_score × e^(-0.001 × days_inactive)
Inactive PeriodScore Retention
1 month97%
3 months91%
6 months84%
1 year69%

The decay rate (λ = 0.001) is a governance parameter — token holders can vote to adjust it if the community believes scores should decay faster or slower.

Decay resets the moment new on-chain activity is detected.

Anti-Gaming Measures

Beyond feature design, we employ several systemic anti-gaming measures:

Sybil resistance: Scores are per-address, but we detect related addresses through on-chain graph analysis (funding sources, interaction patterns). Splitting activity across multiple wallets doesn't multiply your reputation.

Time weighting: Recent behavior counts more, but not exclusively. This prevents "score boosting" where someone behaves perfectly for a week to get a high score, then defaults.

Cross-protocol validation: A high lending score on Aave alone is less convincing than consistent behavior across Aave, Compound, and MakerDAO.

Model versioning: When we update scoring models, both old and new versions run in parallel for 48 hours. Major model changes require DAO governance approval.

Using Scores in Your Protocol

Scores are available via the REST API with sub-100ms latency:

curl https://crowdproof-api.azurewebsites.net/api/v1/reputation/0x1234.../DEFI_LENDING
{
"address": "0x1234...",
"category": "DEFI_LENDING",
"score": 782,
"confidence": 0.91,
"tier": "Good",
"calculatedAt": "2026-02-26T12:00:00Z",
"modelVersion": "v2.1.0"
}

Common integration patterns:

  • Dynamic collateral — Reduce collateral requirements for high-score borrowers
  • Governance weight — Weight DAO votes by governance reputation
  • Access gating — Require minimum scores for protocol features
  • Risk pricing — Adjust interest rates based on credit history

See the Reputation Scores guide for integration examples and the API reference for endpoint details.