How On-Chain Reputation Scoring Actually Works
"Just analyze their on-chain history" sounds simple until you try to do it. Raw blockchain data is noisy, cross-chain identities are fragmented, and naive scoring approaches are trivially gameable. Here's how we built a scoring system that's actually useful.
The Naive Approach (and Why It Fails)
The simplest reputation metric: count transactions. More transactions = more reputable.
This fails immediately. A bot can generate thousands of transactions per day for pennies on L2s. Transaction count tells you nothing about reliability — it tells you about activity volume, which is easy to fake.
Slightly better: total value transacted. But this is also gameable via wash trading — send 100 ETH to yourself in a loop and you look like a whale.
Real reputation requires understanding the quality of on-chain behavior, not just the quantity.
Our Scoring Pipeline
CrowdProof's scoring engine processes raw blockchain data through four stages:
Stage 1: Data Ingestion
Indexers continuously scan supported chains (Ethereum, Polygon, Arbitrum, Optimism, Base) for relevant events:
- Lending: deposits, borrows, repayments, liquidations (Aave, Compound, MakerDAO)
- DEX: swaps, liquidity provision, positions (Uniswap, Curve, Balancer)
- Governance: votes, delegations, proposals (Governor contracts, Snapshot)
- NFT: mints, transfers, listings, sales (OpenSea, Blur, marketplace contracts)
Each event is normalized into a standard schema with timestamp, value, counterparty, and protocol metadata.
Stage 2: Feature Extraction
Raw events become features. For the DeFi Lending category, we extract:
| Feature | Description | Anti-Gaming |
|---|---|---|
repayment_ratio | Loans repaid on time / total loans | Hard to fake — requires actual capital |
collateral_health | Average health factor across positions | Sustained over time, expensive to manipulate |
protocol_diversity | Number of distinct lending protocols used | Requires genuine multi-protocol activity |
time_weighted_volume | TVL × duration, not just peak | Prevents flash-loan inflation |
liquidation_rate | Liquidations / total positions | Negative signal, hard to avoid if genuinely risky |
position_duration | Median time positions are held open | Rewards patience over quick flips |
The key insight: features that require sustained capital commitment are hard to game. Maintaining a healthy collateral ratio across 3 protocols for 6 months is expensive to fake.
Stage 3: Model Inference
Each category has a dedicated ML model trained on labeled data. We use gradient-boosted decision trees (LightGBM) because:
- Interpretable — Feature importance is transparent, unlike neural networks
- Fast inference — Microsecond predictions, critical for API latency
- Handles missing data — New wallets with sparse history don't crash the model
- Resistant to overfitting — Built-in regularization for small training sets
The model outputs a raw score which is then calibrated to the 0–1000 scale.
Stage 4: Confidence Calculation
A score without confidence is misleading. A "750" based on 2 transactions is very different from a "750" based on 200 transactions across 3 years.
confidence = f(data_volume, data_recency, cross_chain_coverage, data_consistency)
| Factor | Weight | Rationale |
|---|---|---|
| Data volume | 35% | More data points = more reliable estimate |
| Data recency | 25% | Recent activity is more predictive than old |
| Cross-chain coverage | 20% | Multi-chain users are harder to Sybil |
| Data consistency | 20% | Contradictory signals reduce confidence |
A confidence of 0.92 means "we're very confident in this score." A confidence of 0.3 means "we don't have enough data to be sure."
Score Decay
Reputation isn't permanent. A wallet that was active 2 years ago but has been dormant since may no longer be a reliable indicator. We apply exponential decay:
decayed_score = base_score × e^(-0.001 × days_inactive)
| Inactive Period | Score Retention |
|---|---|
| 1 month | 97% |
| 3 months | 91% |
| 6 months | 84% |
| 1 year | 69% |
The decay rate (λ = 0.001) is a governance parameter — token holders can vote to adjust it if the community believes scores should decay faster or slower.
Decay resets the moment new on-chain activity is detected.
Anti-Gaming Measures
Beyond feature design, we employ several systemic anti-gaming measures:
Sybil resistance: Scores are per-address, but we detect related addresses through on-chain graph analysis (funding sources, interaction patterns). Splitting activity across multiple wallets doesn't multiply your reputation.
Time weighting: Recent behavior counts more, but not exclusively. This prevents "score boosting" where someone behaves perfectly for a week to get a high score, then defaults.
Cross-protocol validation: A high lending score on Aave alone is less convincing than consistent behavior across Aave, Compound, and MakerDAO.
Model versioning: When we update scoring models, both old and new versions run in parallel for 48 hours. Major model changes require DAO governance approval.
Using Scores in Your Protocol
Scores are available via the REST API with sub-100ms latency:
curl https://crowdproof-api.azurewebsites.net/api/v1/reputation/0x1234.../DEFI_LENDING
{
"address": "0x1234...",
"category": "DEFI_LENDING",
"score": 782,
"confidence": 0.91,
"tier": "Good",
"calculatedAt": "2026-02-26T12:00:00Z",
"modelVersion": "v2.1.0"
}
Common integration patterns:
- Dynamic collateral — Reduce collateral requirements for high-score borrowers
- Governance weight — Weight DAO votes by governance reputation
- Access gating — Require minimum scores for protocol features
- Risk pricing — Adjust interest rates based on credit history
See the Reputation Scores guide for integration examples and the API reference for endpoint details.