How We Score AI Tools
The Tool Performance Score (TPS) uses a 6-judge AI jury to evaluate every tool across 5 dimensions on a 1-10 scale, producing transparent and reproducible scores.
The 6-Judge Jury
Each AI tool is independently evaluated by 6 frontier AI models from 6 different labs. No single model controls the outcome. Each judge scores the tool on all 5 dimensions using a 1-10 scale, providing both a numerical score and written reasoning.
We use Olympic-style trimmed mean scoring: for each dimension, the highest and lowest judge scores are dropped, and the remaining 4 scores are averaged. This eliminates outlier bias and produces a robust consensus score that no single model can unduly influence.
The final composite Pulse Score is a weighted combination of the 5 trimmed dimension scores — capability, value, customer sentiment, and momentum each contribute 22.5%, while usability contributes 10%. Usability is weighted lower to avoid penalising complex but powerful tools; it's displayed separately as an "Ease" badge on each tool's profile. The result is a single number from 1.0 to 10.0 representing overall tool quality.
5 Scoring Dimensions (Weighted)
Capability
How well does the tool perform its core function? Accuracy, feature depth, output quality, and benchmark results.
Usability
How easy is it to learn and use effectively? Interface design, documentation, onboarding, and workflow integration. Weighted lower in the composite to avoid penalising powerful tools that require expertise. Shown separately as an "Ease" badge on tool profiles.
Value
Is the tool worth the cost? Pricing fairness, feature-per-dollar, free tier generosity, and competitive positioning.
Customer Sentiment
What are users actually saying? Aggregated sentiment from reviews, forums, social media, and support channels.
Momentum
Is the tool gaining or losing ground? Update frequency, community growth, search trends, and market share movement.
Confidence System
Not all scores are created equal. The traffic light confidence indicator shows how much the judges agreed. A high score with LOW confidence means the jury was divided — take it with more caution than a HIGH confidence score.
All judges essentially agree. Score spread is narrow (≤ 1.0 points). Strong consensus across all 6 models.
Reasonable agreement with some variation. Score spread is moderate (1.0-3.0 points). Reliable but not unanimous.
Judges disagree significantly. Score spread is wide (> 3.0 points). Users should not rely solely on this number.
Qualification Rules
Not every tool gets scored. To qualify for TPS evaluation, a tool must meet minimum evidence thresholds:
- At least 25 mentions across tracked sources (reviews, forums, social, news)
- Presence on at least 3 platforms (e.g., Product Hunt + Reddit + YouTube)
- Active product (not discontinued, in beta, or pre-launch)
Tools below these thresholds display the V1 Pulse Score (0-100) instead, which requires less evidence.
Judge Providers
The jury consists of frontier models from 6 independent AI labs. Judge numbers (J1-J6) are intentionally not mapped to specific providers to prevent gaming and to ensure each judge's score is evaluated purely on merit.
Models are rotated to latest versions as labs release updates. The specific model version used in each run is recorded in the scoring metadata.
Anti-Manipulation
The multi-model jury design makes manipulation extremely difficult:
- You'd need to fool 6 different AI models simultaneously
- Olympic scoring drops the highest and lowest scores — outliers are discarded
- Judge-to-model mapping is not publicly disclosed
- Evidence snapshots are hashed and timestamped — scores can be audited against the evidence that produced them
- All scoring runs include full reasoning traces from each judge
Evidence Hashing
Before each scoring run, all evidence (reviews, benchmarks, social mentions, documentation) is collected into a snapshot. This snapshot is hashed using SHA-256, producing a unique fingerprint. The hash is stored with the scoring run, so anyone can verify that the scores were generated from a specific, immutable evidence set.
Commercial Disclosures
AIpulse.is is an independent AI tool directory. We disclose the following:
- TPS scores are not influenced by commercial relationships
- Tool vendors cannot pay for higher scores
- AIpulse may earn affiliate commissions from some tools — this has zero effect on scoring
- Featured listings and sponsored placements are always labeled and do not affect TPS
Important Disclaimer
TPS is an AI-generated consensus score, not a statement of fact. It reflects the collective assessment of 6 AI models based on available evidence at the time of scoring. Scores may change weekly as new evidence is gathered. Always evaluate tools based on your own needs, use case, and requirements. TPS is one input to your decision — not the decision itself.
