Score and rank Amazon search results using a statistically valid Bayesian system that accounts for average rating, review count, and star distribution.
When the user asks to score, rank, or compare Amazon products from a search results page open in their browser.
- Browser must be open with an Amazon search results page
- Chrome DevTools MCP server must be connected
The scoring system uses a Dirichlet-Multinomial Bayesian model (same statistical framework as Reddit's "best" ranking and IMDB Top 250).
A single formula naturally handles three requirements:
- Average rating: higher average = higher score
- Number of reviews: more reviews = lower standard error = score stays close to the true mean
- Star distribution: tight/consistent distributions have lower variance = higher score; bimodal distributions (lots of 5s and 1s) get penalized
A Dirichlet prior of alpha=2 per star level adds 10 "phantom reviews" (2 at each star). With few real reviews, the prior dominates and pulls the score toward 3.0. With hundreds of reviews, the prior is negligible. This is why 2x 5-star reviews score much lower than 100x 5-star reviews.
For each product:
1. counts[s] = (histogram_pct[s] / 100) * total_reviews + alpha (for s = 1..5)
2. N_adj = sum(counts)
3. mean = sum(s * counts[s] / N_adj) (posterior mean)
4. variance = sum((s - mean)^2 * counts[s] / N_adj) (posterior variance)
5. SE = sqrt(variance / N_adj) (standard error of the mean)
6. SCORE = mean - 1.65 * SE (90% lower confidence bound)
Parameters:
alpha = 2(prior pseudo-counts per star level)z = 1.65(90% one-tailed confidence z-score)
After collecting histograms, run a suspicion analysis using 7 statistical signals. Each signal adds to a cumulative suspicion score; products are then classified as HIGH (>=40), MEDIUM (>=20), or LOW (>0) risk.
- Missing middle — organic dissatisfaction spreads across 2, 3, and 4 stars. If those combined are under 10%, it suggests the middle has been artificially hollowed out. Weight:
(10 - middle%) * 3 - Unnatural 5-star concentration — 95%+ five-star is almost never organic. 85%+ with under 100 reviews is also suspicious. Weight: 40 (>=95%) or 20 (>=85%, <100 reviews)
- Zero 1-star with 50+ reviews — statistically improbable; even great products attract the odd unhappy buyer. Weight: 15
- No 2-star or 3-star at all (20+ reviews) — real unhappy customers leave a range of negative ratings, not just 1-star. Weight: 20
- Low distribution entropy — Shannon entropy measures how spread out ratings are. Fake reviews cluster tightly (low entropy <0.8; organic typically >1.2; max is 2.32). Weight: 25 (<0.8) or 10 (<1.0 with 30+ reviews)
- 5-star rate far above category average — compute the weighted average 5-star % across all products in the search. Products 15+ points above that are unusual. Weight:
deviation - 10 - 1-star cliff — high 1-star (>=8%) but almost no 2-star (<=1%) suggests competitor attack reviews rather than organic dissatisfaction. Weight: 15
- HIGH risk: likely manipulated — present these separately, do not recommend
- MEDIUM risk: worth scrutinising — flag in the ranking table but don't exclude
- LOW risk: minor anomaly — note but don't penalise
- CLEAN: no flags triggered
Note: low entropy can also indicate a genuinely excellent or genuinely terrible product. MEDIUM flags on high-volume products (500+ reviews) are less concerning than on low-volume ones. Always recommend the user manually check review text for high-scoring products that flag MEDIUM+.
Run this JavaScript via evaluate_script on the Amazon search results page:
() => {
const allDivs = document.querySelectorAll('div[data-asin]');
const seen = new Set();
const results = [];
allDivs.forEach((div) => {
const asin = div.getAttribute('data-asin');
if (!asin || asin === '' || seen.has(asin)) return;
seen.add(asin);
const titleEl = div.querySelector('h2');
const ratingEl = div.querySelector('span.a-icon-alt');
const reviewSpan = div.querySelector('span.a-size-mini.puis-normal-weight-text');
let reviewCount = null;
if (reviewSpan) {
const match = reviewSpan.textContent.trim().match(/\(?([\d,]+)\)?/);
if (match) reviewCount = parseInt(match[1].replace(/,/g, ''));
}
const priceEl = div.querySelector('.a-price .a-offscreen');
if (titleEl && ratingEl) {
const ratingText = ratingEl.textContent.trim();
const ratingMatch = ratingText.match(/([\d.]+)/);
results.push({
asin,
title: titleEl.textContent.trim().substring(0, 100),
rating: ratingMatch ? parseFloat(ratingMatch[1]) : null,
reviewCount,
price: priceEl ? priceEl.textContent.trim() : null
});
}
});
return results;
}For each product ASIN, fetch its product page and extract the histogram. Batch in groups of ~20 to avoid overwhelming the browser:
async () => {
const asins = [/* array of ASINs from step 1 */];
const results = {};
for (const asin of asins) {
try {
const resp = await fetch(`/dp/${asin}/`);
const html = await resp.text();
const matches = [...html.matchAll(/aria-label="(\d+) percent of reviews have (\d) star/g)];
if (matches.length >= 5) {
const hist = {};
matches.forEach(m => { hist[parseInt(m[2])] = parseInt(m[1]); });
results[asin] = hist;
}
} catch(e) { }
}
return results;
}Run after Step 2. Computes the category-average distribution as a baseline, then checks each product against the 7 signals. Pass the same merged product array.
() => {
const products = [/* merged data from steps 1 and 2 */];
// Compute category-average 5-star rate as baseline
let totalReviews = 0;
const avgHist = {1:0,2:0,3:0,4:0,5:0};
products.forEach(p => {
totalReviews += p.reviews;
for (let s = 1; s <= 5; s++) avgHist[s] += p.hist[s] * p.reviews;
});
for (let s = 1; s <= 5; s++) avgHist[s] /= totalReviews;
const avgTotal = Object.values(avgHist).reduce((a,b) => a+b, 0);
for (let s = 1; s <= 5; s++) avgHist[s] = Math.round(avgHist[s] / avgTotal * 100);
return products.map(p => {
const signals = [];
let sus = 0;
const middle = p.hist[2] + p.hist[3] + p.hist[4];
if (middle < 10) { signals.push(`Missing middle: ${middle}% in 2-4 stars`); sus += (10 - middle) * 3; }
if (p.hist[5] >= 95) { signals.push(`${p.hist[5]}% five-star`); sus += 40; }
else if (p.hist[5] >= 85 && p.reviews < 100) { signals.push(`${p.hist[5]}% five-star with ${p.reviews} reviews`); sus += 20; }
if (p.hist[1] === 0 && p.reviews >= 50) { signals.push(`Zero 1-star across ${p.reviews} reviews`); sus += 15; }
if (p.hist[2] === 0 && p.hist[3] === 0 && p.reviews >= 20) { signals.push(`No 2 or 3-star reviews`); sus += 20; }
let entropy = 0;
for (let s = 1; s <= 5; s++) { const x = p.hist[s]/100; if (x > 0) entropy -= x * Math.log2(x); }
if (entropy < 0.8) { signals.push(`Very low entropy: ${entropy.toFixed(2)}`); sus += 25; }
else if (entropy < 1.0 && p.reviews >= 30) { signals.push(`Low entropy: ${entropy.toFixed(2)}`); sus += 10; }
const dev = p.hist[5] - avgHist[5];
if (dev > 15) { signals.push(`5-star rate ${dev}pp above category avg`); sus += dev - 10; }
if (p.hist[1] >= 8 && p.hist[2] <= 1 && p.reviews >= 30) { signals.push(`1-star cliff: ${p.hist[1]}% vs ${p.hist[2]}% two-star`); sus += 15; }
return {
asin: p.asin, title: p.title, reviews: p.reviews,
entropy: parseFloat(entropy.toFixed(2)),
suspicionScore: sus, signals,
risk: sus >= 40 ? 'HIGH' : sus >= 20 ? 'MEDIUM' : sus > 0 ? 'LOW' : 'CLEAN'
};
}).filter(p => p.suspicionScore > 0).sort((a, b) => b.suspicionScore - a.suspicionScore);
}() => {
const products = [/* merged data from steps 1 and 2 */];
const ALPHA = 2;
const Z = 1.65;
const scored = products.map(p => {
const n = p.reviews || 1;
const counts = {};
for (let s = 1; s <= 5; s++) {
counts[s] = (p.hist[s] / 100) * n + ALPHA;
}
const totalAdj = Object.values(counts).reduce((a, b) => a + b, 0);
let mean = 0;
for (let s = 1; s <= 5; s++) mean += s * (counts[s] / totalAdj);
let variance = 0;
for (let s = 1; s <= 5; s++) variance += (s - mean) ** 2 * (counts[s] / totalAdj);
const se = Math.sqrt(variance / totalAdj);
const score = mean - Z * se;
const satisfaction = ((counts[4] + counts[5]) / totalAdj) * 100;
return {
title: p.title, price: p.price, reviews: n,
avgRating: p.rating,
posteriorMean: Math.round(mean * 100) / 100,
score: Math.round(score * 1000) / 1000,
satisfaction: Math.round(satisfaction),
defectRate: Math.round((counts[1] / totalAdj) * 100)
};
});
scored.sort((a, b) => b.score - a.score);
return scored;
}() => {
const asins = [/* top ASIN list */];
const results = {};
document.querySelectorAll('div[data-asin]').forEach(card => {
const asin = card.getAttribute('data-asin');
if (asins.includes(asin) && !results[asin]) {
for (const a of card.querySelectorAll('a')) {
const href = a.getAttribute('href') || '';
if (href.includes('/dp/')) { results[asin] = href; break; }
}
}
});
return results;
}Construct full URLs: https://www.amazon.co.uk + the returned href path (strip query params for cleanliness).
Present results as a ranked table with columns: Rank, Score, Adjusted Avg, Reviews, Satisfaction%, Risk, Price, Product Name.
Always include:
- Top N overall (by score) — exclude HIGH risk products from recommendations
- Top N value picks (high score + low price)
- Flagged products — separate table listing HIGH and MEDIUM risk products with their signals
- Brief explanation of why high-rated low-review products rank lower
- Note if any top-ranked products have MEDIUM risk flags, and recommend the user check the review text manually
- The
aria-labelpattern for histogram extraction works on.co.ukand.com— the format is"X percent of reviews have Y star" - Products with < 5 matched histogram entries may have a different page layout; skip or flag them
- Deduplicate by ASIN before scoring (Amazon shows the same product in multiple slots)
- The review count on the search page sometimes differs from the product page; the search page count is sufficient for scoring
- This approach works for any Amazon product category