Our Methodology

We believe you should know exactly how we arrive at our recommendations. This page explains — in full detail — how we rate products, how we evaluate evidence, how we make money, and how we keep those things separate.

If anything here is unclear, contact us. We mean it.

The Dual-Rating System

Every product we review receives two independent scores. These scores are evaluated by different team members using different criteria, and they are never combined into a single number. This is deliberate.

Product Rating (1–10)

Our Product Rating evaluates the product as a product — independent of any developmental or educational claims. A simple wooden rattle with no scientific pretensions can earn a 10/10 if it’s beautifully crafted, delightful to use, and built to last.

The score is a weighted composite of seven criteria:

Criterion	Weight	What We Evaluate
Play Value	25%	How engaging is this product? Does it sustain interest across multiple sessions? Does it invite open-ended play or is it exhausted in one sitting?
Quality	20%	Materials, construction, finish. Does it feel like a considered object or a disposable one?
Durability	15%	Will it survive real use by real children? We assess material resilience, joint strength, and — for products targeting toddlers — throw resistance. ¹
Age-Appropriateness	15%	Does the manufacturer’s age range match reality? Is it frustrating for the low end or boring for the high end?
Value for Money	10%	Price relative to quality, play value, and competitive alternatives. We track prices over time to avoid evaluating during artificial spikes.
Safety	10%	CPSC compliance, ASTM standards, choking hazard assessment, material safety, recall history. Dr. Rachel Torres evaluates every product against our safety checklist. ²
Parent Experience	5%	Assembly difficulty, storage, noise level, mess factor, battery requirements, and all the things product designers forget matter when you’re the one living with this thing.

Calibration: A score of 5 represents an average product in its category. A 7 is good — we’d recommend it without reservation to parents looking in this category. A 9 or above is exceptional and rare. We do not grade on a curve; if every product in a roundup happens to be mediocre, the scores will reflect that.

We do not award half-points. A score is an integer from 1 to 10.

Evidence Rating

Our Evidence Rating evaluates the scientific support behind a product’s developmental claims. This is assessed by Dr. Priya Ramanathan, our developmental psychologist, through a systematic literature review for each product.

The rating has four levels:

None No peer-reviewed research supports the specific developmental claims made about this product or its product type. This is not a negative judgment — many wonderful toys simply haven’t been studied. It means we found no relevant evidence in our review of PubMed, Google Scholar, PsycINFO, and ERIC databases. ³

Emerging One to two relevant studies exist, but with meaningful limitations. Common limitations include: small sample sizes (typically n < 50), manufacturer funding without independent replication, indirect relevance (the study examined a related but different product type), or methodological concerns noted in peer review.

Moderate Multiple independent studies support the general developmental benefit claimed, though not necessarily for this specific product. For example, if a building toy claims to develop spatial reasoning, and there are several well-designed studies showing that block play supports spatial reasoning in the relevant age group — but none studying this particular product — that’s Moderate. The research supports the category claim, not the product claim.

Strong Robust, replicated research from independent labs directly supports the developmental claims. Studies are well-designed (adequate sample sizes, control groups, pre-registered hypotheses where applicable), the findings have been replicated, and the research addresses the specific type of play or product category with reasonable directness. ⁴

Important distinctions we make:

Research on a product category (e.g., “block play”) is not the same as research on a specific product. We note this difference explicitly.
Manufacturer-funded studies are flagged. They are not automatically dismissed — some are well-designed — but funding source is always disclosed. ⁵
We distinguish between claims the manufacturer makes and claims commonly attributed to the product by reviewers or parents. A product is evaluated against its own marketing, not against folk wisdom.

How We Conduct Reviews

Product Acquisition

We acquire products through two channels:

Retail purchase. We buy most products at retail price, through standard consumer channels. This is our default and preferred method.
Review samples. We accept unsolicited review samples from manufacturers. When we do, this is noted in the review header with a standard disclosure: “This product was provided as a review sample. Our editorial coverage is not influenced by how products are acquired.”

We never accept review samples with editorial conditions attached. If a manufacturer requests approval of the review before publication, copy approval, or guaranteed positive coverage, we decline the sample and purchase at retail.

Testing Protocol

Products are evaluated through a combination of:

Expert evaluation by our team, assessing construction quality, design, safety, and age-appropriateness
Real-world testing with children in the target age range, conducted in naturalistic play settings with parental consent ⁶
Literature review by Dr. Ramanathan, evaluating the evidence behind developmental claims
Safety review by Dr. Torres, checking compliance, recall history, and hazard assessment

Our testing period varies by product type. A board game might be tested across 8–10 play sessions over two weeks. A baby toy might be evaluated over a month to assess sustained engagement and durability.

Writing and Editing

Reviews are drafted by Sofia Marchetti based on the team’s evaluations and research briefs. Every review then passes through:

Fact-checking by Helen Park — every factual claim is verified against source material, every citation is confirmed, and every rating is checked against the rubric
Science review by Dr. Ramanathan — confirming the evidence section accurately represents the literature
Final editorial review by Dr. Margot Chen

We do not publish reviews that have not completed this process. There are no exceptions.

How We Make Money

ScienceBasedKids.com generates revenue through affiliate links. When you click a link to a product on our site and make a purchase from a retailer, we may receive a commission from that retailer. The price you pay is not affected.

We currently participate in affiliate programs with Amazon and select specialty toy retailers.

What this means in practice:

Every review that contains affiliate links includes a disclosure in the first paragraph. Not the footer. Not a separate page. The first paragraph.
Affiliate commissions vary by retailer and product category. We do not select products for review based on commission rates.
We do not use affiliate links in our Evidence Rating sections. The science stands alone.
We regularly publish negative reviews of products that carry affiliate links. A bad review with an affiliate link still earns us nothing — because you won’t buy a product we told you not to.

We do not accept payment for reviews. We do not accept sponsored content. We do not run display advertising. If our revenue model changes in the future, we will update this page and notify our newsletter subscribers.

Editorial Independence

Our editorial and business operations are structurally separate. Specifically:

Product selection is determined by editorial criteria: search demand, category coverage, interesting developmental claims worth investigating, and reader requests. Commission rates are not a factor.
Ratings are determined by the evaluation process described above. They are never adjusted based on affiliate relationships or manufacturer requests.
Negative reviews are published. Approximately 30% of products we review receive a Product Rating of 5 or below. ⁷
Corrections are handled transparently. If we make an error of fact, we correct it and note the correction with a date stamp at the top of the review.

No one outside our editorial team sees a review before it is published. Not manufacturers. Not affiliate partners. Not advertisers (we don’t have any).

Conflicts of Interest

We disclose the following potential conflicts of interest:

Dr. Margot Chen previously worked as a product development consultant for major toy companies. She has no ongoing financial relationships with any manufacturer whose products we review.
When a team member has a personal connection to a product or manufacturer, they recuse themselves from that review’s evaluation process.
We maintain a conflicts of interest log and will publish summary disclosures annually.

How to Hold Us Accountable

We built this methodology to be auditable. If you believe we have:

Misrepresented a study’s findings
Failed to disclose a conflict of interest
Published a factual error
Applied our rubric inconsistently

Please contact us. Correction requests are reviewed within 48 hours. Substantive corrections are noted at the top of the affected review. We take this seriously — our credibility is the only thing we have.

A Note on Certainty

Science is not a collection of settled facts. It is a process — ongoing, self-correcting, and frequently messy. A study that is “Strong” evidence today may be complicated by new findings tomorrow. An Evidence Rating of “None” may change as researchers turn their attention to a product category.

Our ratings represent our best assessment of the available evidence at the time of publication. We include the publication date and last-reviewed date on every review. When significant new research emerges, we update our evaluations.

We are not trying to tell you what to buy. We are trying to tell you what we know, what we don’t know, and how confident you should be in the claims being made. What you do with that information is — as it should be — entirely up to you.

Yes, “throw resistance” is a real criterion. If a product is marketed for toddlers, it will be thrown. We test accordingly. ↩
Our safety evaluation supplements but does not replace CPSC testing. We are not a certified testing laboratory. If you have a safety concern about a product, report it directly to the CPSC at cpsc.gov. ↩
We search PubMed, Google Scholar, PsycINFO, and ERIC using standardized search terms for each product category. Our search protocol is documented and consistent across reviews. We also review the manufacturer’s cited sources when provided. ↩
“Strong” does not mean “proven.” It means the evidence is substantial, replicated, and relevant. All scientific findings are provisional and subject to revision. We use the GRADE framework as a loose guide for assessing evidence quality, adapted for our specific context. ↩
A 2017 analysis in PLOS ONE found that industry-funded studies in related health fields were significantly more likely to report favorable results than independently funded studies (Lundh et al., 2017). This pattern has not been systematically studied in toy development research, but we apply the precautionary principle. ↩
Testing with children is conducted informally in home settings with full parental awareness and consent. We do not operate a formal research lab and our observations are not presented as experimental data. They inform our Play Value and Age-Appropriateness assessments. ↩
This figure will be updated quarterly as our review corpus grows. At launch, sample sizes are small. We include this commitment as a benchmark for our editorial independence. ↩