📋 Editorial Standards · AICraftGuide

Our AI Tool Testing Methodology

Every review and comparison on this site follows a structured, repeatable testing process. Here is exactly how we evaluate AI tools — and why you can trust what we publish.

🔬 Hands-on testing only

📊 Scored across 7 criteria

🔄 Updated when tools change

💰 No paid placements

🧪 50+ AI tools tested since 2026

⏱ 7–14 Days of real-world use per tool review

📋 7 Scoring criteria applied to every review

🚫 0 Paid or sponsored tool placements

AICraftGuide only publishes reviews and comparisons based on direct, hands-on testing. No tool is recommended here based on press releases, vendor briefings, or marketing copy alone.

There are thousands of AI tool review sites. Most of them rewrite product descriptions and call it a review. We do not. Every tool featured on AICraftGuide has been installed, used on real tasks, pushed to its limits, and scored against a consistent rubric before a single word is written.

This page explains exactly how that process works — what we test, how long we test it for, how we score it, and what our conflicts of interest policy looks like. If you ever have a question about a specific review, you can contact us directly and we will show our working.

Who conducts the testing?

Ahmed Bahaa Eldin

Founder & Lead Reviewer — AICraftGuide

Ahmed is a practitioner-researcher focused on the responsible integration of AI tools into professional workflows. He has been testing AI tools daily since early 2025, with a specific focus on how these tools perform for non-technical business users, content creators, and knowledge workers. His testing approach is grounded in real workflow tasks — not synthetic benchmarks — because that is where tool quality actually matters.

All primary testing on AICraftGuide is conducted by Ahmed personally. When a tool is complex enough to require specialist assessment — for example, a coding assistant reviewed from a developer's perspective, or a medical AI summariser reviewed from a clinician's perspective — we note this explicitly in the review and describe who contributed the specialist input.

What does our testing process look like?

Every tool review on AICraftGuide goes through five structured phases before publication. This is not a checklist we complete in an afternoon — for most tools, the full process takes between 7 and 14 days of active use.

Phase 1

🔍 Initial Setup & First Impressions

We sign up using the same account type an ordinary user would access (free tier first, then paid if applicable). We document the onboarding experience, any friction points, and the learning curve for a non-technical user. Time: Day 1–2.

Phase 2

📋 Core Feature Testing

We run the tool through 10–20 standardised real-world tasks relevant to its core use case. For a writing tool: drafting, editing, summarising, and tone-matching. For a research tool: source accuracy, citation quality, and hallucination rate. Time: Day 2–6.

Phase 3

🔥 Edge Case & Stress Testing

We deliberately try to break the tool — ambiguous inputs, complex multi-step tasks, tasks outside its stated use case, and prompts designed to expose hallucination tendencies. This is where most tools reveal their real limitations. Time: Day 6–9.

Phase 4

💰 Pricing & Value Assessment

We evaluate the free tier honestly (what it actually lets you do vs what is locked), compare paid plan pricing to direct competitors, and calculate whether the cost is justifiable for the target user. We never promote a tool's paid plan unless we believe it is genuinely worth the price. Time: Day 9–11.

Phase 5

📝 Scoring, Writing & Review

We score the tool across our 7-criteria rubric (see below), write the article based on our notes from all four phases, and perform a final accuracy check before publication. Time: Day 11–14.

💡 Note on re-testing: AI tools update frequently. When a tool we have reviewed releases a significant update — a new model, a redesigned interface, or a changed pricing structure — we re-test the affected areas and update the article. The publication date at the top of every review reflects the most recent update, not just the original publish date.

How do we score AI tools?

Every tool reviewed on AICraftGuide is scored on the same 7 criteria, each worth up to 10 points, giving a maximum total score of 70. We never cherry-pick criteria to make a tool look better than it is.

Criterion	What we are measuring	Max points	Weight
Output Quality	Accuracy, relevance, and usefulness of the AI's responses on real tasks	10	Highest weight
Ease of Use	How quickly a non-technical user can get results without a learning curve	10
Reliability & Consistency	Does the tool produce similarly good results across repeated tests, or does quality vary wildly?	10
Hallucination Rate	How often does the tool confidently produce false information? Tested with verifiable fact-check prompts.	10
Privacy & Data Safety	How the tool handles user data, what is logged, and whether enterprise/opt-out options exist	10
Pricing & Value	Fairness of the free tier and honest cost-per-value assessment of paid plans	10
Practical Workflow Fit	Does this tool actually save time and integrate into how real professionals work?	10

A tool scoring 60–70 is exceptional. 45–59 is good with notable caveats. 30–44 has significant limitations. Below 30 means we would not recommend it for professional use. We publish the score breakdown for every reviewed tool, not just the final number, so you can see exactly where it excels and where it falls short.

A real example: how we tested NotebookLM

To make the process concrete, here is a simplified version of the testing timeline we used for our NotebookLM vs YouMind comparison guide.

Day 1 — Setup & onboarding test

Created a fresh Google account with no prior NotebookLM history. Documented time-to-first-useful-output for a new user with no tutorial assistance.

Day 2–4 — Core feature testing with real documents

Uploaded 12 real-world documents (PDF research papers, Word reports, web articles) across different lengths and complexity levels. Tested the Q&A feature, Audio Overview generation, and source citation accuracy on each.

Day 5–6 — Hallucination testing

Submitted 15 prompts with verifiable answers that were present (and some deliberately not present) in the uploaded documents. Tracked how often the tool fabricated sources or answers versus correctly citing or declining to answer.

Day 7–8 — Edge case testing

Tested the 50-source notebook limit, cross-document question answering with conflicting information across sources, Arabic-language document upload, and very long PDF handling (>100 pages).

Day 9–10 — Comparison testing vs YouMind

Ran the exact same 10 tasks on both platforms, with screenshots of outputs taken within the same 2-hour window to ensure fair comparison (no update-induced differences).

Day 11–14 — Scoring, writing, fact-checking & publication

Applied the 7-criterion rubric, wrote the article, cross-checked all statistics against primary sources, and ran a final read-through for accuracy before publishing.

What we do not do — and why it matters

Our editorial independence is non-negotiable. We do not accept payment for positive coverage, and we do not change scores based on vendor requests. Full stop.

We do not write reviews based on press releases or vendor-provided demo environments. Everything is tested in the same environment you would use.
We do not accept payment, free upgrades, or gifts in exchange for positive coverage. If a tool gives us a complimentary account to review it, we disclose this in the article and score it identically to how we would score it if we had paid.
We do not remove negative findings from a review because a company asked us to. If something is a real limitation, it stays in the article.
We do not publish "Best AI tools of [year]" list articles that exist solely to earn affiliate clicks. If we link to a tool, it is because we tested it and genuinely believe it is useful for the specific audience we describe.
We do not test tools using artificially easy prompts designed to produce impressive-looking outputs. Our test tasks reflect the messy, complex, real-world requests that professionals actually need help with.
We do not compare tools using different account tiers without disclosing it. If we test Tool A on a Pro plan and Tool B on a free plan, we say so clearly.

⚠️ Affiliate link disclosure: Some articles on AICraftGuide contain affiliate links. This means if you click a link and purchase a subscription, we may receive a small commission at no extra cost to you. This does not influence our scores or recommendations. We only include affiliate links for tools we have tested and would recommend regardless of the commission. Every article that contains affiliate links is labelled accordingly.

The AICraftGuide Editorial Promise

Every article we publish lives up to four commitments. These are not aspirations — they are the standard we hold ourselves to on every piece of content, before it goes live.

✓ Tested first, written second
✓ Scores never adjusted for commercial reasons
✓ Limitations reported, not hidden
✓ Updated when tools change

How to read an AICraftGuide review

Every tool review on this site follows the same structure so you can find the information you need quickly, regardless of which article you are reading.

Standard review structure

Who this tool is actually for — stated at the top of every review, because the right tool depends entirely on your use case and skill level.
Testing conditions — which plan was tested, when the testing was conducted, and which version of the tool was active at the time.
Score breakdown — all 7 criteria scored individually, with a brief explanation of each score.
What it does well — specific, tested examples, not generic praise.
Where it falls short — real limitations discovered during testing, not pulled from competitor marketing.
Verdict and recommendation — a clear, direct answer to "should you use this tool?" for the specific audience described at the top.
Last tested date — so you know how recent the assessment is. AI tools change fast.

✅ Our commitment to transparency: If you read a review on AICraftGuide and believe something has changed since we published it, or if your experience with a tool differs significantly from ours, send us a message. We take reader feedback seriously and will re-test if there is a credible reason to believe our assessment is out of date.

📜 Editorial Independence Declaration

AICraftGuide is independently owned and operated by Ahmed Bahaa Eldin. No external investor, advertiser, or AI company has editorial control over the content published on this site.

Tool vendors are free to contact us to submit their products for review. Submission does not guarantee coverage, and coverage does not guarantee a positive assessment. We test what we receive on the same terms as any other tool.

Our scoring rubric is fixed. It does not change between reviews, and it is not adjusted based on a tool's popularity, market position, or commercial relationship with this site.

This methodology page was last updated in April 2026. If our testing process changes in a material way, we will update this page and note what changed and why.

📅 This page was last reviewed and updated: April 2026 · Questions? Contact Ahmed →

Follow Us:

How We Test AI Tools — Our Testing Methodology

Our AI Tool Testing Methodology

Who conducts the testing?

Ahmed Bahaa Eldin

What does our testing process look like?

How do we score AI tools?

A real example: how we tested NotebookLM

Day 1 — Setup & onboarding test

Day 2–4 — Core feature testing with real documents

Day 5–6 — Hallucination testing

Day 7–8 — Edge case testing

Day 9–10 — Comparison testing vs YouMind

Day 11–14 — Scoring, writing, fact-checking & publication

What we do not do — and why it matters

The AICraftGuide Editorial Promise

How to read an AICraftGuide review

Standard review structure

📜 Editorial Independence Declaration

Post a Comment

Contact Form