$1.6B Secured
500+ Proposals Written
Federal, State & Foundation Grants
Agency Tactics

How SBIR Applications Are Scored: NIH vs. NSF vs. ARPA-H vs. DoD

NalinLast updated: March 31, 2026

NIH scores your SBIR application on a 1-9 scale where 1 means "Exceptional." NSF uses the same 1-9 scale, but 9 means "Exceptional." ARPA-H doesn't use a panel at all -- a single Program Manager reads your submission and decides. DoD rejects your technology if it doesn't match the exact solicitation topic. These aren't minor differences. They determine whether you frame your innovation as hypothesis-driven research, high-risk R&D, a 10x health improvement, or an operational solution.

The complete comparison

Dimension NIH NSF ARPA-H DoD
Scoring scale 1-9 (1 = best) 1-9 (9 = best) 1-9 (9 = best) Varies by component
Who reviews Panel of 15-20 scientists Program Director + expert Single Program Manager Technical evaluators
Top criterion Approach Innovation Classification Non-Incremental Innovation (25%) Topic alignment
What kills applications Sequential aim dependencies Tier C (engineering, not R&D) Failing the 60-second test Misaligned to solicitation topic
Innovation bar Hypothesis-driven R&D High-risk/high-reward R&D 10x improvement required Solves the defined problem
Preliminary data Critical Less formal; customer signals valued Proof-of-concept Varies by topic
Language culture Scientific, hypothesis-driven R&D-focused, national significance Plain language, outcome-focused Operational, mission-focused
Decision language Fundable / Not Competitive Invite / Decline Encourage / Discourage Select / Not Select

Each agency optimizes for a different question:

  • NIH: "Will this advance scientific knowledge and improve health?"
  • NSF: "Is this genuine high-risk/high-reward R&D with national significance?"
  • ARPA-H: "Can this solve a health problem in a way conventional approaches cannot?"
  • DoD: "Does this solve the specific operational problem we defined?"

How NIH scores: 5 criteria, study section review

NIH uses the most formalized review process. A study section panel (15-20 domain scientists) assigns 3 reviewers to each application.

The 5 criteria

Each reviewer scores on a 1-9 scale where 1 = Exceptional and 9 = Poor.

Criterion Core Question What Kills Applications
Significance Does this address an important health problem? Generic health burden instead of quantified data with CDC/WHO sources
Innovation Does this challenge existing approaches? Claiming "novel" without explaining what specifically is new
Approach Is the methodology well-reasoned? Sequential aim dependencies, missing potential problems section
Investigators Is the PI suited for this work? No preliminary data supporting the proposed hypothesis
Environment Does the institution support success? Missing equipment descriptions, no collaboration evidence

Overall Impact: the score that matters

Reviewers produce an Overall Impact score reflecting the likelihood of sustained influence on the field. This is NOT the average of the 5 criteria. A fatal flaw in one criterion (particularly Approach) can drive Overall Impact to unfundable levels even if the other four score well.

  • Scores 1-3: typically fundable
  • Scores 4-5: needs revision
  • Scores 6-9: not competitive

Half of applications get triaged

NIH triages the bottom half before the study section meeting. "Not Discussed" means your application never received a formal score. Triage triggers: no preliminary data, vague hypothesis, sequential aim dependencies, Phase II scope in a Phase I budget, missing potential problems section.

How NSF scores: innovation classification is everything

The screening gate

Before technical review, the Program Director applies 5 screening questions. Fail any one and your pitch is declined regardless of merit:

Screening Question What They're Really Asking
Has this been done before? Is there genuine R&D novelty?
Are there technical hurdles NSF R&D could overcome? Is the risk technical (fundable) or business risk (not fundable)?
Could this disrupt the target market? Is the impact nationally significant?
Is there evidence of product-market fit? Real customer signals, not just a TAM slide?
Is there potential for broad societal impact? Specific population and mechanism of benefit?

Innovation classification

NSF classifies your innovation before scoring:

  • Tier A -- New scientific principle: typically scores highest. This is what NSF wants.
  • Tier B -- Novel application of known science: competitive but not a slam dunk.
  • Tier C -- Engineering optimization: rarely scores well. Effectively a decline.

If reviewers can't tell if your work is A/B or C, that ambiguity itself is a red flag. NSF's primary gate is whether you're doing genuine R&D versus product development dressed as research.

The 3 criteria

  1. Intellectual Merit -- potential to advance scientific knowledge
  2. Broader Impacts -- how the technology benefits society (for SBIR: specific population, mechanism, scale)
  3. Commercial Impact -- market need, scalability, whether NSF funding de-risks the technology

How ARPA-H evaluates: the 60-second test

ARPA-H has no peer review panels. A single Program Manager reads your 6-page Solution Summary and decides: Encourage or Discourage.

The PM should understand your concept in 60 seconds

If your opening requires domain-specific knowledge to parse, the PM assumes your thinking is unclear.

Fails the test: "We are developing a platform to improve cancer treatment."

Passes the test: "We are developing a [specific technology] that [mechanism] to [quantified outcome], which would [health impact] for [specific population]."

5 weighted criteria

Criterion Weight What the PM Looks For
Non-Incremental Innovation 25% 10x better, not 10%. New mechanism, not better implementation.
Health Impact and Scale 25% Quantified in patients/lives, not market size. Equity addressed.
Technical Feasibility 20% Measurable milestones with real Go/No-Go decisions.
Team and Execution 15% Three pillars: technical + clinical + commercialization.
Writing Quality 15% Passes 60-second test. Jargon-free. Quantified throughout.

Language matters: NIH vocabulary is a red flag at ARPA-H

NIH Language (avoid at ARPA-H) ARPA-H Language (use instead)
"Hypothesis-driven" "Will demonstrate"
"Specific aims" "Milestones with Go/No-Go"
"Preliminary data suggests" "Preliminary data demonstrates"
"Grantee" "Performer"
"Market opportunity ($XB TAM)" "Health impact (X million patients)"

How DoD scores: topic alignment is king

DoD SBIR is topic-driven. You respond to a specific solicitation topic, not your own research question.

Component Format Key Differentiator
Standard DoD (Navy, Army, SOCOM) Topic-based proposals Respond to explicit topic requirements
AFWERX Open Topic + Specific Topic Commercial viability weighted equally with technical merit
DARPA BAA-specific PM-directed; Proposers Day attendance matters

DoD awards are contracts, not grants. This means defined deliverables and milestones from the solicitation, not self-defined research aims. IP and patent strategy matter more at DoD than civilian agencies.

Common DoD decline patterns

  1. Topic misalignment -- addresses a related but different problem than the solicitation specifies
  2. No operational context -- technology described in commercial terms without defense use case
  3. Missing IP strategy -- unclear data rights or IP protection plan
  4. Academic framing -- reads like an NIH grant instead of a defense contract response

Writing the same technology for different agencies

If you're applying to multiple agencies (recommended -- a portfolio approach improves your odds):

For NIH: lead with scientific significance. Hypothesis-driven. Quantify health burden. Structure aims as independent, testable hypotheses.

For NSF: lead with innovation classification. Demonstrate Tier A/B novelty. Frame Broader Impacts around populations and mechanisms, not revenue.

For ARPA-H: lead with the 10x improvement. 60-second clarity. Health impact in patients, not dollars. Use ARPA-H vocabulary.

For DoD: lead with topic alignment. Show your technology addresses the operational need. Emphasize IP protection and defense sector credibility.

For the full cross-agency proposal strategy, see how to win an SBIR grant. For agency-specific guides, see our NSF pitch guide, AFWERX guide, or DARPA BAA guide.

Want to know how your proposal would score?

We write proposals across NIH, NSF, ARPA-H, and DoD. If you're not sure which agency your technology is most competitive for, that's the first question to answer before investing 80+ hours. Our Strategy Review includes agency-fit assessment specific to your technology.

Frequently Asked Questions

Each agency scores differently. NIH uses a study section panel scoring 5 criteria (Significance, Innovation, Approach, Investigators, Environment) on a 1-9 scale where 1 is best. NSF uses a Program Director + expert review on Intellectual Merit, Broader Impacts, and Commercial Impact (1-9 where 9 is best). ARPA-H uses a single Program Manager who decides 'Encourage' or 'Discourage.' DoD evaluates against specific solicitation topic alignment.
No. NIH uses 1-9 where 1 = Exceptional (best). NSF and ARPA-H use 1-9 where 9 = Exceptional (best). This is the most common source of confusion for founders applying to multiple agencies. A score of 2 means near-perfect at NIH but near-bottom at NSF.
Approach is empirically the strongest predictor of funding at NIH. It evaluates whether your methodology is well-reasoned, with specific experimental design, success criteria, potential problems, and alternative strategies. A fatal flaw in Approach can drive your Overall Impact score to unfundable levels even if other criteria score well.
Before scoring, NSF classifies your innovation into tiers. Tier A (new scientific principle) typically scores highest. Tier B (novel application of known science) is competitive. Tier C (engineering optimization) rarely scores above 4 out of 9 and is effectively a decline. If reviewers can't distinguish A/B from C, the ambiguity itself is a red flag.
Radically differently from other agencies. A single Program Manager reads your 6-page Solution Summary and decides alone -- no peer review panel. The PM should understand your concept in 60 seconds. ARPA-H requires a 10x improvement metric over existing approaches, uses different vocabulary from NIH, and evaluates on Non-Incremental Innovation (25%), Health Impact (25%), Technical Feasibility (20%), Team (15%), and Writing Quality (15%).
Technically yes, but it will score poorly everywhere. Each agency has different criteria, different scoring direction, and different vocabulary. An NIH-style application sent to ARPA-H signals you don't understand how ARPA-H works. Budget 20-40 hours per agency-specific adaptation.
NIH triages the bottom half of applications before the study section meeting -- they never get discussed or scored. Common triage triggers: no preliminary data for any aim, vague or untestable central hypothesis, sequential aim dependencies (Aim 2 requires Aim 1 success), Phase II scope crammed into a Phase I budget, and missing potential problems section.
The 'easiest' agency is the one where your technology best matches the review criteria -- not the one with the highest success rate. NIH Phase I success rates are 15-25% by Institute. NSF is competitive but its two-step pitch process gives early feedback. ARPA-H is newer and still establishing patterns. DoD success depends heavily on topic alignment.

Ready to explore your funding options?

We'll map your technology to the most relevant programs and tell you where to start. 15 minutes, no obligation.

Book Strategy Review