How are SBIR applications scored?

Each agency scores differently. NIH uses a study section panel scoring 5 criteria (Significance, Innovation, Approach, Investigators, Environment) on a 1-9 scale where 1 is best. NSF uses a Program Director + expert review on Intellectual Merit, Broader Impacts, and Commercial Impact (1-9 where 9 is best). ARPA-H uses a single Program Manager who decides 'Encourage' or 'Discourage.' DoD evaluates against specific solicitation topic alignment.

Do all SBIR agencies use the same scoring scale?

No. NIH uses 1-9 where 1 = Exceptional (best). NSF and ARPA-H use 1-9 where 9 = Exceptional (best). This is the most common source of confusion for founders applying to multiple agencies. A score of 2 means near-perfect at NIH but near-bottom at NSF.

What is the most important SBIR review criterion at NIH?

Approach is empirically the strongest predictor of funding at NIH. It evaluates whether your methodology is well-reasoned, with specific experimental design, success criteria, potential problems, and alternative strategies. A fatal flaw in Approach can drive your Overall Impact score to unfundable levels even if other criteria score well.

What is NSF's innovation classification?

Before scoring, NSF classifies your innovation into tiers. Tier A (new scientific principle) typically scores highest. Tier B (novel application of known science) is competitive. Tier C (engineering optimization) rarely scores above 4 out of 9 and is effectively a decline. If reviewers can't distinguish A/B from C, the ambiguity itself is a red flag.

How does ARPA-H review SBIR applications?

Radically differently from other agencies. A single Program Manager reads your 6-page Solution Summary and decides alone -- no peer review panel. The PM should understand your concept in 60 seconds. ARPA-H requires a 10x improvement metric over existing approaches, uses different vocabulary from NIH, and evaluates on Non-Incremental Innovation (25%), Health Impact (25%), Technical Feasibility (20%), Team (15%), and Writing Quality (15%).

Can I submit the same SBIR application to multiple agencies?

Technically yes, but it will score poorly everywhere. Each agency has different criteria, different scoring direction, and different vocabulary. An NIH-style application sent to ARPA-H signals you don't understand how ARPA-H works. Budget 20-40 hours per agency-specific adaptation.

What gets an SBIR application triaged at NIH?

NIH triages the bottom half of applications before the study section meeting -- they never get discussed or scored. Common triage triggers: no preliminary data for any aim, vague or untestable central hypothesis, sequential aim dependencies (Aim 2 requires Aim 1 success), Phase II scope crammed into a Phase I budget, and missing potential problems section.

Which SBIR agency is easiest to get funded by?

The 'easiest' agency is the one where your technology best matches the review criteria -- not the one with the highest success rate. NIH Phase I success rates are 15-25% by Institute. NSF is competitive but its two-step pitch process gives early feedback. ARPA-H is newer and still establishing patterns. DoD success depends heavily on topic alignment.

How SBIR Applications Are Scored: NIH vs NSF vs DoD Review Criteria (2026)

NIH scores your SBIR application on a 1-9 scale where 1 means "Exceptional." NSF uses the same 1-9 scale, but 9 means "Exceptional." ARPA-H doesn't use a panel at all -- a single Program Manager reads your submission and decides. DoD rejects your technology if it doesn't match the exact solicitation topic. These aren't minor differences. They determine whether you frame your innovation as hypothesis-driven research, high-risk R&D, a 10x health improvement, or an operational solution.

The complete comparison

Dimension	NIH	NSF	ARPA-H	DoD
Scoring scale	1-9 (1 = best)	1-9 (9 = best)	1-9 (9 = best)	Varies by component
Who reviews	Panel of 15-20 scientists	Program Director + expert	Single Program Manager	Technical evaluators
Top criterion	Approach	Innovation Classification	Non-Incremental Innovation (25%)	Topic alignment
What kills applications	Sequential aim dependencies	Tier C (engineering, not R&D)	Failing the 60-second test	Misaligned to solicitation topic
Innovation bar	Hypothesis-driven R&D	High-risk/high-reward R&D	10x improvement required	Solves the defined problem
Preliminary data	Critical	Less formal; customer signals valued	Proof-of-concept	Varies by topic
Language culture	Scientific, hypothesis-driven	R&D-focused, national significance	Plain language, outcome-focused	Operational, mission-focused
Decision language	Fundable / Not Competitive	Invite / Decline	Encourage / Discourage	Select / Not Select

Each agency optimizes for a different question:

NIH: "Will this advance scientific knowledge and improve health?"
NSF: "Is this genuine high-risk/high-reward R&D with national significance?"
ARPA-H: "Can this solve a health problem in a way conventional approaches cannot?"
DoD: "Does this solve the specific operational problem we defined?"

How NIH scores: 5 criteria, study section review

NIH uses the most formalized review process. A study section panel (15-20 domain scientists) assigns 3 reviewers to each application.

The 5 criteria

Each reviewer scores on a 1-9 scale where 1 = Exceptional and 9 = Poor.

Criterion	Core Question	What Kills Applications
Significance	Does this address an important health problem?	Generic health burden instead of quantified data with CDC/WHO sources
Innovation	Does this challenge existing approaches?	Claiming "novel" without explaining what specifically is new
Approach	Is the methodology well-reasoned?	Sequential aim dependencies, missing potential problems section
Investigators	Is the PI suited for this work?	No preliminary data supporting the proposed hypothesis
Environment	Does the institution support success?	Missing equipment descriptions, no collaboration evidence

Overall Impact: the score that matters

Reviewers produce an Overall Impact score reflecting the likelihood of sustained influence on the field. This is NOT the average of the 5 criteria. A fatal flaw in one criterion (particularly Approach) can drive Overall Impact to unfundable levels even if the other four score well.

Scores 1-3: typically fundable
Scores 4-5: needs revision
Scores 6-9: not competitive

Half of applications get triaged

NIH triages the bottom half before the study section meeting. "Not Discussed" means your application never received a formal score. Triage triggers: no preliminary data, vague hypothesis, sequential aim dependencies, Phase II scope in a Phase I budget, missing potential problems section.

How NSF scores: innovation classification is everything

The screening gate

Before technical review, the Program Director applies 5 screening questions. Fail any one and your pitch is declined regardless of merit:

Screening Question	What They're Really Asking
Has this been done before?	Is there genuine R&D novelty?
Are there technical hurdles NSF R&D could overcome?	Is the risk technical (fundable) or business risk (not fundable)?
Could this disrupt the target market?	Is the impact nationally significant?
Is there evidence of product-market fit?	Real customer signals, not just a TAM slide?
Is there potential for broad societal impact?	Specific population and mechanism of benefit?

Innovation classification

NSF classifies your innovation before scoring:

Tier A -- New scientific principle: typically scores highest. This is what NSF wants.
Tier B -- Novel application of known science: competitive but not a slam dunk.
Tier C -- Engineering optimization: rarely scores well. Effectively a decline.

If reviewers can't tell if your work is A/B or C, that ambiguity itself is a red flag. NSF's primary gate is whether you're doing genuine R&D versus product development dressed as research.

The 3 criteria

Intellectual Merit -- potential to advance scientific knowledge
Broader Impacts -- how the technology benefits society (for SBIR: specific population, mechanism, scale)
Commercial Impact -- market need, scalability, whether NSF funding de-risks the technology

How ARPA-H evaluates: the 60-second test

ARPA-H has no peer review panels. A single Program Manager reads your 6-page Solution Summary and decides: Encourage or Discourage.

The PM should understand your concept in 60 seconds

If your opening requires domain-specific knowledge to parse, the PM assumes your thinking is unclear.

Fails the test: "We are developing a platform to improve cancer treatment."

Passes the test: "We are developing a [specific technology] that [mechanism] to [quantified outcome], which would [health impact] for [specific population]."

5 weighted criteria

Criterion	Weight	What the PM Looks For
Non-Incremental Innovation	25%	10x better, not 10%. New mechanism, not better implementation.
Health Impact and Scale	25%	Quantified in patients/lives, not market size. Equity addressed.
Technical Feasibility	20%	Measurable milestones with real Go/No-Go decisions.
Team and Execution	15%	Three pillars: technical + clinical + commercialization.
Writing Quality	15%	Passes 60-second test. Jargon-free. Quantified throughout.

Language matters: NIH vocabulary is a red flag at ARPA-H

NIH Language (avoid at ARPA-H)	ARPA-H Language (use instead)
"Hypothesis-driven"	"Will demonstrate"
"Specific aims"	"Milestones with Go/No-Go"
"Preliminary data suggests"	"Preliminary data demonstrates"
"Grantee"	"Performer"
"Market opportunity ($XB TAM)"	"Health impact (X million patients)"

How DoD scores: topic alignment is king

DoD SBIR is topic-driven. You respond to a specific solicitation topic, not your own research question.

Component	Format	Key Differentiator
Standard DoD (Navy, Army, SOCOM)	Topic-based proposals	Respond to explicit topic requirements
AFWERX	Open Topic + Specific Topic	Commercial viability weighted equally with technical merit
DARPA	BAA-specific	PM-directed; Proposers Day attendance matters

DoD awards are contracts, not grants. This means defined deliverables and milestones from the solicitation, not self-defined research aims. IP and patent strategy matter more at DoD than civilian agencies.

Common DoD decline patterns

Topic misalignment -- addresses a related but different problem than the solicitation specifies
No operational context -- technology described in commercial terms without defense use case
Missing IP strategy -- unclear data rights or IP protection plan
Academic framing -- reads like an NIH grant instead of a defense contract response

Writing the same technology for different agencies

If you're applying to multiple agencies (recommended -- a portfolio approach improves your odds):

For NIH: lead with scientific significance. Hypothesis-driven. Quantify health burden. Structure aims as independent, testable hypotheses.

For NSF: lead with innovation classification. Demonstrate Tier A/B novelty. Frame Broader Impacts around populations and mechanisms, not revenue.

For ARPA-H: lead with the 10x improvement. 60-second clarity. Health impact in patients, not dollars. Use ARPA-H vocabulary.

For DoD: lead with topic alignment. Show your technology addresses the operational need. Emphasize IP protection and defense sector credibility.

For the full cross-agency proposal strategy, see how to win an SBIR grant. For agency-specific guides, see our NSF pitch guide, AFWERX guide, or DARPA BAA guide.

Want to know how your proposal would score?

We write proposals across NIH, NSF, ARPA-H, and DoD. If you're not sure which agency your technology is most competitive for, that's the first question to answer before investing 80+ hours. Our Strategy Review includes agency-fit assessment specific to your technology.

How SBIR Applications Are Scored: NIH vs. NSF vs. ARPA-H vs. DoD

The complete comparison

How NIH scores: 5 criteria, study section review

The 5 criteria

Overall Impact: the score that matters

Half of applications get triaged

How NSF scores: innovation classification is everything

The screening gate

Innovation classification

The 3 criteria

How ARPA-H evaluates: the 60-second test

The PM should understand your concept in 60 seconds

5 weighted criteria

Language matters: NIH vocabulary is a red flag at ARPA-H

How DoD scores: topic alignment is king

Common DoD decline patterns

Writing the same technology for different agencies

Want to know how your proposal would score?

Frequently Asked Questions

Ready to explore your funding options?

The complete comparison

How NIH scores: 5 criteria, study section review

The 5 criteria

Overall Impact: the score that matters

Half of applications get triaged

How NSF scores: innovation classification is everything

The screening gate

Innovation classification

The 3 criteria

How ARPA-H evaluates: the 60-second test

The PM should understand your concept in 60 seconds

5 weighted criteria

Language matters: NIH vocabulary is a red flag at ARPA-H

How DoD scores: topic alignment is king

Common DoD decline patterns

Writing the same technology for different agencies

Want to know how your proposal would score?

Frequently Asked Questions

Ready to explore your funding options?

Related Guides

How to Write an NSF SBIR Project Pitch That Gets Invited

Which Federal Agency Should Fund Your Startup? A Decision Guide

ARPA-H for Startups: How to Write a Solution Summary That Gets Invited