Last updated: March 25, 2026 | Author: Nalin Bhatt, Cada
NIH scores your SBIR application on a 1-9 scale where 1 means "Exceptional." NSF uses the same 1-9 scale -- but 9 means "Exceptional." ARPA-H doesn't use a panel at all: a single Program Manager reads your 6-page summary and decides in 60 seconds. And at DOD, your technology gets rejected if it doesn't match the exact solicitation topic.
These aren't minor procedural differences. They determine whether you frame your innovation as hypothesis-driven research (NIH), high-risk R&D (NSF), a 10x health improvement (ARPA-H), or an operational solution (DOD). Get the framing wrong and you're dead on arrival.
Most founders write one application and submit it to multiple agencies. That's a reliable way to lose everywhere. Each agency has a different review culture, different criteria, different scoring direction, and different deal-breakers. This guide breaks down exactly how NIH, NSF, ARPA-H, and DOD evaluate SBIR applications -- based on Cada's experience writing across these agencies over the past two years.
Who Actually Reviews Your SBIR Application at Each Agency?
Before worrying about what reviewers score, understand who is reading your application. The review structure determines everything about how you should write.
| Agency | Who Reviews | Review Format | Decision Mechanism |
|---|---|---|---|
| NIH | Panel of 15-20 scientists | 3 assigned reviewers per application | Consensus score from study section |
| NSF | Program Director + technical expert | PD screens first, then merit review | PD decides based on screening + review |
| ARPA-H | Single Program Manager | PM reads and evaluates alone | PM decides: "Encourage" or "Discourage" |
| DOD (Standard SBIR) | Technical evaluators | Topic-based evaluation | Aligned to solicitation topic requirements |
| AFWERX | Panel evaluators | Pitch competition format | Presentation-based assessment |
| DARPA | Program managers | BAA-specific review | PM-driven, Proposers Day attendance matters |
At NIH, you're writing for a committee of scientists who will debate your application's merits. At ARPA-H, you're writing for one person who needs to understand your concept in 60 seconds. At NSF, you need to pass a 5-question screening gate before your application even reaches technical review. These structural differences should change how you write every section.
How NIH Scores SBIR Applications: 5 Criteria on a 1-9 Scale
NIH uses the most formalized review process of any SBIR agency. A study section panel -- typically 15-20 domain scientists -- assigns 3 reviewers to each application: a primary reviewer (clinician-scientist), a secondary reviewer (methods expert), and a discussant (commercialization/environment expert).
The 5 Review Criteria
Each reviewer scores all 5 criteria independently on a 1-9 scale where 1 = Exceptional and 9 = Poor. Yes, the scale runs opposite to what most people expect.
| Criterion | Core Question | What Kills Applications |
|---|---|---|
| Significance | Does this address an important health problem? | Generic health burden ("improves patient outcomes") instead of quantified burden with CDC/WHO data |
| Investigator(s) | Is the PI well-suited for this work? | No preliminary data, or data from a different system that doesn't support the proposed hypothesis |
| Innovation | Does this challenge existing approaches? | Claiming "novel" without explaining what specifically is new and why it matters scientifically |
| Approach | Is the methodology well-reasoned? | Sequential aim dependencies where Aim 2 fails if Aim 1 fails, or missing potential problems section |
| Environment | Does the institution support success? | Lack of collaboration evidence or missing equipment descriptions |
Overall Impact -- The Score That Actually Matters
Reviewers also produce an Overall Impact score reflecting the likelihood the project will have a sustained, powerful influence on the research field. This is NOT the average of the 5 criterion scores. A fatal flaw in one criterion -- say, aims that are sequential dependencies -- can drive Overall Impact to a 7 even if the other four criteria score 2-3.
Applications scoring Overall Impact 1-3 are typically "Fundable." Scores of 4-5 mean "Needs Revision." Scores of 6-9 are "Not Competitive." The funded percentile varies by Institute -- some ICs fund the top 20%, others the top 30% -- so check your target IC's payline before assuming a score of 3 is safe.
Triage: Half of Applications Never Get Discussed
NIH triages the bottom half of applications before the study section meeting. "Not Discussed" means your application was triaged -- it never received a formal score or discussion. Any of these weaknesses alone can trigger triage:
- No preliminary data for any aim
- Central hypothesis is vague or untestable
- Aims are sequential dependencies (Aim 2 requires Aim 1 success)
- Phase I scope is really Phase II scope (too ambitious for 6-12 months)
- Missing potential problems and alternative strategies
That means half the applications submitted to NIH are eliminated before a single reviewer advocates for them in the room. If your application has any of these issues, fixing them before submission is the highest-ROI use of your time.
What Good Looks Like at NIH
Significance (target score: 1-3): Name the disease, quantify the burden with incidence/prevalence/mortality data, cite CDC or WHO sources. Frame within the target Institute's strategic priorities. Include 2-3 sentences on commercial potential framed as a healthcare delivery problem -- not revenue projections.
Approach (target score: 1-3): Each aim has rationale, preliminary data, experimental design, expected outcomes, potential problems with specific alternatives, and milestones. Success criteria are specific: "This aim will be considered successful if [metric] exceeds [threshold]." Include a rigor and reproducibility paragraph.
Innovation (target score: 1-3): Use a comparison table showing specific feature/metric differences vs. current scientific approaches (not commercial competitors by name). Explain what is new and why it matters -- "novel" alone is never sufficient.
How NSF Scores SBIR Pitches: Innovation Classification Is Everything
NSF SBIR review works differently from NIH in almost every way. The scoring direction is reversed (9 = Exceptional), the criteria are different, and there's a screening gate before your pitch reaches technical review.
The Program Director Screening Gate
Before any technical review, the Program Director applies 5 screening questions. Fail any one and your pitch is declined regardless of technical merit:
| Screening Question | What They're Really Asking |
|---|---|
| Has this been attempted/done before? | Is there genuine R&D novelty, or are you rebuilding something that exists? |
| Are there technical hurdles that NSF R&D could overcome? | Is the risk technical (fundable) or business/market risk (not fundable)? |
| Could this disrupt the targeted market segment? | Is the impact nationally significant or niche? |
| Is there evidence of product-market fit? | Do you have real customer signals, not just a TAM slide? |
| Is there potential for broad societal impact? | Can you name a specific population and mechanism of benefit? |
This screening gate is the single most important thing to understand about NSF SBIR. Your pitch can have world-class technology and still get declined at screening if the PD classifies your work as engineering optimization rather than R&D.
Innovation Classification: The Single Most Important Factor
Before scoring your pitch, NSF reviewers assess whether the work represents genuine R&D or incremental engineering. Based on Cada's analysis of NSF review outcomes, we classify innovation into three tiers that predict scoring outcomes:
- Tier A -- New scientific principle or method: Typically scores 7+ (out of 9). This is what NSF wants to fund.
- Tier B -- Novel application of known science to a new domain: Typically scores 5+. Competitive but not a slam dunk.
- Tier C -- Engineering optimization of existing approaches: Rarely scores above 4. This is effectively a decline.
If the reviewer can't clearly distinguish whether your work is Tier A/B or Tier C, that ambiguity is itself a red flag. NSF's primary gate is whether you're doing genuine high-risk/high-reward R&D versus product development dressed as research.
NSF Review Criteria
NSF uses 3 core criteria plus technical risk assessment:
- Intellectual Merit -- Potential to advance scientific or engineering knowledge
- Broader Impacts -- How the technology benefits society (for SBIR, this is NOT about education outreach or diversity programs -- it's about whether your technology itself has national significance)
- Commercial Impact -- Market need, scalability, and whether NSF funding meaningfully de-risks the technology
Broader Impacts for SBIR founders: This trips up applicants who've written academic NSF grants. For SBIR, Broader Impacts means naming a specific population that benefits, a mechanism of benefit, and a plausible scale. "This technology will benefit society and create jobs" fails. "If Phase I demonstrates 95% accuracy, the technology could reduce diagnostic time by 40% for the 15M patients annually in rural health systems" passes.
Common NSF Decline Patterns
The top 3 reasons NSF declines SBIR pitches:
- Incremental improvement, not R&D breakthrough -- "Better, faster, cheaper" without a technical leap gets classified as Tier C
- Niche market, not nationally significant -- NSF funds technologies with broad societal impact, not narrow vertical solutions
- Objectives describe product development, not R&D -- If a standard contractor could do the proposed work, it's not NSF-fundable
How ARPA-H Evaluates Applications: The 60-Second Test and PM Decision
ARPA-H is the newest health research agency, and its review process is radically different from NIH or NSF. There are no peer review panels. A single Program Manager reads your 6-page Solution Summary and decides whether to "Encourage" or "Discourage" you from submitting a full proposal.
The 60-Second Test
The PM should understand what your technology does and why it matters in under 60 seconds of reading your concept summary. If your opening section requires domain-specific knowledge to understand, the PM will assume your thinking is unclear. This is the single most important gate at ARPA-H.
A concept summary that fails the 60-second test:
"We are developing a platform to improve cancer treatment."
A concept summary that passes:
"We are developing a [specific technology] that [mechanism] to [quantified outcome], which would [health impact] for [specific population]."
The difference: the second version tells the PM exactly what, how, and for whom -- in one sentence.
5 Weighted Evaluation Criteria
Based on Cada's analysis of ARPA-H PM evaluation patterns, we model 5 weighted criteria that reflect what PMs reward:
| Criterion | Weight | What the PM Looks For |
|---|---|---|
| Non-Incremental Innovation | 25% | Is this genuinely 10x better, not 10%? A new mechanism, not a better implementation? |
| Health Impact and Scale | 25% | Health burden quantified in patients/lives/QALYs -- NOT market size. Equity addressed. |
| Technical Feasibility and Milestones | 20% | Measurable milestones with real Go/No-Go decisions. Honest about risks. |
| Team and Execution Capability | 15% | Three-pillar coverage: technical + clinical + commercialization expertise. |
| Writing Quality and PM Communication | 15% | Passes 60-second test. Jargon-free. Direct, outcome-focused. Quantified throughout. |
Note: ARPA-H does not publish a formal scoring rubric like NIH. These weights reflect Cada's model of what PM review consistently rewards, based on our experience with ARPA-H submissions.
The 10x Bar
ARPA-H explicitly requires non-incremental innovation. Your mandatory metrics comparison table must show at least one metric with >= 10x improvement over existing approaches. The table requires sourced baselines and year-by-year targets.
"Better, faster, cheaper" is not ARPA-H language. "10x reduction in diagnostic time enabled by [mechanism]" is.
Language Culture: NIH Vocabulary Is a Red Flag at ARPA-H
ARPA-H rejects NIH language as a cultural signal of the wrong kind of thinking. Using the wrong vocabulary tells the PM you haven't read ARPA-H's own guidance -- and that's a credibility hit before they even evaluate your technology.
| NIH Language (Avoid at ARPA-H) | ARPA-H Language (Use Instead) |
|---|---|
| "Hypothesis-driven" | "Will demonstrate" |
| "Specific aims" | "Milestones with Go/No-Go" |
| "Preliminary data suggests" | "Preliminary data demonstrates" |
| "Grantee" | "Performer" |
| "Phase 1" | "Base period" |
| "Program officer" | "Program manager / PM" |
| "Pilot study" | "Proof-of-concept" |
| "Market opportunity ($XB TAM)" | "Health impact (X million patients)" |
Three-Pillar Team Requirement
ARPA-H expects your team to cover three pillars. Missing any one is a significant gap:
- Technical expertise -- the science/engineering behind the innovation
- Clinical expertise -- understanding of the health problem, patient needs, clinical workflow
- Commercialization/adoption expertise -- regulatory pathway, manufacturing, reimbursement
If you don't have all three in-house, acknowledge the gap and show active recruiting plans. Pretending a missing pillar doesn't exist is worse than naming it.
How DOD Components Score SBIR Applications: Topic Alignment Is King
DOD SBIR is structurally different from civilian agency SBIR. You don't propose your own research question -- you respond to a specific solicitation topic published by a DOD component. Topic alignment is the primary scoring factor.
Key DOD Components and Their Formats
| Component | Format | Key Differentiator |
|---|---|---|
| Standard DOD SBIR (Navy, Army, SOCOM, DEVCOM, DLA) | Topic-based proposals | Respond to explicit topic numbers with defined requirements |
| AFWERX | Pitch competition | Concise presentations, not traditional proposals |
| DARPA | BAA-specific | Respond to Broad Agency Announcements; Proposers Day attendance strongly recommended |
| DIU | Commercial solutions | Requires existing product at TRL 4+; NOT for early-stage R&D |
DOD vs. Civilian Agency Differences
DOD SBIR awards are contracts, not grants. This changes the accountability structure -- you have deliverables and milestones defined by the solicitation, not self-defined research aims.
IP and patent protection matter more at DOD than at civilian agencies. Companies without filed patents or IP are at a measurable disadvantage -- DOD evaluators view IP ownership as evidence that you can deliver and protect the technology for government use.
DOD review evaluates your technology against a specific operational need. The question isn't "Is this scientifically innovative?" (NIH) or "Is this 10x better?" (ARPA-H) -- it's "Does this solve the problem we defined in the solicitation topic?"
What Good Looks Like at DOD
Topic alignment: Your proposal directly addresses every requirement listed in the solicitation topic. DOD topics are specific -- "develop a lightweight sensor for X environment" -- and reviewers evaluate how precisely you respond. A brilliant technology that doesn't match the topic gets rejected regardless of quality.
Operational context: You demonstrate understanding of the operational environment where your technology will be deployed. Using military/defense terminology correctly signals that you understand the end user.
Prior defense experience: Companies with prior DOD SBIR awards, CRADA agreements, or partnerships with defense research labs have a measurable edge. If you don't have prior experience, a strong letter of intent from a defense end-user helps close the credibility gap.
Common DOD Decline Patterns
- Topic misalignment -- the proposal addresses a related but different problem than the solicitation topic specifies
- No operational context -- the technology is described in commercial terms without connecting to the defense use case
- Missing IP strategy -- no plan for protecting intellectual property or unclear data rights position
- Overly academic framing -- proposal reads like an NIH grant instead of a defense contract response
SBIR Review Criteria by Agency: NIH vs NSF vs ARPA-H vs DOD Side-by-Side
Federal SBIR review criteria vary significantly across agencies. The same technology pitched to NIH, NSF, ARPA-H, and DOD needs four different narratives because each agency evaluates through a different lens. Here's the complete comparison:
| Dimension | NIH | NSF | ARPA-H | DOD |
|---|---|---|---|---|
| Scoring scale | 1-9 (1 = best) | 1-9 (9 = best) | 1-9 (9 = best) | Varies by component |
| # of criteria | 5 | 3 + innovation classification | 5 (weighted) | Topic-dependent |
| Who reviews | Panel of 15-20 scientists | Program Director + expert | Single Program Manager | Technical evaluators |
| Top criterion | Approach | Innovation Classification | Non-Incremental Innovation (25%) | Topic alignment |
| What kills apps | Sequential aim dependencies | Tier C innovation classification | Failing 60-second test | Misaligned to topic |
| Innovation bar | Hypothesis-driven R&D | High-risk/high-reward R&D | 10x improvement required | Solves defined problem |
| Preliminary data | Required (higher than R21, lower than R01) | Less formal; customer signals valued | Proof-of-concept, not pilot study | Varies |
| Phase I award | Up to $314K (per NIH SBIR PA) | $305K (per NSF 23-515) | Varies by program, typically $1M-$5M | Varies by component |
| Review timeline | 4-5 months to summary statement | Varies | Rolling submissions | Solicitation-dependent |
| Decision language | Fundable / Not Competitive | Invite / Decline | Encourage / Discourage | Select / Not Select |
| Language culture | Scientific, hypothesis-driven | R&D-focused, national significance | Plain language, outcome-focused | Operational, mission-focused |
The Key Insight
Each agency optimizes its review process for a different question:
- NIH: "Will this advance scientific knowledge and improve health?"
- NSF: "Is this genuine high-risk/high-reward R&D with national significance?"
- ARPA-H: "Can this solve a health problem in a way that cannot be achieved through conventional approaches?"
- DOD: "Does this solve the specific operational problem we defined?"
The same therapeutic technology might score well at NIH by emphasizing the underlying biological mechanism, get classified as Tier C at NSF because it's an application of known science, receive an "Encourage" at ARPA-H because it shows 10x improvement in patient outcomes, and get passed over at DOD because there's no matching solicitation topic. Understanding which lens each agency uses is the difference between a competitive application and a wasted 80 hours.
Writing the Same Technology for Different Agencies
If you're applying to multiple agencies (which we recommend -- a portfolio approach improves your odds), here's how to adapt your narrative:
For NIH: Lead with scientific significance. Frame your technology as hypothesis-driven research. Quantify the health burden using CDC/WHO data. Structure aims as independent, testable hypotheses -- not a product development roadmap.
For NSF: Lead with your innovation classification. Demonstrate that your R&D is genuinely novel (Tier A or B), not engineering optimization (Tier C). Frame Broader Impacts around specific populations and mechanisms of benefit, not revenue.
For ARPA-H: Lead with the 10x improvement. Write your concept summary so a non-specialist understands it in 60 seconds. Frame impact in patients and lives -- never in market size. Use ARPA-H vocabulary (performer, base period, Go/No-Go).
For DOD: Lead with topic alignment. Show that your technology directly addresses the defined operational need. Emphasize IP protection and prior defense sector experience.
Before You Submit: 5-Point Checklist
- Have you verified which scoring direction the agency uses? (NIH: 1 = best; NSF/ARPA-H: 9 = best)
- Does your application use the agency's vocabulary? (Not NIH language at ARPA-H)
- Have you addressed the agency's top decline pattern?
- Is your innovation framed at the right level for the agency?
- Does your application match the agency's review structure? (Panel vs. PM vs. topic-based)
Frequently Asked Questions About SBIR Review Criteria
Do all agencies use the same scoring scale?
No. NIH uses 1-9 where 1 = Exceptional (best). NSF and ARPA-H use 1-9 where 9 = Exceptional (best). This is one of the most common sources of confusion for founders applying to multiple agencies. If you're used to NIH scoring and see a "2" at NSF, that's near the bottom -- not near the top.
Can I submit the same application to multiple agencies?
Technically, yes -- there's no rule against it. But an application written for NIH reviewers will score poorly at ARPA-H because it uses the wrong language, wrong framing, and wrong structure. Each agency needs a tailored narrative. Budget 20-40 hours per agency-specific adaptation, not 5.
Which agency is easiest to get funded by?
It depends on your technology and stage. NIH Phase I success rates run 20-25% (source: NIH RePORTER data). NSF invitation rates after pitch are competitive. ARPA-H is newer and still establishing patterns. The "easiest" agency is the one where your technology best matches the review criteria -- not the one with the highest success rate.
How long does review take at each agency?
NIH: 4-5 months from submission to summary statement, 9-12 months to award. NSF: varies by program. ARPA-H: rolling submissions with faster turnaround (typically 4-8 weeks to initial response). DOD: tied to solicitation timelines, typically 3-6 months.
What's the biggest mistake founders make with SBIR applications?
Writing one application and submitting it to every agency. Each agency has a different review culture, different criteria, and different deal-breakers. An NIH-style application sent to ARPA-H signals that you don't understand how ARPA-H works -- and that's an immediate credibility hit with the PM reading your submission.
Get Agency-Calibrated Review Before You Submit
Cada's grant writing services include agency-calibrated review simulations that model how your application would score at NIH, NSF, ARPA-H, or DOD. Each simulation uses the actual criteria, scoring rubrics, and reviewer personas for the target agency -- not a generic checklist.
If you're not sure which agency your technology is most competitive for, that's the first question to answer before investing 40-80 hours in an application. We do a free 15-minute assessment call that gives you a straight answer on agency fit. No pitch, no obligation.