Proposal: Handling Special Values in Infr
Status: Draft / Future Consideration Date: 2026-03-21
R's Special Values
R has a rich set of special values, each with distinct semantics:
| Value | R Type | Meaning |
|---|---|---|
NA | logical | Generic missing value |
NA_real_ | double | Missing numeric |
NA_integer_ | integer | Missing integer |
NA_character_ | character | Missing character |
NA_complex_ | complex | Missing complex |
NaN | double | Not a Number (e.g., 0/0) |
Inf / -Inf | double | Positive/negative infinity |
NULL | NULL | Absence of a value |
Current State in Infr
All special values are fully supported as literals — they are lexed, parsed, type-inferred, and transpiled correctly. The type checker infers them to their base types:
NA,NA_real_→numericNA_integer_→integerNA_character_→characterNA_complex_→complexNaN,Inf→numericNULL→null
NULL has first-class type support: nullable types (T? as shorthand for T | NULL) and is.null() type narrowing in conditionals both work today.
NA, NaN, and Inf have no special type-level representation — they are simply members of their base type.
Precedent: TypeScript and JavaScript
TypeScript faced a similar problem with JavaScript's null and undefined. The parallels are instructive:
| R | JavaScript | TypeScript Solution |
|---|---|---|
NULL | null | T | null union type |
NA | undefined (loosely) | T | undefined union type |
is.null() | === null | Control-flow narrowing |
is.na() | === undefined | Control-flow narrowing |
NaN, Inf | NaN, Infinity | No special type treatment |
What TypeScript did
-
strictNullChecks(TS 2.0, 2016) — Before this flag,nullandundefinedwere assignable to every type. With it on, they became distinct types requiring explicit unions (string | null). This is widely considered TypeScript's most impactful strictness flag. -
Control-flow narrowing —
if (x !== null)narrowsstring | null→string. No new syntax was needed; TypeScript recognized existing JS idioms as type guards. -
NaNandInfinitygot no special types — They remainnumber.Number.isNaN()returnsbooleanbut does not narrow. TypeScript decided the complexity wasn't worth it.
Key difference: NA is harder than undefined
TypeScript's approach maps well onto Infr's NULL handling (already implemented). But NA is a fundamentally different beast:
undefineddoesn't propagate —1 + undefinedproducesNaN(a different value/type).NApropagates silently —1 + NAproducesNA(same type, infectious).undefinedis a property of the variable — a variable is eitherundefinedor it isn't.NAis a property of individual vector elements — anumericvector can contain a mix of real values andNAs. This is an element-level concept with no JS equivalent.
This means full NA tracking is genuinely novel territory beyond what TypeScript attempted.
Proposal: Phased Approach
Phase 1: Document Current Behavior + na.rm Lint
Effort: Low | Value: Medium | Priority: Do now
No type-system changes. Two concrete deliverables:
-
Spec update — Add a "Special Values" section to
infr-spec.mdformalizing thatNA,NaN,Infare valid literals inferred to their base types. -
na.rmdiagnostic — Instrictmode, warn when calling aggregate functions (mean(),sum(),max(),min(), etc.) withoutna.rm = TRUEon data that hasn't been explicitly filtered. This is a simple heuristic lint — no new types needed — and it catches one of the most common classes of R bugs.
Phase 2A: is.na() / is.nan() Type Narrowing
Effort: Medium | Value: Medium | Priority: Next
Extend the existing narrowing infrastructure (which handles is.null() today) to recognize is.na(), is.nan(), is.finite(), and is.infinite() as type guards. Initially these would serve as documentation and intent markers without changing inferred types, since NA isn't a separate type yet.
if (!is.na(x)) {
# Checker recognizes this branch as NA-safe
}
This lays the groundwork for Phase 2B by establishing the control-flow patterns.
Phase 2B: strictNaChecks — Explicit NA Types
Effort: High | Value: High | Priority: Design carefully, implement later
Introduce an na type modifier, analogous to how NULL works today. Under a strictNaChecks flag (likely a strictness level in infr.toml):
NAbecomes its own type rather than collapsing intonumeric.- Vectors that might contain NA are typed as
na | numeric(parallelingT | NULL). is.na()narrowsna | numeric→numericin the false branch.c(1, NA, 3)infersna | numericinstead ofnumeric.
x: na | numeric <- c(1, NA, 3)
if (!is.na(x)) {
x + 1 # OK: x is numeric here
}
x + 1 # Warning: x might be NA
Open design questions:
- Element-level vs variable-level: In R, NA is a property of individual vector elements, not the variable. Should
na | numericmean "this vector contains at least one NA" or "this scalar might be NA"? The former is more accurate but much harder to track. - Propagation rules:
1 + NA→NA. Should the checker model this? It would require tracking NA-ness through every arithmetic/comparison operator. - Ergonomics: If every unfiltered data-frame column becomes
na | T, annotation burden could be heavy. TypeScript mitigated this with!(non-null assertion); Infr might need something similar for NA. - Interaction with
na.rm: Functions likemean(x, na.rm = TRUE)should stripnafrom their return type. This requires literal-value overload resolution (dispatching onna.rm = TRUEvsFALSE).
Phase 3: NA-Aware Function Signatures
Effort: Very High | Value: High | Priority: Future / experimental
If Phase 2B is implemented, declaration files could express NA behavior through overloads:
mean(x: na | numeric, na.rm: FALSE) -> na | numeric
mean(x: na | numeric, na.rm: TRUE) -> numeric
mean(x: numeric) -> numeric
This would enable the checker to trace NA-ness through data pipelines and warn only where it matters — a powerful capability, but one that requires significant investment in overload resolution and propagation tracking.
Summary
| Phase | Effort | Value | Depends On |
|---|---|---|---|
Phase 1: Spec + na.rm lint | Low | Medium | Nothing |
| Phase 2A: Narrowing as intent markers | Medium | Medium | Phase 1 |
Phase 2B: strictNaChecks with na type | High | High | Phase 2A |
| Phase 3: NA-aware function signatures | Very High | High | Phase 2B |
The key insight from TypeScript's experience: strictNullChecks was transformative but took years to design, ship, and migrate the ecosystem. Infr's NULL handling already mirrors it. strictNaChecks would be breaking genuinely new ground — worth pursuing, but deserving of careful, incremental design.