Skip to main content

Proposal: Handling Special Values in Infr

Status: Draft / Future Consideration Date: 2026-03-21

R's Special Values

R has a rich set of special values, each with distinct semantics:

ValueR TypeMeaning
NAlogicalGeneric missing value
NA_real_doubleMissing numeric
NA_integer_integerMissing integer
NA_character_characterMissing character
NA_complex_complexMissing complex
NaNdoubleNot a Number (e.g., 0/0)
Inf / -InfdoublePositive/negative infinity
NULLNULLAbsence of a value

Current State in Infr

All special values are fully supported as literals — they are lexed, parsed, type-inferred, and transpiled correctly. The type checker infers them to their base types:

  • NA, NA_real_numeric
  • NA_integer_integer
  • NA_character_character
  • NA_complex_complex
  • NaN, Infnumeric
  • NULLnull

NULL has first-class type support: nullable types (T? as shorthand for T | NULL) and is.null() type narrowing in conditionals both work today.

NA, NaN, and Inf have no special type-level representation — they are simply members of their base type.

Precedent: TypeScript and JavaScript

TypeScript faced a similar problem with JavaScript's null and undefined. The parallels are instructive:

RJavaScriptTypeScript Solution
NULLnullT | null union type
NAundefined (loosely)T | undefined union type
is.null()=== nullControl-flow narrowing
is.na()=== undefinedControl-flow narrowing
NaN, InfNaN, InfinityNo special type treatment

What TypeScript did

  1. strictNullChecks (TS 2.0, 2016) — Before this flag, null and undefined were assignable to every type. With it on, they became distinct types requiring explicit unions (string | null). This is widely considered TypeScript's most impactful strictness flag.

  2. Control-flow narrowingif (x !== null) narrows string | nullstring. No new syntax was needed; TypeScript recognized existing JS idioms as type guards.

  3. NaN and Infinity got no special types — They remain number. Number.isNaN() returns boolean but does not narrow. TypeScript decided the complexity wasn't worth it.

Key difference: NA is harder than undefined

TypeScript's approach maps well onto Infr's NULL handling (already implemented). But NA is a fundamentally different beast:

  • undefined doesn't propagate1 + undefined produces NaN (a different value/type).
  • NA propagates silently1 + NA produces NA (same type, infectious).
  • undefined is a property of the variable — a variable is either undefined or it isn't.
  • NA is a property of individual vector elements — a numeric vector can contain a mix of real values and NAs. This is an element-level concept with no JS equivalent.

This means full NA tracking is genuinely novel territory beyond what TypeScript attempted.

Proposal: Phased Approach

Phase 1: Document Current Behavior + na.rm Lint

Effort: Low | Value: Medium | Priority: Do now

No type-system changes. Two concrete deliverables:

  1. Spec update — Add a "Special Values" section to infr-spec.md formalizing that NA, NaN, Inf are valid literals inferred to their base types.

  2. na.rm diagnostic — In strict mode, warn when calling aggregate functions (mean(), sum(), max(), min(), etc.) without na.rm = TRUE on data that hasn't been explicitly filtered. This is a simple heuristic lint — no new types needed — and it catches one of the most common classes of R bugs.

Phase 2A: is.na() / is.nan() Type Narrowing

Effort: Medium | Value: Medium | Priority: Next

Extend the existing narrowing infrastructure (which handles is.null() today) to recognize is.na(), is.nan(), is.finite(), and is.infinite() as type guards. Initially these would serve as documentation and intent markers without changing inferred types, since NA isn't a separate type yet.

if (!is.na(x)) {
# Checker recognizes this branch as NA-safe
}

This lays the groundwork for Phase 2B by establishing the control-flow patterns.

Phase 2B: strictNaChecks — Explicit NA Types

Effort: High | Value: High | Priority: Design carefully, implement later

Introduce an na type modifier, analogous to how NULL works today. Under a strictNaChecks flag (likely a strictness level in infr.toml):

  • NA becomes its own type rather than collapsing into numeric.
  • Vectors that might contain NA are typed as na | numeric (paralleling T | NULL).
  • is.na() narrows na | numericnumeric in the false branch.
  • c(1, NA, 3) infers na | numeric instead of numeric.
x: na | numeric <- c(1, NA, 3)

if (!is.na(x)) {
x + 1 # OK: x is numeric here
}

x + 1 # Warning: x might be NA

Open design questions:

  • Element-level vs variable-level: In R, NA is a property of individual vector elements, not the variable. Should na | numeric mean "this vector contains at least one NA" or "this scalar might be NA"? The former is more accurate but much harder to track.
  • Propagation rules: 1 + NANA. Should the checker model this? It would require tracking NA-ness through every arithmetic/comparison operator.
  • Ergonomics: If every unfiltered data-frame column becomes na | T, annotation burden could be heavy. TypeScript mitigated this with ! (non-null assertion); Infr might need something similar for NA.
  • Interaction with na.rm: Functions like mean(x, na.rm = TRUE) should strip na from their return type. This requires literal-value overload resolution (dispatching on na.rm = TRUE vs FALSE).

Phase 3: NA-Aware Function Signatures

Effort: Very High | Value: High | Priority: Future / experimental

If Phase 2B is implemented, declaration files could express NA behavior through overloads:

mean(x: na | numeric, na.rm: FALSE) -> na | numeric
mean(x: na | numeric, na.rm: TRUE) -> numeric
mean(x: numeric) -> numeric

This would enable the checker to trace NA-ness through data pipelines and warn only where it matters — a powerful capability, but one that requires significant investment in overload resolution and propagation tracking.

Summary

PhaseEffortValueDepends On
Phase 1: Spec + na.rm lintLowMediumNothing
Phase 2A: Narrowing as intent markersMediumMediumPhase 1
Phase 2B: strictNaChecks with na typeHighHighPhase 2A
Phase 3: NA-aware function signaturesVery HighHighPhase 2B

The key insight from TypeScript's experience: strictNullChecks was transformative but took years to design, ship, and migrate the ecosystem. Infr's NULL handling already mirrors it. strictNaChecks would be breaking genuinely new ground — worth pursuing, but deserving of careful, incremental design.