Skip to content

Type System

Part of the Data Contracts Technical Specification.

DQX defines its own contract type system designed for simplicity and broad data quality coverage. Rather than exposing raw storage types, the contract type system uses 12 human-readable types — int, float, bool, string, bytes, date, time, timestamp, decimal, list, struct, map — that cover the vast majority of data quality use cases. Each contract type accepts a range of compatible storage representations; the compatibility reference at the end of this document maps each contract type to the storage formats it validates against.

Design Philosophy

The type system prioritizes validation flexibility over exact matching. Types accept compatible variations (e.g., int accepts int8 through int64), require parameters only when semantically necessary (e.g., timezone for timestamp), and default to the simplest form.


Type Reference Table

Type YAML Format Validates Against Notes
Primitive Types
int type: int int8, int16, int32, int64, uint8, uint16, uint32, uint64 Any integer width, signed or unsigned
float type: float float32, float64 float32 and float64 only; float16 is not accepted
bool type: bool bool Boolean exact match
string type: string string, utf8 UTF-8 text; also accepts large_string, large_utf8
bytes type: bytes binary, large_binary Binary data
Temporal Types
date type: date date32, date64 Any date representation
time type: time time32(s/ms), time64(us/ns) Any time representation
timestamp type: timestamp or type: {kind: timestamp} or type: {kind: timestamp, tz: "..."} timestamp(any unit, any tz) Simple form and object form without tz are timezone-naive (tz=None); explicit tz enforces exact match
Decimal Type
decimal type: decimal decimal128(any), decimal256(any) Any precision/scale
Complex Types
list type: {kind: list, value_type: T} list<T> Recursive validation of element type; also accepts large_list
struct type: {kind: struct, fields: [...]} struct<fields> Recursive validation of field structure
map type: {kind: map, key_type: K, value_type: V} map<K, V> Recursive validation of key/value types

Primitive Types

Five primitive types validate flexibly:

# Integer - accepts any width (int8-int64, uint8-uint64)
- name: user_id
  type: int
  nullable: false
  description: "User ID"

# Float - accepts float32 or float64
- name: price
  type: float
  description: "Product price"

# Boolean
- name: is_active
  type: bool
  nullable: false
  description: "Active flag"

# String (UTF-8 text)
- name: name
  type: string
  nullable: false
  description: "User name"

# Bytes (binary data)
- name: thumbnail
  type: bytes
  nullable: true
  description: "Image thumbnail bytes"

The validator checks semantic type, not storage format.


Temporal Types

Store timestamps in UTC for consistency. The type system offers three levels of timezone strictness.

# Date - accepts date32 or date64
- name: birth_date
  type: date
  nullable: false
  description: "Date of birth"

# Timestamp - three ways to use
# 1. Simple form (flexible - accepts any timezone or no timezone)
- name: event_time
  type: timestamp
  description: "Event timestamp"

# 2. Complex form with timezone-naive default
- name: created_at
  type:
    kind: timestamp
    # tz defaults to None (timezone-naive) when omitted
  nullable: false
  description: "Creation timestamp (timezone-naive)"

# 3. Complex form with explicit timezone
- name: created_at_ny
  type:
    kind: timestamp
    tz: "America/New_York"
  nullable: false
  description: "Creation timestamp in New York time"

# Time - accepts time32 or time64 with any unit
- name: daily_event_time
  type: time
  nullable: false
  description: "Time of day when event occurred"

Use simple form or object form without tz for timezone-naive timestamps; use explicit tz (e.g. "UTC") when the timezone must match exactly.


Decimal Type

# Decimal - accepts any precision/scale
- name: amount
  type: decimal
  nullable: false
  description: "Transaction amount"

# Validates: decimal128(10, 2), decimal128(18, 4), decimal256(38, 6), etc.

The validator confirms the column is a decimal type regardless of precision or scale. Future versions may support precision constraints for strict financial validation.


Complex Types

List Type

# Simple list (primitive element type)
- name: tags
  type:
    kind: list
    value_type: string
  nullable: true
  description: "Product tags"

# List with integer elements (any integer width accepted)
- name: item_ids
  type:
    kind: list
    value_type: int
  nullable: false
  description: "Item IDs"

# List with complex element type (struct)
- name: events
  type:
    kind: list
    value_type:
      kind: struct
      fields:
        - name: timestamp
          type:
            kind: timestamp
          description: "Event timestamp"

        - name: event_type
          type: string
          description: "Event type"

        - name: value
          type: float
          description: "Event value"
  nullable: false
  description: "Event history"

Struct Type

# Simple struct (flat)
- name: location
  type:
    kind: struct
    fields:
      - name: latitude
        type: float
        description: "Latitude coordinate"

      - name: longitude
        type: float
        description: "Longitude coordinate"

      - name: label
        type: string
        description: "Location label"
  nullable: true
  description: "Geographic location"

# Nested struct
- name: address
  type:
    kind: struct
    fields:
      - name: street
        type: string
        description: "Street address"

      - name: city
        type: string
        description: "City name"

      - name: coordinates
        type:
          kind: struct
          fields:
            - name: lat
              type: float
              description: "Latitude"

            - name: lon
              type: float
              description: "Longitude"
        description: "GPS coordinates"
  nullable: false
  description: "Complete address"

Note: Nested struct and list element fields have no nullable flag — nullability is only enforced at the top-level column. Schema validation enforces nullability at the top-level column only; nested field nullability is not validated.

Map Type

# Simple map (string keys and values)
- name: properties
  type:
    kind: map
    key_type: string
    value_type: string
  nullable: true
  description: "Custom properties"

# Map with integer keys (any integer type accepted)
- name: item_quantities
  type:
    kind: map
    key_type: int
    value_type: int
  nullable: false
  description: "Item ID to quantity mapping"

# Map with complex value type (struct)
- name: metrics
  type:
    kind: map
    key_type: string
    value_type:
      kind: struct
      fields:
        - name: value
          type: float
          description: "Metric value"

        - name: unit
          type: string
          description: "Unit of measurement"
  nullable: false
  description: "Performance metrics"

Complex types nest arbitrarily (e.g., list of structs with maps). All types validate recursively.


Type Mismatch Errors

When a column's actual type does not match the contract, the validator raises a SchemaValidationError describing the column, the expected type, and the actual type. The following example shows a contract that declares user_id as int but receives a string column:

Contract declares: type: int
Actual column type: pa.string()
Error: SchemaValidationError: Column 'user_id' type mismatch: expected int (int8-int64, uint8-uint64), got string

The error message names the column, states the full set of accepted physical types, and identifies the actual type found. This lets engineers locate the source of the mismatch without inspecting the schema manually.


Compatibility Reference

Integer Type Compatibility

Contract type int validates against:

  • pa.int8() — 8-bit signed integer (-128 to 127)
  • pa.int16() — 16-bit signed integer (-32,768 to 32,767)
  • pa.int32() — 32-bit signed integer (-2^31 to 2^31-1)
  • pa.int64() — 64-bit signed integer (-2^63 to 2^63-1)
  • pa.uint8() — 8-bit unsigned integer (0 to 255)
  • pa.uint16() — 16-bit unsigned integer (0 to 65,535)
  • pa.uint32() — 32-bit unsigned integer (0 to 2^32-1)
  • pa.uint64() — 64-bit unsigned integer (0 to 2^64-1)

Float Type Compatibility

Contract type float validates against:

  • pa.float32() — 32-bit single precision (IEEE 754)
  • pa.float64() — 64-bit double precision (IEEE 754)

Contract type float does not validate against pa.float16().

Date Type Compatibility

Contract type date validates against:

  • pa.date32() — 32-bit signed integer, days since UNIX epoch
  • pa.date64() — 64-bit signed integer, milliseconds since UNIX epoch

Time Type Compatibility

Contract type time validates against:

  • pa.time32('s') — 32-bit signed integer, seconds since midnight
  • pa.time32('ms') — 32-bit signed integer, milliseconds since midnight
  • pa.time64('us') — 64-bit signed integer, microseconds since midnight
  • pa.time64('ns') — 64-bit signed integer, nanoseconds since midnight

Timestamp Type Compatibility

Simple form (type: timestamp):

  • Validates against any pa.timestamp(unit, tz) regardless of unit or timezone

Complex form (type: {kind: timestamp} or type: {kind: timestamp, tz: "UTC"}):

  • Validates unit flexibility (accepts s, ms, us, ns)
  • object form without tz is timezone-naive (tz=None); explicit tz validates timezone matches

Complex form with explicit timezone (type: {kind: timestamp, tz: "America/New_York"}):

  • Validates unit flexibility (accepts s, ms, us, ns)
  • Validates timezone exactly matches specified value

Decimal Type Compatibility

Contract type decimal validates against:

  • pa.decimal128(precision, scale) — Any precision/scale combination
  • pa.decimal256(precision, scale) — Any precision/scale combination

String Type Compatibility

Contract type string validates against:

  • pa.string() — UTF-8 encoded variable-length string
  • pa.utf8() — alias for string
  • pa.large_string() — large UTF-8 string (64-bit offsets, common in DuckDB)
  • pa.large_utf8() — alias for large_string

List Type Compatibility

Contract type list validates against:

  • pa.list_(value_type) — standard list with 32-bit offsets
  • pa.large_list(value_type) — large list with 64-bit offsets