API

Public

class pydiverse.colspec.ColSpec[source]

Base class for all column specifications.

The base classes here are just for code completion support when working with Collection objects that store actual data or table references. They are removed at runtime by a metaclass.

class pydiverse.colspec.Collection[source]

Base class for all collections of tables with a predefined column specification.

A collection is comprised of a set of members which are collectively “consistent”, meaning they the collection ensures that invariants are held up across members. This is different to dataframely schemas which only ensure invariants within individual members.

In order to properly ensure that invariants hold up across members, members must have a “common primary key”, i.e. there must be an overlap of at least one primary key column across all members. Consequently, a collection is typically used to represent “semantic objects” which cannot be represented in a single table due to 1-N relationships that are managed in separate tables.

A collection must only have type annotations for ColSpec with known column specification:

Besides, it may define filters (c.f. filter()) and arbitrary methods.

A colspec.Collection can also be instantiated and filled with pydiverse transform Table, pipedag Table objects, or pipedag task outputs which reference a table. This yields quite intuitive syntax:

c = MyCollection.build()
c.first_member = pipdag_task1()
c.second_member = pipdag_task2()
pipdag_task3(c)

Attention

Do NOT use this class in combination with from __future__ import annotations as it requires the proper schema definitions to ensure that the collection is implemented correctly.

class pydiverse.colspec.Filter(logic_fn: collections.abc.Callable[[C], pydiverse.colspec.optional_dependency.ColExpr])[source]

Internal class representing logic for filtering members of a collection.

class pydiverse.colspec.Rule(expr: pydiverse.colspec.optional_dependency.ColExpr)[source]

Internal class representing validation rules.

class pydiverse.colspec.Column(*, nullable: bool = True, primary_key: bool = False, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

Abstract base class for data frame column definitions.

This class is merely supposed to be used in ColSpec definitions.

class pydiverse.colspec.Any(*, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, pydiverse.colspec.columns.any.Any] | None = None)[source]

A column that can contain any type.

class pydiverse.colspec.Bool(*, nullable: bool = True, primary_key: bool = False, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

A column of booleans.

class pydiverse.colspec.Date(*, nullable: bool = True, primary_key: bool = False, min: datetime.date | None = None, min_exclusive: datetime.date | None = None, max: datetime.date | None = None, max_exclusive: datetime.date | None = None, resolution: str | None = None, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

A column of dates (without time).

class pydiverse.colspec.Datetime(*, nullable: bool = True, primary_key: bool = False, min: datetime.datetime | None = None, min_exclusive: datetime.datetime | None = None, max: datetime.datetime | None = None, max_exclusive: datetime.datetime | None = None, resolution: str | None = None, time_zone: str | datetime.tzinfo | None = None, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

A column of datetimes.

class pydiverse.colspec.Decimal(precision: int | None = None, scale: int | None = None, *, nullable: bool = True, primary_key: bool = False, min: decimal.Decimal | float | int | None = None, min_exclusive: decimal.Decimal | float | int | None = None, max: decimal.Decimal | float | int | None = None, max_exclusive: decimal.Decimal | float | int | None = None, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

A column of decimal values with given precision and scale.

class pydiverse.colspec.Duration(*, nullable: bool = True, primary_key: bool = False, min: datetime.timedelta | None = None, min_exclusive: datetime.timedelta | None = None, max: datetime.timedelta | None = None, max_exclusive: datetime.timedelta | None = None, resolution: str | None = None, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

A column of durations.

class pydiverse.colspec.Enum(categories: collections.abc.Sequence[str], *, nullable: bool = True, primary_key: bool = False, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

A column of enum (string) values.

class pydiverse.colspec.Time(*, nullable: bool = True, primary_key: bool = False, min: datetime.time | None = None, min_exclusive: datetime.time | None = None, max: datetime.time | None = None, max_exclusive: datetime.time | None = None, resolution: str | None = None, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, metadata: dict[str, Any] | None = None)[source]

A column of times (without date).

class pydiverse.colspec.Float(*, nullable: bool = True, primary_key: bool = False, min: float | None = None, min_exclusive: float | None = None, max: float | None = None, max_exclusive: float | None = None, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

A column of floating-point numbers.

class pydiverse.colspec.Float32(*, nullable: bool = True, primary_key: bool = False, min: float | None = None, min_exclusive: float | None = None, max: float | None = None, max_exclusive: float | None = None, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

A column of 32-bit floating-point numbers.

class pydiverse.colspec.Float64(*, nullable: bool = True, primary_key: bool = False, min: float | None = None, min_exclusive: float | None = None, max: float | None = None, max_exclusive: float | None = None, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

A column of 64-bit floating-point numbers.

class pydiverse.colspec.Int8(*, nullable: bool = True, primary_key: bool = False, min: int | None = None, min_exclusive: int | None = None, max: int | None = None, max_exclusive: int | None = None, is_in: collections.abc.Sequence[int] | None = None, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

A column of int8 values.

class pydiverse.colspec.Int16(*, nullable: bool = True, primary_key: bool = False, min: int | None = None, min_exclusive: int | None = None, max: int | None = None, max_exclusive: int | None = None, is_in: collections.abc.Sequence[int] | None = None, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

A column of int16 values.

class pydiverse.colspec.Int32(*, nullable: bool = True, primary_key: bool = False, min: int | None = None, min_exclusive: int | None = None, max: int | None = None, max_exclusive: int | None = None, is_in: collections.abc.Sequence[int] | None = None, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

A column of int32 values.

class pydiverse.colspec.Int64(*, nullable: bool = True, primary_key: bool = False, min: int | None = None, min_exclusive: int | None = None, max: int | None = None, max_exclusive: int | None = None, is_in: collections.abc.Sequence[int] | None = None, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

A column of int64 values.

class pydiverse.colspec.Integer(*, nullable: bool = True, primary_key: bool = False, min: int | None = None, min_exclusive: int | None = None, max: int | None = None, max_exclusive: int | None = None, is_in: collections.abc.Sequence[int] | None = None, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

A column of integers (with any number of bytes).

class pydiverse.colspec.UInt8(*, nullable: bool = True, primary_key: bool = False, min: int | None = None, min_exclusive: int | None = None, max: int | None = None, max_exclusive: int | None = None, is_in: collections.abc.Sequence[int] | None = None, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

A column of uint8 values.

class pydiverse.colspec.UInt16(*, nullable: bool = True, primary_key: bool = False, min: int | None = None, min_exclusive: int | None = None, max: int | None = None, max_exclusive: int | None = None, is_in: collections.abc.Sequence[int] | None = None, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

A column of uint16 values.

class pydiverse.colspec.UInt32(*, nullable: bool = True, primary_key: bool = False, min: int | None = None, min_exclusive: int | None = None, max: int | None = None, max_exclusive: int | None = None, is_in: collections.abc.Sequence[int] | None = None, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

A column of uint32 values.

class pydiverse.colspec.UInt64(*, nullable: bool = True, primary_key: bool = False, min: int | None = None, min_exclusive: int | None = None, max: int | None = None, max_exclusive: int | None = None, is_in: collections.abc.Sequence[int] | None = None, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

A column of uint64 values.

class pydiverse.colspec.String(*, nullable: bool = True, primary_key: bool = False, min_length: int | None = None, max_length: int | None = None, regex: str | None = None, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

A column of strings.

class pydiverse.colspec.List(inner: pydiverse.colspec.columns._base.Column, *, nullable: bool = True, primary_key: bool = False, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, min_length: int | None = None, max_length: int | None = None, metadata: dict[str, Any] | None = None)[source]

A list column.

class pydiverse.colspec.Struct(inner: dict[str, pydiverse.colspec.columns._base.Column], *, nullable: bool = True, primary_key: bool = False, check: collections.abc.Callable[[pydiverse.colspec.optional_dependency.ColExpr], pydiverse.colspec.optional_dependency.ColExpr] | None = None, alias: str | None = None, metadata: dict[str, Any] | None = None)[source]

A struct column.