Design decisions¶
This section discusses the core implementation of visions
.
This view can guide intuition of:
performance and complexity of operations
It is limited for:
understanding abstract concepts
motivation for representations
Short circuiting¶
TODO
Sampling¶
TODO
Memory usage¶
TODO
Operations are designed to be idempotent (i.e. do not have side-effects). This may impact the performance of your program when you use large DataFrames, as a copy is made.
Dtypes¶
Staying close to pandas’ data types, we can use the dtypes for type detection. Complexity O(1) instead of O(n).
Constraint checking in tests¶
Constraint of mutual exclusivity is not checked on runtime, rather during testing.
Nullable types¶
All types are nullable by default. TODO: why (refer to goal)
Why don’t we use OOP inheritance?¶
You might wonder why for example Image
class does not inherit from File
class.
The short answer is, we tried, in order to support our use cases inheritance ultimately only added complexity to the solution.
Within the current abstraction, each type inherits from a base type, class inheritance from relations.
When you think how class inheritance would be beneficial is here, is where it reduces complexity. TODO The End Of Object Inheritance & The Beginning Of A New Modularity
Note
The choice of not using OOP inheritance limits the use of build-in type hints that rely on covariance and contravariance. Read more in PEP 484.
Sampling in inference¶
TODO
Why are relations defined on the type?¶
The short answer is extendability.
Recall, relations define mappings to a type, so, given two types A
and B
with a relation from B -> A
,
that relationship is defined on A
. Defining relationships in this way actually decouples types from each other.
This allows us to dynamically construct a relation graph based only on the types included in the typeset without
modifying any type specific logic.
Missing value bitmaps¶
Pandas upcasts certain types when adding missing values, unnecessarily increasing physical storage size.
This behaviour occurs for booleans and integers.
Pandas itself offers nullable integers.
We implement nullable types as missing value bitmaps, in the same way pandas’ nullable integers work.
For each value, we keep a 1 bit per value that specifies whether a value is null or not.
We use the contention that NaN
is used when the type represents numbers, None
otherwise.
More information can be found here: pandas 2.0 design document