# Data Modelling in Python: Required, Optional, and Default with Pydantic, Dataclass

### TLDR: Create Python Data Classes Options

* `Plain Class:` Full manual control, no auto-validation.
* `@Dataclass`: Boilerplate-free data container; relies on *static* type checks; `Optional` fields still require a default for instantiation.
* `Pydantic BaseModel`: Built for *runtime* validation, parsing, and serialization; `Optional` fields are automatically handled as truly optional for input.

**Choose**: Plain for total control; `@Dataclass` for simple, static-checked data; `Pydantic` for robust, *runtime*-validated data (especially from external sources).

### Introduction

When structuring data in Python, developers frequently encounter the challenge of defining which fields are mandatory, which are optional, and how to assign default values. Python offers several tools for this:`Pydantic`, the `@dataclasses` , and plain Python classes.

At its core, the distinction lies in whether a tool primarily focuses on static analysis (checking code *before* it runs), runtime validation (checking data *as* it runs), or simply providing boilerplate reduction.

### The Plain Python Class: Unadorned Control

A plain Python class, with its explicit `__init__` method, offers the most direct and granular control. ***Requiredness is dictated purely by whether an argument is present in the `__init__` signature without a default value. If a parameter lacks a default, it must be provided during instantiation***, or a `TypeError` will be raised. Default values are assigned directly within the `__init__` method's parameters.

*Absence of  runtime type validation:* While type hints can be added to plain classes (e.g., `def __init__(self, id: str):`), Python itself will not enforce these hints at runtime. Passing an integer to a `str`-hinted field will typically succeed until an operation on that attribute fails due to its unexpected type. For robust data validation, developers must write explicit checks within the `__init__` or other methods.

### The @Dataclass: Streamlined Data Containers

The `@dataclass` decorator, part of Python's standard library, simplifies the creation of classes  intended for holding data. It *intelligently generates common "boilerplate" methods* like `__init__`, `__repr__`, and `__eq__` based on the class's type-hinted attributes.

In dataclasses, the concept of "required" fields for instantiation follows a straightforward rule: ***any attribute that does not have a default value in the class definition becomes a required parameter for the generated `__init__` method***. If such a field is omitted, a `TypeError` will occur.

For example, even if a field is defined as `Optional[str]` , if there is no default value assigned, the field is still required at input.

*Absence of  runtime type validation:* similar to plain class, `@dataclass` annotated class does not enforce runtime type checks. A `str` can be still passed to a `int` field.

### Pydantic: Runtime Validation and Intelligent Parsing

Pydantic is an external library built upon Python's type hints to provide sophisticated runtime data validation, parsing, and serialization. It supports runtime data type validation, and correctly parse `Optional` hints. The required rule for Pydantic is clean:

1. If there is a default value, the field is optional. It can be omitted from input and takes on default value at instantiation.
2. If there is an `Optional` hint, the field is optional, effectively with a default value None.
3. Runtime type checks are performed and will raise `ValidationError`.

*Runtime type validation:* Pydantic force runtime type checks with data coercion. So `abc` cannot be passed into an `int` field, while `"123"` can be coerced into `123` as `int`.

#### Key Differences Table

| Feature                 | Pydantic Class (`BaseModel`)                                                               | Dataclass (`@dataclass`)                                                                                     | Plain Class (`__init__`)                                 | Dictionary                                   |
| ----------------------- | ------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------- | -------------------------------------------- |
| Primary Use             | Data validation, parsing, serialization, API models                                        | Simple data containers, boilerplate reduction                                                                | General-purpose objects, manual control                  | Ad-hoc key-value storage                     |
| Requiredness Rule       | Attribute without a default value (including `None` for `Optional`) is required for input. | Parameter in `__init__` without a default value is required. `Optional` hint alone doesn't make it optional. | Parameter in `__init__` without a default is required.   | Key must be present to avoid `KeyError`.     |
| Default Values          | Supported. If present, attribute becomes "optional" for input.                             | Supported. If present, parameter becomes optional for `__init__`.                                            | Explicitly set as `__init__` parameter default or later. | Manual with `get()` or `setdefault()`.       |
| Runtime Type Validation | Yes, strong. Raises `ValidationError` if input types don't match hints.                    | No. Type hints are for static checkers. Python will accept incorrect types.                                  | No. Requires manual checks.                              | No. Entirely manual.                         |
| Type Coercion/Parsing   | Yes. Automatically converts compatible types (e.g., "123" to int).                         | No.                                                                                                          | No. Requires manual conversion.                          | No. Requires manual conversion.              |
| Boilerplate             | Low (auto `__init__`, `__repr__`, etc., plus validation).                                  | Very Low (auto `__init__`, `__repr__`, `__eq__`, etc.).                                                      | High (manual `__init__` and other dunder methods).       | N/A                                          |
| Dependencies            | External library (`pydantic`).                                                             | Standard Library.                                                                                            | Standard Library.                                        | Standard Library.                            |
| Immutability            | Optional (`model_config['frozen']=True`).                                                  | Optional (`frozen=True`).                                                                                    | Manual implementation.                                   | Can be implicitly immutable if not modified. |

#### Choosing the Right Tool

The decision between these approaches hinges on your project's needs:

* Plain Python Class: Choose when you need maximum manual control, minimal dependencies, and are prepared to implement all validation logic yourself. Best for highly specialized internal data structures where performance is paramount and data is inherently trusted.
* Dataclass: Ideal for simple, immutable (when `frozen=True`), and well-defined data containers where boilerplate reduction is desired, and static type checking (via tools like MyPy) is sufficient for identifying type issues. Use when runtime validation is not a primary concern, or is handled elsewhere.
* Pydantic: The superior choice for any scenario involving external data, API development, configuration management, or when robust runtime validation, data parsing, and serialization are critical. Its intelligent handling of `Optional` and detailed error reporting makes it invaluable for ensuring data integrity at the application's boundaries.

In many modern Python projects, a common pattern emerges: dataclasses are used for internal, well-controlled data, while Pydantic takes center stage for handling all data ingress and egress, acting as a crucial guardian of data quality.

References:

Gemni
