跳转至

pydantic 数据校验(v2)

本文内容适用于 pydantic v2.x

特性

  • 详尽的数据校验方式,基于Python的语言type hint机制
  • 支持序列化反序列化为JSONdict对象
  • 高性能,核心校验由Rust语言实现
  • 生态丰富(FastAPI, LnagChain和Polars库均采用了pydantic)

安装

基础组件

Bash
pip install pydantic

email 组件

Bash
pip install pydantic[email]

settings组件

Bash
pip install pydantic-settings

基础校验

基础校验主要依赖于 pydantic.BaseModel 类。在下面的示例中,类Employee中的字段由其类型确定,不能指定成其他类型。

Python
from datetime import date
from uuid import UUID, uuid4
from enum import Enum
from pydantic import BaseModel, EmailStr

class Department(Enum):
    HR = "HR"
    IT = "IT"

class Employee(BaseModel):
    employee_id: UUID = uuid4()  # UUID类型,如果不提供就自动生成
    name: str                    # 字符串
    email: EmailStr              # 邮箱类型
    date_of_birth: date          # 日期类型
    salary: float                # 浮点数
    department: Department       # 枚举类型
    elected_benefits: bool       # 布尔类型

如果想正确地实例化Employee对象,需要指定字段类型。示例

Python
1
2
3
4
5
6
7
8
9
# good
emp = Employee(
    name="John Doe",
    email="5hQp2@example.com",
    date_of_birth=date(1990, 1, 1),
    salary=100000.0,
    department=Department.IT,
    elected_benefits=True
)

序列化/反序列化

对于上面的emp实例化对象,我们可以使用下面的方法/函数进行序列化和反序列化。

Python
1
2
3
4
5
6
7
8
9
# to dict
emp_dict = emp.model_dump()
# from dict
emp2 = Employee.model_validate(emp_dict)

# to json
emp_json = emp.model_dump_json()
# from json
emp3 = Employee.model_validate_json(emp_json)

高级校验 Field

在基于基础的type hint校验数据时,有时无法满足需求,比如上面的salary字段需要是float类型,且必须为正数。由此我们需要额为引入pydantic.Field来增加一部分功能校验

Python
from datetime import date
from uuid import UUID, uuid4
from enum import Enum
from pydantic import BaseModel, EmailStr, Field

class Department(Enum):
    HR = "HR"
    IT = "IT"

class Employee(BaseModel):
    employee_id: UUID = Field(default_factory=uuid4, frozen=True)
    name: str = Field(min_length=1, frozen=True)
    email: EmailStr = Field(pattern=r".+@example\.com$")
    date_of_birth: date = Field(alias="birth_date", repr=False, frozen=True)
    salary: float = Field(alias="compensation", gt=0, repr=False)
    department: Department
    elected_benefits: bool

注意:

  • default_factory=uuid4,自动生成字段值,并利用uuid4函数生成
  • frozen=True,禁止修改字段值
  • min_length=1,字段长度必须大于等于1
  • pattern=r".+@example\.com$,引入正则表达式校验
  • repr=False,在直接查看Employee实例时不显示字段值
  • alias="birth_date",当序列化/反序列化时将字段值重命名为birth_date,(注意,在序列化时如果指定model_dump(by_alias=True),则会自动将json中的字段值重命名为alias值;如果需要将dict不使用alias值直接转成model,则使用MyModel.model_construct(**dict_data)方法)
  • gt=0,字段值必须大于等于0

高级校验 Validators

单独校验某个字段

在下面的例子中,date_of_birth字段的值必须大于18岁,否则抛出异常

Python
from datetime import date
from uuid import UUID, uuid4
from enum import Enum
from pydantic import BaseModel, EmailStr, Field, field_validator

class Department(Enum):
    HR = "HR"
    IT = "IT"

class Employee(BaseModel):
    employee_id: UUID = Field(default_factory=uuid4, frozen=True)
    name: str = Field(min_length=1, frozen=True)
    email: EmailStr = Field(pattern=r".+@example\.com$")
    date_of_birth: date = Field(alias="birth_date", repr=False, frozen=True)
    salary: float = Field(alias="compensation", gt=0, repr=False)
    department: Department
    elected_benefits: bool

    @field_validator("date_of_birth")
    @classmethod
    def check_valid_age(cls, date_of_birth: date) -> date:
        today = date.today()
        eighteen_years_ago = date(today.year - 18, today.month, today.day)

        if date_of_birth > eighteen_years_ago:
            raise ValueError("Employees must be at least 18 years old.")

        return date_of_birth

联合校验多个字段

在下面的例子中,IT部门的员工,工资必须大于10000,否则抛出异常

Python
from typing import Self
from datetime import date
from uuid import UUID, uuid4
from enum import Enum
from pydantic import (
    BaseModel,
    EmailStr,
    Field,
    field_validator,
    model_validator,
)

class Department(Enum):
    HR = "HR"
    SALES = "SALES"
    IT = "IT"
    ENGINEERING = "ENGINEERING"

class Employee(BaseModel):
    employee_id: UUID = Field(default_factory=uuid4, frozen=True)
    name: str = Field(min_length=1, frozen=True)
    email: EmailStr = Field(pattern=r".+@example\.com$")
    date_of_birth: date = Field(alias="birth_date", repr=False, frozen=True)
    salary: float = Field(alias="compensation", gt=0, repr=False)
    department: Department
    elected_benefits: bool

    @field_validator("date_of_birth")
    @classmethod
    def check_valid_age(cls, date_of_birth: date) -> date:
        today = date.today()
        eighteen_years_ago = date(today.year - 18, today.month, today.day)

        if date_of_birth > eighteen_years_ago:
            raise ValueError("Employees must be at least 18 years old.")

        return date_of_birth

    @model_validator(mode="after")
    def check_it_benefits(self) -> Self:

        if self.department == Department.IT and self.salary < 10000.0:
            raise ValueError("IT employees must earn at least $10,000.")
        return self

高级校验 validate_call

validate_call用于对函数/方法进行强制校验,相比于Python自带的type hint,validate_call在函数的入参不符合类型时,将抛出异常

Python
import time
from typing import Annotated
from pydantic import PositiveFloat, Field, EmailStr, validate_call

@validate_call
def send_invoice(
    client_name: Annotated[str, Field(min_length=1)],
    client_email: EmailStr,
    items_purchased: list[str],
    amount_owed: PositiveFloat,
) -> str:

    email_str = f"""
    Dear {client_name}, \n
    Thank you for choosing xyz inc! You
    owe ${amount_owed:,.2f} for the following items: \n
    {items_purchased}
    """

    print(f"Sending email to {client_email}...")
    time.sleep(2)

    return email_str

管理配置

此功能多用于对用户的配置进行校验,尤其是经常需要变更的环境变量配置(更适用于在Docker环境的环境变量校验)

直接读取环境变量

首先在应用中定义配置类型AppConfig

Python
1
2
3
4
5
6
7
8
from pydantic import HttpUrl, Field
from pydantic_settings import BaseSettings

class AppConfig(BaseSettings):
    database_host: HttpUrl
    database_user: str = Field(min_length=5)
    database_password: str = Field(min_length=10)
    api_key: str = Field(min_length=20)

然后在系统中尝试导入环境变量,下面是一组实例的错误的环境变量

Bash
1
2
3
export DATABASE_USER="usee"
export DATABASE_PASSWORD="asdf"
export API_KEY="ajf"

在App启动时,进行配置校验,会抛出异常

Python
1
2
3
from settings_management import AppConfig

AppConfig()

从文件读取环境变量

Python
from pydantic import HttpUrl, Field
from pydantic_settings import BaseSettings, SettingsConfigDict

class AppConfig(BaseSettings):
    model_config = SettingsConfigDict(
        env_file=".env",            # 从.env文件中读取环境变量
        case_sensitive=True,        # 大小写敏感
        extra="forbid",             # 禁止出现额外字段
        env_file_encoding="utf-8",
    )

    database_host: HttpUrl
    database_user: str = Field(min_length=5)
    database_password: str = Field(min_length=10)
    api_key: str = Field(min_length=20)

参考