pydantic 数据校验(v2)
特性
- 详尽的数据校验方式,基于Python的语言type hint机制
- 支持序列化反序列化为
JSON
和dict
对象
- 高性能,核心校验由Rust语言实现
- 生态丰富(FastAPI, LnagChain和Polars库均采用了pydantic)
安装
基础组件
email 组件
Bash |
---|
| pip install pydantic[email]
|
settings组件
Bash |
---|
| pip install pydantic-settings
|
基础校验
基础校验主要依赖于 pydantic.BaseModel
类。在下面的示例中,类Employee
中的字段由其类型确定,不能指定成其他类型。
Python |
---|
| from datetime import date
from uuid import UUID, uuid4
from enum import Enum
from pydantic import BaseModel, EmailStr
class Department(Enum):
HR = "HR"
IT = "IT"
class Employee(BaseModel):
employee_id: UUID = uuid4() # UUID类型,如果不提供就自动生成
name: str # 字符串
email: EmailStr # 邮箱类型
date_of_birth: date # 日期类型
salary: float # 浮点数
department: Department # 枚举类型
elected_benefits: bool # 布尔类型
|
如果想正确地实例化Employee
对象,需要指定字段类型。示例
Python |
---|
| # good
emp = Employee(
name="John Doe",
email="5hQp2@example.com",
date_of_birth=date(1990, 1, 1),
salary=100000.0,
department=Department.IT,
elected_benefits=True
)
|
序列化/反序列化
对于上面的emp
实例化对象,我们可以使用下面的方法/函数进行序列化和反序列化。
Python |
---|
| # to dict
emp_dict = emp.model_dump()
# from dict
emp2 = Employee.model_validate(emp_dict)
# to json
emp_json = emp.model_dump_json()
# from json
emp3 = Employee.model_validate_json(emp_json)
|
高级校验 Field
在基于基础的type hint校验数据时,有时无法满足需求,比如上面的salary
字段需要是float
类型,且必须为正数。由此我们需要额为引入pydantic.Field
来增加一部分功能校验
Python |
---|
| from datetime import date
from uuid import UUID, uuid4
from enum import Enum
from pydantic import BaseModel, EmailStr, Field
class Department(Enum):
HR = "HR"
IT = "IT"
class Employee(BaseModel):
employee_id: UUID = Field(default_factory=uuid4, frozen=True)
name: str = Field(min_length=1, frozen=True)
email: EmailStr = Field(pattern=r".+@example\.com$")
date_of_birth: date = Field(alias="birth_date", repr=False, frozen=True)
salary: float = Field(alias="compensation", gt=0, repr=False)
department: Department
elected_benefits: bool
|
注意:
default_factory=uuid4
,自动生成字段值,并利用uuid4
函数生成
frozen=True
,禁止修改字段值
min_length=1
,字段长度必须大于等于1
pattern=r".+@example\.com$
,引入正则表达式校验
repr=False
,在直接查看Employee
实例时不显示字段值
alias="birth_date"
,当序列化/反序列化时将字段值重命名为birth_date
,(注意,在序列化时如果指定model_dump(by_alias=True)
,则会自动将json中的字段值重命名为alias
值;如果需要将dict不使用alias
值直接转成model,则使用MyModel.model_construct(**dict_data)
方法)
gt=0
,字段值必须大于等于0
高级校验 Validators
单独校验某个字段
在下面的例子中,date_of_birth
字段的值必须大于18岁,否则抛出异常
Python |
---|
| from datetime import date
from uuid import UUID, uuid4
from enum import Enum
from pydantic import BaseModel, EmailStr, Field, field_validator
class Department(Enum):
HR = "HR"
IT = "IT"
class Employee(BaseModel):
employee_id: UUID = Field(default_factory=uuid4, frozen=True)
name: str = Field(min_length=1, frozen=True)
email: EmailStr = Field(pattern=r".+@example\.com$")
date_of_birth: date = Field(alias="birth_date", repr=False, frozen=True)
salary: float = Field(alias="compensation", gt=0, repr=False)
department: Department
elected_benefits: bool
@field_validator("date_of_birth")
@classmethod
def check_valid_age(cls, date_of_birth: date) -> date:
today = date.today()
eighteen_years_ago = date(today.year - 18, today.month, today.day)
if date_of_birth > eighteen_years_ago:
raise ValueError("Employees must be at least 18 years old.")
return date_of_birth
|
联合校验多个字段
在下面的例子中,IT部门的员工,工资必须大于10000,否则抛出异常
Python |
---|
| from typing import Self
from datetime import date
from uuid import UUID, uuid4
from enum import Enum
from pydantic import (
BaseModel,
EmailStr,
Field,
field_validator,
model_validator,
)
class Department(Enum):
HR = "HR"
SALES = "SALES"
IT = "IT"
ENGINEERING = "ENGINEERING"
class Employee(BaseModel):
employee_id: UUID = Field(default_factory=uuid4, frozen=True)
name: str = Field(min_length=1, frozen=True)
email: EmailStr = Field(pattern=r".+@example\.com$")
date_of_birth: date = Field(alias="birth_date", repr=False, frozen=True)
salary: float = Field(alias="compensation", gt=0, repr=False)
department: Department
elected_benefits: bool
@field_validator("date_of_birth")
@classmethod
def check_valid_age(cls, date_of_birth: date) -> date:
today = date.today()
eighteen_years_ago = date(today.year - 18, today.month, today.day)
if date_of_birth > eighteen_years_ago:
raise ValueError("Employees must be at least 18 years old.")
return date_of_birth
@model_validator(mode="after")
def check_it_benefits(self) -> Self:
if self.department == Department.IT and self.salary < 10000.0:
raise ValueError("IT employees must earn at least $10,000.")
return self
|
高级校验 validate_call
validate_call
用于对函数/方法进行强制校验,相比于Python自带的type hint,validate_call
在函数的入参不符合类型时,将抛出异常
Python |
---|
| import time
from typing import Annotated
from pydantic import PositiveFloat, Field, EmailStr, validate_call
@validate_call
def send_invoice(
client_name: Annotated[str, Field(min_length=1)],
client_email: EmailStr,
items_purchased: list[str],
amount_owed: PositiveFloat,
) -> str:
email_str = f"""
Dear {client_name}, \n
Thank you for choosing xyz inc! You
owe ${amount_owed:,.2f} for the following items: \n
{items_purchased}
"""
print(f"Sending email to {client_email}...")
time.sleep(2)
return email_str
|
管理配置
此功能多用于对用户的配置进行校验,尤其是经常需要变更的环境变量配置(更适用于在Docker环境的环境变量校验)
直接读取环境变量
首先在应用中定义配置类型AppConfig
Python |
---|
| from pydantic import HttpUrl, Field
from pydantic_settings import BaseSettings
class AppConfig(BaseSettings):
database_host: HttpUrl
database_user: str = Field(min_length=5)
database_password: str = Field(min_length=10)
api_key: str = Field(min_length=20)
|
然后在系统中尝试导入环境变量,下面是一组实例的错误的环境变量
Bash |
---|
| export DATABASE_USER="usee"
export DATABASE_PASSWORD="asdf"
export API_KEY="ajf"
|
在App启动时,进行配置校验,会抛出异常
Python |
---|
| from settings_management import AppConfig
AppConfig()
|
从文件读取环境变量
Python |
---|
| from pydantic import HttpUrl, Field
from pydantic_settings import BaseSettings, SettingsConfigDict
class AppConfig(BaseSettings):
model_config = SettingsConfigDict(
env_file=".env", # 从.env文件中读取环境变量
case_sensitive=True, # 大小写敏感
extra="forbid", # 禁止出现额外字段
env_file_encoding="utf-8",
)
database_host: HttpUrl
database_user: str = Field(min_length=5)
database_password: str = Field(min_length=10)
api_key: str = Field(min_length=20)
|
参考