PriorLabs TabPFN

以下是您要求的英文文章翻译为流畅中文的结果，已严格遵循原文结构、格式和术语保留要求。

TabPFN 快速入门交互式笔记本教程 💡

立即使用我们的交互式 Colab 笔记本上手体验！这是感受 TabPFN 的最佳方式，它将引导您完成安装、分类和回归示例。

⚡ 推荐使用 GPU：为获得最佳性能，请使用 GPU（即使是较旧的、约 8GB 显存的 GPU 也能良好运行；部分大型数据集需要 16GB）。在 CPU 上，仅适用于小型数据集（≲1000 个样本）。没有 GPU？可通过 TabPFN Client 使用我们免费的托管推理服务。

安装

官方安装（pip）

pip install tabpfn

或从源码安装

pip install "tabpfn @ git+https://github.com/PriorLabs/TabPFN.git"

或本地开发安装：首先安装我们开发所用的 uv（建议使用 0.10.0 或更高版本），然后运行：

git clone https://github.com/PriorLabs/TabPFN.git --depth 1
cd TabPFN
uv sync

基本用法

使用我们默认的、完全基于合成数据训练的 TabPFN-2.6 模型：

from tabpfn import TabPFNClassifier, TabPFNRegressor

clf = TabPFNClassifier()
clf.fit(X_train, y_train)  # 首次使用时会下载检查点
predictions = clf.predict(X_test)

reg = TabPFNRegressor()
reg.fit(X_train, y_train)  # 首次使用时会下载检查点
predictions = reg.predict(X_test)

使用其他模型版本（例如 TabPFN-2.5）：

from tabpfn import TabPFNClassifier, TabPFNRegressor
from tabpfn.constants import ModelVersion

classifier = TabPFNClassifier.create_default_for_version(ModelVersion.V2_5)
regressor = TabPFNRegressor.create_default_for_version(ModelVersion.V2_5)

完整示例请参见 tabpfn_for_binary_classification.py、tabpfn_for_multiclass_classification.py 和 tabpfn_for_regression.py 文件。

使用技巧

使用批量预测模式：每次 predict 调用都会重新计算训练集。对 100 个样本分别调用 predict 比单次调用慢近 100 倍且成本更高。如果测试集非常大，请将其分成每批 1000 个样本的块。
避免数据预处理：向模型输入数据时，不要应用数据缩放或 one-hot 编码。
使用 GPU：TabPFN 在 CPU 上执行速度较慢。请确保有 GPU 可用以获得更好的性能。
注意数据集大小：TabPFN 在样本数少于 100,000 且特征数少于 2000 的数据集上效果最佳。对于更大的数据集，建议参考大型数据集指南。

TabPFN 生态系统

根据您的需求选择合适的 TabPFN 实现：

TabPFN Client：通过基于云的推理使用 TabPFN 的简单 API 客户端。
TabPFN Extensions：一个功能强大的配套仓库，包含高级工具、集成和特性——非常适合贡献代码：
interpretability：通过基于 SHAP 的解释、特征重要性和选择工具获得洞察。
unsupervised：用于异常检测和合成表格数据生成的工具。
embeddings：提取并使用 TabPFN 内部学习的嵌入向量用于下游任务或分析。
many_class：处理超出 TabPFN 内置类别限制的多类分类问题。
rf_pfn：将 TabPFN 与随机森林等传统模型结合，实现混合方法。
hpo：针对 TabPFN 定制的自动超参数优化。
post_hoc_ensembles：通过集成多个训练后的 TabPFN 模型来提升性能。
安装方式：

    git clone https://github.com/priorlabs/tabpfn-extensions.git
    pip install -e tabpfn-extensions

TabPFN（本仓库）：支持 PyTorch 和 CUDA 的快速本地推理核心实现。
TabPFN UX：无代码图形界面，用于探索 TabPFN 功能——非常适合业务用户和原型设计。

TabPFN 工作流程概览

按照以下决策树构建您的模型，并从我们的生态系统中选择合适的扩展。它将引导您回答关于数据、硬件和性能需求的关键问题，从而为您的特定用例找到最佳解决方案。

config:
  theme: 'default'
  themeVariables:
    edgeLabelBackground: 'white'
---
graph LR
    %% 1. 定义颜色方案与样式
    classDef default fill:#fff,stroke:#333,stroke-width:2px,color:#333;
    classDef start_node fill:#e8f5e9,stroke:#43a047,stroke-width:2px,color:#333;
    classDef process_node fill:#e0f2f1,stroke:#00796b,stroke-width:2px,color:#333;
    classDef decision_node fill:#fff8e1,stroke:#ffa000,stroke-width:2px,color:#333;
    style Infrastructure fill:#fff,stroke:#ccc,stroke-width:5px;
    style Unsupervised fill:#fff,stroke:#ccc,stroke-width:5px;
    style Data fill:#fff,stroke:#ccc,stroke-width:5px;
    style Performance fill:#fff,stroke:#ccc,stroke-width:5px;
    style Interpretability fill:#fff,stroke:#ccc,stroke-width:5px;

    %% 2. 定义图结构
    subgraph Infrastructure
        start((Start)) --> gpu_check["GPU available?"];
        gpu_check -- Yes --> local_version["Use TabPFN
(local PyTorch)"];
        gpu_check -- No --> api_client["Use TabPFN-Client
(cloud API)"];
        task_type["What is
your task?"]
    end
    local_version --> task_type
    api_client --> task_type
    end_node((Workflow
Complete));

    subgraph Unsupervised
        unsupervised_type["Select
Unsupervised Task"];
        unsupervised_type --> imputation["Imputation"]
        unsupervised_type --> data_gen["Data
Generation"];
        unsupervised_type --> tabebm["Data
Augmentation"];
        unsupervised_type --> density["Outlier
Detection"];
        unsupervised_type --> embedding["Get
Embeddings"];
    end

    subgraph Data
        data_check["Data Checks"];
        model_choice["Samples > 50k or
Classes > 10?"];
        data_check -- "Table Contains Text Data?" --> api_backend_note["Note: API client has
native text support"];
        api_backend_note --> model_choice;
        data_check -- "Time-Series Data?" --> ts_features["Use Time-Series
Features"];
        ts_features --> model_choice;
        data_check -- "Purely Tabular" --> model_choice;
        model_choice -- "No" --> finetune_check;
        model_choice -- "Yes, 50k-100k samples" --> ignore_limits["Set
ignore_pretraining_limits=True"];
        model_choice -- "Yes, >100k samples" --> subsample["Large Datasets Guide
"];
        model_choice -- "Yes, >10 classes" --> many_class["Many-Class
Method"];
    end

如果您需要将上述 Mermaid 流程图中的英文节点也一并翻译为中文，请告知，我可以为您提供完整的中文版本。