pymllm.server.launch

pymllm HTTP server – RESTful API entry point.

This module implements a FastAPI-based HTTP server that wraps the pymllm Engine and exposes OpenAI-compatible and native REST endpoints.

Endpoints

  • GET  /health – liveness probe

  • GET  /v1/models – list served models (OpenAI-compatible)

  • POST /generate – native generate (streaming via SSE)

  • POST /v1/completions – OpenAI-compatible completions

  • POST /v1/chat/completions – OpenAI-compatible chat completions

  • GET  /model_info – model metadata

  • GET  /server_info – runtime config dump

  • POST /flush_cache – flush internal caches

  • POST /abort_request – cancel a running request

Attributes

Classes

GenerateRequest

Body for POST /generate.

ImageUrl

!!! abstract "Usage Documentation"

ContentPart

!!! abstract "Usage Documentation"

ChatMessage

!!! abstract "Usage Documentation"

StreamOptions

!!! abstract "Usage Documentation"

ToolFunction

!!! abstract "Usage Documentation"

Tool

!!! abstract "Usage Documentation"

ChatCompletionRequest

OpenAI POST /v1/chat/completions body.

CompletionRequest

OpenAI POST /v1/completions body.

AbortRequest

!!! abstract "Usage Documentation"

Functions

lifespan(app)

Startup / shutdown hooks for the FastAPI app.

http_exception_handler(request, exc)

health()

Liveness / readiness probe. Returns 503 if subprocesses died.

model_info()

Return basic model metadata.

server_info()

Dump runtime server configuration (sensitive fields redacted).

list_models()

OpenAI-compatible model listing.

retrieve_model(model_id)

OpenAI-compatible single model retrieval.

generate(obj, request)

Native generation endpoint. Supports SSE streaming.

openai_completions(obj, request)

OpenAI-compatible text completion endpoint.

openai_chat_completions(obj, request)

OpenAI-compatible chat completion endpoint with reasoning & tool-call parsing.

flush_cache()

Cache flush (not yet implemented).

abort_request(obj)

Abort a running request by rid.

launch_server()

Launch the pymllm Engine then start the uvicorn HTTP server.

main()

CLI entry point.

Module Contents

pymllm.server.launch.logger
class pymllm.server.launch.GenerateRequest(/, **data)

Bases: pydantic.BaseModel

Body for POST /generate.

Parameters:

data (Any)

text: List[str] | str | None = None
input_ids: List[List[int]] | List[int] | None = None
sampling_params: List[Dict[str, Any]] | Dict[str, Any] | None = None
image_data: Any | None = None
audio_data: Any | None = None
video_data: Any | None = None
return_logprob: List[bool] | bool | None = None
logprob_start_len: List[int] | int | None = None
top_logprobs_num: List[int] | int | None = None
lora_path: List[str | None] | str | None = None
session_params: List[Dict[str, Any]] | Dict[str, Any] | None = None
stream: bool = False
rid: List[str] | str | None = None
model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pymllm.server.launch.ImageUrl(/, **data)

Bases: pydantic.BaseModel

!!! abstract “Usage Documentation”

[Models](../concepts/models.md)

A base class for creating Pydantic models.

Parameters:

data (Any)

__class_vars__

The names of the class variables defined on the model.

__private_attributes__

Metadata about the private attributes of the model.

__signature__

The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__

Whether model building is completed, or if there are still undefined fields.

__pydantic_core_schema__

The core schema of the model.

__pydantic_custom_init__

Whether the model has a custom __init__ function.

__pydantic_decorators__

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__

Metadata for generic models; contains data used for a similar purpose to __args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__

The name of the post-init method for the model, if defined.

__pydantic_root_model__

Whether the model is a [RootModel][pydantic.root_model.RootModel].

__pydantic_serializer__

The pydantic-core SchemaSerializer used to dump instances of the model.

__pydantic_validator__

The pydantic-core SchemaValidator used to validate instances of the model.

__pydantic_fields__

A dictionary of field names and their corresponding [FieldInfo][pydantic.fields.FieldInfo] objects.

__pydantic_computed_fields__

A dictionary of computed field names and their corresponding [ComputedFieldInfo][pydantic.fields.ComputedFieldInfo] objects.

__pydantic_extra__

A dictionary containing extra values, if [extra][pydantic.config.ConfigDict.extra] is set to ‘allow’.

__pydantic_fields_set__

The names of fields explicitly set during instantiation.

__pydantic_private__

Values of private attributes set on the model instance.

url: str
detail: str | None = 'auto'
class pymllm.server.launch.ContentPart(/, **data)

Bases: pydantic.BaseModel

!!! abstract “Usage Documentation”

[Models](../concepts/models.md)

A base class for creating Pydantic models.

Parameters:

data (Any)

__class_vars__

The names of the class variables defined on the model.

__private_attributes__

Metadata about the private attributes of the model.

__signature__

The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__

Whether model building is completed, or if there are still undefined fields.

__pydantic_core_schema__

The core schema of the model.

__pydantic_custom_init__

Whether the model has a custom __init__ function.

__pydantic_decorators__

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__

Metadata for generic models; contains data used for a similar purpose to __args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__

The name of the post-init method for the model, if defined.

__pydantic_root_model__

Whether the model is a [RootModel][pydantic.root_model.RootModel].

__pydantic_serializer__

The pydantic-core SchemaSerializer used to dump instances of the model.

__pydantic_validator__

The pydantic-core SchemaValidator used to validate instances of the model.

__pydantic_fields__

A dictionary of field names and their corresponding [FieldInfo][pydantic.fields.FieldInfo] objects.

__pydantic_computed_fields__

A dictionary of computed field names and their corresponding [ComputedFieldInfo][pydantic.fields.ComputedFieldInfo] objects.

__pydantic_extra__

A dictionary containing extra values, if [extra][pydantic.config.ConfigDict.extra] is set to ‘allow’.

__pydantic_fields_set__

The names of fields explicitly set during instantiation.

__pydantic_private__

Values of private attributes set on the model instance.

type: str
text: str | None = None
image_url: ImageUrl | None = None
class pymllm.server.launch.ChatMessage(/, **data)

Bases: pydantic.BaseModel

!!! abstract “Usage Documentation”

[Models](../concepts/models.md)

A base class for creating Pydantic models.

Parameters:

data (Any)

__class_vars__

The names of the class variables defined on the model.

__private_attributes__

Metadata about the private attributes of the model.

__signature__

The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__

Whether model building is completed, or if there are still undefined fields.

__pydantic_core_schema__

The core schema of the model.

__pydantic_custom_init__

Whether the model has a custom __init__ function.

__pydantic_decorators__

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__

Metadata for generic models; contains data used for a similar purpose to __args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__

The name of the post-init method for the model, if defined.

__pydantic_root_model__

Whether the model is a [RootModel][pydantic.root_model.RootModel].

__pydantic_serializer__

The pydantic-core SchemaSerializer used to dump instances of the model.

__pydantic_validator__

The pydantic-core SchemaValidator used to validate instances of the model.

__pydantic_fields__

A dictionary of field names and their corresponding [FieldInfo][pydantic.fields.FieldInfo] objects.

__pydantic_computed_fields__

A dictionary of computed field names and their corresponding [ComputedFieldInfo][pydantic.fields.ComputedFieldInfo] objects.

__pydantic_extra__

A dictionary containing extra values, if [extra][pydantic.config.ConfigDict.extra] is set to ‘allow’.

__pydantic_fields_set__

The names of fields explicitly set during instantiation.

__pydantic_private__

Values of private attributes set on the model instance.

role: str
content: str | List[ContentPart] | None = None
name: str | None = None
tool_calls: List[Any] | None = None
tool_call_id: str | None = None
model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pymllm.server.launch.StreamOptions(/, **data)

Bases: pydantic.BaseModel

!!! abstract “Usage Documentation”

[Models](../concepts/models.md)

A base class for creating Pydantic models.

Parameters:

data (Any)

__class_vars__

The names of the class variables defined on the model.

__private_attributes__

Metadata about the private attributes of the model.

__signature__

The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__

Whether model building is completed, or if there are still undefined fields.

__pydantic_core_schema__

The core schema of the model.

__pydantic_custom_init__

Whether the model has a custom __init__ function.

__pydantic_decorators__

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__

Metadata for generic models; contains data used for a similar purpose to __args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__

The name of the post-init method for the model, if defined.

__pydantic_root_model__

Whether the model is a [RootModel][pydantic.root_model.RootModel].

__pydantic_serializer__

The pydantic-core SchemaSerializer used to dump instances of the model.

__pydantic_validator__

The pydantic-core SchemaValidator used to validate instances of the model.

__pydantic_fields__

A dictionary of field names and their corresponding [FieldInfo][pydantic.fields.FieldInfo] objects.

__pydantic_computed_fields__

A dictionary of computed field names and their corresponding [ComputedFieldInfo][pydantic.fields.ComputedFieldInfo] objects.

__pydantic_extra__

A dictionary containing extra values, if [extra][pydantic.config.ConfigDict.extra] is set to ‘allow’.

__pydantic_fields_set__

The names of fields explicitly set during instantiation.

__pydantic_private__

Values of private attributes set on the model instance.

include_usage: bool | None = False
continuous_usage_stats: bool | None = False
class pymllm.server.launch.ToolFunction(/, **data)

Bases: pydantic.BaseModel

!!! abstract “Usage Documentation”

[Models](../concepts/models.md)

A base class for creating Pydantic models.

Parameters:

data (Any)

__class_vars__

The names of the class variables defined on the model.

__private_attributes__

Metadata about the private attributes of the model.

__signature__

The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__

Whether model building is completed, or if there are still undefined fields.

__pydantic_core_schema__

The core schema of the model.

__pydantic_custom_init__

Whether the model has a custom __init__ function.

__pydantic_decorators__

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__

Metadata for generic models; contains data used for a similar purpose to __args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__

The name of the post-init method for the model, if defined.

__pydantic_root_model__

Whether the model is a [RootModel][pydantic.root_model.RootModel].

__pydantic_serializer__

The pydantic-core SchemaSerializer used to dump instances of the model.

__pydantic_validator__

The pydantic-core SchemaValidator used to validate instances of the model.

__pydantic_fields__

A dictionary of field names and their corresponding [FieldInfo][pydantic.fields.FieldInfo] objects.

__pydantic_computed_fields__

A dictionary of computed field names and their corresponding [ComputedFieldInfo][pydantic.fields.ComputedFieldInfo] objects.

__pydantic_extra__

A dictionary containing extra values, if [extra][pydantic.config.ConfigDict.extra] is set to ‘allow’.

__pydantic_fields_set__

The names of fields explicitly set during instantiation.

__pydantic_private__

Values of private attributes set on the model instance.

name: str
description: str | None = None
parameters: Dict[str, Any] | None = None
class pymllm.server.launch.Tool(/, **data)

Bases: pydantic.BaseModel

!!! abstract “Usage Documentation”

[Models](../concepts/models.md)

A base class for creating Pydantic models.

Parameters:

data (Any)

__class_vars__

The names of the class variables defined on the model.

__private_attributes__

Metadata about the private attributes of the model.

__signature__

The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__

Whether model building is completed, or if there are still undefined fields.

__pydantic_core_schema__

The core schema of the model.

__pydantic_custom_init__

Whether the model has a custom __init__ function.

__pydantic_decorators__

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__

Metadata for generic models; contains data used for a similar purpose to __args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__

The name of the post-init method for the model, if defined.

__pydantic_root_model__

Whether the model is a [RootModel][pydantic.root_model.RootModel].

__pydantic_serializer__

The pydantic-core SchemaSerializer used to dump instances of the model.

__pydantic_validator__

The pydantic-core SchemaValidator used to validate instances of the model.

__pydantic_fields__

A dictionary of field names and their corresponding [FieldInfo][pydantic.fields.FieldInfo] objects.

__pydantic_computed_fields__

A dictionary of computed field names and their corresponding [ComputedFieldInfo][pydantic.fields.ComputedFieldInfo] objects.

__pydantic_extra__

A dictionary containing extra values, if [extra][pydantic.config.ConfigDict.extra] is set to ‘allow’.

__pydantic_fields_set__

The names of fields explicitly set during instantiation.

__pydantic_private__

Values of private attributes set on the model instance.

type: str = 'function'
function: ToolFunction
class pymllm.server.launch.ChatCompletionRequest(/, **data)

Bases: pydantic.BaseModel

OpenAI POST /v1/chat/completions body.

Parameters:

data (Any)

model: str = ''
messages: List[ChatMessage]
temperature: float | None = None
top_p: float | None = None
top_k: int | None = None
max_tokens: int | None = None
max_completion_tokens: int | None = None
stream: bool = False
stream_options: StreamOptions | None = None
stop: str | List[str] | None = None
n: int = 1
frequency_penalty: float | None = None
presence_penalty: float | None = None
repetition_penalty: float | None = None
seed: int | None = None
logprobs: bool | None = None
top_logprobs: int | None = None
user: str | None = None
tools: List[Tool] | None = None
tool_choice: str | Dict[str, Any] | None = None
separate_reasoning: bool = True
stream_reasoning: bool = True
chat_template_kwargs: Dict[str, Any] | None = None
model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pymllm.server.launch.CompletionRequest(/, **data)

Bases: pydantic.BaseModel

OpenAI POST /v1/completions body.

Parameters:

data (Any)

model: str = ''
prompt: str | List[str]
temperature: float | None = None
top_p: float | None = None
top_k: int | None = None
max_tokens: int | None = None
stream: bool = False
stream_options: StreamOptions | None = None
stop: str | List[str] | None = None
n: int = 1
frequency_penalty: float | None = None
presence_penalty: float | None = None
repetition_penalty: float | None = None
seed: int | None = None
echo: bool = False
logprobs: int | None = None
user: str | None = None
model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pymllm.server.launch.AbortRequest(/, **data)

Bases: pydantic.BaseModel

!!! abstract “Usage Documentation”

[Models](../concepts/models.md)

A base class for creating Pydantic models.

Parameters:

data (Any)

__class_vars__

The names of the class variables defined on the model.

__private_attributes__

Metadata about the private attributes of the model.

__signature__

The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__

Whether model building is completed, or if there are still undefined fields.

__pydantic_core_schema__

The core schema of the model.

__pydantic_custom_init__

Whether the model has a custom __init__ function.

__pydantic_decorators__

Metadata containing the decorators defined on the model. This replaces Model.__validators__ and Model.__root_validators__ from Pydantic V1.

__pydantic_generic_metadata__

Metadata for generic models; contains data used for a similar purpose to __args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.

__pydantic_parent_namespace__

Parent namespace of the model, used for automatic rebuilding of models.

__pydantic_post_init__

The name of the post-init method for the model, if defined.

__pydantic_root_model__

Whether the model is a [RootModel][pydantic.root_model.RootModel].

__pydantic_serializer__

The pydantic-core SchemaSerializer used to dump instances of the model.

__pydantic_validator__

The pydantic-core SchemaValidator used to validate instances of the model.

__pydantic_fields__

A dictionary of field names and their corresponding [FieldInfo][pydantic.fields.FieldInfo] objects.

__pydantic_computed_fields__

A dictionary of computed field names and their corresponding [ComputedFieldInfo][pydantic.fields.ComputedFieldInfo] objects.

__pydantic_extra__

A dictionary containing extra values, if [extra][pydantic.config.ConfigDict.extra] is set to ‘allow’.

__pydantic_fields_set__

The names of fields explicitly set during instantiation.

__pydantic_private__

Values of private attributes set on the model instance.

rid: str | None = None
async pymllm.server.launch.lifespan(app)

Startup / shutdown hooks for the FastAPI app.

Parameters:

app (fastapi.FastAPI)

pymllm.server.launch.app
async pymllm.server.launch.http_exception_handler(request, exc)
Parameters:
  • request (fastapi.Request)

  • exc (fastapi.HTTPException)

async pymllm.server.launch.health()

Liveness / readiness probe. Returns 503 if subprocesses died.

async pymllm.server.launch.model_info()

Return basic model metadata.

async pymllm.server.launch.server_info()

Dump runtime server configuration (sensitive fields redacted).

async pymllm.server.launch.list_models()

OpenAI-compatible model listing.

async pymllm.server.launch.retrieve_model(model_id)

OpenAI-compatible single model retrieval.

Parameters:

model_id (str)

async pymllm.server.launch.generate(obj, request)

Native generation endpoint. Supports SSE streaming.

Parameters:
async pymllm.server.launch.openai_completions(obj, request)

OpenAI-compatible text completion endpoint.

Parameters:
async pymllm.server.launch.openai_chat_completions(obj, request)

OpenAI-compatible chat completion endpoint with reasoning & tool-call parsing.

Parameters:
async pymllm.server.launch.flush_cache()

Cache flush (not yet implemented).

async pymllm.server.launch.abort_request(obj)

Abort a running request by rid.

Parameters:

obj (AbortRequest)

pymllm.server.launch.launch_server()

Launch the pymllm Engine then start the uvicorn HTTP server.

It first boots all engine subprocesses (tokenizer, scheduler, model-runner, detokenizer) and then hands off to uvicorn to serve HTTP traffic.

pymllm.server.launch.main()

CLI entry point.