Index _ | A | B | C | D | E | F | G | I | K | L | M | N | O | P | Q | R | S | T | U | V | W | Z _ __add__() (pymllm.ffi.Tensor method) __class_vars__ (pymllm.service.network.ChatCompletionRequest attribute) __div__() (pymllm.ffi.Tensor method) __enter__() (pymllm.utils.adb.ShellContext method) __exit__() (pymllm.utils.adb.ShellContext method) __mul__() (pymllm.ffi.Tensor method) __neg__() (pymllm.ffi.Tensor method) __private_attributes__ (pymllm.service.network.ChatCompletionRequest attribute) __pydantic_complete__ (pymllm.service.network.ChatCompletionRequest attribute) __pydantic_computed_fields__ (pymllm.service.network.ChatCompletionRequest attribute) __pydantic_core_schema__ (pymllm.service.network.ChatCompletionRequest attribute) __pydantic_custom_init__ (pymllm.service.network.ChatCompletionRequest attribute) __pydantic_decorators__ (pymllm.service.network.ChatCompletionRequest attribute) __pydantic_extra__ (pymllm.service.network.ChatCompletionRequest attribute) __pydantic_fields__ (pymllm.service.network.ChatCompletionRequest attribute) __pydantic_fields_set__ (pymllm.service.network.ChatCompletionRequest attribute) __pydantic_generic_metadata__ (pymllm.service.network.ChatCompletionRequest attribute) __pydantic_parent_namespace__ (pymllm.service.network.ChatCompletionRequest attribute) __pydantic_post_init__ (pymllm.service.network.ChatCompletionRequest attribute) __pydantic_private__ (pymllm.service.network.ChatCompletionRequest attribute) __pydantic_root_model__ (pymllm.service.network.ChatCompletionRequest attribute) __pydantic_serializer__ (pymllm.service.network.ChatCompletionRequest attribute) __pydantic_validator__ (pymllm.service.network.ChatCompletionRequest attribute) __signature__ (pymllm.service.network.ChatCompletionRequest attribute) __str__() (pymllm.ffi.Tensor method) __sub__() (pymllm.ffi.Tensor method) A abs() (pymllm.ffi.Tensor method) adb_path (pymllm.utils.adb.ADBToolkit attribute) (pymllm.utils.adb.ShellContext attribute) ADBToolkit (class in pymllm.utils.adb) app (in module pymllm.service.network) arange() (in module pymllm.ffi) ARGeneration (C++ class) ARGeneration::categoricalSample (C++ function) ARGeneration::do_sample_ (C++ member) ARGeneration::eos_token_id_ (C++ member) ARGeneration::forward (C++ function) ARGeneration::generate (C++ function) ARGeneration::getLastLogits (C++ function) ARGeneration::max_length_ (C++ member) ARGeneration::sampleFromDistribution (C++ function) ARGeneration::sampleGreedy (C++ function) ARGeneration::sampleTemperature (C++ function) ARGeneration::sampleTopK (C++ function) ARGeneration::sampleTopP (C++ function) ARGeneration::streamGenerate (C++ function) ARGeneration::trace (C++ function) B bfloat16 (in module pymllm.ffi) bfloat16_() (in module pymllm.ffi) build_cast2fp32_pipeline() (in module pymllm.quantize.pipeline) build_w4a32_kai_pipeline() (in module pymllm.quantize.pipeline) BUILTIN_QUANTIZE_PASS (in module pymllm.quantize.pipeline) BUILTIN_QUANTIZE_PIPELINE (in module pymllm.quantize.pipeline) C Cast2Fp32QuantizePass (class in pymllm.quantize.cast2fp32_pass) CausalMask (C++ class) CausalMask::CausalMask::CausalMask (C++ function), [1], [2] ChatCompletionRequest (class in pymllm.service.network) cli_app (in module pymllm.service.tools) clip() (pymllm.ffi.Tensor method) clone() (pymllm.ffi.Tensor method) close() (pymllm.utils.adb.ShellContext method) compile_test() (in module tilelang_compile_test) contiguous() (pymllm.ffi.Tensor method) Conv3D (C++ class) Conv3D::Conv3D::bias (C++ function) Conv3D::Conv3D::Conv3D (C++ function), [1], [2] Conv3D::Conv3D::weight (C++ function) cpu (in module pymllm.ffi) cpu() (pymllm.ffi.Tensor method) cpu_() (in module pymllm.ffi) create_session() (in module pymllm.service.models_hub) cuda (in module pymllm.ffi) cuda() (pymllm.ffi.Tensor method) cuda_() (in module pymllm.ffi) D Device (class in pymllm.ffi) device (pymllm.ffi.Tensor property) device() (in module pymllm.ffi) device_id (pymllm.utils.adb.ShellContext attribute) download_mllm_model() (in module pymllm.service.models_hub) DType (class in pymllm.ffi) dtype (pymllm.ffi.Tensor property) E echo() (in module pymllm.ffi) elementwise_add() (in module tilelang_compile_test) Embedding (C++ class) Embedding::Embedding::Embedding (C++ function), [1], [2] Embedding::Embedding::weight (C++ function) empty() (in module pymllm.ffi) enable_faulthandler() (in module pymllm.utils.error_handler) enable_thinking (pymllm.service.network.ChatCompletionRequest attribute) execute() (pymllm.utils.adb.ShellContext method) execute_command() (pymllm.utils.adb.ADBToolkit method) F file_handler (pymllm.convertor.model_file_v2.ModelFileV2 attribute) file_path (pymllm.convertor.model_file_v2.ModelFileV2 attribute) finalize() (pymllm.convertor.model_file_v2.ModelFileV2 method) float16 (in module pymllm.ffi) float16_() (in module pymllm.ffi) float32 (in module pymllm.ffi) float32_() (in module pymllm.ffi) from_numpy() (in module pymllm.ffi) from_torch() (in module pymllm.ffi) G GELU (C++ class) GELU::GELU::GELU (C++ function), [1] get_device_info() (pymllm.utils.adb.ADBToolkit method) get_devices() (pymllm.utils.adb.ADBToolkit method) get_download_model_path() (in module pymllm.service.models_hub) get_response() (in module pymllm.service.rr_process) get_shell_context() (pymllm.utils.adb.ADBToolkit method) I id (pymllm.service.network.ChatCompletionRequest attribute) initialize_context() (in module pymllm.ffi) inputs_dict (pymllm.quantize.quantize_pass.QuantizePlanPayload attribute) inputs_num (pymllm.quantize.quantize_pass.QuantizePlanPayload attribute) insert_session() (in module pymllm.service.rr_process) install_apk() (pymllm.utils.adb.ADBToolkit method) is_alive() (pymllm.utils.adb.ShellContext method) is_contiguous() (pymllm.ffi.Tensor method) is_numpy_available() (in module pymllm.ffi) is_torch_available() (in module pymllm.ffi) K KVCache (C++ class) KVCache::KVCache::KVCache (C++ function), [1], [2] KVCache::KVCache::setLayerIndex (C++ function) L Layer (C++ class) Layer::__fmt_print (C++ function) Layer::__main (C++ function) Layer::impl (C++ function) Layer::Layer (C++ function), [1] Layer::opType (C++ function) Layer::refOptions (C++ function) Layer::to (C++ function) LayerNorm (C++ class) LayerNorm::LayerNorm::LayerNorm (C++ function), [1], [2] lifespan() (in module pymllm.service.network) Linear (C++ class) Linear::Linear::bias (C++ function) Linear::Linear::Linear (C++ function), [1], [2] Linear::Linear::weight (C++ function) load_model() (in module pymllm.convertor) M magic (pymllm.convertor.model_file_v2.ModelFileV2Descriptor attribute) main() (in module pymllm.service.tools) (in module pymllm.utils.mllm_convertor) match() (pymllm.quantize.cast2fp32_pass.Cast2Fp32QuantizePass method) (pymllm.quantize.kai.w4a32.W4A32KAIQuantizePass method) (pymllm.quantize.quantize_pass.QuantizeBasePass method) matmul() (in module pymllm.nn.functional) matmul_impl_blas (in module pymllm.nn.functional) matmul_impl_default (in module pymllm.nn.functional) matmul_impl_gguf (in module pymllm.nn.functional) matmul_impl_mllmblas (in module pymllm.nn.functional) max() (pymllm.ffi.Tensor method) mean() (pymllm.ffi.Tensor method) messages (pymllm.service.network.ChatCompletionRequest attribute) min() (pymllm.ffi.Tensor method) mllm::__mllm_exception_main (C++ function) mllm::__setup_signal_handler (C++ function) mllm::__signal_handler (C++ function) mllm::async::fork (C++ function) mllm::async::wait (C++ function), [1] mllm::cleanThisThread (C++ function) mllm::initializeContext (C++ function) mllm::isOpenCLAvailable (C++ function) mllm::isQnnAvailable (C++ function) mllm::load (C++ function) mllm::memoryReport (C++ function) mllm::nn::functional::clip (C++ function) mllm::nn::functional::concat (C++ function) mllm::nn::functional::flashAttention2 (C++ function) mllm::nn::functional::interpolate (C++ function), [1] mllm::nn::functional::log (C++ function) mllm::nn::functional::matmul (C++ function) mllm::nn::functional::max (C++ function) mllm::nn::functional::mean (C++ function) mllm::nn::functional::min (C++ function) mllm::nn::functional::pad (C++ function) mllm::nn::functional::softmax (C++ function) mllm::nn::functional::split (C++ function), [1], [2], [3] mllm::nn::functional::sum (C++ function) mllm::nn::functional::topk (C++ function) mllm::nn::functional::view (C++ function) mllm::perf::warmup (C++ function) mllm::print (C++ function) mllm::save (C++ function) mllm::setMaximumNumThreads (C++ function) mllm::setPrintMaxElementsPerDim (C++ function) mllm::setPrintPrecision (C++ function) mllm::setRandomSeed (C++ function) mllm::shutdownContext (C++ function) mllm::signal_description (C++ function) mllm::test::allClose (C++ function) mllm::test::AllCloseResult (C++ class) mllm::test::AllCloseResult::mllm::test::AllCloseResult::is_close (C++ member) mllm::test::AllCloseResult::mllm::test::AllCloseResult::max_absolute_diff (C++ member) mllm::test::AllCloseResult::mllm::test::AllCloseResult::max_relative_diff (C++ member) mllm::test::AllCloseResult::mllm::test::AllCloseResult::mismatched_elements (C++ member) mllm::test::AllCloseResult::mllm::test::AllCloseResult::total_elements (C++ member) mllm::thisThread (C++ function) MLLM_FIND_NUMPY_AVAILABLE (in module pymllm.ffi) MLLM_FIND_SAFETENSORS_AVAILABLE (in module pymllm.convertor) MLLM_FIND_TORCH_AVAILABLE (in module pymllm.ffi) MLLM_LAYER_ANY_INPUTS_1_OUTPUTS_FORWARD (C macro) MLLM_LAYER_ANY_INPUTS_2_OUTPUTS_FORWARD (C macro) MLLM_LAYER_ANY_INPUTS_3_OUTPUTS_FORWARD (C macro) MLLM_MAIN (C macro) MLLM_MODEL_FILE_V2_MAGIC_NUMBER (in module pymllm.convertor.model_file_v2) MLLM_MODEL_FILE_V2_MODEL_NAME_LENGTH (in module pymllm.convertor.model_file_v2) MLLM_MODEL_FILE_V2_PARAMS_NAME_LENGTH (in module pymllm.convertor.model_file_v2) MLLM_MODEL_FILE_V2_TENSOR_SHAPE_LENGTH (in module pymllm.convertor.model_file_v2) MLLM_MODEL_FILE_V2_VERSION (in module pymllm.convertor.model_file_v2) MLLM_TYPE_MAPPING (in module pymllm.convertor.mllm_type_mapping) model (pymllm.service.network.ChatCompletionRequest attribute) MODEL_HUB_LOOKUP_TABLE (in module pymllm.service.models_hub) model_name (pymllm.convertor.model_file_v2.ModelFileV2 attribute) (pymllm.convertor.model_file_v2.ModelFileV2Descriptor attribute) MODEL_SESSION_CREATED (in module pymllm.service.network) ModelFileV2 (class in pymllm.convertor.model_file_v2) ModelFileV2Descriptor (class in pymllm.convertor.model_file_v2) ModelFileV2ParamsDescriptor (class in pymllm.convertor.model_file_v2) module pymllm pymllm.compile pymllm.compile.mlir pymllm.convertor pymllm.convertor.mllm_type_mapping pymllm.convertor.model_file_v1 pymllm.convertor.model_file_v2 pymllm.ffi pymllm.ffi.base pymllm.nn pymllm.nn.functional pymllm.quantize pymllm.quantize.cast2fp32_pass pymllm.quantize.gguf pymllm.quantize.kai pymllm.quantize.kai.w4a32 pymllm.quantize.pipeline pymllm.quantize.quantize_pass pymllm.quantize.solver pymllm.quantize.spinquant pymllm.service pymllm.service.models_hub pymllm.service.network pymllm.service.rr_process pymllm.service.tools pymllm.utils pymllm.utils.adb pymllm.utils.error_handler pymllm.utils.mllm_convertor service test_tensor tilelang_compile_test Module (C++ class) Module::__fmt_print (C++ function) Module::__main (C++ function) Module::__send_graph_begin (C++ function) Module::__send_graph_end (C++ function) Module::__trace (C++ function) Module::forward (C++ function) Module::getBuffer (C++ function) Module::getModuleName (C++ function) Module::impl (C++ function) Module::load (C++ function) Module::Module (C++ function), [1], [2] Module::operator() (C++ function) Module::params (C++ function) Module::reg (C++ function) Module::registerBuffer (C++ function) Module::to (C++ function) MultimodalRoPE (C++ class) MultimodalRoPE::MultimodalRoPE::MultimodalRoPE (C++ function), [1], [2] N name (pymllm.convertor.model_file_v2.ModelFileV2ParamsDescriptor attribute) (pymllm.ffi.Tensor property) num_params (pymllm.convertor.model_file_v2.ModelFileV2Descriptor attribute) numel() (pymllm.ffi.Tensor method) O ones() (in module pymllm.ffi) outputs_dict (pymllm.quantize.quantize_pass.QuantizePlanPayload attribute) outputs_num (pymllm.quantize.quantize_pass.QuantizePlanPayload attribute) P Param (C++ class) Param::Param::Param (C++ function), [1], [2] Param::Param::weight (C++ function) param_id (pymllm.convertor.model_file_v2.ModelFileV2ParamsDescriptor attribute) param_offset (pymllm.convertor.model_file_v2.ModelFileV2ParamsDescriptor attribute) param_size (pymllm.convertor.model_file_v2.ModelFileV2ParamsDescriptor attribute) param_type (pymllm.convertor.model_file_v2.ModelFileV2ParamsDescriptor attribute) params_desc_offset (pymllm.convertor.model_file_v2.ModelFileV2Descriptor attribute) passes (pymllm.quantize.solver.QuantizeSolver attribute) permute() (pymllm.ffi.Tensor method) predict() (in module pymllm.service.network) prepare() (pymllm.quantize.cast2fp32_pass.Cast2Fp32QuantizePass method) (pymllm.quantize.kai.w4a32.W4A32KAIQuantizePass method) (pymllm.quantize.quantize_pass.QuantizeBasePass method) pull_file() (pymllm.utils.adb.ADBToolkit method) push_file() (pymllm.utils.adb.ADBToolkit method) pymllm module pymllm.compile module pymllm.compile.mlir module pymllm.convertor module pymllm.convertor.mllm_type_mapping module pymllm.convertor.model_file_v1 module pymllm.convertor.model_file_v2 module pymllm.ffi module pymllm.ffi.base module pymllm.nn module pymllm.nn.functional module pymllm.quantize module pymllm.quantize.cast2fp32_pass module pymllm.quantize.gguf module pymllm.quantize.kai module pymllm.quantize.kai.w4a32 module pymllm.quantize.pipeline module pymllm.quantize.quantize_pass module pymllm.quantize.solver module pymllm.quantize.spinquant module pymllm.service module pymllm.service.models_hub module pymllm.service.network module pymllm.service.rr_process module pymllm.service.tools module pymllm.utils module pymllm.utils.adb module pymllm.utils.error_handler module pymllm.utils.mllm_convertor module Q qnn (in module pymllm.ffi) qnn_() (in module pymllm.ffi) QuantizeBasePass (class in pymllm.quantize.quantize_pass) QuantizePlanPayload (class in pymllm.quantize.quantize_pass) QuantizeSolver (class in pymllm.quantize.solver) QuickGELU (C++ class) QuickGELU::QuickGELU::QuickGELU (C++ function), [1] R random() (in module pymllm.ffi) rank (pymllm.ffi.Tensor property) reboot_device() (pymllm.utils.adb.ADBToolkit method) record_screen() (pymllm.utils.adb.ADBToolkit method) register_pass() (pymllm.quantize.solver.QuantizeSolver method) ReLU (C++ class) ReLU::ReLU::ReLU (C++ function), [1] repeat() (pymllm.ffi.Tensor method) RMSNorm (C++ class) RMSNorm::RMSNorm::RMSNorm (C++ function), [1], [2] RMSNorm::RMSNorm::weight (C++ function) run() (pymllm.quantize.cast2fp32_pass.Cast2Fp32QuantizePass method) (pymllm.quantize.kai.w4a32.W4A32KAIQuantizePass method) (pymllm.quantize.quantize_pass.QuantizeBasePass method) S send_request() (in module pymllm.service.rr_process) service module Session (class in pymllm.ffi) session_qwen3() (in module pymllm.service.models_hub) set_name() (pymllm.ffi.Tensor method) shape (pymllm.convertor.model_file_v2.ModelFileV2ParamsDescriptor attribute) (pymllm.ffi.Tensor property) shape_len (pymllm.convertor.model_file_v2.ModelFileV2ParamsDescriptor attribute) ShellContext (class in pymllm.utils.adb) shutdown_context() (in module pymllm.ffi) SiLU (C++ class) SiLU::SiLU::SiLU (C++ function), [1] SIZE (pymllm.convertor.model_file_v2.ModelFileV2Descriptor attribute) (pymllm.convertor.model_file_v2.ModelFileV2ParamsDescriptor attribute) Softmax (C++ class) Softmax::Softmax::Softmax (C++ function), [1], [2] squeeze() (pymllm.ffi.Tensor method) start_service() (in module pymllm.service.rr_process) static_write() (pymllm.convertor.model_file_v2.ModelFileV2 method) STFT (C++ class) STFT::STFT::STFT (C++ function), [1], [2] stop_service() (in module pymllm.service.rr_process) stream (pymllm.service.network.ChatCompletionRequest attribute) stream_quantize() (pymllm.quantize.solver.QuantizeSolver method) stream_quantize_params_size() (pymllm.quantize.solver.QuantizeSolver method) streaming_write() (pymllm.convertor.model_file_v2.ModelFileV2 method) sum() (pymllm.ffi.Tensor method) T T (pymllm.ffi.Tensor property) take_screenshot() (pymllm.utils.adb.ADBToolkit method) Tensor (class in pymllm.ffi) Tensor::abs (C++ function) Tensor::alloc (C++ function) Tensor::allocExtraTensorView (C++ function) Tensor::arange (C++ function) Tensor::at (C++ function) Tensor::bytes (C++ function) Tensor::clip (C++ function) Tensor::clone (C++ function) Tensor::coffsettedPtr (C++ function) Tensor::constAt (C++ function) Tensor::contiguous (C++ function) Tensor::copy2 (C++ function) Tensor::cptrAt (C++ function) Tensor::cpu (C++ function) Tensor::cuda (C++ function) Tensor::delete_ (C++ function) Tensor::device (C++ function) Tensor::dtype (C++ function) Tensor::empty (C++ function) Tensor::fromVector (C++ function) Tensor::getExtraTensorViewInTensor (C++ function) Tensor::isContiguous (C++ function) Tensor::isContiguousN (C++ function) Tensor::isNil (C++ function) Tensor::max (C++ function) Tensor::mean (C++ function) Tensor::memType (C++ function) Tensor::min (C++ function) Tensor::name (C++ function) Tensor::nil (C++ function) Tensor::numel (C++ function) Tensor::offsettedPtr (C++ function) Tensor::ones (C++ function) Tensor::operator bool (C++ function) Tensor::operator delete (C++ function) Tensor::operator* (C++ function), [1] Tensor::operator+ (C++ function), [1] Tensor::operator- (C++ function), [1], [2] Tensor::operator/ (C++ function), [1] Tensor::operator[] (C++ function), [1] Tensor::permute (C++ function) Tensor::ptr (C++ function) Tensor::ptrAt (C++ function) Tensor::random (C++ function) Tensor::repeat (C++ function) Tensor::reshape (C++ function) Tensor::setMemType (C++ function) Tensor::setName (C++ function) Tensor::shape (C++ function) Tensor::stride (C++ function) Tensor::sum (C++ function) Tensor::T (C++ function) Tensor::to (C++ function), [1] Tensor::topk (C++ function) Tensor::transpose (C++ function) Tensor::unsqueeze (C++ function) Tensor::uuid (C++ function) Tensor::view (C++ function) Tensor::zeros (C++ function) test_empty_tensor_create() (in module test_tensor) test_is_torch_available() (in module test_tensor) test_tensor module tilelang_compile_test module to() (pymllm.ffi.Tensor method) to_pod() (pymllm.ffi.Device method) (pymllm.ffi.DType method) tobytes() (pymllm.ffi.Tensor method) transpose() (pymllm.ffi.Tensor method) U uninstall_app() (pymllm.utils.adb.ADBToolkit method) unsqueeze() (pymllm.ffi.Tensor method) update_mode (pymllm.convertor.model_file_v2.ModelFileV2 attribute) V v2_file_header (pymllm.convertor.model_file_v2.ModelFileV2 attribute) v2_param_descriptor (pymllm.convertor.model_file_v2.ModelFileV2 attribute) version (pymllm.convertor.model_file_v2.ModelFileV2Descriptor attribute) view() (pymllm.ffi.Tensor method) VisionRoPE (C++ class) VisionRoPE::VisionRoPE::VisionRoPE (C++ function), [1], [2] W W4A32KAIQuantizePass (class in pymllm.quantize.kai.w4a32) Z zeros() (in module pymllm.ffi)