modeling_llama¶
Classes¶
Module Contents¶
- class modeling_llama.LlamaPreTrainedModel¶
Bases:
transformers.modeling_utils.PreTrainedModel- config: transformers.models.llama.configuration_llama.LlamaConfig¶
- base_model_prefix = 'model'¶
- supports_gradient_checkpointing = True¶
- class modeling_llama.LlamaModel(config)¶
Bases:
LlamaPreTrainedModel- Parameters:
config (transformers.models.llama.configuration_llama.LlamaConfig)
- padding_idx¶
- vocab_size¶
- embed_tokens¶
- layers¶
- norm¶
- rotary_emb¶
- gradient_checkpointing = False¶
- sin_embedding_input_qdq¶
- cos_embedding_input_qdq¶
- norm_input_qdq¶
- convert_rope_for_deploy()¶
- forward(input_ids=None, attention_mask=None, position_ids=None, past_key_values=None, inputs_embeds=None, cache_position=None, use_cache=None, **kwargs)¶
- Parameters:
input_ids (Optional[torch.LongTensor])
attention_mask (Optional[torch.Tensor])
position_ids (Optional[torch.LongTensor])
past_key_values (Optional[transformers.cache_utils.Cache])
inputs_embeds (Optional[torch.FloatTensor])
cache_position (Optional[torch.LongTensor])
use_cache (Optional[bool])
kwargs (transformers.processing_utils.Unpack[transformers.utils.TransformersKwargs])
- Return type:
transformers.modeling_outputs.BaseModelOutputWithPast
- class modeling_llama.LlamaForCausalLM(config)¶
Bases:
LlamaPreTrainedModel,transformers.generation.GenerationMixin- model¶
- vocab_size¶
- lm_head¶
- mllm_qualcomm_max_length = None¶
- lm_head_input_qdq¶
- lm_head_output_qdq¶
- copy_lm_head_weight_from_embed_tokens()¶
- forward(input_ids=None, attention_mask=None, position_ids=None, past_key_values=None, inputs_embeds=None, labels=None, use_cache=None, cache_position=None, logits_to_keep=0, **kwargs)¶
Example:
```python >>> from transformers import AutoTokenizer, LlamaForCausalLM
>>> model = LlamaForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf") >>> tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
>>> prompt = "Hey, are you conscious? Can you talk to me?" >>> inputs = tokenizer(prompt, return_tensors="pt")
>>> # Generate >>> generate_ids = model.generate(inputs.input_ids, max_length=30) >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0] "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you." ```
- Parameters:
input_ids (Optional[torch.LongTensor])
attention_mask (Optional[torch.Tensor])
position_ids (Optional[torch.LongTensor])
past_key_values (Optional[transformers.cache_utils.Cache])
inputs_embeds (Optional[torch.FloatTensor])
labels (Optional[torch.LongTensor])
use_cache (Optional[bool])
cache_position (Optional[torch.LongTensor])
logits_to_keep (Union[int, torch.Tensor])
kwargs (transformers.processing_utils.Unpack[transformers.utils.TransformersKwargs])
- Return type:
transformers.modeling_outputs.CausalLMOutputWithPast
- class modeling_llama.LlamaForSequenceClassification¶
Bases:
transformers.modeling_layers.GenericForSequenceClassification,LlamaPreTrainedModel
- class modeling_llama.LlamaForQuestionAnswering¶
Bases:
transformers.modeling_layers.GenericForQuestionAnswering,LlamaPreTrainedModel- base_model_prefix = 'transformer'¶
- class modeling_llama.LlamaForTokenClassification¶
Bases:
transformers.modeling_layers.GenericForTokenClassification,LlamaPreTrainedModel