Skip to content

Convert Vocabulary

To start the conversion process, you need to have Python (3.8+) installed and install the required packages.

Terminal window
cd tools/convertor
pip install -r ./requirements.txt

To convert the vocabulary to mllm vocabulary, follow these steps. We currently support two types of tokenizers: Unigram and BPE.

Terminal window
python vocab.py --input_file=tokenizer.json --output_file=vocab.mllm --type=Unigram