Convert Vocabulary
To start the conversion process, you need to have Python (3.8+) installed and install the required packages.
cd tools/convertorpip install -r ./requirements.txt
To convert the vocabulary to mllm vocabulary, follow these steps. We currently support two types of tokenizers: Unigram and BPE.
python vocab.py --input_file=tokenizer.json --output_file=vocab.mllm --type=Unigram