Local Ollama models
How to install and run open source LLM’s locally using Ollama and integrate it into VSCode editor for assisted code completion etc.
Ollama is a tool that allows you to run open-source large language models (LLMs) locally on your machine to provide flexibility in working with different models
You can view a list of supported models here.
Running Ollama⌗
For me running Ollama locally is a simple as executing the following in a terminal:
This will download Ollama and start the server. If all good at this point you should see in the terminal output that it is Listening on 127.0.0.1:11434
. If you open that URL in a browser then you should see Ollama is running
.
To run a specific model, browse to the Ollama models library and pick one that suits your needs. For example:
Llama 2 is released by Meta Platforms, Inc. This model is trained on 2 trillion tokens, and by default supports a context length of 4096. Llama 2 Chat models are fine-tuned on over 1 million human annotations, and are made for chat.
Interaction⌗
After running the model as shown above, once it finishes downloading it will provide you with a prompt to start chatting:
There is also an API available that you can send requests to:
You can view the API docs here.
To install a different model, repeat the run command above, specifying a different model:
To see a list of installed models:
Integration with VSCode⌗
Install and configure the llama-coder
extension from the VSCode marketplace.
Llama Coder is a better and self-hosted Github Copilot replacement for VS Studio Code. Llama Coder uses Ollama and codellama to provide autocomplete that runs on your hardware.
I’ll do a follow up on this with my findings later, after I have some more time using this and compared other models.