Learn how to self-host A.I LLMs like Llama 3, Mistral etc. locally using OpenSource tools and how to run uncensored models locally and customize them.
Selfhosting Meta Llama 3
LLaMA 3 is the latest and most advanced Large Language Model (LLM) developed by Meta. They have made the model open source so that it is freely available for individuals to set up and use it. A fully offline, customizable A.I model running on a laptop sounds so cool in theory, but the greatest constraint is the hardware requirements.
It takes a very powerful GPU and lots of VRAM to run it smoothly. I tried to run it on my laptop that has 16GB of RAM, an i7 9th Gen, GTX 1650 GPU that has 4GB of VRAM, but it completely ignored the GPU and ran fully off CPU, and it was painfully slow, like 1-2 tokens/s.
Next, I tried it on my MacBook, and it was way better. I could achieve some speed around 10-13 tokens/s (thanks to Apple unified memory). Now I have a usable A.I model running locally successfully.
Installation steps for Selfhosting LLMs locally.
Installation and setup are pretty simple. Basically, it needs two things: the A.I model file and the software to load it and run, and it also provides a GUI or API access. I tried a few tools, Gpt4all, Jan.ai, Lmstudio.ai to mention a few.
Gpt4all is an open-source and privacy-oriented tool, and the code is available on GitHub. To download, just go to https://gpt4all.io and click on the download button for the respective platform and install it. Open the tool; it will show a GUI with various options. Go to downloads and search for the LLM model. Search results will show available models and the requirements like file size and RAM requirement. Select according to your system specs. After downloading, simply click on the chat icon and then load the model. It might take a few seconds to load. That’s all; enjoy.
Jan.ai Jan A.I is another similar tool that can help to run LLMs locally. It is also a free and open-source application. The UI is a bit clunky, and downloading models directly is not as smooth as gpt4all, but it has a feature called extensions where you can type in the API key of popular commercial LLMs like ChatGPT or Groq and use it without downloading anything.
Selfhosting uncensored LLMs locally.
Additionally, I also tried running some uncensored LLMs and they work fine on all the tools I tried. Uncensored models are not bound by the safety and ethical limitations of regular models so use them with caution. Below picture shows one such model running on Jan.ai tool.
Lmstudio is another option, but keep in mind it is a closed-source application even though it has a more feature-rich UI for one-click downloading of models.
Whichever application is used, the underlying LLM model (.gguf file) will be the same and can be used in all these.
COMMENTS