Private, Free, and Powerful: A Guide to Local AI
Step-by-step walkthroughs to common tasks with small language models
Small language models are powerful, free, and private. They can also be a nightmare to use—figuring out how to do any one task with a small LM involves multiple hurdles of downloading esoteric pieces of software; finding an LM that runs on your computer; and navigating the confluence of what’s possible with the tools, what’s possible with the LM, and what you’re actually trying to accomplish.
Underneath all this cruft is the tantalizing promise of local-first AI—tools that accomplish meaningful work without surrendering data or money to a large tech company. This is the driving philosophy behind Tiny Tools, a framework for software and AI control in newsrooms that I developed with Mandi Cai and Jeremy Gilbert. I’ve also experimented with leveraging local LMs over the past year, from SQL-based data analysis to deep research over document sets.
This work consistently shows that small language models can do productive journalism tasks, such as searching for documents, annotating data, or extracting information from images. This post is an attempt to complement that work with a practical guide, offering step-by-step instructions on how to set up and run small language models locally for a range of useful tasks. While the examples throughout draw from newsroom use cases, these techniques apply to anyone who values privacy, autonomy, and control over their AI tools.
How this guide works: We’ll start simple and gradually increase in complexity. First, we’ll set up basic chat with a local model (easy, 5 minutes). Then we’ll add document search and web capabilities (moderate, 20 minutes). Next, we’ll explore image and audio processing (uses the same tools). Finally, we’ll build automated workflows (moderate-advanced, 30+ minutes). You can stop at any point when you’ve achieved what you need—each section builds on the previous one, but you don’t have to complete them all.
Chat
To chat with a local language model, we’ll use a free, open source tool called Ollama.
Quick Setup:
Download: ollama.com
Installation time: ~5 minutes
Technical difficulty: Easy (standard app installation)
After downloading Ollama, we can open the application to access its UI. To the right of the text box, there’s a toggle that lets us select which model we want to interact with (and download models as needed). Gemma3:4b, Google’s small language model, is a good starting point. Once the model is installed, we can chat with it just like we would with ChatGPT.
Understanding model names: The “4b” in Gemma3:4b refers to 4 billion parameters—essentially the model’s size. Smaller models (1b-7b parameters) run faster and use less memory but are less capable. Larger models (8b-20b+ parameters) are smarter but slower and require more RAM.
How to choose a model: Browse available models in Ollama’s interface or at ollama.com/library. Start with smaller models and only move up if you need better quality.
If a model fails at your task: Try these steps: (1) Switch to a larger model—an 8b model might succeed where a 4b model struggles. (2) Improve your prompt—be more specific about what you want, provide more examples, be clear about the desired output format. (3) Try a different model family—some models excel at certain tasks. You can test multiple models quickly by switching between them in the UI.
For newsroom use, this basic chat functionality is useful for quick tasks like brainstorming story angles, drafting social media posts, or getting initial background on unfamiliar topics—all without sending sensitive story information to external servers.
Should you continue? If basic chat with a local model meets your needs, you can stop here. You now have a private, free alternative to ChatGPT. Continue to the next section if you want to work with your own documents, search the web, or reduce hallucinations by grounding the model in real information.
Using documents and web search
One of the weaknesses of small language models is that they tend to hallucinate more often. An effective way to counteract that tendency is by providing the model directly with trustworthy information in the form of documents and web searches, rather than depending on the information it learned during training.
We can do both of these things with Ollama (documentation here and here), but there’s a more powerful tool we can use called Open WebUI. Think of this as an extension on top of Ollama, a program that uses the same underlying models but adds a ton of extra functionality.
Quick Setup:
Prerequisites: Docker Desktop - docker.com/products/docker-desktop (required before installing Open WebUI)
Download: docs.openwebui.com (installation instructions)
Installation time: ~15-20 minutes (including Docker setup)
Technical difficulty: Moderate (requires terminal commands and Docker)
The installation process for Open WebUI is more involved—you’ll need to install Docker Desktop first (see Quick Setup above). After Docker is installed and running, you can follow the terminal commands in the Open WebUI installation guide to install and launch Open WebUI, which will then be available at http://localhost:3000 (or whatever your default port is).
Now we can start pulling in all sorts of external knowledge to our chatbot, depending on what we want to incorporate:
To ask questions about a single document, we can drag the file into the chat window and make a query.
To query multiple documents, we can first upload everything we want to use into a collection by going to Workspace > Knowledge > New Knowledge. Then, we can incorporate this collection into a query by referencing it with the
#shortcut—for example, here I’m querying OpenAI’s gpt-oss:20b with a set of articles I’ve collected about the environmental impact of AI:
To ask questions about a specific webpage, we can attach it to our query using the + button > Attach Webpage, then dropping in a URL. Open WebUI will attempt to scrape the text of that webpage and include it in the model’s context.
To enable web search, we first need to click on the user badge in the top right > Admin Panel > Settings > Web Search, and toggle on the feature. Then, select your web search engine. Each engine has different requirements and functionality; I signed up for a free account on tavily and copied its API key (a unique password that lets Open WebUI access the search service) into Open WebUI. After all this setup, the language model can finally search the web! Click the Integrations button in the chat window (the four circles under the text box) and toggle on Web Search. This will let the model retrieve sources from the internet and cite them in its responses:
Should you continue? This is a good stopping point if you primarily work with text documents and web sources. You now have a powerful local research assistant. Continue to the next section if you need to process images (like scanned documents or photos) or transcribe audio files. The good news: no additional software setup required, just different models.
Working with images and audio
Note: This section uses the same tools (Ollama and Open WebUI) from the previous sections, so no additional setup is required.
For journalists, working with images could mean extracting text from scanned documents, analyzing photos for story details, or getting descriptions of charts and infographics. Luckily, the tools we’ve explored so far can handle image inputs—you can drop an image file directly into Ollama or Open WebUI. All we need to do is select a model that can understand images. Some good options include Gemma 3 and Mistral Small 3.2. Then, we can load in our model and ask it questions about an image:
For audio, our best bet is transcription. While some small LMs can input and output audio (e.g., Qwen 2.5 Omni), right now they’re a headache to set up. Luckily, Open WebUI comes equipped with a transcription model, so we can just hit the microphone button in the chat window and talk to any model we’d like. In addition, we can drop in an audio file and get a transcript. This is useful for journalists working with recorded interviews or press conferences—you can transcribe audio locally without sending it to external services.
Should you continue? For most users, the tools you’ve set up so far (Ollama and Open WebUI) will handle the majority of daily AI tasks. Continue to the next section only if you want to build automated workflows that run without manual intervention—like monitoring RSS feeds, processing files on a schedule, or chaining multiple AI tasks together. This requires additional setup (Node.js and n8n) and is more technically involved.
Multi-step workflows
So far, we’ve looked at tasks that consist of only a single step. In practice, we might want to connect a language model to multiple tasks with automated connections. It’s useful, for example, to upload a company financial filing to an LM and extract key results. It’s even more powerful to automate the process of looking for new filings, downloading them, extracting information with the LM, and outputting the results to a spreadsheet.
To build these kinds of pipelines, I often use a tool called n8n. There’s a paid cloud version of n8n, but with a few terminal commands we can also use the free, open source version locally.
Quick Setup:
Prerequisites: Node.js - nodejs.org (required before installing n8n)
Installation command:
npx n8n(run this in your terminal after Node.js is installed)Installation time: ~10-15 minutes (including Node.js setup)
Technical difficulty: Moderate (requires terminal commands)
After running npx n8n, the tool will launch in your browser.
N8n is a powerful workflow tool with many integrations—the n8n website has plenty of documentation on integrations, use cases, and examples. Among them is support for Ollama, which will let us plug in any of the small LMs we’ve downloaded so far. We just need to select the “Ollama Model” or ‘Ollama Chat Model’ node, select or create a credential (a saved connection setting that tells n8n where to find your Ollama installation—the default settings should work, although you may need to change localhost to 127.0.0.1 in the “Base URL” setting), and select a model. From there, we can build a workflow around the LM.
One useful workflow, inspired by a recent C+J paper I led, is pulling in web pages and deciding whether they’re worth following up on. In that paper, we evaluated how well OpenAI models could assess the newsworthiness of generative AI use cases in newsrooms. Here, we can do something similar—let’s say we want to grab articles from the front page of Hacker News, then use a language model to filter the ones we’re not interested in. In n8n, we can point an RSS node at the HN front page feed, call an Ollama model with a prompt:
Below is the headline for a news article. I’m only interested in reading articles about 1) Python tools, libraries, or programming language updates, or 2) retro computing. Determine if this headline meets these criteria. Only respond with “Yes” or “No”.
{{ $json.title }}
And then save any articles that the model highlights. The full workflow looks like this:
From here, we could make all kinds of adjustments and extensions—having the workflow run on a set schedule, adding topic tags to the LM’s output, incorporating additional sources, or uploading the results to an Airtable or Google Sheet.
Local tooling and user control
What you’ve built: If you’ve followed along, you now have a complete local AI toolkit. You can chat with language models without sending data to external servers. You can analyze your own documents and search the web while keeping sensitive information private. You can process images and transcribe audio locally. And if you went all the way, you can build automated workflows that run on your schedule, monitoring sources and processing information without manual intervention.
Why this matters: In 1975, if you wanted to use a computer, you needed access to a university mainframe or corporate data center. In 1985, you could have one on your desk. In 2025, we’re at a similar inflection point with AI. Right now, most people depend on corporate servers to access language models. But small LMs are changing that equation. Just as the shift from mainframes to personal computers democratized computing, local AI can democratize access to artificial intelligence.
For newsrooms in particular, this shift is critical. It means journalists can analyze sensitive documents, work with confidential sources, and build proprietary research tools without compromising privacy and independence. This is a vision of AI that is distributed, respects user autonomy, and keeps essential tooling free from the influence of profit motives.
Next steps: Local AI takes more work to set up than ChatGPT, but I hope this guide has demystified the process. To cement what you’ve learned, try this: Take a real task from your work this week—maybe analyzing a set of documents, transcribing an interview, or monitoring a source for updates—and do it entirely with local tools. You’ll discover both the capabilities and limitations firsthand, and you’ll be building skills that put you in control of your AI tools rather than dependent on them.
In the future, I’d like to write more guides like this—perhaps for local image and audio generation, or for a deeper dive into n8n workflows tailored to newsroom needs. If there are specific local AI tasks you’d like to see covered, I’d love to hear about them.
Appendix A: Technical Notes
Your computer’s specs will dramatically affect which models you can run. I ran all these walkthroughs on an M3 MacBook Air with 24 GB of system memory. As a rule of thumb, the M-series chips that MacBooks run on are generally strong for language models, and the more system memory you have available, the larger (and more capable) models you can run.
Appendix B: Alternative Approaches
Many different tools can do the tasks laid out in this guide. For the sake of clarity, the main text focuses on one workflow that I’ve had success with. Below are alternative tools for each category, with brief notes on why you might consider them.
Chat alternatives
The other widely-used application for this purpose is LM Studio, a user-friendly tool that has a chat interface and model selector built into one application. The biggest caveat with LM Studio is that it isn’t fully open source.
Document and web search alternatives
AnythingLLM is another local application geared toward chatting with documents. If you have an NVIDIA graphics card, ChatRTX also lets you pull your local documents and data into conversations with a language model.
Image and audio alternatives
If you prefer an all-in-one desktop app, LM Studio supports vision-capable models (e.g., qwen2-vl) and documents how to pass images to VLMs. For audio, two lightweight local ASR options are faster-whisper and whisper.cpp.
Multi-step workflow alternatives
Beyond n8n, Node-RED is an open-source, flow-based tool with a browser editor and rich node ecosystem. If you want something more developer-centric, Windmill turns scripts into APIs/UIs/cron jobs and composes them into fast, self-hostable workflows.






