The Boston Planning Department this week approved a "Squares and Streets" plan for Roslindale Square that includes zoning changes to make it easier to add housing atop the square's one-story commercial buildings, engineering studies of realigning Washington and Poplar streets along Adams Park and creating a more plaza-like feel to the intersection of Belgrade Avenue with South and Roberts streets.
The plan also calls for investigating ways to keep Roslindale Square the home of small, locally focused businesses - including more shops and services catering to Black, Hispanic and immigrant groups. And it proposes an organized effort to "encourage the build-out of a small movie theater/flexible entertainment space with a local film operator as the tenant." Roslindale Square has not had a movie theater since the Rialto closed in 1973, when other local businesses were also closing up as people began patronizing the malls along Rte. 1 in Dedham.
In addition to making it easier for Square building owners to do what the owners of Wallpaper City and the Chilacates building have already done - restore residential floors torn down when Roslindale Square had become a desolate bypass on the way to somewhere else - the plan would ask developers of new units to include a higher percentage of units with two or more bedrooms as part of their affordable-housing requirements.
And the city will look to use money from its own affordable-housing fund to acquire lots or buildings around the square that could be sold at reduced cost to developers to put up affordable housing. In recent years, developers have put up a number of affordable-apartment buildings in Roxbury's Nubian Square on what were formerly city-owned vacant or parking lots. The city says it would also work with the non-profit Southwest Boston Community Development Corp. or other non-profits on housing proposals in the Roslindale Square area.
The plan calls for engineering studies to determine whether to restore Washington Street between Corinth Street and Cummins Highway into a two-way road - which in turn would let the city turn Poplar by Adams Park into a "shared" resource that could be blocked off during the weekly farmers market and for public celebrations and other events - along with more trees and a permanent bike lane.
These changes would reduce traffic on Poplar St, simplify bus routing, reduce residential cut-through traffic, and improve operations at intersections. Bus stops would be relocated as needed to allow for passenger pick up and drop off along southbound Washington St.
If two-way operations are restored on Washington St, explore opportunities to shorten pedestrian crosswalks, create new separated bike connections, provide green infrastructure, and create space for community programming. Specific areas to consider include the intersections of Washington St/South St and Washington St/Poplar.
If two-way operations are restored on Washington St, explore expanding the Poplar St sidewalk along Adams Park and/or making Poplar St a shared street. A shared street along Adams Park would allow for pedestrian and bike travel, in addition to local vehicle travel and curbside parking/deliveries.
The plan estimates a formal engineering study would take up to two years. The complete re-make of Poplar Street into a "shared" resource, however, could take five to ten years, the city says.
The plan also calls for a re-do of the complex intersection of Belgrade Avenue with South and Roberts streets in a way that would enlarge the microscopic Alexander the Great park into a space large enough for public events - and to create an area in front of the Square Root that would allow patio seating - but says that while "interim activation" - similar to the way a section of Beech Street was blocked off for a plaza - could be done within one to three years, a complete, permanent re-do could take up to a decade.
Also proposed:
Flip the one-way directions of Firth Rd and Bexley Rd to control left-turning vehicles more safely at an existing traffic signal. This change will improve overall operations and reduce conflicts between turning vehicles and people walking and biking.
Square off the Belgrade Ave intersections of both Pinehurst St and Amherst St to reduce crosswalk distances, slow turning vehicles, expand space for bus stops, and create green infrastructure opportunities.
The plan also calls for the city to work with the MBTA to figure out how to increase the frequency of trains on the Needham Line - and to reduce the fares for passengers who use the Roslindale Village station, something Mayor Wu first started fighting for while still a city councilor. And the city committed to working with the T to upgrade the dismal pedestrian underpass at the station with new lighting and a mural.
Also on the to-do list: Do something about flooding on the streets around Healy Field.
Dear America
International journalists are hearing echoes. From countries around the world that have witnessed the rise of autocratic and populist leaders, they are watching the U.S. and warning of a characteristic of wounded democracies everywhere: an endangered free press.
This issue of Nieman Reports features insights from nine journalists abroad, offering advice to American reporters on preparing for attacks on press freedom.
Open, small LLMs have gotten really good. Good enough that more people – especially non-technical people – should be running models locally. Doing so not only provides you with an offline, privacy-safe, helpful chatbot; it also helps you learn how LLMs work and appreciate the diversity of models being built.
Following the LLM field is complicated. There are plenty of metrics you can judge a model on and plenty of types of models that aren’t easily comparable to one another. Further, the metrics that enthusiasts and analysts evaluate models might not matter to you.
But there is an overarching story across the field: LLMs are getting smarter and more efficient.
And while we continually hear about LLMs getting smarter, before the DeepSeek kerfuffle we didn’t hear so much about improvements in model efficiency. But models have been getting steadily more efficient, for years now. Those who keep tabs on these smaller models know that DeepSeek wasn’t a step-change anomaly, but an incremental step in an ongoing narrative.
These open models are now good enough that you – yes, you – can run a useful, private model for free on your own computer. And I’ll walk you through it.
Large language models are, in a nutshell, a collection of probabilities. When you download a model, that’s what you get: a file(s) full of numbers. To use the model you need software to perform inference: inputting your text into a model and generating output in response to it. There are many options here, but we’re going to pick a simple, free, cross-platform app called Jan.
Jan is an open-source piece of software that manages and runs models on your machine and talks to hosted models (like the OpenAI or Anthropic APIs). If you’ve used ChatGPT or Claude, its interface will feel familiar:
Go download Jan and come back when it’s installed.
In the screenshot above we’re in the chat tab (see that speech bubble icon in the upper left?) Before we can get started, we’ll need to download a model. Click the “four square” icon under the speech bubble and go to the model “hub”.
In Jan’s model hub we’re presented with a list of models we can download to use on our machine.
We can see the model name, the file size of the model, a button to download the model, and a carat to toggle more details for a given model. Like these:
If you are new to this, I expect these model names to be very confusing. The LLM field moves very fast and has evolved language conventions on the fly which can appear impenetrable to the unfamiliar (don’t make me tap the sign!).
But we can clear this up. You don’t need to understand the following to run a local model, but knowing the language conventions here will help demystify the domain.
Let’s go from left to right:
Don’t worry too much about the quantization notation – you don’t need to know it. Jan very helpfully provides us with warnings if a model won’t run well on our machines:
When getting started, feel free to stick with the model recommendations I list below. But if you want to explore, download and try anything that can run on your machine. If it’s slow, non-sensical, or you just don’t like it: delete it and move on.
The models listed here are curated by Jan. But Jan can run any text model hosted on Hugging Face (a website for sharing datasets and machine learning models) in the GGUF format (a file format for sharing models). And there’s plenty of them.
But let’s put a pin in this.
For now, go back to Jan’s model hub and hit the “Download” button for “Llama 3.2 3B Instruct Q8”. If Jan says this model will be, “Slow on your device,” download “Llama 3.2 1B Instruct Q8”. It’s smaller, but still great for its size. Come back when the download is complete.
When Llama 3.2 is downloaded, switch back to the chat pane by clicking the speech bubble in the upper-left. In the text field at the bottom, there’s a model selector. Check to make sure it’s set to the model you just downloaded:
As you download more models, you can switch between them here.
Typing text in the text field and hitting enter creates a new thread (just like ChatGPT or Claude).
Here’s what the 3B parameter model says when we ask, “Why might someone want to run a LLM on their laptop, rather than using a cloud provider?”
While running a Large Language Model (LLM) on a cloud provider can be convenient and scalable, there are some reasons why someone might prefer to run an LLM on their own laptop:
1. **Cost savings**: Cloud providers charge by the hour or usage, which can add up quickly. Running an LLM locally can save money in the long run, especially if you only need to use it occasionally.
2. **Data ownership and control**: When you host your LLM on a cloud provider, you may have limited control over how your data is stored, processed, and shared. Running it locally ensures that you maintain full control over your data and can ensure its privacy and security.
3. **Customization and flexibility**: By running an LLM locally, you can customize the environment to suit your specific needs without relying on cloud provider configurations or APIs. This can be particularly useful for research or development purposes where you need to experiment with different models or hyperparameters.
4. **Faster training times**: Training a large language model can take significant time and computational resources. Running it locally means that you don't have to worry about network latency, API timeouts, or other cloud-related issues that might slow down the training process.
5. **Offline access**: With a local LLM, you can work on projects without relying on an internet connection, making it ideal for situations where connectivity is limited or unreliable.
6. **Reduced dependency on cloud providers**: By running your own LLM locally, you reduce your dependence on cloud providers and their infrastructure. This can be beneficial if you're concerned about the stability of cloud services or prefer to maintain a more decentralized approach.
7. **Self-hosting and security**: Running an LLM locally allows you to implement additional security measures and ensure that your model is not exposed to external attacks or vulnerabilities.
However, it's worth noting that running a large language model on a laptop can be computationally intensive and may require significant resources (e.g., GPU power). If you're planning to run a large-scale LLM, cloud providers often offer more suitable solutions with optimized infrastructure and economies of scale.
And here’s the 1B parameter model response to the same question:
Running a Large Language Model (LLM) on a laptop can be beneficial for:
1. **Small projects or proof-of-concepts**: For smaller tasks, like experimenting with different models or hyperparameters, running an LLM locally might be sufficient.
2. **Research and development**: Researchers may need to run small-scale experiments or prototype their ideas without being tied to cloud infrastructure.
3. **Personal use cases**: Individuals who only occasionally work on language-related projects can benefit from local hosting, as it provides a cost-effective alternative to cloud providers.
However, running an LLM on a laptop has some limitations:
1. **Limited scalability**: As the model size or complexity increases, local hosting may become impractical due to computational constraints.
2. **Data storage and processing**: Running an LLM locally requires significant storage space for data and intermediate results, which can be challenging in laptops with limited disk capacity.
3. **Network latency and security concerns**: Local connections can introduce latency and compromise the model's security, especially if you're handling sensitive or proprietary data.
To overcome these limitations, researchers and developers often use:
1. **Cloud-based distributed training**: Training large models on a cloud provider, which allows for scalability and parallel processing.
2. **Serverless computing**: Using serverless platforms that provide a pay-as-you-go pricing model, enabling users to scale up or down as needed.
3. **Edge computing**: Deploying computations closer to the source of data, reducing latency and improving security.
Ultimately, whether running an LLM on a laptop or cloud provider depends on your specific needs, project requirements, and preferences.
Both are pretty good for such tiny models! The 3B-sized Llama does a better job, sticking to the question and providing numerous reasons. But this comes with a speed cost: on my Mac with 64GB of RAM, Llama 3B produces ~50 words-per-minute while Llama 1B produces ~113.
Figuring out which model is right for you takes experimentation. But I have a few recommendations to get you started.
Unless otherwise noted, you can download these models in Jan by copy-and-pasting the URL in the model name link.
Mistral-Small-24B-Instruct is a bit of a revelation. While there have been open GPT-4o class models that fit on laptops (Llama 3.3 comes to mind), Mistral-Small is the first one I’ve used whose speed is comparable as well. For the last week, I’ve been using it as my first stop – before Claude or ChatGPT – and it’s performed admirably. Rarely do I need to try a hosted model. It’s that good.
If you can run the “Q4” version (it’s the 13.35GB option in Jan, when you paste in the link), I strongly recommend you do. This is the model that inspired me to write this post. Local models are good enough for non-nerds to start using them.
Also good: Microsoft’s Phi-4 model is a bit smaller (~8GB when you select the “Q4” quantized one) but also excellent. It’s great at rephrasing, general knowledge, and light reasoning. It’s designed to be a one-shot model (basically, it outputs more detail for a single question and isn’t designed for follow-ups), and excels as a primer for most subjects.
Yeah, yeah, yeah…let’s get to DeepSeek.
DeepSeek is very good, but you’re not likely going to be able to run the original model on your machine. However, you likely can run one of the distilled models the DeepSeek team prepared. Distillation is a strategy to create lightweight versions of large language models that are both efficient and effective. A smaller model ‘learns’ by reviewing the output from a larger model.
In this case, DeepSeek R1 was used to train [Qwen 2.5], an excellent set of smaller models, how to reason. (DeepSeek also distilled Llama models) Getting this distilled model up and running in Jan requires an extra step, but the results are well worth it.
First, paste the DeepSeek-R1-Distill-Qwen14B model URL into Jan’s model hub search box. If you’re on Mac, grab the “Q4_1” version, provided you can run it. On Windows, grab the “Q4-K_M”. (If either of those are flagged as not being able to run on your machine, try the 7B version. It’s about ~5GB.)
Once the model downloads, click the “use” button and return to the chat window. Click the little slider icon next to the model name (marked in red, below). This toggles the thread’s settings. Toggle the “Model Settings” dropdown (marked in blue) so that the “Prompt template” is visible.
Your prompt template won’t look like the one in the image above. But we’re going to change that.. Paste the following in the prompt template field, replacing its current contents:
{system_message}
<|User|>
{prompt}
<|end▁of▁sentence|>
<|Assistant|>
Think of prompt templates as wrappers that format the text you enter into a format the model expects. Templates can vary, model by model. Thankfully, Jan will take care of most of this for you if you stick to their model library.
This change we’ve made, though, will let you see this model “think”. For example, here’s how it replies to our question about why one might want to use local LLMs. The reasoning is bracketed by <think>
and </think>
tokens and prefaces the final answer.
There are giant, leading LLMs available for cheap or free. Why should one bother with setting up a chatbot on their PC?
Llama 3.2 3B already answered this for us, but…
So much of the coverage around LLMs focuses on raising the ceiling: the improved capabilities of the largest models. But beneath this noise the floor is being raised. There’s been incredible progress: the capabilities of models you can run on moderately powered laptops perform as well as the largest models from this time last year. It’s time to try a local model.
I’ve been working on editors again: tweaking CodeMirror to make it work better in Val Town. It’s really hard work, unfortunately the kind of hard work that seems like it should be easy. It has me thinking a bit about why front-end engineering is difficult.
In fancy terms, it’s multivariate and extended over time. Frontends, backends, browsers, and users are all variable, unpredictable actors.
There are no single-player web applications: the simplest model of frontend software is a user interacting with a webpage, but the user and the webpage have similar capabilities. Your frontend application can respond to and produce events. It can modify the webpage.
So can the user: they can modify webpages at any point. Maybe they load every page in a new session, breaking your assumptions about how a website will persist state. Or maybe they never create a new session - they have 80+ tabs open that never get closed, so they will continue using versions of your frontend application that you released months ago, which now break when talking to an updated backend server.
But those aren’t the only players. Browsers are active participants in the game: a browser might translate a webpage and modify the DOM in a way that makes React crash. It might disable an API that the application expects, like localStorage, in a way that makes its behavior unpredictable. Ad-blockers might prevent some of the application from loading.
Distribution is probably the single most compelling attribute of the web over native applications. There’s no app store, no mandated release process, no installation step. But just as you as the application developer have more power, so do browsers and users.
The interplay between these actors is what makes things difficult, and difficult in a way that’s underemphasized in both Computer Science dogma and hacker culture. It isn’t some algorithmic challenge or hard performance problem, or anything that can be solved by switching programming languages. It’s an understanding of uncertainty in a deployment target that is constantly shifting and has a lot of unknowns.
After working in this realm, it’s a treat to work on ‘hard’ problems that at least give you more certainty about the environment. Optimizing SQL queries, for example – it can be challenging, but at least I know that I’m writing SQL for Postgres 16. Frontend engineering can feel like optimizing a query and not knowing whether it’ll run in DuckDB, Postgres, or MySQL, and even whether you can trust that the database schema will still be the same when it runs.
It’s hard work. It’s not cool. And I am nowhere near mastering it. I totally get it when people have an aversion to it, or think that the methods used today are wild. But mostly the methods are wild because the problem is, too. A straight-line solution to the problems of the front-end would be lovely, but those problems don’t really permit one.