Journalist/developer. Storytelling developer @ USA Today Network. Builder of @HomicideWatch. Sinophile for fun. Past: @frontlinepbs @WBUR, @NPR, @NewsHour.
2109 stories
·
45 followers

Moving on from 18F. — ethanmarcotte.com

1 Share
Read the whole story
chrisamico
2 days ago
reply
Boston, MA
Share this story
Delete

Roslindale Square gets official city plan calling for more housing, possible realignment of key roads and a movie theater

1 Share

The Boston Planning Department this week approved a "Squares and Streets" plan for Roslindale Square that includes zoning changes to make it easier to add housing atop the square's one-story commercial buildings, engineering studies of realigning Washington and Poplar streets along Adams Park and creating a more plaza-like feel to the intersection of Belgrade Avenue with South and Roberts streets.

The plan also calls for investigating ways to keep Roslindale Square the home of small, locally focused businesses - including more shops and services catering to Black, Hispanic and immigrant groups. And it proposes an organized effort to "encourage the build-out of a small movie theater/flexible entertainment space with a local film operator as the tenant." Roslindale Square has not had a movie theater since the Rialto closed in 1973, when other local businesses were also closing up as people began patronizing the malls along Rte. 1 in Dedham.

In addition to making it easier for Square building owners to do what the owners of Wallpaper City and the Chilacates building have already done - restore residential floors torn down when Roslindale Square had become a desolate bypass on the way to somewhere else - the plan would ask developers of new units to include a higher percentage of units with two or more bedrooms as part of their affordable-housing requirements.

And the city will look to use money from its own affordable-housing fund to acquire lots or buildings around the square that could be sold at reduced cost to developers to put up affordable housing. In recent years, developers have put up a number of affordable-apartment buildings in Roxbury's Nubian Square on what were formerly city-owned vacant or parking lots. The city says it would also work with the non-profit Southwest Boston Community Development Corp. or other non-profits on housing proposals in the Roslindale Square area.

The plan calls for engineering studies to determine whether to restore Washington Street between Corinth Street and Cummins Highway into a two-way road - which in turn would let the city turn Poplar by Adams Park into a "shared" resource that could be blocked off during the weekly farmers market and for public celebrations and other events - along with more trees and a permanent bike lane.

These changes would reduce traffic on Poplar St, simplify bus routing, reduce residential cut-through traffic, and improve operations at intersections. Bus stops would be relocated as needed to allow for passenger pick up and drop off along southbound Washington St.

If two-way operations are restored on Washington St, explore opportunities to shorten pedestrian crosswalks, create new separated bike connections, provide green infrastructure, and create space for community programming. Specific areas to consider include the intersections of Washington St/South St and Washington St/Poplar.

If two-way operations are restored on Washington St, explore expanding the Poplar St sidewalk along Adams Park and/or making Poplar St a shared street. A shared street along Adams Park would allow for pedestrian and bike travel, in addition to local vehicle travel and curbside parking/deliveries.

The plan estimates a formal engineering study would take up to two years. The complete re-make of Poplar Street into a "shared" resource, however, could take five to ten years, the city says.

The plan also calls for a re-do of the complex intersection of Belgrade Avenue with South and Roberts streets in a way that would enlarge the microscopic Alexander the Great park into a space large enough for public events - and to create an area in front of the Square Root that would allow patio seating - but says that while "interim activation" - similar to the way a section of Beech Street was blocked off for a plaza - could be done within one to three years, a complete, permanent re-do could take up to a decade.

Also proposed:

Flip the one-way directions of Firth Rd and Bexley Rd to control left-turning vehicles more safely at an existing traffic signal. This change will improve overall operations and reduce conflicts between turning vehicles and people walking and biking.

Square off the Belgrade Ave intersections of both Pinehurst St and Amherst St to reduce crosswalk distances, slow turning vehicles, expand space for bus stops, and create green infrastructure opportunities.

The plan also calls for the city to work with the MBTA to figure out how to increase the frequency of trains on the Needham Line - and to reduce the fares for passengers who use the Roslindale Village station, something Mayor Wu first started fighting for while still a city councilor. And the city committed to working with the T to upgrade the dismal pedestrian underpass at the station with new lighting and a mural.

Also on the to-do list: Do something about flooding on the streets around Healy Field.

Complete proposal.

Neighborhoods: 
Read the whole story
chrisamico
5 days ago
reply
Boston, MA
Share this story
Delete

Dear America - Nieman Reports

1 Share

Dear America

International journalists are hearing echoes. From countries around the world that have witnessed the rise of autocratic and populist leaders, they are watching the U.S. and warning of a characteristic of wounded democracies everywhere: an endangered free press.

This issue of Nieman Reports features insights from nine journalists abroad, offering advice to American reporters on preparing for attacks on press freedom.

Read the whole story
chrisamico
8 days ago
reply
Boston, MA
Share this story
Delete

Nieman Foundation curator Ann Marie Lipinski to step down - Nieman Foundation

1 Share
Read the whole story
chrisamico
8 days ago
reply
Boston, MA
Share this story
Delete

A Gentle Intro to Running a Local LLM

1 Share

How to Run a Chatbot on Your Laptop (and Why You Should)

Open, small LLMs have gotten really good. Good enough that more people – especially non-technical people – should be running models locally. Doing so not only provides you with an offline, privacy-safe, helpful chatbot; it also helps you learn how LLMs work and appreciate the diversity of models being built.

Following the LLM field is complicated. There are plenty of metrics you can judge a model on and plenty of types of models that aren’t easily comparable to one another. Further, the metrics that enthusiasts and analysts evaluate models might not matter to you.

But there is an overarching story across the field: LLMs are getting smarter and more efficient.

And while we continually hear about LLMs getting smarter, before the DeepSeek kerfuffle we didn’t hear so much about improvements in model efficiency. But models have been getting steadily more efficient, for years now. Those who keep tabs on these smaller models know that DeepSeek wasn’t a step-change anomaly, but an incremental step in an ongoing narrative.

These open models are now good enough that you – yes, you – can run a useful, private model for free on your own computer. And I’ll walk you through it.

Installing Software to Run Models

Large language models are, in a nutshell, a collection of probabilities. When you download a model, that’s what you get: a file(s) full of numbers. To use the model you need software to perform inference: inputting your text into a model and generating output in response to it. There are many options here, but we’re going to pick a simple, free, cross-platform app called Jan.

Jan is an open-source piece of software that manages and runs models on your machine and talks to hosted models (like the OpenAI or Anthropic APIs). If you’ve used ChatGPT or Claude, its interface will feel familiar:

Jan in use

Go download Jan and come back when it’s installed.

In the screenshot above we’re in the chat tab (see that speech bubble icon in the upper left?) Before we can get started, we’ll need to download a model. Click the “four square” icon under the speech bubble and go to the model “hub”.

Making Sense of Models

In Jan’s model hub we’re presented with a list of models we can download to use on our machine.

Jan's model hub

We can see the model name, the file size of the model, a button to download the model, and a carat to toggle more details for a given model. Like these:

Details about Llama 3.2 1B Instruct Q8

If you are new to this, I expect these model names to be very confusing. The LLM field moves very fast and has evolved language conventions on the fly which can appear impenetrable to the unfamiliar (don’t make me tap the sign!).

But we can clear this up. You don’t need to understand the following to run a local model, but knowing the language conventions here will help demystify the domain.

Breaking down the components of a model name

Let’s go from left to right:

  1. Family: Models come in families, which helps you group them by the teams that make them and their intended use (usually). Think of this as the model’s brand name. Here, it’s “Llama”, a family of open models produced by Meta.
  2. Generation: The generation number is like a version number. A larger number means a more recent model, with the value to the left of the decimal indicating major generations. The number to the right of the decimal might indicate an incremental update or signify a variant for a specific use case. Here we’re looking at 3.2, a generation of Llama models which are smaller sized and designed to run “at the edge” (aka, on a device like a phone or PC, not a remote server).
  3. Parameters: As I said, models are essentially collections of probabilities. The parameter count is the number of probabilities contained in a given model. This count loosely correlates to a model’s performance (though much less so these days). A model with more parameters will require a more powerful computer to run. Parameter count also correlates with the amount of space a model takes up on your computer. This model has 1 billion parameters and clocks in at 1.23GB. There is a 3 billion parameter option in the Llama 3.2 generation as well, weighing in at 3.19GB.
  4. Variant: Different models are built for different purposes. The variant describes the task for which a model is built or tuned. This model is made to follow instructions, hence, “Instruct.” You will also see models with “Chat” or “Code” (both self-explanatory). “Base” we tend to see less of these days, but it refers to models that have yet to be tuned for a specific task.
  5. Quantization: Quantization is a form of model compression. We keep the same number of parameters, but we reduce the details of each. In this case, we’re converting the numbers representing the probabilities in the model from highly detailed numbers with plenty of decimal places to 8-bit integers: whole numbers between -128 and 127. The “Q8” here says the weights in the model have been converted to 8-bit integers. We’re saving plenty of space in exchange for some potential model wonkiness during usage.

Don’t worry too much about the quantization notation – you don’t need to know it. Jan very helpfully provides us with warnings if a model won’t run well on our machines:

Jan tells us when models will run slow on our PC

When getting started, feel free to stick with the model recommendations I list below. But if you want to explore, download and try anything that can run on your machine. If it’s slow, non-sensical, or you just don’t like it: delete it and move on.

The models listed here are curated by Jan. But Jan can run any text model hosted on Hugging Face (a website for sharing datasets and machine learning models) in the GGUF format (a file format for sharing models). And there’s plenty of them.

But let’s put a pin in this.

For now, go back to Jan’s model hub and hit the “Download” button for “Llama 3.2 3B Instruct Q8”. If Jan says this model will be, “Slow on your device,” download “Llama 3.2 1B Instruct Q8”. It’s smaller, but still great for its size. Come back when the download is complete.

Chatting with Jan

When Llama 3.2 is downloaded, switch back to the chat pane by clicking the speech bubble in the upper-left. In the text field at the bottom, there’s a model selector. Check to make sure it’s set to the model you just downloaded:

Jan's model selector in action

As you download more models, you can switch between them here.

Typing text in the text field and hitting enter creates a new thread (just like ChatGPT or Claude).

Here’s what the 3B parameter model says when we ask, “Why might someone want to run a LLM on their laptop, rather than using a cloud provider?”

While running a Large Language Model (LLM) on a cloud provider can be convenient and scalable, there are some reasons why someone might prefer to run an LLM on their own laptop:

1. **Cost savings**: Cloud providers charge by the hour or usage, which can add up quickly. Running an LLM locally can save money in the long run, especially if you only need to use it occasionally.
2. **Data ownership and control**: When you host your LLM on a cloud provider, you may have limited control over how your data is stored, processed, and shared. Running it locally ensures that you maintain full control over your data and can ensure its privacy and security.
3. **Customization and flexibility**: By running an LLM locally, you can customize the environment to suit your specific needs without relying on cloud provider configurations or APIs. This can be particularly useful for research or development purposes where you need to experiment with different models or hyperparameters.
4. **Faster training times**: Training a large language model can take significant time and computational resources. Running it locally means that you don't have to worry about network latency, API timeouts, or other cloud-related issues that might slow down the training process.
5. **Offline access**: With a local LLM, you can work on projects without relying on an internet connection, making it ideal for situations where connectivity is limited or unreliable.
6. **Reduced dependency on cloud providers**: By running your own LLM locally, you reduce your dependence on cloud providers and their infrastructure. This can be beneficial if you're concerned about the stability of cloud services or prefer to maintain a more decentralized approach.
7. **Self-hosting and security**: Running an LLM locally allows you to implement additional security measures and ensure that your model is not exposed to external attacks or vulnerabilities.

However, it's worth noting that running a large language model on a laptop can be computationally intensive and may require significant resources (e.g., GPU power). If you're planning to run a large-scale LLM, cloud providers often offer more suitable solutions with optimized infrastructure and economies of scale.

And here’s the 1B parameter model response to the same question:

Running a Large Language Model (LLM) on a laptop can be beneficial for:

1. **Small projects or proof-of-concepts**: For smaller tasks, like experimenting with different models or hyperparameters, running an LLM locally might be sufficient.
2. **Research and development**: Researchers may need to run small-scale experiments or prototype their ideas without being tied to cloud infrastructure.
3. **Personal use cases**: Individuals who only occasionally work on language-related projects can benefit from local hosting, as it provides a cost-effective alternative to cloud providers.

However, running an LLM on a laptop has some limitations:

1. **Limited scalability**: As the model size or complexity increases, local hosting may become impractical due to computational constraints.
2. **Data storage and processing**: Running an LLM locally requires significant storage space for data and intermediate results, which can be challenging in laptops with limited disk capacity.
3. **Network latency and security concerns**: Local connections can introduce latency and compromise the model's security, especially if you're handling sensitive or proprietary data.

To overcome these limitations, researchers and developers often use:

1. **Cloud-based distributed training**: Training large models on a cloud provider, which allows for scalability and parallel processing.
2. **Serverless computing**: Using serverless platforms that provide a pay-as-you-go pricing model, enabling users to scale up or down as needed.
3. **Edge computing**: Deploying computations closer to the source of data, reducing latency and improving security.

Ultimately, whether running an LLM on a laptop or cloud provider depends on your specific needs, project requirements, and preferences.

Both are pretty good for such tiny models! The 3B-sized Llama does a better job, sticking to the question and providing numerous reasons. But this comes with a speed cost: on my Mac with 64GB of RAM, Llama 3B produces ~50 words-per-minute while Llama 1B produces ~113.

Figuring out which model is right for you takes experimentation. But I have a few recommendations to get you started.

Unless otherwise noted, you can download these models in Jan by copy-and-pasting the URL in the model name link.

For people with (at least) a Mac with 32GB of RAM or an NVIDIA RTX 4090:

Mistral-Small-24B-Instruct is a bit of a revelation. While there have been open GPT-4o class models that fit on laptops (Llama 3.3 comes to mind), Mistral-Small is the first one I’ve used whose speed is comparable as well. For the last week, I’ve been using it as my first stop – before Claude or ChatGPT – and it’s performed admirably. Rarely do I need to try a hosted model. It’s that good.

If you can run the “Q4” version (it’s the 13.35GB option in Jan, when you paste in the link), I strongly recommend you do. This is the model that inspired me to write this post. Local models are good enough for non-nerds to start using them.

Also good: Microsoft’s Phi-4 model is a bit smaller (~8GB when you select the “Q4” quantized one) but also excellent. It’s great at rephrasing, general knowledge, and light reasoning. It’s designed to be a one-shot model (basically, it outputs more detail for a single question and isn’t designed for follow-ups), and excels as a primer for most subjects.

For people who want to see a model “reason”:

Yeah, yeah, yeah…let’s get to DeepSeek.

DeepSeek is very good, but you’re not likely going to be able to run the original model on your machine. However, you likely can run one of the distilled models the DeepSeek team prepared. Distillation is a strategy to create lightweight versions of large language models that are both efficient and effective. A smaller model ‘learns’ by reviewing the output from a larger model.

In this case, DeepSeek R1 was used to train [Qwen 2.5], an excellent set of smaller models, how to reason. (DeepSeek also distilled Llama models) Getting this distilled model up and running in Jan requires an extra step, but the results are well worth it.

First, paste the DeepSeek-R1-Distill-Qwen14B model URL into Jan’s model hub search box. If you’re on Mac, grab the “Q4_1” version, provided you can run it. On Windows, grab the “Q4-K_M”. (If either of those are flagged as not being able to run on your machine, try the 7B version. It’s about ~5GB.)

Once the model downloads, click the “use” button and return to the chat window. Click the little slider icon next to the model name (marked in red, below). This toggles the thread’s settings. Toggle the “Model Settings” dropdown (marked in blue) so that the “Prompt template” is visible.

Accessing the thread's settings in Jan

Your prompt template won’t look like the one in the image above. But we’re going to change that.. Paste the following in the prompt template field, replacing its current contents:

{system_message}
<|User|>
{prompt}
<|end▁of▁sentence|>
<|Assistant|>

Think of prompt templates as wrappers that format the text you enter into a format the model expects. Templates can vary, model by model. Thankfully, Jan will take care of most of this for you if you stick to their model library.

This change we’ve made, though, will let you see this model “think”. For example, here’s how it replies to our question about why one might want to use local LLMs. The reasoning is bracketed by <think> and </think> tokens and prefaces the final answer.

Why Bother?

There are giant, leading LLMs available for cheap or free. Why should one bother with setting up a chatbot on their PC?

Llama 3.2 3B already answered this for us, but…

  1. It’s free: These models work with the PC you have and require no subscriptions. Your usage is only limited by the speed of the model.
  2. It’s 100% privacy-safe: None of your questions or answers leave your PC. Go ahead, turn off your WiFi and start prompting – everything works perfectly.
  3. It works offline: The first time I used a local model to help with a coding task while flying on an airplane without WiFi, it felt like magic. There’s something crazy about the amount of knowledge these models condense into a handful of gigabytes.
  4. It’s customizable: We only scratched the surface here by changing our prompt template. But unfold the “Inference Settings” tab and take a look at the levers waiting to be pulled. Discussing these all is beyond the scope of this article, but here’s a quick tip: the “Temperature” setting effectively controls how much randomness is added during inference. Try setting it to each extreme and see how it changes your responses.
  5. It’s educational: This is the main reason you should bother with local LLMs. Merely grabbing a few models and trying them out demystifies the field. This exercise is an antidote to the constant hype the AI industry fosters. By getting your hands just slightly dirty, you’ll start to understand the real-world trajectory of these things. And hey, maybe the next DeepSeek won’t be so surprising when it lands.

So much of the coverage around LLMs focuses on raising the ceiling: the improved capabilities of the largest models. But beneath this noise the floor is being raised. There’s been incredible progress: the capabilities of models you can run on moderately powered laptops perform as well as the largest models from this time last year. It’s time to try a local model.


Read the whole story
chrisamico
14 days ago
reply
Boston, MA
Share this story
Delete

The web is already multiplayer

1 Share

I’ve been working on editors again: tweaking CodeMirror to make it work better in Val Town. It’s really hard work, unfortunately the kind of hard work that seems like it should be easy. It has me thinking a bit about why front-end engineering is difficult.

In fancy terms, it’s multivariate and extended over time. Frontends, backends, browsers, and users are all variable, unpredictable actors.

There are no single-player web applications: the simplest model of frontend software is a user interacting with a webpage, but the user and the webpage have similar capabilities. Your frontend application can respond to and produce events. It can modify the webpage.

So can the user: they can modify webpages at any point. Maybe they load every page in a new session, breaking your assumptions about how a website will persist state. Or maybe they never create a new session - they have 80+ tabs open that never get closed, so they will continue using versions of your frontend application that you released months ago, which now break when talking to an updated backend server.

But those aren’t the only players. Browsers are active participants in the game: a browser might translate a webpage and modify the DOM in a way that makes React crash. It might disable an API that the application expects, like localStorage, in a way that makes its behavior unpredictable. Ad-blockers might prevent some of the application from loading.

Distribution is probably the single most compelling attribute of the web over native applications. There’s no app store, no mandated release process, no installation step. But just as you as the application developer have more power, so do browsers and users.

The interplay between these actors is what makes things difficult, and difficult in a way that’s underemphasized in both Computer Science dogma and hacker culture. It isn’t some algorithmic challenge or hard performance problem, or anything that can be solved by switching programming languages. It’s an understanding of uncertainty in a deployment target that is constantly shifting and has a lot of unknowns.

After working in this realm, it’s a treat to work on ‘hard’ problems that at least give you more certainty about the environment. Optimizing SQL queries, for example – it can be challenging, but at least I know that I’m writing SQL for Postgres 16. Frontend engineering can feel like optimizing a query and not knowing whether it’ll run in DuckDB, Postgres, or MySQL, and even whether you can trust that the database schema will still be the same when it runs.

It’s hard work. It’s not cool. And I am nowhere near mastering it. I totally get it when people have an aversion to it, or think that the methods used today are wild. But mostly the methods are wild because the problem is, too. A straight-line solution to the problems of the front-end would be lovely, but those problems don’t really permit one.

Read the whole story
chrisamico
14 days ago
reply
Boston, MA
Share this story
Delete
Next Page of Stories