Journalist/developer. Storytelling developer @ USA Today Network. Builder of @HomicideWatch. Sinophile for fun. Past: @frontlinepbs @WBUR, @NPR, @NewsHour.
2010 stories
·
45 followers

Back at school, Braintree special ed. student begs to go home

1 Share

“Yesterday, we just give the best shot we can,” Samantha’s mother, Alicja Frechon, a native Polish speaker, said Tuesday evening from her home in Braintree. “Today, she didn’t trust no one. Not them, not me.”

Samantha’s enrollment at the 1,000-student East Middle School, which marked her first time in a regular public school in six years, came on the orders of a Norfolk County Superior Court judge following the eighth-grader’s months-long absence from any type of schooling. That absence, Judge Catherine Ham ruled last week, had caused Samantha “irreparable harm.”

Samantha’s fate had landed in Ham’s hands following a prolonged dispute between Frechon and Braintree Public Schools about the appropriate setting for the special education student, the Globe previously reported. Samantha, who has autism and crippling anxiety, among other disabilities, had last year attended a private school that specialized in one-on-one instruction, an environment where she thrived, she and her mother said.

The cost of that school, Fusion Academy in Hingham, was paid for by the Braintree school district under a provision of federal special education law that requires districts to cover private tuition when they cannot provide an appropriate education in-house. Braintree, though, chose not to enroll Samantha at Fusion this school year. (She had been one of roughly 9,500 Massachusetts special education students who were enrolled in separate public or private day schools at public expense in 2022, while about 800 more were educated at residential facilities, federal data show.) The district, in the midst of a budget crisis, said it would not pay the tuition for the private school, saying the placement at the Hingham school, which does not employ special educators, was only temporary. Superintendent Jim Lee previously has said the district’s financial situation played no role in its educational decisions.

Frechon sought the court’s help, alleging in court documents that Braintree had violated the girl’s “stay put” rights, a key tenet of both federal and state special education law. Under the provision, special education students have the right to stay in their current placement while administrators, teachers, therapists, and their parents sort out a disagreement over a potential change.

But Ham, who at a previous court hearing said she was unfamiliar with the intricacies of special education law, instead ruled Samantha should return to a traditional public school setting, a learning environment the teen hadn’t been taught in since she was 8. After being out of school all year, Samantha went with her mother to the middle school on Monday.

The transition was grueling.

Pulling behind an idling school bus, Alicja Frechon turned her head toward the passenger seat, and asked her daughter if she was OK.

“Mm-mm,” Samantha murmured, her face pale from fear, staring blankly ahead, as John Mellencamp crooned “Ain’t That America” on the car radio.

Shortly after entering the school, Samantha, overwhelmed by the number of adults there to assist her, nearly fainted and required attention from the school nurse, Frechon said.

(A Globe reporter and photographer shadowed Samantha on her way to school, then waited with her mother at the house until the school day was done.)

Shortly after drop-off, while Frechon sat at her dining room table, the first call from the school came.

Samantha, switching back and forth between English and their native Polish, begged her mom to come get her, her voice growing more insistent with each desperate plea: “Please come get me. I want to go. Please come get me. ... I don’t feel safe here, and I want to leave. ... Please come get me. I’ve already been here long enough. They know I tried.”

It was the first of eight calls in three hours.

Frechon, dabbing tears from her eyes with a paper napkin, told the girl she couldn’t. “I cannot just go get you,” Frechon said. “You’re asking me something I can’t do.”

Frechon said she worried the district might portray her as not wanting Samantha in school or not complying with the judge’s order.

Frechon questioned whether the school could provide her daughter what she needs. Samantha spent nearly the entire day in an empty conference room with two behavioral therapists. She did not interact with other students.

At times during the calls home, East Middle School Assistant Principal Andrew Curran spoke with Frechon.

Curran, maintaining an upbeat tone to his voice, explained the school would be setting Samantha up with a Chromebook so she could start accessing schoolwork. Still, Samantha begged to go home.

“I did tell her, ‘Honey, I can’t. I can’t overrule, you know, your mom and the judges and the lawyers,’” he said. “We knew the first day, it was gonna be tough.”

But Tuesday was even worse.

Samantha, panicked with nerves, again required the school nurse’s attention shortly after she entered the building. Unwilling to remain at the school as her mother sought to leave, Samantha then went to the parking lot, where she refused to move away from Frechon’s car.

Frechon questioned whether it’d be better for her to leave so the school officials could bring Samantha back into the school.

Collins Fay-Martin, a special education attorney advising Frechon pro bono, soon arrived at the school, where, according to a video recorded by Fay-Martin and viewed by the Globe, a school official threatened to call the Department of Children and Families if Frechon were to leave the school while Samantha remained in the parking lot.

“Are you going to leave a child here who doesn’t want to come into the school?” the official said.

“You’re court-ordered to have her,” Fay-Martin responded.

Soon after, officers from the Braintree Police Department arrived, at the school’s request.

At one point, a school official started recording the incident on her phone, riling Frechon, who began yelling at the school officials.

Samantha fell to the ground, vomiting, saying she couldn’t breathe. Her mom called 911, prompting an ambulance to arrive and take the teen to the hospital.

“You’re damned if you do, damned if you don’t,” Fay-Martin said, referring to Frechon’s position — take her daughter, but run afoul of the judge’s ruling, or leave her daughter, and let her suffer.

Superintendent Lee did not respond to an email requesting comment about Samantha’s return to school and Tuesday’s events.


Mandy McLaren can be reached at mandy.mclaren@globe.com. Follow her @mandy_mclaren.

Read the whole story
chrisamico
20 hours ago
reply
Boston, MA
Share this story
Delete

No one buys books

1 Share

In 2022, Penguin Random House wanted to buy Simon & Schuster. The two publishing houses made up 37 percent and 11 percent of the market share, according to the filing, and combined they would have condensed the Big Five publishing houses into the Big Four. But the government intervened and brought an antitrust case against Penguin to determine whether that would create a monopoly. 

The judge ultimately ruled that the merger would create a monopoly and blocked the $2.2 billion purchase. But during the trial, the head of every major publishing house and literary agency got up on the stand to speak about the publishing industry and give numbers, giving us an eye-opening account of the industry from the inside. All of the transcripts from the trial were compiled into a book called The Trial. It took me a year to read, but I’ve finally summarized my findings and pulled out all the compelling highlights.

I think I can sum up what I’ve learned like this: The Big Five publishing houses spend most of their money on book advances for big celebrities like Brittany Spears and franchise authors like James Patterson and this is the bulk of their business. They also sell a lot of Bibles, repeat best sellers like Lord of the Rings, and children’s books like The Very Hungry Caterpillar. These two market categories (celebrity books and repeat bestsellers from the backlist) make up the entirety of the publishing industry and even fund their vanity project: publishing all the rest of the books we think about when we think about book publishing (which make no money at all and typically sell less than 1,000 copies).

But let’s dig into everything they said in detail.

In my essay “Writing books isn’t a good idea” I wrote that, in 2020, only 268 titles sold more than 100,000 copies, and 96 percent of books sold less than 1,000 copies. That’s still the vibe.

Q. Do you know approximately how many authors there are across the industry with 500,000 units or more during this four-year period?

A. My understanding is that it was about 50.

Q. 50 authors across the publishing industry who during this four-year period sold more than 500,000 units in a single year?

A. Yes.

, CEO, Penguin Random House US

The DOJ’s lawyer collected data on 58,000 titles published in a year and discovered that 90 percent of them sold fewer than 2,000 copies and 50 percent sold less than a dozen copies.

In my essay “No one will read your book,” I said that publishing houses work more like venture capitalists. They invest small sums in lots of books in hopes that one of them breaks out and becomes a unicorn, making enough money to fund all the rest.

Turns out, they agree!

Every year, in thousands of ideas and dreams, only a few make it to the top. So I call it the Silicon Valley of media. We are angel investors of our authors and their dreams, their stories. That’s how I call my editors and publishers: angels… It’s rather this idea of Silicon Valley, you see 35 percent are profitable; 50 on a contribution basis. So every book has that same likelihood of succeeding.

— Markus Dohle, CEO, Penguin Random House

Those unicorns happen every five to 10 years or so.

We’re very hit driven. When a book is successful, it can be wildly successful. There are books that sell millions and millions of copies, and those are financial gushes for the publishers of that book, sometimes for years to come… A gusher is once in a decade or something. For instance, I don’t know if you know the Twilight series of books? Hachette published the Twilight series of books, and those made hundreds of millions of dollars over the course of time.

Right now the novels of Colleen Hoover are topping the bestseller lists in really, really huge numbers and the publishers of those books are making a lot of money. You probably remember The Girl With the Dragon Tattoo… Or the Fifty Shades of Grey series. So once every five years, ten years, those come along for the whole industry and become the industry driver that’s drawing people into bookstores because there is such a commotion about them. 

— Michael Pietsch, CEO, Hachette

They spent a lot of the trial talking about books that made an advance of more than $250,000—they called these “anticipated top-sellers.” According to Nicholas Hill, a partner at Bates White Economic Consulting, 2 percent of all titles earn an advance over $250,000.

Publisher’s Marketplace says it’s even lower.

Top-selling authors were defined as those receiving advances (i.e., guaranteed money) in excess of $250,000. Far fewer than 1 percent of authors receive advances over that mark; Publishers Marketplace, which tracks these things, recorded 233 such deals in all of 2022.

, Publisher at Sutherland House

Hill says titles that earn advances over $250,000 account for 70 percent of advance spending by publishing houses. At Penguin Random House, it’s even more. The bulk of their advance spending goes to deals worth $1 million or more, and there are about 200 of those deals a year. Of the roughly $370 million they say PRH accounts for, $200 million of that goes to advance deals worth $1 million or more.

Most of those are deals with celebrities. And Penguin gets most of them.

Books by the Obamas sold so many copies they had to be removed from the charts as statistical anomalies.

There are giant celebrities Michelle Obama where you know it’s going to be a top seller.

— Jennifer Rudolph Walsch, Literary Agent

Because they are so lucrative, Gallery Books Group focuses its efforts on trying to get celebrities to write books.

75 percent [of our] acquisitions come from approaching celebrities, politicians, athletes, the “celebrity adjacent,” etc. That way, we can control the content…. We are approaching authors and celebrities and politicians and athletes for ideas. So it’s really we are on the look out. We are scouts in a lot of ways…

— Jennifer Bergstrom, SVP, Gallery Books Group

Bergstrom said her biggest celebrity sale was Amy Schumer who received millions of dollars for her advance.

We’ve had a lot of success publishing musicians, I mentioned Bruce Springsteen. We’ve also published Bob Dylan and Linda Ronstadt, a lot of entertainers through the years… There was a political writer, Ben Shapiro, who has a very popular podcast and a large following. We also competed with HarperCollins for that.

— Jonathan Karp, CEO, Simon & Schuster

Penguin Random House US has guidelines for who gets what advance:

  • Category 1: Lead titles with a sales goal of 75,000 units and up

    • Advance: $500,000 and up

  • Category 2: Titles with a sales goal of 25,000-75,000 units

    • Advance: $150,000-$500,000

  • Category 3: Titles with a sales goal of 10,000-25,000 units

    • Advance: $50,000- $150,000

  • Category 4: Titles with a sales goal of 5,000 to 10,000 units

    • Advance: $50,000 or less

Is anyone else alarmed that the top tier is book sales of 75,000 units and up? One post on Substack could get more views than that…..

Franchise authors are the other big category. Walsch says James Patterson and John Grisham get advances in the “many millions.” Putnam makes most of its money from repeat authors like John Sandford, Clive Cussler, Tom Clancy, Lisa Scottoline, and others.

Q. Putnam typically publihses about 60 books a year. Correct?

A. 60, 65, sort of on naverage… I will say of those 65, though, a good portion of those are repeat authors… franchise authors that we regularly publish every year, sometimes twice a year.

— Sally Kim, SVP and Publisher, Putnam

The advantage of publishing celebrity books is that they have a built-in audience.

In some of the cases, the reason they are paying big money is because the person has a big platform. And if that platform is there for the advertising, then the spend might be lower.

— Jennifer Rudolph Walsh, former Agent

Macmillan agrees.

Q. Would you agree that those type of authors, meaning the ones with the built-in audience, are also authors who would command a high advance if they went to a traditional publisher like Macmillan or PRH? 

A. That’s a broad brush. But, yes…

Q. And you’re willing to pay more if they have a significant following? 

A. Yes.

— Donald Weisberg, CEO, Macmillan Publishers

They give some examples:

The Butcher and the Wren… this particular author has a big following, and with a single post on Instagram, she presold over 40,000 books. So, I mean, that’s just staggering from a per copy perspective, and it pretty much guarantees a number one spot on the New York [Times] best seller list when it’s published in September.

— Jennifer Rudolph Walsh, former Agent

These big advances, the authors have quite a bit of their own infrastructure with them. They have their own publicists. They have their own social media people. They have their own newsletters. So they actually are able—we are able to offload a good amount of the work, not all the time, but that is actually a factor in why we sometimes pay these big advances, because the authors are actually capable of helping us a lot.

— Jonathan Karp, CEO, Simon & Schuster

For example:

Q. Who is the best selling Simon & Schuster author currently? 
A. Right now it’s Colleen Hoover. 
Q. Does she have the highest marketing budget that Simon & Schuster pays? 
A. No. 
Q. Why is that? 
A. She’s the queen of TikTok, and so she has a huge following on TikTok.

— Jonathan Karp, CEO, Simon & Schuster

Related:

[One author wrote] paranormal, so it’s sexy vampires. This book was probably her 21st book. So she’s what I would call a franchise author. She’s very established. Though we spent $1.2 million on the book, we spent about $62,000 on the marketing and publicity because she had such an established fan base…

[Another author is] a celebrity-adjacent author, but also her platform was on social media. So we paid $450,000 for her book, and we spent $36,000 on the marketing and publicity. We didn’t need to spend more than that because she already booked at that point on Good Morning America, The Today Show. So publicity drove that, and that didn’t cost us.

— Jennifer Bergstrom, SVP, Gallery Books Group

Just goes to show that the main thing an author gets from a publishing house is an advance!

Every second book in America, ballpark, is being sold via <a href="http://e-commerce%E2%80%A6Amazon.com" rel="nofollow">e-commerce…Amazon.com</a> has 50 million books available. A bookstore, a good independent bookstore, has around 50,000 different books available… an algorithm decides what is being presented and made visible and discoverable for an end consumer online. It makes a huge difference.

— Markus Dohle, CEO, Penguin Random House

Publishing houses try to game the algorithm and even pay to get ahead of it.

Q.  Penguin Random House has hired data scientists to try and figure out these algorithms so that its books get better presented on Amazon than its competitors’ books? 

A. One of the many efforts that we pursue, correct.

Q. And Penguin Random House pays Amazon to improve its search results? 

A. There is something that is available to our publishers, it’s called Amazon Marketing Services, AMS, and all publishers can spend money and give it to Amazon to have hopefully better search results.

— Markus Dohle, CEO, Penguin Random House

, president of Ayesha Pande Literary, says that 20 percent of her authors earn out their advance—if she’s being generous.

The single most important contract term is the advance…Because in a large number of cases, it may be the only compensation that the author will receive for their work.

— Ayesha Pande, President, Ayesha Pande Literary

Even celebrity books flop.

There are plenty of books that we spend $1 million on the advance and published them last year and they did not even make the top 1,000 on BookScan… Less than 45 percent of those books [that we spend a million dollars on] end up on that thousand best seller list.

— Madeline Mcintosh, CEO, Penguin Random House US

Just because the publisher pays $250,000 or $500,000 or $1 million for a book does not guarantee that a single person is going to buy it. A lot of what we do is unknowable and based on inspiration and optimism.”

— Michael Pietsch, CEO, Hachette

Even celebrities, though sometimes you think it’s going to be a big best seller, it flops. It happens…  I mean, Andrew Cuomo’s book was sold at the height of his being America’s governor during the COVID crisis. I mean, that book was sold for $5 million, I believe. I don’t know for a fact. But by the time it came out, the nursing home scandal had happened, the Me Too issues, and the book didn’t do any business.

Sometimes it’s just a timing issue, like Marie Kondo. She did a book about Joy at Work, about making your office sparked with joy because it’s not cluttered. It published in March of 2020.

— Jennifer Rudolph Walsch, Literary Agent

Having a lot of social media followers or fame doesn’t guarantee it will sell. The singer Billie Eilish, despite her 97 million Instagram followers and 6 million Twitter followers, sold only 64,000 copies within eight months of publishing her book. The singer Justin Timberlake sold only 100,000 copies in the three years after he published his book. Snoop Dog’s cookbook saw a boost during the pandemic, but he still only sold 205,000 copies in 2020.

Here’s a few more:

Representative Ilhan Omar, a Democrat from Minnesota, is no global pop star, but she has a significant social-media presence, with 3 million Twitter followers and another 1.3 million on Instagram. Yet her book, This Is What America Looks Like: My Journey from Refugee to Congresswoman, which was published in May 2020, has sold just 26,000 copies across print, audio and e-book formats, according to her publisher.

Tamika D. Mallory, a social activist with over a million Instagram followers, was paid over $1 million for a two-book deal. But her first book, State of Emergency, has sold just 26,000 print copies since it was published in May, according to BookScan.

The journalist and media personality Piers Morgan had a weaker showing in the United States. Despite his followers on Twitter (8 million) and Instagram (1.8 million), Wake Up: Why the World Has Gone Nuts has sold just 5,650 U.S. print copies since it was published a year ago, according to BookScan.

The New York Times

It’s pretty common.

The worst day of a life of an agent and an author is when they’ve gotten a large advance and you go on BookScan and you see their first few months’ of sales and it says 4,000 copies or something like that. It happens. It happens more than any of us would like.

— Gail Ross, Literary Agent

If I look at the top 10 percent of books… that 10 percent level gets you to about 300,000 copies sold in that year. And if you told me I’m definitely going to sell 300,000 copies in a year, I would spend many millions of dollars to get that book.

— Madeline Mcintosh, CEO, Penguin Random House US

Publishing houses pay millions of dollars for a book that sells only 300,000 copies??? Well, because books don’t sell a lot of copies, they don’t make a lot of money. According to Hill, 85 percent of the books with advances of $250,000 and up never earn out their advance. (Meaning the royalties earned never covered the cost of the advance.)

Very, very frequently, the winning bid in our calculation is a money loser.

— Michael Pietsch, CEO, Hachette

Markus Dohle, CEO, Penguin Random House, says the top 4 percent of titles drive 60 percent of the profitability. That goes for the rest of them too:

It would be just a couple of books in every hundred are driving that degree of profit… twoish books account for the lion’s share of profitability.

— Madeline Mcintosh, CEO, Penguin Random House US

Around half the books we publish make a profit of some kind.

— Michael Pietsch, CEO, Hachette

About half of the books we publish make money, and a much lower percentage of them earn back the advance we pay.

— Jonathan Karp, CEO, Simon & Schuster

Many publishers have realized that maybe those big advances aren’t worth it.

We have a report that we colloquially call ‘The Ones That Got Away.’ And it’s a report on the books where we bid $500,000 or more as an advance and did not succeed in acquiring the book… this report stands as a kind of caution against the high risk of big advances because the lesson we take away again and again is: Thank goodness we stopped bidding when we did because even at the advance we offered, we would have lost money… Very frequently, the winning bid in our calculation is a money loser.  

— Michael Pietsch, CEO, Hachette

If new books typically don’t sell well, well that’s why publishing houses make their revenue from their backlist.

I would actually expect a book that is selling 300,000 units in a year is probably going to sell at least 400,000 or 500,000 over its life once you get backlist in there too.

Our backlist brings in about a third of our annual revenues, so $300 million a year roughly, a little less.

— Michael Pietsch, CEO, Hachette

The backlist includes all of the books that have ever come out. Brian Murray, CEO of HarperCollins, points out that their backlist includes bibles (an $80 million business), coloring books, dictionaries, encyclopedias, magic trick books, calendars, puzzles, and SAT study guides. It also includes perennial bestsellers like Don Quijote, Steven King’s Carrie, and Tolkien’s Lord of the Rings—these books continue to sell year after year.

Popular children’s books are cash cows selling huge amounts of copies year after year and generation after generation.

Sometimes children’s books will be three generations, people have been buying them over and over again, and so that backlist catalog is really, really important to pay for the overhead of your publishing teams and then also to take the risks on the new books. So without a backlist I think it’s very hard to compete with these big books.

— Brian Murray, CEO, HarperCollins

For instance, Penguin Random House owns Eric Carle’s Very Hungry Caterpillar intellectual property. The book has been on Publisher Weekly’s bestseller list every week for 19 years.

Children’s books comprised 27 percent of PRH’s sales in 2021. That’s about $725 million—so roughly double the size of Scholastic’s trade division, and more or less equal on its own to all of Macmillan or HBG. Christian books accounted for 2 percent.

The Trial

Backlist titles like The Bible and Very Hungry Caterpillar and Lord of the Rings make up a disproportionately large percentage of the publishing industry.

Q. Are you concerned that Amazon will favor Penguin Random House Simon & Schuster in terms of promotion and distribution and discoverability? 

A. Yes.

— Donald Weisberg, CEO, Macmillan Publishers

With Amazon’s data, they could immediately beat out all the publishing houses if they wanted to.

I think Amazon as a publisher of books is underestimated. They have about 50 editors… Obviously, given the number of people searching on Amazon for products, that gives them a huge advantage because when people go onto Amazon, they—if the book isn’t there for what they are searching for, they could create that book. That’s one theory I have. But even if that doesn’t happen, they know what people are buying and they have access to that data. Their bestseller list, in my view, is more important than The New York Times best seller list because it’s in realtime. It’s hourly. And I look at that Amazon best seller list regularly, every day.

— Jonathan Karp, CEO, Simon & Schuster

Wouldn’t it be great if you could pay $9.99 a month and read all of the books you want? Just like you get all the movies you want from Netflix? Or all the music you want from Spotify?

Technically, it does exist. Kindle Unlimited is the largest, followed by Scribd. Audible isn’t quite all-access, but then Spotify got into audiobooks and made them so. But none of these players have quite taken off the way Netflix or Spotify has. That’s for one reason: The Big Five publishing houses refuse to let their authors participate. 

Q. No books are found on Kindle Unlimited? Because you think that’ll be had for the industry?”

A. We think it’s going to destroy the publishing industry.

— Markus Dohle, CEO, Penguin Publishing House

He’s right. No one would purchase a book again.

We all know about Netflix, we all know about Spotify and other media categories, and we also know what it has done to some industries… The music industry has lost, in the digital transformation, approximately 50 percent of its overall revenue pool.

— Markus Dohle, CEO, Penguin Publishing House

There’s one reason.

Around 20 to 25 percent of the readers, the heavy readers, account for 80 percent of the revenue pool of the industry of what consumers spend on books. It’s the really dedicated readers. If they got all-access, the revenue pool of the industry is going to be very small. Physical retail will be gone—see music—within two to three years. And we will be dependent on a few Silicon Valley or Swedish internet companies that will actually provide all-access.

— Markus Dohle, CEO, Penguin Publishing House

The publishing industry would die, that’s for sure. But I’d be willing to bet writers would get their books read way more.

And I think it’s on its way. Spotify has already started publishing audiobooks, and my money is on Substack for eventually publishing written books!

If publishing houses make minimal investment in marketing their authors and focus largely on celebrity books and their backlist, authors who can’t snag a large advance might have better luck building their own audience and publishing elsewhere.

I think really from the advent of online—really, once the internet became popular, you know, we heard the phrase disintermediation. And I don’t understand why that wouldn’t be a possible prospect for any best selling author, to just disintermediate, to go straight to the internet and sell directly if you have a following… Colleen Hoover has published with both Amazon and Simon & Schuster. And her Amazon book was on the independent book sellers’ best seller list. So what that says to me is that a Rubicon has been crossed.

— Jonathan Karp, CEO, Simon & Schuster

The romance category has already gone independent.

Many of those heavy readers of romance novels at that time switched to self-published stories. A very different price point. 99 cents, $1.99, away from what we call mass-market trade paperbacks… The mass-market trade paperback is the sort of small-format mass-market book, like it is a trade paperback, but a smaller format. It has been declining for the last 25 years. But we had a step change around ’14, ‘15, with this trend that so many consumers went away from mass-market books into electronic ebooks in particular and self-published books.”

— Markus Dohle, CEO, Penguin Random House

Gallery author moved to self-publishing (though Todd began her career writing on Wattpad, and recently returned to set up an imprint at Wattpad Books).

— Jennifer Bergstrom, SVP, Gallery Books Group

And of course, we have to talk about Kickstarter MVP Brandon Sanderson.

There is a New York Times best selling author in the science fiction and fantasy category. His name is Brandon Sanderson. I believe he’s published by both Macmillan and Penguin Random House. He went onto Kickstarter and announced that he would be offering four of his novels to anybody who wanted them if they wanted to donate to Kickstarter. And he raised over $42 million…

I have subsequently become aware of Good Night Stories for Rebel Girls, which is a series of books. It’s now actually become a whole company. And these are stories to give young girls confidence. And it’s been very successful, and it’s actually resulted in an entire company.

— Jonathan Karp, CEO, Simon & Schuster

After the Judge denied the merger, Penguin went through a massive round of layoffs and Simon & Schuster was sold to a private equity company instead. 

Private equity tends to have one game plan: buy a company, load it with debt, wring out costs to improve its financials, sell at a profit. Dealing Simon & Schuster to private equity, The New Republic warned at the time with some slight hyperbole of its own, would mean “absolute devastation and wholesale job loss.”

The publishing houses may live to see another day, but I don’t think their model is long for this world. Unless you are a celebrity or franchise author, the publishing model won’t provide a whole lot more than a tiny advance and a dozen readers. If you are a celebrity, you’ll still have a much bigger reach on Instagram than you will with your book!

Personally, I could not be more grateful to skip the publishing houses altogether and write directly for my readers here, being supported by those who read this newsletter rather than by a publishing advance that won’t ultimately translate to people reading my work.

But I’d love to know your thoughts 👇🏻

Leave a comment

Thank you for reading and being here,

P.S. If you enjoyed this post please consider sharing it. That’s how I meet new people and earn a living as a writer! ✨

Share

Read the whole story
chrisamico
21 hours ago
reply
Boston, MA
Share this story
Delete

Colorado Sun politics reporter kicked out of GOP state assembly

1 Share
Read the whole story
chrisamico
16 days ago
reply
Boston, MA
Share this story
Delete

Israel must be held to account for the targeting and killing of journalists

1 Share

Protest in Tel Aviv against the Netanyahu government last June. Photo (cc) 2023 by RG TLV.

CNN media reporter Oliver Darcy wrote an important analysis last week about journalists who have been killed by Israeli forces in the the Gaza war. Citing figures from the Committee to Protect Journalists, Darcy observes that at least 95 journalists have been killed since Hamas’ terrorist attack on Israel last Oct. 7, and that all but five of those journalists are Palestinian — the highest death toll for members of the press since CPJ began tracking such casualties in 1992.

In addition to deaths that might be attributed to the fog of war, there have also been killings that Israel carried out despite what appear to be clear indications that it was targeting media workers. Darcy writes that the United Nations recently finished a report showing that Reuters journalist Issam Abdallah had been killed in southern Lebanon after a tank fired at a group of “clearly identified journalists.” Israeli officials responded to the U.N. that it “does not deliberately shoot at civilians, including journalists.”

In addition, The Washington Post last week found that a Jan. 7 missile attack resulting in the deaths of two Al Jazeera journalists and two freelancers in southern Gaza may have lacked any military justification. The Israeli military claimed it had “identified and struck a terrorist who operated an aircraft that posed a threat to IDF troops” — but the Post found that the “aircraft” was a drone apparently being used for reporting purposes.

Darcy includes accounts of Palestinian journalists who have alleged been abused by Israeli forces as well — a topic that is the subject of a new report from CPJ, which “found multiple kinds of incidents of journalists being targeted while carrying out their work in Israel and the two Palestinian territories, Gaza and the West Bank” as well as the deaths of journalists’ families.

CPJ has posted an open letter signed by 36 leaders of top U.S. and international news organizations calling Israel to end its attacks on journalists. Among the Americans the letter are Julie Pace, the executive editor of The Associated Press; Mark Thompson, the chair and CEO of CNN; A.G. Sulzberger, the publisher of The New York Times; Sally Buzbee, the executive editor of The Washington Post; Kim Godwin, the president of ABC News; and Rebecca Blumenstein, the president of editorial at NBC News. Significantly, the international news leaders signing the letter include Aluf Benn, the editor-in-chief of the Israeli newspaper Haaretz. The letter includes this:

Journalists are civilians and Israeli authorities must protect journalists as noncombatants according to international law. Those responsible for any violations of that longstanding protection should be held accountable. Attacks on journalists are also attacks on truth. We commit to championing the safety of journalists in Gaza, which is fundamental for the protection of press freedom everywhere.

This weekend, as NPR reports, tens of thousands of Israelis demonstrated against the government of Prime Minister Benjamin Netananyu, calling for a deal with Hamas to release the more than 100 hostages the terrorist group is still believed to be holding.

The horrendous situation in the Middle East began with Hamas’ attacks, claiming some 1,200 lives and leading to Israel’s invasion of Gaza, which have killed more than 30,000 people, mostly civilians. Starvation looms. President Biden has been ever-so-slowly been backing away from the Netanyahu government, allowing a U.N. Security Council resolution calling for a cease-fire and the release of the hostages to take effect.

Israel’s targeting of media workers is a small part of a much larger picture — a horrendous problem that would seem to have no good solution. But let’s start with this: Journalists are the world’s eyes and ears. They need to be able to tell us what is taking place on the ground without fear of being killed.

Leave a comment | Read comments

Read the whole story
chrisamico
22 days ago
reply
Boston, MA
Share this story
Delete

Running OCR against PDFs and images directly in your browser

1 Share

30th March 2024

I attended the Story Discovery At Scale data journalism conference at Stanford this week. One of the perennial hot topics at any journalism conference concerns data extraction: how can we best get data out of PDFs and images?

I’ve been having some very promising results with Gemini Pro 1.5, Claude 3 and GPT-4 Vision recently—I’ll write more about that soon. But those tools are still inconvenient for most people to use.

Meanwhile, older tools like Tesseract OCR are still extremely useful—if only they were easier to use as well.

Then I remembered that Tesseract runs happily in a browser these days thanks to the excellent Tesseract.js project. And PDFs can be processed using JavaScript too thanks to Mozilla’s extremely mature and well-tested PDF.js library.

So I built a new tool!

tools.simonwillison.net/ocr provides a single page web app that can run Tesseract OCR against images or PDFs that are opened in (or dragged and dropped onto) the app.

Crucially, everything runs in the browser. There is no server component here, and nothing is uploaded. Your images and documents never leave your computer or phone.

Here’s an animated demo:

First an image file is dragged onto the page, which then shows that image and accompanying OCR text. Then the drop zone is clicked and a PDF file is selected - that PDF is rendered a page at a time down the page with OCR text displayed beneath each page.

It’s not perfect: multi-column PDFs (thanks, academia) will be treated as a single column, illustrations or photos may result in garbled ASCII-art and there are plenty of other edge cases that will trip it up.

But... having Tesseract OCR available against PDFs in a web browser (including in Mobile Safari) is still a really useful thing.

How I built this #

For more recent examples of projects I’ve built with the assistance of LLMs, see Building and testing C extensions for SQLite with ChatGPT Code Interpreter and Claude and ChatGPT for ad-hoc sidequests.

I built the first version of this tool in just a few minutes, using Claude 3 Opus.

I already had my own JavaScript code lying around for the two most important tasks: running Tesseract.js against an images and using PDF.js to turn a PDF into a series of images.

The OCR code came from the system I built and explained in How I make annotated presentations (built with the help of multiple ChatGPT sessions). The PDF to images code was from an unfinished experiment which I wrote with the aid of Claude 3 Opus a week ago.

I composed the following prompt for Claude 3, where I pasted in both of my code examples and then added some instructions about what I wanted it to build at the end:

This code shows how to open a PDF and turn it into an image per page:

<!DOCTYPE html>
<html>
<head>
  <title>PDF to Images</title>
  <script src="<a href="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.9.359/pdf.min.js" rel="nofollow">https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.9.359/pdf.min.js</a>"></script>
  <style>
    .image-container img {
      margin-bottom: 10px;
    }
    .image-container p {
      margin: 0;
      font-size: 14px;
      color: #888;
    }
  </style>
</head>
<body>
  <input type="file" id="fileInput" accept=".pdf" />
  <div class="image-container"></div>

  <script>
  const desiredWidth = 800;
    const fileInput = document.getElementById('fileInput');
    const imageContainer = document.querySelector('.image-container');

    fileInput.addEventListener('change', handleFileUpload);

    pdfjsLib.GlobalWorkerOptions.workerSrc = 'https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.9.359/pdf.worker.min.js';

    async function handleFileUpload(event) {
      const file = event.target.files[0];
      const imageIterator = convertPDFToImages(file);

      for await (const { imageURL, size } of imageIterator) {
        const imgElement = document.createElement('img');
        imgElement.src = imageURL;
        imageContainer.appendChild(imgElement);

        const sizeElement = document.createElement('p');
        sizeElement.textContent = `Size: ${formatSize(size)}`;
        imageContainer.appendChild(sizeElement);
      }
    }

    async function* convertPDFToImages(file) {
      try {
        const pdf = await pdfjsLib.getDocument(URL.createObjectURL(file)).promise;
        const numPages = pdf.numPages;

        for (let i = 1; i <= numPages; i++) {
          const page = await pdf.getPage(i);
          const viewport = page.getViewport({ scale: 1 });
          const canvas = document.createElement('canvas');
          const context = canvas.getContext('2d');
          canvas.width = desiredWidth;
          canvas.height = (desiredWidth / viewport.width) * viewport.height;
          const renderContext = {
            canvasContext: context,
            viewport: page.getViewport({ scale: desiredWidth / viewport.width }),
          };
          await page.render(renderContext).promise;
          const imageURL = canvas.toDataURL('image/jpeg', 0.8);
          const size = calculateSize(imageURL);
          yield { imageURL, size };
        }
      } catch (error) {
        console.error('Error:', error);
      }
    }

    function calculateSize(imageURL) {
      const base64Length = imageURL.length - 'data:image/jpeg;base64,'.length;
      const sizeInBytes = Math.ceil(base64Length * 0.75);
      return sizeInBytes;
    }

    function formatSize(size) {
      const sizeInKB = (size / 1024).toFixed(2);
      return `${sizeInKB} KB`;
    }
  </script>
</body>
</html>

This code shows how to OCR an image:

async function ocrMissingAltText() {
    // Load Tesseract
    var s = document.createElement("script");
    s.src = "https://unpkg.com/tesseract.js@v2.1.0/dist/tesseract.min.js";
    document.head.appendChild(s);

    s.onload = async () => {
      const images = document.getElementsByTagName("img");
      const worker = Tesseract.createWorker();
      await worker.load();
      await worker.loadLanguage("eng");
      await worker.initialize("eng");
      ocrButton.innerText = "Running OCR...";

      // Iterate through all the images in the output div
      for (const img of images) {
        const altTextarea = img.parentNode.querySelector(".textarea-alt");
        // Check if the alt textarea is empty
        if (altTextarea.value === "") {
          const imageUrl = img.src;
          var {
            data: { text },
          } = await worker.recognize(imageUrl);
          altTextarea.value = text; // Set the OCR result to the alt textarea
          progressBar.value += 1;
        }
      }

      await worker.terminate();
      ocrButton.innerText = "OCR complete";
    };
  }

Use these examples to put together a single HTML page with embedded HTML and CSS and JavaScript that provides a big square which users can drag and drop a PDF file onto and when they do that the PDF has every page converted to a JPEG and shown below on the page, then OCR is run with tesseract and the results are shown in textarea blocks below each image.

I saved this prompt to a prompt.txt file and ran it using my llm-claude-3 plugin for LLM:

llm -m claude-3-opus < prompt.txt

It gave me a working initial version on the first attempt!

A square dotted border around the text Drag and drop PDF file here

Here’s the full transcript, including my follow-up prompts and their responses. Iterating on software in this way is so much fun.

First follow-up:

Modify this to also have a file input that can be used—dropping a file onto the drop area fills that input

make the drop zone 100% wide but have a 2em padding on the body. it should be 10em high. it should turn pink when an image is dragged over it.

Each textarea should be 100% wide and 10em high

At the very bottom of the page add a h2 that says Full document—then a 30em high textarea with all of the page text in it separated by two newlines

Here’s the interactive result.

A PDF file is dragged over the box and it turned pink. The heading Full document displays below

And then:

get rid of the code that shows image sizes. Set the placeholder on each textarea to be Processing... and clear that placeholder when the job is done.

Which gave me this.

I noticed that it didn’t demo well on a phone, because you can’t drag and drop files in a mobile browser. So I fired up ChatGPT (for no reason other than curiosity to see how well it did) and got GPT-4 to add a file input feature for me. I pasted in the code so far and added:

Modify this so jpg and png and gif images can be dropped or opened too—they skip the PDF step and get appended to the page and OCRd directly. Also move the full document heading and textarea above the page preview and hide it u til there is data to be shown in it

Then I spotted that the Tesseract worker was being created multiple times in a loop, which is inefficient—so I prompted:

Create the worker once and use it for all OCR tasks and terminate it at the end

I’d tweaked the HTML and CSS a little before feeding it to GPT-4, so now the site had a title and rendered in Helvetica.

Here’s the version GPT-4 produced for me.

A heading reads OCR a PDF or Image - This tool runs entirely in your browser. No files are uploaded to a server. The dotted box now contains text that reads Drag and drop a PDF, JPG, PNG, or GIF file here or click to select a file

Rather delightfully it used the neater pattern where the file input itself is hidden but can be triggered by clicking on the large drop zone, and it updated the copy on the drop zone to reflect that—without me suggesting those requirements.

Manual finishing touches #

Fun though it was iterating on this project entirely through prompting, I decided it would be more productive to make the finishing touches myself. You can see those in the commit history. They’re not particularly interesting:

  • I added Plausible analytics (which I like because they use no cookies).
  • I added better progress indicators, including the text that shows how many pages of the PDF have been processed so far.
  • I bumped up the width of the rendered PDF page images from 800 to 1000. This seemed to improve OCR quality—in particular, the Claude 3 model card PDF now has less OCR errors than it did before.
  • I upgraded both Tesseract.js and PDF.js to the most recent versions. Unsurprisingly, Claude 3 Opus had used older versions of both libraries.

I’m really pleased with this project. I consider it finished—it does the job I designed it to do and I don’t see any need to keep on iterating on it. And because it’s all static JavaScript and WebAssembly I expect it to continue working effectively forever.

Read the whole story
chrisamico
23 days ago
reply
Boston, MA
Share this story
Delete

Eclipse 2024 - Andy Woodruff, cartographer

1 Share

Eclipse and elevation data by NASA. Base map data and land cover colors from Natural Earth/Tom Patterson.

Read the whole story
chrisamico
29 days ago
reply
Boston, MA
Share this story
Delete
Next Page of Stories