One of the things I’ve seen while working on and studying the 4-day week movement is that different topics get hot for a while. This season, it’s been the intersection of AI and the 4-day week– and in particular, whether AI will accelerate adoption of a shorter workweek.

When you work with enough 4-day week trials, you see that details matter: the particular ways organizations design their time, use technology, automate repetitive tasks, etc., are all important. AI is a very broad category of technology, and it never seemed to me that it would inevitably lead to a shorter week, or deskill workers, or throw people out of work; instead, the impacts would depend on how the technology is deployed, who has power over it, and who gets to decide how the benefits are distributed between labor and capital.

As a result, it felt necessary for me to actually work with AI: train some chatbots, understand how much work it takes to make them, see what they can do and can’t do, and have a better sense of their limits and possibilities. I haven’t done this once: I’m currently running four different tools, for different purposes and different audiences.

Why so many?

First of all, what have I been using these systems for– and why in the world would I have multiple systems?

The simple answer to the second question is that it’s a chance to learn about different products, and because no system does everything I might need.

Mem.ai: The Research and Writing Assistant

I use Mem.ai as a research tool to support my work on the 4-day week. Mem is like an AI-forward version of Evernote or Notion: you have lots of relatively short documents (a couple thousand pieces related to the 4-day week, in my case), which you can tag or organize into collections, and then the AI assistant can help you with various tasks related to it.

For me, Mem’s greatest strength is as a writing assistant. I use it to generate first drafts of company case studies; when I need 500 words on the 4-day week in a particular industry or country; or when I have some other writing task that is well-defined, and where there are plenty of existing examples that the AI has already been trained on.

Another significant convenience for me is that it has excellent Zapier integration, and so I can continue to use Evernote for basic note-taking, and having those new notes automagically added to Mem.

This is definitely labor-saving for me, but there are two important caveats. First, these are definitely DRAFTS, not ready-to-publish work. If you want a piece that opens with an interesting anecdote that will catch the reader, or has a well-placed illustrative quote, you’ll need to add those yourself; the AI isn’t going to. Second, I control this technology, and therefore I can use it to assit my work and augment my productivity. I can imagine a boss with lower editorial standards looking at this tool and thinking, “I can fire my whole staff.”

For me, the one downside of Mem is that there’s no feature to let other people query my assistant: I can’t, for example, let journalists “interview” it, or let clients ask it questions. This is what led me to my second system.

Coachvox.ai: The Consultant

Coachvox.ai is a tool for building an online AI coach or trainer. It’s very conversation-forward, is meant to help you automate a coaching service, turn an advice book into an interactive tool, or create other sorts of human-like converations. So unlike Mem.ai, Coachvox AI is meant to be a public tool that leverages / extends your presence in the world, or in the lives of your clients.

It also differs in that the training set consists of “prompts and completions,” which is a fancy word for questions and answers. This reflects its therapeutic / coaching origins, I think. Fortunately, a lot of what I do involves answering client’s questions, and between information sessions, AMAs, notes from calls and coaching, etc., I have a LOT of questions and answers.

Coachvox.ai has proved to really good at answering 95% of clients’ questions, and because it stores transcripts of its exchanges, I can see what questions it gets and what answers it gives– and use that material to further refine the system by editing answers to make them sharper or or more accurate (essentially telling the system how to do a better job).

If Mem.ai is trying to be an assistant and thought partner, Coachvox.ai is trying to be a professional. But for me, there’s one place where it falls short: I can only use it for clients, not the general public. You can’t specify a piece of the training set and say, “These are prompts and completions from publicly-available sources; use only these when answering questions from the general public,” and categorize a second training set as “for use with clients only.” And this illustrates one of the ways in which the “AI as professional” model falls short. Being a professional isn’t just about having a store of expert knowledge: one of things you learn to do as a professionals is to answer questions from different audiences. If you’re a pediatrian, you don’t talk to your 10-year old patient the same way you’d talk to her parents or to a colleague; for each audience you’d have a mental model of the person you’re speaking to, and adjust your conversation accordingly. This is not incidental to professionalism; it’s a key part of it. But the current version of Crowdvox has a pretty “flat” model of what it means to be a professional, which limits what the system does.

Dante.ai: The Free Chatbot

This creates the need for my third system: a chatbot that can answer questions about the 4-day week for the general public. That’s where the Dante AI tool comes in. It’s not as powerful, the maximum training set size is smaller, but that’s okay. (Honestly, someone with a little patience could navigate a FAQ and learn all the same things, but then you wouldn’t have the cool experience of talking to an AI.)

NotebookLM: The Research Notebook

The fourth system is the most experimental: it’s on Google’s NotebookLM, and is supporting a new book I’m writing. I was intrigued with NotebookLM when it first came out, but the limitations on the training set seemed, well, limiting; but after hearing Steven Johnson talk about it, and discovering that it now allows for a lot of content, it felt like it was worth a try. It’s also designed specifically to be more like a research assistant / notebook that’s come to life (sort of like in Harry Potter, though that’s more a cautionary tale given who turned out to be inhabiting that journal!), which for me in research mode felt promising.

The main downside of NotebookLM is that while your documents can be quite big, you can only have 50 of them; and so I have to concatenate my hundreds of Evernote notes into some bigger documents. This also means that it doesn’t update continuously, the way Mem.ai can.

You might ask yourself (and I did), “Why not use Mem for this project too? You’ve already got the Evernote integration set up, etc.” Simple answer: This is a totally different project, and it wasn’t clear to me that even with good tagging etc., Mem could “keep straight” the two. I could be underestimating the power of the system, but NotebookLM is also free, so the costs of experimentation were just my time, and figuring out the workflow for importing and updating content.

So Which Is Best?

If all this sounds like a complicated patchwork of different abilities, implementations, technical limits– you’re exactly right! Each of these systems does somewhat different things, has strengths and weaknesses, and requires you to think about what you need it for. It’s not like comparing 4-door cars from Toyota, Hyundai, and Ford; it’s more like comparing a Dodge Ram, a Fiat Spider, and a bulldozer. None of them is a bad vehicle; they’re just very very different.

What I’ve Learned

From all this experience, it’s clear that there are some things that anyone who wants to use this technology needs to think about.

Be Prepared to Think a Lot About Your Work

One of the things you have to do when you build an AI system is think about how you work: what specific steps do you take to complete a task that you want to automate, or what knowledge do you draw on to answer a question that you want to train a system to answer, too.

This is not at all a bad thing, but it is a piece of invisible labor that will probably require more time and energy than you expect. It’s a bit like laying down the tarps and tape before you start painting a room: nobody sees that work when you’re done, but they sure would notice if you DIDN’T do it first!

Every LLM is Different

I’ve experimented with Dante.ai (where my free chatbot lives), Mem.ai for my own notes, Crowdvox.ai for the teacher and client assistant, and Google’s NotebookLM for my new book. It’s not like Word vs WordPerfect vs WordStar back in the day, where all three systems did similar things. These systems are all designed to do different things well, and have different combinations of features, maximum training set sizes, token limits, customization options, etc.

Different services are not yet competing on the same things, like mobile phone companies that all compete on coverage, price, etc.; there’s no consensus in the market yet about what standards you should use in choosing one over another. Nobody has every feature you might want, so you need to think about what you really want, and how to adapt to limits of a system.

Save Your Work

It’s not necessary for your content to be locked in with any single system. Even though there’s some training that you can do within a system, 99% of your content should be in Google Docs, Evernote, PDFs, giant text files, or whatever, on your own system. This will make it easier to update your training sets, compare and contrast different systems, and migrate from one service to another if you need.

LLMs Do Average Work By Design

By their nature, large language models are designed to produce what you might think of as a summary or consensus view of a body of knowledge. This is a function of their working like “stochastic parrots,” using training sets (either the information you give your model, or in the case of ChatGPT, big parts of the Internet) to identify statistically likely combinations of text, and then using that analysis as a guide to create texts that are most likely answers to questions.

The results can seem pretty magical if you’re asking questions about something that you’re not really familiar with; but if you ask an artificial intelligence about something in an area where you’re expert, you’ll find that the results are usually not wrong, but also not particularly insightful or sparkling.

For example: As a writer, one of the things you always look for is the vivid anecdote or story that catches the reader, and also introduces some big theme. One of the things I work hardest at in my books is come up with interesting chapter openings: using a story of biologist Barbara McClintock solving a big problem while walking on the Stanford campus as an introduction to the idea of walking as a source of inspiration– and setting up a later section where I walk around the Stanford campus with scientist Marily Oppezzo, talking about her research on walking and creativity. An LLM won’t do that; it’s not meant to.

Scott Galloway describes this as the “All chip, no salsa” problem– LLMs generate results are solid but bland. So don’t expect counterintuitive insights, or radical ideas, or novel combinations; at heart these systems are reliable rule-quoting bureaucrats, not genre-bending artists.

You Need to Learn Your System’s Limits

In my experience, a large language model performs about as well as an enthusiastic intern who wants to do a good job but doesn’t know a lot and needs really clear instructions to succeed. What’s that mean?

For one thing, if you have a complex task, you have to decide if it’s worth it to take the time to think through the steps, formalize the tacit knowledge that lets you do the do the job, and then coach the AI through the process– or just do it yourself. Sometimes it actually will be easier to just do it yourself.

For another, LLMs tend to add a bit of generic verbiage that don’t add much insight– e.g., closing lines that restate the main thesis, that sort of thing. This is a fairly minor irritation, not really a problem, but after the thousandth time, it gets a bit on one’s nerves.

Will AI Bring About a 4-Day Week?

Will AI systems make a 4-day week inevitable? The simple answer is, no they won’t.

For one thing, no technology operates in an independent, autonomous way, generating predictable outcomes no matter what. Technological inevitability– the claim that a new technology’s triumph will happen no matter what we do and therefore we might as well surrender to it, and that good outcomes flow as naturally as 4 flows from 2 and 2– is a myth. And while AI feels more human or human-like, it’s just another technology.

And whether it will create more or less work will depend not on the internal capabilities of the technology, or how many large the training sets are, what the reinforcement is like, etc.. It’s depend on how the technology is deployed, who gets to control it, and who gets to say where the and how the benefits are distributed. If workers get to say how it gets deployed, they’re going to use it to increase their productivity, automate boring work, and move up the value chain; and they might use those productivity gains to reduce working hours, or they’ll use it to increase total salaries.

In contrast, if managers and executives and capital are in charge of it, they’re going to use it to kill jobs. We’ve had decades of Silicon Valley execs seeing job elimination as both the natural consequence of automation, and a cool thing that the big boys do: Travis Kalanick was admired for destroying the taxi industry, and Elon Musk’s elimination of 80% of Twitter’s workforce was seen as an alpha male baller move, not a reckless dumpster fire that helps explain why its current valuation has dropped 80% since Elon bought it.

So right now, AI is more spellcheck than Shakespeare. And yes, AI could help us get to a 4-day week. But it’ll depend more on how it’s deployed and how the benefits are distributed.