In late 2023, I wrote an overview of the state of AI called Beyond ChatGPT. That report focused on the state of the technology as well as state of adoption and reactions across the university sector. The state of Artificial Intelligence in Higher Education is very much the same as last year, but the capabilities of generative AI have progressed at a rapid pace.
Since then, there have been many developments but I believe there are six trends that have not only characterised the development of AI since 2023 but will be the most consequential for what academic practice will look like in the future.
- Multimodality: In just two years, we've moved on from models that can have a text chat, to models with which we can have a voice conversation and which can understand images. We started with technology that could convert text to speech, and now we can convert text into a full podcast using our own voice in almost any language, accent and with a personality. It is hard to underestimate how consequential this is.
- Interfaces and user experience: ChatGPT started with an interface innovation - chat. But since then, the range of interactions within the chat and also completely outside of chat has completely transformed. OpenAI introduced Advanced Data Analysis and Canvas, Anthropic created Artifacts in Claude, Elicit have introduced Notebooks, Google created NotebookLM, Perplexity and others offer a desktop app, Cursor have transformed coding. Each of these innovations is showing us that we are moving far beyond chat but also beyond what we were used to software is like.
- Long context windows: When ChatGPT was released, users could paste in a long newspaper article. By the end of the summer 2023, the users of Anthropic's Claude could ask it questions about a short novel thanks to its model's context window of 100,000 tokens. However, in January 2024 Google announced that its Gemini model would have One Million context window with more to come. This means that the model could "see" several dozen academic papers or a life time worth of notes at once. As context windows continue to get larger, the scope of what Large Language Models can do will increase in unpredictable ways but it is sure to transform academic practice radically.
- Small local language models: When ChatGPT was released it ran on a model with 175 billion parameters and it is estimated that the latest frontier models have 500 billion or more. This was thought to spell doom for Open Source and the idea of AI running on a user's own computer. But at the same, new techniques started appearing for making models smaller and more efficient. This effort was spurred by Meta releasing an Open Source model called Llama in the spring of 2023. Since then, capable small models (with 1 - 8 billion parameters) have become commonplace and both Microsoft and Apple have released on device models built into their operating systems. Small local language models will never fully replace the best LLMs but they will open up a range of possibilities that we can now start seeing the shape of.
- Reasoning models: Introduced into the world only in September of 2023 with OpenAI’s release of the o1 model series, reasoning models have become a key trend in the development of Large Language Models that overcome many of the problems faced by models. In particular, they can solve more complex problems that require more deliberation. Reasoning models build on the popular Chain of Thought technique where models are known to produce better results in some areas when they output the whole process for deriving the solution before giving the final answer. Reasoning models were finetuned to output much longer chains of thought in the background before giving their answer. Reasoning models do not replace or even outperform “traditional” LLMs in all areas but they are ideal for complex programming and other STEM related tasks. They are a key driver in the recent progress of models on complex tasks and since the release of o1 preview, OpenAI have now released full o1 and announced the o3 series to be released. Google have also released a reasoning model and there are also 2 Open Source reasoning models QwQ from Qwen and R1 from DeepSeek.
- Agents: Agents are the most speculative and exciting development in generative AI. Up until now, most of the things people do with AI are those that can be achieved with a series of prompts in a chat session. Over the last year, many new techniques have appeared for extending what Large Language Models can do by creating "agent systems" where the model not only responds with the outline of a plan of action but can also start new threads performing the actions from the plan. This could mean writing a complete software application or translating a whole book. So far, we are only seeing glimpses of potential and it is not clear what the limits are but there is no doubt that this will be the most significant trend for the year to come.