U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Highlights from the Open Data Science Conference

“Last year I was a data scientist; this year, we’re all AI engineers.”

In April 2024, the Cambridge Open Data team attended ODSC East, where this year’s theme was AI (Artificial Intelligence), and the buzzwords heard everywhere were RAG, agentic, and genAI. ODSC East was a great chance to meet other people in the data science world, learn more about AI, and get new ideas to bring back to Open Data in Cambridge.

GenAI is just a shorter, in-the-know way of saying “generative artificial intelligence,” which is what ChatGPT and others you may be familiar with are – AIs that can generate text, images, data, etc. using patterns they’ve learned on large sets of training data. GenAIs that generate text, like ChatGPT, are also known as LLMs, or Large Language Models, because they’re trained on very big sets of textual data.

RAG stands for retrieval-augmented generation; it’s when a generative AI is allowed to retrieve data from outside sources while generating its output. For example, a genAI could have access to your email and calendar, and could then help you plan your week by retrieving information from those sources about your meetings and appointments – or a genAI could have access to Google, and would have up-to-date access to news and data.

Agentic AI (or an AI agent) is AI that can do tasks beyond simply generating text or images, and can do it without much human involvement. Rather than only performing predetermined or repetitive tasks like simpler AI models, agentic AI models can do multiple subtasks depending on conditions and make decisions at various points throughout the process.

As the Open Data team learned more about genAI, a theme emerged: AI has a massive appetite. 

AI is hungry for data: Large-language models have already “eaten” the whole internet, but they’re still hungry – that is, they’ve already trained on most everything written online, but they still need more training. To solve this problem, engineers are coming up with different ways of learning for the LLMs that can be more “data efficient” so that they can learn more on smaller samples.

Furthermore, a huge proportion of the internet is written in English, but there is a huge need to non-English AI. This chart shows the completely different representation of languages in “real life” versus on the internet (chart from Trevor Back’s talk at ODSC):

AI is hungry for energy:
AI uses a huge amount of energy. At the current rate of growth, AI would start using more energy than the world produces about 15 years from now, in 2040 (see chart below). One idea for less energy-intensive genAI is to use smaller, more targeted AI systems rather than a one-AI-fits all model (such as chatGPT). In a system of smaller AIs, a query could be sent through a router AI and assigned to a smaller, specialized AI that would best fit, using much less energy.

This chart shows a few projections of how much energy AI could use (chart from Kate Soule’s talk at ODSC):

Page was posted on 5/29/2024 3:10 PM
Page was last modified on 5/29/2024 3:29 PM
Contact Us

How can we help?

Please provide as much detail below as possible so City staff can respond to your inquiry:

As a governmental entity, the Massachusetts Public Records Law applies to records made or received by the City. Any information received through use of this site is subject to the same provisions as information provided on paper.

Read our complete privacy statement

Service Requests

Enter a service request via SeeClickFix for things like missed trash pickups, potholes, etc., click here