Earlier this month, I attended the AI Conference in San Francisco, Sep. 10-11, 2024. It was exciting to meet and talk with some of the leading AI practitioners in the field, attend sessions, visit the exhibitor booths, and learn about the latest developments.
Here is a brief overview of the main sessions I attended along with some of my key takeaways.
Open Source AI Comes of Age
Open source models are gaining traction and are now competitive with closed models, both in terms of performance and the robustness of their ecosystems. Both Meta (Llama) and IBM (Granite) had interesting talks describing their respective capabilities.
Meta’s Llama stack is quite well developed and includes API calls for inference, safety, memory, agents, evaluation, post-training, synthetic data generation, and reward scoring (each API is a collection of REST endpoints).
Their philosophy is that changing models has gotten too complex, so a layer of abstraction is needed to simplify design and maintenance.
IBM’s strategy with Granite is to create a family of smaller, open models with data and build transparency tailored for specific domains. The focus is on enabling enterprise use cases.
Some early predictions about open source AI that turned out to be wrong:
1. Open source AI will lag behind proprietary systems in capability. [FALSE]
2. Open source AI is dangerous and closed models are safe. [FALSE]
3. Enterprises will be slow to adapt open source AI until (tech/legal/policy) issues settle down. [FALSE]
We Need Philosophers, Not Engineers!
Some key takeaways from an interesting fireside chat with Peter Norvig (Director of Research at Google and Education Fellow at the Stanford Institute for Human-Centered Artificial Intelligence (HAI)) and Alfred Spector (Visiting scholar @ MIT, formerly with Google, CMU):
- How do we know we can trust the results from AI? The applications need to be understandable – the results need to be explainable to users and we need to understand how decisions were made. Maybe the systems should have transferability so they can also teach humans so we’re not left behind.
- We will continue to chip away at the affordability of the models.
- We’re figuring out how to apply these things in many different domains; only limitation is the data.
- It used to be that there were only a small number of experts who could work with the models; now a domain expert with a small amount of knowledge can harness them.
- The AI part in the middle is the easy part; figuring out what we really want is what’s hard.
- Focus should be on needs, but we don’t have a good way to say we want to end hunger, achieve world peace, etc.
- Now that we’re building systems that behave more like people, we need to consider more difficult societal issues. Maybe instead of engineers, we need philosophers!
What’s next?
- (Peter) Education is a huge opportunity. Consider AI 1-on-1 tutors vs human teachers with a 1:30 ratio.
- Then the question becomes what should we teach, what do we want to learn?
RAG Is All the RAGe
RAG stands for Retrieval Augmented Generation. This is a technique to add additional domain-specific contextual information to a prompt before sending it to an LLM. The idea is the LLM will use the information as additional context to produce more accurate results with fewer “hallucinations”.
The additional context is generally stored in a vector database (where similarity relationships are preserved) or a graph database (where hierarchical relationships are preserved based on a knowledge graph). Hybrid approaches that use both techniques have also been developed.
Some of the companies present at the conference with RAG-related tools, databases or services include:
– Marqo
– MongoDB
– Neo4j
– Ragie
– SingleStore
– Vectara
RAG might sound a lot like fine-tuning, but they are two different approaches although they have similar objectives. RAG adds domain-specific, contextual data to the prompt, while fine-tuning is a form of training and incorporates the additional knowledge into the model itself.
There are pros and cons to each approach but RAG is best suited for data that’s updated frequently, while fine-tuning is more for foundational knowledge that’s incorporated into the model. It’s also possible to combine the two as they’re not mutually exclusive.
Agents Are Your Friends
The whole area of AI agents is an evolving one and the exact definition of “agent” is still a little fuzzy, but it’s generally come to mean an entity that uses one or more LLMs as its knowledge base, can interact with its environment, make decisions and perform tasks all autonomously.
There were several sessions on agents, covering both use cases and tools. CrewAI gave an interesting talk on their multi-agent platform. Zapier also talked about their tools for building agents.
Here were some key takeaways of the talks:
- Up till now, model development was the driving force in AI; now agents are becoming more important.
- Agents can be constrained or unconstrained. Letting agents run amok without constraints can lead to problems, but as LLMs become more reliable, agents can be made more autonomous.
- A key question is how do we improve agentic reasoning and output resolution?
- How do you model what humans want to do and then how do you make the LLM do what you want to do?
- We can use a small model to generate the steps necessary to implement a plan.
- Function calling is key to getting an agent going.
- The agent typically needs to call the LLM multiple times to execute each step of the plan.
- Don’t try to implement too much functionality in a single agent. Better to build a system using multiple agents.
- Multiple agents communicating with each other is similar to a microservices architecture in the software domain.
- Fine-tuning is important for agents as it helps to control their behavior more precisely.
- For the future we need to build more awareness into agents, otherwise one agent could make thousands of LLM calls. We also need to leverage smaller models otherwise the cost will become prohibitive.
- The future is in multi-agents, solving more complex tasks.
- Like humans, you need to give your agents space/time to process.
- Build safety into an agent by including guard rails; assume they know nothing initially, iterate and then give them feedback.
What the VCs Are Watching
Key takeaways from this roundtable:
- Translating technology into business use. Example: doctor assistant application – doctors today need 2 hours to transcribe their notes every day; having an AI assistant handle this chore would be a huge plus. Many such applications exist in other verticals (copilot vs autopilot).
- Verticals – how can AI fundamentally change these industries?
- Applications across industries and enterprises.
- What can we re-imagine in terms of the enterprise when the model is operating at a PhD level?
- Funding different kinds of intelligences.
- Encourage founders to look beyond LLMs.
- Products that are not screen based.
Brain and Brain, What Is Brain?
There were a couple of interesting talks about how we can apply learnings from neuroscience and biology to AI.
Numenta has been focusing on this area and has an interesting initiative underway called the Thousand Brains Project.
Their mission is to “map our neuroscience discoveries to today’s AI systems and create a new type of AI for the future.”
This is an area that I personally find fascinating. While nuclear reactors are literally being harnessed to train and operate LLMs, our brains run just fine on the energy equivalent of an ordinary lightbulb. There is some kind of a disconnect here!
Key takeaways from these talks:
- Neural tissue computation is sparse (in both connectivity and activity). In the brain, to perform a specific task requires fewer neurons, less activity, and a high locality of reference.
- By comparison, LLMs need to be fully activated to process even the simplest queries.
- Can we mimic this behavior in silicon or do we need another technology?
Brain vs silicon:
[Compute]
- Human Cortex has ~16 billion neurons
- Cortical neurons spike ~0.16 times per second
- Each neuron has ~7000 synapses -> 700 connections/neuron
- 16B x 0.16 X 700 = ~2 trillion ops/sec
- By comparison, iPhone capable of ~5 trillion ops/sec
[Memory Size]
- Human Cortex has ~300 trillion synapses
- Equivalent connectome graph size ~300T x 4 bytes = 1.2 Pb
- By comparison, iPhone memory size 16-32 GB
Brain vs Silicon summary:
- [brain] Cortex = 2 trillion ops/sec (~iPhone throughput), 1 Pbyte of memory <– what we need
- [silicon] GPU = 100 trillion ops/sec, 16-32 GB of memory (iPhone size) <– what we are building
⇒ Why doesn’t silicon behave like a brain??
Because we’re building the wrong thing!
⇒ Why?
Because we don’t know the algorithm!
Regulating AI: SB 1047 and Beyond
This was an overview of this legislation addressing the issue of how we should regulate AI by the lead author, California State Senator Scott Wiener followed by a lively and engaging discussion with Prof. Ion Stoica (EECS/UCB) on some of its more controversial aspects.
On Sept. 29, 2024, Governor Newsom vetoed the bill for specific reasons, but at the same established a task force to continue moving the process forward.
Even though the bill is officially dead, I’d highly recommend anyone interested in the topic to check out the video of the presentation, which should be available soon on the AI Conference YouTube channel, as the discussion covers a wide range of relevant topics that currently remain unresolved.
Get in Touch
Our friendly team would love to hear from you.
- hello@lai-techtr.com
- +1-650-571-7877
- 204 2nd Ave. Suite 128 San Mateo, California 94401 USA