Pioneer Park
Pioneer Park Podcast
Unstructured play and personal tutors with Cinjon Resnick
0:00
-53:16

Unstructured play and personal tutors with Cinjon Resnick

John and Bryan interview Cinjon Resnick, an AI researcher at NYU. Cinjon is interested in developing unstructured, discovery-oriented play games and experiences.

Topics

  • Ender’s Game and its inspiring concept of the Mind Game as an education system that deeply understands the state of the student.

  • The present and future of AI-driven tutors (see Cinjon’s longer post here).

    • Virtues of personal tutors has inspiration back to Socrates and Rousseau, AI may be able to make this broadly accessible.

    • Currently LingoStar and Speak offer digital tutors for language

    • These solutions still lack the empathy of a real human tutor, but that should be possible

    • This is a potential solution to Bloom’s 2 Sigma Problem.

  • Cinjon’s experience with circus, and his interactions with instructors including Victor Fomine

  • The inspiration for Cinjon’s hackathon project, Animate.

  • Cinjon’s love of social deception games, and the challenges for developing algorithms that can effectively play those sorts of games with humans.

  • Impact of Generative AI on gaming

  • Natural language APIs

  • Meeting the challenges of hallucination

  • The limits and potential of AI-driven storytelling

  • Why voice processing, is critical to build machines that have human-like empathy

  • The importance of real time processing, for example simultaneous translation.  

  • The future of Multi agent RL models

  • Causality Research as an underrated field

  • Cinjon’s recommendations

    • Diamond Age by Neal Stephenson [goodreads], Ender’s Game by Orson Scott Card [goodreads]

    • Rousseau on education: Emile [goodreads]

    • Martin Arjovsky’s thesis [arxiv]

  • What’s next for Cinjon? He is starting to think deeply about childhood companions.

Transcript

[00:00:00] Hi, I'm Bryan and I'm John. And we are hosting the Pioneer Park Podcast where we bring you in-depth conversations with some of the most innovative and forward-thinking creators, technologists, and intellectuals. We're here to share our passion for exploring the cutting edge of creativity and technology. And we're excited to bring you along on the journey. Tune in for thought-provoking conversations with some of the brightest minds of Silicon Valley and beyond.

Bryan Davis: Hey there. Welcome to the Pioneer Park Podcast. Today we are having an interview with Dr. Cinjon Resnick, who likes to be at the forefront of tech and research. He has worked at Startups, spent time as a fellow at Google Brain, and recently wrapped up a PhD in machine learning from nyu. He's an alum of South Park Commons and is currently working on ideas related to AI powered experiences, games and companionship.

Cinjon, welcome.

Cinjon Resnick: Thank you. Appreciate it guys.

Bryan Davis: So I'm I wanna dive right in. Tell me about Enders Game, your relationship with the book Enders Game and the cool game that was featured there. I [00:01:00] guess that's called Mind Game in the book. What is your memory of reading that and what inspired you about that narrative?

Cinjon Resnick: Ender's game is wonderful. If you haven't read it, I highly recommend it. It's a story about a a whole family. And this family is, they're special, but in particular, this one boy ends up taking on the mantle of savior of humanity through a simulated adventure experience. I'll leave at that and let the audience go and read it.

But in particular, inside of Ender's Game, there is a story. And the story is guided by this thing called Mind Game. In this story undergoes on an adventure. It's not meant to be beaten. This is meant to test different aspects of the character. There's some fun things that happened, but this was a companion friend.

There was interactive companion. It was similar to a diamond age where you have this concept of the primer. The primer is an interactive like Socrates for the child. . So we're dealing

Bryan Davis: with a, now a set of [00:02:00] technologies that are potentially opening up the doors towards unstructured, discovery oriented play games experiences.

And it sounds like that's a lot of the areas that you're interested in developing your work. What do you think are the experiences that are opened up by the current generative AI technologies?

Cinjon Resnick: Yeah, it's a good question cuz if you think about what life was like for a 1700s, 1800s child either you had nothing or you were very wealthy and then you had teachers, those teachers were 1 on 1 personal tutors.

So Rousseau talked about this in Émile, and like this idea of. A personal tutor that you would have for you. And so Socrates was this famously for different wealthy patrons, ch children, wealthy patrons. What would be really cool is to provide this for every kid. Are we there yet?

Probably not exactly in this document that I wrote sort of goes through where exactly we need to fix and what are the remaining problems to be being met. But we're pretty close and we're close and close to each day. And so the opportunity now [00:03:00] to gear an an ai, if you will, towards a child's ability towards a child so that it is a personalized experience for them and what they want to learn and what they wanna go, but stories they wanna play to, to play out for themselves and really have that AI be something that is not just comfortable for, not just a great experience for the child, but something that parents want.

We're close. We're close. And so I think ti that today is time to really start thinking about that and maybe even building towards it, finding some initial wedges. , are there some

John McDonnell: kind of maybe early applications or milestones on that journey that you're the most excited about that we could potentially be close to delivering?

Cinjon Resnick: Yes. So I think that one very clear one is actually everything that's happening, language learning. We see companies like LingoStar, I think Speak is gonna go this way as well, where they're developing applications that you could imagine just plugging a child into or yourself as an adult language learner and figuring out, okay, this is how you understand French in the context of actually talking about it with.[00:04:00]

And from there, the interactive experience actually looks pretty similar to the interactive experience when you're language learning with a human. The thing that, the gap that's missing there, one of, one of the gap, there's two things actually that's pretty clearly missing. One is the empathy to know where the student is on their journey.

And also on a day-to-day how they're feeling, how they're doing, do I up the capabilities of this AI or decrease them, et cetera. And the second one is just having a curriculum for the students because it's not just this interactive experience, especially with adult language learners, there's also this idea of at some point you are teaching them, you have some objective in mind for where they're trying to go.

That's missing. But I think that we're in a path to build, able to get there and Speak, they already have tons of users in I believe South Korea LingoStar, they're trying the same idea. They're of demonstrating the capabilities of this.

Bryan Davis: Have you heard of Bloom's two Sigma?

Yeah. Yeah.

Which for anyone listening, that's the idea that Language learning or [00:05:00] any kind of learning that is taking place with a single tutor. So the the impact of having a personal tutor on learning is Two Sigma greater than the sort of base case. Basically making the case for individualized tutors and individualized education and bloom's two Sigma is a pretty prominent result in educational theory. And I think speaks to the fact that these personalized Aristotles or these personalized coaches have a big impact on education. And so I suppose what you're proposing is that we might be at a point where we can make these personalized tutors and be achieving much more significant educational results for a much larger proportion of the population.

Cinjon Resnick: That's right. That's right. So I have some background in some of this stuff. I've not been a teacher per se, but I do regular personal tutoring in, in circus. So I have a teacher in the circus apparatus that I trained, and it's just night and day, whether I'm working with one of my coaches or not. Similarly when we've tried to do things [00:06:00] around I ran a nonprofit called Depth First Learning for a while, which the whole goal there was to try and figure out a different way to learn from a structured base.

I'm not gonna go into the exact details of how this is that person learning.com, wanna check it out. But what was interesting there was when you put people in a group, rather than having 'em just be alone, it works out much better. Why is that? I think a big reason is because you get to learn from what other people are doing, or the people are going in where their knowledge is coming from.

And so if you have a particular entity who's geared up to understand the topic better than you, and to be able to go on this journey with you, but is also tuned to knowing where you're at. . It's very powerful. And this is a lot of the things that you're pointing at, let's say with Bloom, et cetera. So one thing that's really cool about doing this with language learning is that it's largely just about talking.

And so the subject matter is really easy. You can't, it's hard to get it wrong. A lot of these machines that'll hallucinate answers today, and they can get it wrong, say in history or in finance, that's a problem. [00:07:00] Another area where I think is very primed for this concept is in early childhood interactive experiences, because getting it wrong just doesn't matter to a kid that's, three to five to seven years old, getting something wrong in their story of their day or talking about, a, a big dog that is drooling in the park.

And that doesn't matter if it's slightly. . But the experience of making a companion that I can actually have empathy with this child, those are the things you start to be able to build there. And that's actually a direction I've mostly been looking into.

Bryan Davis: I'm curious if you've had any mentors or teachers that you feel have been really effective in your life in cultivating that experience for you, that's made you so interested in this.

Cinjon Resnick: What exactly are asking if a teacher of mine has actually just effectively been a

Bryan Davis: 1 on 1 tutor? Exactly. Whether or not you've had a really sort of significant relationship with somebody who is a mentor or a tutor that kind of really you felt opened up a new field. It sounds like to some degree circus has been that for you.

Yeah.

Cinjon Resnick: Yes. Yeah, I can definitely answer that way. So [00:08:00] in athletic movement I have two coaches in Montreal that I train with one, and this guy Victor Fomine, who's world famous coach, I'm really lucky to be able to work with him. Another guy think Sergey and I go to them for different things. , but the, and Victor also doesn't speak English, so the opportunity to work with them is just fantastic because they understand home so much of how the body should move.

And so the whole experience with that is getting cues, getting, figuring out, okay, this, so we should be doing then, so we're doing then. And so just the cue to tap Tapi, look at your, look at the ground, look at your feet. Just hearing that over and over again at the right time is fantastic.

It's just so useful. But then sometimes I can go there and I'll see him train, I'll see him teaching people who are much less skill or even much higher skill, and he changes his course to those people, right? There's an empathy for understanding where they're at. But then still this drive to I think one of the best parts of working with a tutor who's able to adapt to you is if you give more, then they give more.[00:09:00]

And if it's a day where you just can't give that much, they recognize that. Being able to build that into this next generation of machines is gonna be so important for getting this tutoring experience.

Bryan Davis: One last aside on that, what is your circus skill? I do

Cinjon Resnick: straps. It's like artistic rings.

John McDonnell: What was the inspiration for your recent hackathon project that you did at

South Park Commons?

Cinjon Resnick: Animate. You're talking about animate and the idea here, just to sum it up for the audience is we're going to. , I wanted to understand what was the state of the art in a wide variety of systems. There's a wide variety of APIs that we could use to have an interactive experience with ai.

And additionally, I also wanted to understand what it would be like to do test. Two things. One is it fun to be read a story to? And two, is it interesting for a language learner to be read a story in different language in like the language that they're [00:10:00] thinking about? At this point, I hadn't yet come across any app that could do the second thing I have in the time since.

But the Animate then was we took a chapter of Alice and Wonderland, we turned it into a visual story so you could see a scene with it, and there's a narrator, there's two characters. And then we wanted to have for each of those characters, them talking out their role. So in other words, we turned it into a play.

Yeah. And the whole experience of taking the story, turning it into a play, and then animating the play so that you have the characters with their, their their mouths are moving. Then the, oh yeah, then there's the language switching and the interactive experience. So the main goal was really to test, is it interesting to be read a story to what would make it interesting and is there something around language learning there that can be, that can tap into, so built all that out.

It was actually rather quick to build it all out. Considering the technology today is very good. Just progress to the point where [00:11:00] you can do all these things and. The goal at the end of this was then put it before some children and see what they liked. And when I did that, there was just one thing that stood out over and over again was the ability to change the scene.

It's just fun. It's a fun experience when you edit the scene and you go from something which you can see, it's plain, and it's a canvas. It's a creatively constrained canvas because it's it's characters sitting in front of a fireplace with a chessboard in between them. But then you say, oh, I wanna put a monkey on the chessboard, or I want to change this chair to be a giraffe.

And what you get back is just fun. It's surprising, it's creative. It's interesting. It's hilarious. And it was engaging. Watching the kids actually

Bryan Davis: play with this. I'm curious, you are very interested in games and play. Do you play any

Cinjon Resnick: games? I do. I really like social deception games. So famously like secret Hitler and coup those kinds of games.

But I also braid and I used to play a lot of Diablo too when I was a kid and [00:12:00] work off three and those kinda things.

Bryan Davis: Do you spend any of your time now deeply invested in any kind of computer worlds or most of them are sort of social deception? Ooh, .

Cinjon Resnick: I don't really play any computer games these days.

I am gonna play double four when it comes out. I have a childhood nostalgia around it, but I have not invested in any of the ones that I've noticed in my friends playing. I'm not, I never got into factorial, for example. , do you feel like

Bryan Davis: the next generation of games? Factorial is a great example of a game that is algorithmically generated, but it's and has a lot of ran randomness that's embedded in the way the game is played.

It's very famous for being replayable and time again, every experience will be different. Do you think that generative ai, and I guess, how do you think the generative AI is poised to impact gaming? .

Cinjon Resnick: That's a good question. So I think there's some obvious answers here. Things you can point out with AI Dungeon, you could talk about storytelling, whatnot.

I think there's two things that may be are less obvious. One direction is around motion generation. So we're starting [00:13:00] to see this past year, really actually this past year, motion generation start to be, start to work. What I mean by this is examples are PhysDiff or the motion diffusion model. These are pointed to a place where you can just say, I want this character to move like Beyonce in a rainstorm with jazz music in the background.

And then it, it does some interesting thing because it has some concept of Beyonce rainstorms and jazz music associated with the movement of the human body. , we're not at a point where we can do this with. With shapes that aren't really the human body unless you can slap a faux human body on it.

But what that means is that if I was to just draw something that had some resemblance to the human pose, you could imagine creating, turning that thing into its own shape. And so this has a, this could have a huge effect on U G C content. Suddenly U G C content can come alive. So I've seen a couple of startups working along those directions.

Not the direction of taking the [00:14:00] motion generation and putting in yet, but being ready for when that's possible. So that's one area. Another area which I think is really promising and is, I know is actually being worked on in place like EA

is defining difficulty differently. A lot of times in game difficulty, what is you see a a computer get better or stronger. just based off of they'll give a bonus as to how much gold it collects when it accumulates something, it's just hacks. . But if instead what you can do is define it in terms of how capable it is as an agent and that capability adapts to what your strengths are, so you're doing really well.

So they'll just keep upping the difficulty until you're in that sweet spot of just a little bit past what you can do. But if you strive hard, you'll get there. I think that's gonna come, that's gonna come pretty soon. Yeah. So like

John McDonnell: I play online chess and [00:15:00] I think the thing that is very cool about it is that you have your ELO rating and you're always paired with people who are a good match.

And it was always disappointing. Like I remember as a kid when I realized that like in civilization, deity mode or whatever, like all that was happening was like the bots just. Could build every building in three turns or something. . I was like really disappointed. Like I thought, oh, the bot's gonna be like super intelligent.

It's gonna outsmart me. And that's not actually how it works. And you just kinda have to figure out what hacks you can do to exploit its dumb behavior. Yeah, it's like a completely different idea to actually make it be really smarter.

Cinjon Resnick: Yeah. Yeah. I think places you'll see this first are things like FPSs.

In chess you could do this right now, you could train an alpha go to have any ELO rating you want. Yeah. And then just park it on the server and have it be available to play, I yeah. I dunno if anyone's actually done that though. actually

John McDonnell: think so. So I for sure heard about people building chess bots that are intended to have a certain ELO rating.

And then I believe actually that on chess.com, some of those bots like I like are actually Yeah. Spec specifically trained to behave like a human would behave. I if the human had that ELO

Cinjon Resnick: rating yeah.

Bryan Davis: I think some go servers also have [00:16:00] similar bots that are out there at different levels with different sort of training and background.

So very interesting to think about. I recently read the paper about diplomacy, the Cicero paper from Facebook, which was talking about the integration of large language models into this sort of like social strategic game. And that was one of the most fascinating examples and that I've run into in recent history of an integration of a very complex social game with a strategic engine.

I'm very fascinated to think about what is the sort of next version of this. Are there environments where we could let these things loose so in learn from it proactively. A lot of these things, especially the strategic engines that the Alpha go and these other sort of game engines that are winning these strategy games rely on self play.

Hundreds and hundreds of games to be iterated upon, in the background playing each other. And I'm curious, do you think that in a social game or a game that is almost dependent on relationships with humans, do we run into an issue where self play becomes ineffective [00:17:00] because we can't actually mimic human behavior, we can't mimic human

Cinjon Resnick: adaptability.

What is the world where, what is the game you're thinking of where you need to do that? And guess I'll point out my example is being in AlphaGo, once it passed human capability, they kept getting better because it was now competing against its own population. So the

Bryan Davis: example of diplomacy, I think is somewhat interesting because it is reliant on human communication.

It's reliant on interpretation and alliances being effectively formed. . And so perhaps there's a category of games that do have this sort of like unbounded social nature and self play. When they used self play in the context of diplomacy they found that the a large PORs, a large part of diplomacy, first of all, takes place over messaging, basically convincing people to ally with you or to invade another country on your behalf.

And so that requires that you're able to be persuasive. And when they instituted self play in this system, they found that there was a tremendous amount of semantic drift, where system one and system two were [00:18:00] communicating with each other and they were be beginning to use nonsensical language to And so that seems to be a limiting factor on how well a computer can do in a sort of social setting or a setting where a computer needs to be persuasive. It seems like there needs to be some sort of anchor to the real world.

Cinjon Resnick: Yeah. Yeah. So it's been a while since I've been involved in this research direction.

I would say the thing that come to mind is called other play. If you haven't seen that, I would look into that. So other play is its work out of I think it was also actually originally a fair, but I associate it more with Jakob Foerster and his lab. The idea behind other play is that you want to train agents that can work not just with their self, but with other agents.

And so the goal the whole time was to be able to train agents that play Hanabi with humans at a very high level. And so the algorithms that they come up with around this, even though they're playing self play, Need to [00:19:00] be able to work with humans too. And they actually do a pretty darn good job. So a lot of that I think that there's a lot of room for algorithm improvement where you go in those directions.

The challenging part is always going to be to keep the human connection available there. I think though, there's another question that's built into what you're seeing, which is can you make an algorithm that doesn't actually work with humans, that is agents getting better and better, but still a human interpretable as to what they're talking about, right?

So that that's, it doesn't need necessarily need to be playing with humans, but needs to be talking in a way that humans can understand or that's what we would want. And one question here is if it's even possible, because maybe what they're saying looks interpretable to humans, but actually has codes underneath all of it.

And so that's, that's I think an open question. I don't actually know a research that has addressed any of that but I would expect that to actually happen that once it surpasses the human, trying to understand the strategy [00:20:00] involved is too difficult. And at some point it's going to be so difficult that we're just going to let it happen anyways.

We're gonna let it happen because the results are so good. And you can take that as for voting, even I'm not. .

Bryan Davis: One of John, one of my favorite conjectures that John has about the future is this world in which there are just natural language APIs to the universe. So basically, every sort of site or service has a natural language API where you state your intent and it is able to perform the actions.

And you can have, obviously these APIs that are beginning to interact with each other, just, like a large API server, but they're interacting with natural language. But what I think is interesting in that context is what happens when natural language ceases to mean what it means to us when these bots are that's right.

Yeah. To communicate in their own version of our language that to mean very different things. Yeah.

Cinjon Resnick: But I love that direction, that emerging communication and as, one of the reasons I wanted to do a PhD was to study that area. And I think that there is, , there's [00:21:00] a lot of fanciful things that we can come up with in that domain.

It's just, it's hard to then ground it in a real ac actually useful thing to do. And you saw, we've seen now the rise of agents that we can talk with. And we used to call these, three years ago, we used to call these chat bot, and now we don't, we call them, we just, we've forgotten the word chat bott.

Instead we go Just ChatGPT or GPT-3 or the coming one from Google, whatever. But they work now and they work in a way that eschewed all of the purpose that was going on with emerging communication, but maybe it's time to bring it back in. And I would love that. I think that'd be amazing. I also wanna bring up something else I think is interesting in this direction.

And that's that's how it's sort of connection with hallucination the hallucination in terms of these the big language models. So my friend Colin has this interesting take , he says it's, bzip, it's pretty hard to imagine these neural nets being more compressive than bzip. And bzip is roughly N over four in terms of using float 32. So if you have size of your language, just [00:22:00] 25% in, that's roughly bzip.

Let's just say that's a floor. Okay? So then let's just, we're gonna move on from what Colin's point is there, but that's our floor. Now, if you imagine all of the internet that's generated, it's much larger than the size of these models. So if you imagine stuffing all of that into these models, it's not gonna be able to.

in the same way that you and I, when we go around the world, we can't stuff everything into our head. We have to compress a lot. We have to figure out how to make it compositional, but it's not gonna get below what bzip is doing because it's not even caring about making.

You can do it

John McDonnell: if you're lossy, right?

Cinjon Resnick: Yes, that's exactly it. You can do it if you're lossy, and that's where hallucination comes in, because we don't give it any faculty for knowing what it doesn't for knowing that it doesn't know something, and we require that it generates something. The only answer is that it's a lossy hallucination. It has to be. And if you were to instead figure out some faculty for either having a reliable communication channel that [00:23:00] let's it say I don't know what this is.

If you don't even do that, you're not gonna get, if you, sorry, if you don't do that, you're not gonna fix this problem.

Bryan Davis: So what do you see as solutions to hallucination in the, short and medium term? .

Cinjon Resnick: I think the first question is to ask yourself if you need a solution. Cause a lot of times maybe you don't need a solution.

Yes, you're gonna need one. If you're trying to do something that's legal obligation. If you require that this thing is airtight in that domain, then you need a solution. But in many places you don't. And ask yourself really, if you do the second answer is at some point we need to teach it or it needs to emerge because that's the seems to be the flavor of the day, is to emerge a property of understanding what it doesn't know.

And there are places that people working on that, but even the direction those say involving knowledge bases inside these things, know, people have been doing this for a while. It's not like in the last year and a half was the first time that we started to understand that the stuff can hallucinate people working on summarization for decades, extractive or subtracted summarization.

This is not a. [00:24:00] A new topic. We don't have an answer even if you include knowledge base because the network may actually have a concept of the knowledge base, sorry. The model may have a concept that this knowledge base exists without being able to actually point to the fact that caused it to understand something.

And in other words, I don't think there is a solution right now. And I think you have to just deal with how much you wanted of it and then otherwise form the right gates you form the the right playpen for your users or whatever to plan.

John McDonnell: Yeah, it's really it's interesting your point about the fact that I can't say that it doesn't know like it makes me wonder, if you could instruct tune them or something to be able to ask the follow up question. How confident or were you sure about that and to have it reliably, give you a reasonable response or that kind of thing.

Cinjon Resnick: It's unclear though that would help, right? Because it's the same problem. It's like at its core, it just doesn't have the ability to do this again, unless something emerges. That's different. But we haven't seen that. Instead, what it has I've seen really good evidence to suggest that [00:25:00] what it has figured out is the ability to follow your intent.

The conversational partner's intent. . So if Brian's talking to this agent and it knows what it's looking for, is some answer along these lines, like why do you not know what you know? What do you mean what I don't? Is it because you are trying to track this fact? Oh yes. That's why. And then you roll dice again and says, oh no, that's not why it's actually this.

But the understanding, the, it has, it seems to have some understanding of your intent to where it's going with it.

Bryan Davis: Yeah. That's interesting when I phrase this is they seem to be very agreeable and I, yeah. Yes. I suppose that's because a lot of the to your point earlier, like these are trained on data that exists, not data of just denial of existence.

And I think it's interesting, it's perhaps an interesting point to think about. The negative case of not knowing is not represented very well in the data that it's trained on. Because the [00:26:00] overwhelmingly the internet is full of information. Even if that information is false, it's not full of people.

Or I guess we have an underrepresentation of questions not being. because the questions that it's being trained on are content. So it's almost as if we have this bias towards the things obviously that do exist, the training sample that do exist, and perhaps there's a, there would be a benefit to generating false or negations as part of its, as part of its training sample.

One other strategy in this domain that I'm curious to hear feedback on is relatively annotation heavy. And that would require basically taking the input of something like Wikipedia and annotating it as requiring citation or being basically, Labeling as this particular statement coming from a, needing a source or being an example of something that is a timely, factual piece of information and thereby [00:27:00] perhaps teaching a model in its process of training that it needs to basically inject some citation or inject some sort of timely fact.

And knowing that and being able to output that as part of its response to then be filled in by so we can imagine, for instance, a tag that indicates a timely fact or a sort of citation needed that's actually in the data as it's ingested. That's just one strategy to throw out there. I understand that it might underlie some of the experimentations with the FLA model from Google.

But curious if you have any reactions to that sort of strategy or

Cinjon Resnick: others. I think that it's a great strategy for targeting it towards your use case. , if you care a lot about having, lemme put it this way, the model size isn't changing, so you still have this limitation this more meta limitation around can you actually put all the data you need into [00:28:00] this thing?

It's comparable to you as a human. Actually. It even has fewer parameters and abilities right now than you as a human. But at some point it'll be comparable in terms of per of parameters in its head. And you yourself can't remember everything that happened. There's just too much data out there.

And so the answer must be that it has to compress it into composition ways and then use those compositions to, to meld into these new concepts and then we'll explain them. But even us, when we do that, we still don't remember facts. Because those are too, there's too much information there. And it's too much long.

It's too long. Tail. What? I don't anticipate that changing it. What do you see? Why would it change? Wow.

Bryan Davis: Sorry. What do you see as the limits of AI driven storytelling? What's the boundary in its capacity to proactively create?

Cinjon Resnick: That's interesting. I think in a long term, I don't think that there is [00:29:00] bound it's not bounded. I think that it'll gain all the faculties that we want it to have. I think today, one way that it's, one way that it's bounded is definitely in the empathy and understanding of what's going on. If you try and say, okay you're playing the role of a teacher for a five year old, It's not gonna remember the entire time that's playing with a five year old child.

That's one thing that comes up. But at some point the child's gonna say some set of information and the model it's not gonna know that tone means something. There's no ability to take that in. If you say the kid was excitedly saying it, are you saying it with the right way that we gauge it?

There's just, there's a lot of lossy information there in how humans receive empathy and give empathy to get where the child is at. And I expect that'll be true for us as well when I've taken. We put that back to adult language learners. When I've taken language classes [00:30:00] or just one-on-one experiences with teachers they have this ability to slow down the way that they speak automatically to gauge where when they figure out that you're not thinking about the right thing, or they can stop and say, oh, you didn't get that word did you?

Or on the other flip side, they can speed up when they recognize, oh, you're just, you're fine. You got this. Let's go faster. It's not gonna be able to do that automatically. So there's gonna be this little bit of extra friction every time you use it, where you need to now account for that with design.

. And I think it's possible. And it's a very interesting journey in the next 10 years, getting from here to the next step.

John McDonnell: How would you try to get it there?

Cinjon Resnick: The answers that come to mind feel like a combination of getting the data and getting the right design today. And also just we need to reduce the latency in things like speech conversations. So in a past life, I've worked a lot with audio data. It's, we're talking if it's 16 fpf, oh, sorry, 16 kilohertz, then you're talking tremendous amount of samples per second models today can deal with that.

[00:31:00] But it's a whole other modality compared to text, which is much fewer samples per second because each of those samples, much fewer words per second, because each of them contain a lot more information . So if you wanna be able to go from what the experience that we are having right now where I'm talking, and you immediately understand it because there's no extra steps from taking this audio to text, to, to sensory, to, to reasoning, and then reasoning back out to text.

That pathway needs to be smoothed. And there is some really interesting work going from audio to audio. But most of the big labs are not focused on that because they're seeing so much power right now. Go into these straight up text to text models that they're gonna focus on them for a while.

If you wanna get to a place where it feels like realtime understanding and realtime maybe even empathy it's possible that emerges from the text, but it's gonna do it in a medium that doesn't feel the same as it does with you and I right now, or what happens when you play with a child. And so I, I do believe you, you [00:32:00] almost surely have to go to an audio to audio experience to get there.

John McDonnell: it feels like there's almost like a multimodality to this where you can think of it there as being the text itself is like one mode, and then all the kind of like meta text, audio information about the way the person's talking or their speed of speech or their accent or whatever.

Is this other stream that, that you're actually gonna want to co-pro as you're making judgements.

Cinjon Resnick: I agree. Yeah, I agree. I think that there's so much interesting questions around ity that we don't really understand, and sure there's dire other fields that look at the effect that comes with different pro, but bringing that into the end-to-end experience that is, that's in advances today.

We, we don't have good answers yet. There are teams working on this. There are teams at Google or Facebook that I'm familiar with that they're not they're, they're not even private about it. They're public. They're fairly public about the fact they're trying this because it's all early research.

John McDonnell: So Brian and I, for our hackathon project, we made a, like a voice [00:33:00] chat bot that you could call on the phone.

That's awesome. And it was what, honestly, I think the coolest thing about this project was having that experience of talking to a bot by a voice and then seeing how it's cool and also how it's broken. . And so like when we first turned it on, we used curate as the model and the response time was about a couple hundred milliseconds.

And it really felt like Curie is like listening to us and then answering back. And it is really magical just to have the bot be talking in a conversation flow and cadence that matches yours at least a little bit. But then curri, curri was difficult because curi hallucinates a lot. So fun.

It's fun that Curie hallucinates actually, but it was like, . I had weird conversations about it where it told me there was like a terrorist attack going on and stuff, and it's OK, , I can't really ship that. . But it was real. It was, and it was creepy. Like it was almost like just there's like a freakiness to that.

But but then, so then to get reliability switched to Da Vinci and then it was like, three second lag time or something, and then you just kinda feel like you're giving instructions to [00:34:00] Alexa or something. Yeah. Magical. And it's, and of course like mean to your point man, if you could get that curious speed and then also have the bot be attentive to your ity and the kind of like other aspects of your speech that, that are reflecting your state of mind.

Like I could imagine that even if it was just not very smart, like being really magical feeling,

Cinjon Resnick: yeah. It really would. And then, I that doesn't even count for the TTS and assr on the other side. So the Texas speech and the speech recognition on both sides of that, yeah, that's probably adds another few hundred milliseconds each way at least.

Yeah. Me. This is,

Bryan Davis: I think, evidence of how amazing it is to be a social animal and to have a brain that is capable of interacting in a real time, interactive, perceiving, understanding, reacting, all happening so quickly with and it speaks to the fact that there's some amazing compression representations of our world are extremely efficient and in their [00:35:00] ability, both in terms of a memory standpoint and also in terms of a computation standpoint.

And I wonder whether or. The, that seems to me from from where I'm standing to be, one of the main limitations that of our current sort of understanding of how AI will progress is we are very far away from being able to represent the world in an efficient way that will allow for real-time communication, realtime speech, realtime video, that sort of thing.

Yeah.

Cinjon Resnick: You're right, and it's tempting to make predictions that this is very far away, as we've seen things move fairly fast sometimes the stuff that's coming out with respect to music is really incredible, but it's also not realtime. And the, maybe one benchmark to consider here is whether you can do simultaneous translation.

People care about having simultaneous transla. , there's big companies that care about it because it means you don't have to take translators with you. There's large organizations that care about it because the UN then can be just having a much more efficient experience [00:36:00] on their floor. But wow, is that a hard problem?

The idea that I can be talking right now and that there's someone right to my, in terms of the order here, maybe just someone right below me who can be with only half a second, maybe delay or a second delay, be translating the concepts that I'm saying. It's extraordinarily different than what machine translation does.

Machine translation is going trying to do almost the sentence by sentence experience, but here it's more the conceptual experience in order to make that fast enough. And we do not have any good solutions for this. And I think this is probably akin to all of the problems that we've described here with respect to understanding empathy, et.

Bryan Davis: what do you believe are the constraints on solving that problem? Do you think it's a understanding of art model architecture? Is it a a hardware issue? Do you feel like any of these things will be breakthrough points?

Cinjon Resnick: I think it's largely data. We just don't have, we have tremendous amounts of data for doing machine [00:37:00] translation, for doing simultaneous translation.

We have un which UN data might actually be around this this kind of good stuff. I don't know how much

Bryan Davis: tens of thousands of hours of recorded un simultaneous translations. For context here, I used to be a translator. I was never a sim. I was never a s I was never a simultaneous translator, but I was a I would.

What was, what's the other variety? I can't recall. But basically taking part in meetings with lawyers and translating back and forth between lawyers and clients. And this was a profession I was pursuing and I was fairly close with some people who did become simultaneous translators.

And it's, it is amazing to think of the sort of computational training that they are enhancing specifically one part of their brain to be able to instantly code switch in their heads at the speed of human language. And it's very unique and it requires years of training to get right. Yeah. It's

Cinjon Resnick: wild what happens in that when they're working on it themselves.

Bryan Davis: It's like practice. It's like somebody trying to become a concert pianist is they [00:38:00] just perform and perform. And of course, they're also working to close gaps in any vocabulary that they might be missing.

And become domain experts in the variety in the fields in which they really want to concentrate, whether that be politics or economics or specific business experience. So there's vocabulary acquisition that goes along with that training, but a lot of it is sitting in a booth and doing the work over and over.

That's cool.

Cinjon Resnick: Over, yeah. I really respect that a lot. Tens of thousands of hours. Sounds like enough. But I don't know. I don't know. I've not worked on the problem. I really, I haven't really thought about translation seriously as a research endeavor in four years, but I do perceive that it hasn't reached enough of a, it hasn't reached, it hasn't reached a place where people could say, Hey, this is almost ready to tip over.

Let's now just add compute to it. There's no service that offers this. There's nothing That's good enough.

John McDonnell: Switching gears a little bit, one thing that, that I really wanted to ask you about was this kind of world of multi-agent RL models. So you've done some work in this, right? , [00:39:00] so it's funny so now, in the Bay area, there's all this excitement about LLMs and everyone and open eyes, brand name is just like infinitely high.

And of course they started off doing a lot of these multi-agent models. My, my impression of how this went was that they went transformed when transport, when the transformer paper came out. Then they built GPT and then they realized, oh, this is amazing. We're gonna just like pivot until we're really focusing on this.

But I guess

What's of become of that multi-agent work? What were they hoping to get out of that and did it just not work and are other people achieving their aims? Like how did that, like what's the state of that field?

Cinjon Resnick: Yeah. Also a good question.

It's, I don't know their motivations in particular. No. I will say that there was a long period. where people thought that the way to get to general intelligence was through RL, the reward function was the most important thing et cetera, et cetera. And you can learn everything through the reward function.

[00:40:00] Theoretically, that remains true, but in practice it's appears to not be as important as having transplant data and a sim simple enough objective that still works for what you need and what the language models with respect to where multiagent stuff is happening. Fair is still doing quite a bit of it, as you can see with Cicero led by Noam and team.

Then you have DeepMind, which of course has a bunch of people still work on this stuff all the time. I saw for all recently put out a paper that was really interesting, the 8 0 1, which is all about adaptive learning and be able to do it with small number of samples. , a lot of this work is now building on top of foundational models and then adding RL to it, which is what with RLHF as well.

Sure. I think that for the near future it's going to a lot of that, the core multi-agent RL type stuff is gonna be relegated to academic [00:41:00] labs more. I don't know how much is gonna happen because everything is just super hot right now in working with foundation models and then pushing on that.

Yeah, and there's also this feeling in academia that, more and more people do the thing that's hot. It's pretty common. And then every once in a while you're gonna have this offshoot that comes around that pushes things forward. It's gonna be surprising and there's gonna work, and then it's gonna take over a little bit more.

There are labs I can point to that will continue on this path because they, it's not going to be run over by the computational steamroller so much. , there are important problems to think about that, that say around cooperation. like involving humans. I, if you, it's rare that you're going to be run away with a competition steam roller if you have to involve humans in the loop.

Yeah. It's just too hard to then do it Now. Maybe r l HF will lead to some route where you can have people who are [00:42:00] every second are updating something, but you're gonna have to have a huge team doing that. And there's a bet here, say even like the Forester Lab in Oxford is kinda making a bet that actually this is going to continue to be the case and they're important problem to solve.

Frankly, right now it just looks like all of research is being dominated by this stuff. And my old advisors at NYU certainly also are seeing that too. And every once in a while like something on Twitter or a comic or something about how all of ML research is now being taken over by these things.

I You say that, I think there's one direction though, which is maybe answers your question a positively and that's around robotics. So Open doesn't really do robotics anymore. They stopped and they stopped because it's a different use of the resources that isn't gonna scale as well as this, as well as everything they're doing right now.

Yeah. There are teams that are focusing on robotics. There's a, or team at robotic deep Mind, a team at brain. There's teams all over the world that are focusing on robotics still, and they have to bring together so many different parts of this, of the stack. So they're [00:43:00] starting to use LLMs in order to guide.

The progress of the robot in order to make it do things that are human controllable. That's say, can, as an example sorry, the paper called Say Can, as an example. And they're also starting to do a lot more

immense amounts of simulation. And using all that data and figuring out how to do that in a proper way. So we're gonna start seeing papers much more many more papers come out with high amount simulation and then doing a little bit of symptomal using the fact that these LLMs have so much understanding of the reel.

. And that's really cool. That's happening as well. So I think that those are areas that, in terms of the multi-agent l I think you're gonna start seeing it seep into robotics more than you have because some of the other problems that they've tackled, their goal, sorry, that they've been focusing on will be easier to address.

Given the lms. It does actually

John McDonnell: Remind me of how, So I'm so old that so I had the opportunity to take Yann LeCun's class in like 2010. And I remember thinking like, [00:44:00] oh, like I know neural nets aren't cool. And he's just so obsessed with neural nets. I'm just gonna go take like a class that's like doing Pac learnability with SBS and stuff.

And that's what I did. And huge regrets. It was obviously in retrospect to dumb decision . But, young really had to fight through and Ben Gio, these people who were, who kept working on neural nets, like they, they like like the field really abandoned that direction and like other stuff got trendier and they had to just say I'm just gonna work on this anyway.

And at the end they were right. I do of wonder if a lot of this kind of multi-agent stuff or l like the trends going away, but like you can see what the potential is and some of the people who just really stick with it might end up being.

Cinjon Resnick: That's causality today. Yeah. Oh my gosh.

I think of causality as being that today that everyone has look, Facebook, just, Facebook just dropped their causality teams recently. It was part of the firing. There's, if you wanna have something that will push these things to the next level, it's having causal understanding. But we don't have real good ideas of how to bring causal [00:45:00] understanding into donuts.

A lot of work on it. Cindy, you

Bryan Davis: Take a moment. Uh, Can you define what causality is as a field of

Cinjon Resnick: research? Oh, . Just ask me to define causality. What causality is a field? So causality is a field is trying. So I'll start by saying that there are really good conferences for causality. There's also, there's also a part of it, which is in fa fairness conference, it collides somewhat with that direction.

And also I'll put a quick pitch here for Jonas Peter's work. He's amazing. Professor Zurich, who's been doing this stuff for a while. Christina Heinze-Deml Martin Arjovsky David Lopez-Paz these are really good researchers in AI. What you're looking for here is the ability to have some sort of the model to give the model a causal understanding of what it's doing.

And there's some toy examples here you could throw out. One of 'em is if you have a data set that has it has a really weird distortion around, say women mostly have blonde hair, men mostly have brunette hair. [00:46:00] And then in your test set, it's flipped around. The models will tend to do a correlation there.

And if they see something with blonde hair, they'll predict it's a male. Whereas what you really want is for it to have a causal understanding, or sorry, to have an understanding that is hair does not predict gender, or sorry, hair does not predict sex. And that ends up being, you could wait, think about this, is that there is, the causal link is broken there.

So in terms of the graphical model, it would look different if that was predictable, if, sorry if sex was predictable by by hair color. So that's the toy problem that people oftentimes use for this. And you can even more toy just by using some Gaussian models to, and then making predictions about that.

And we just really don't have good ways to scale this up, that exact toy problem. I can point to a solution. Martin's got a great one in his thesis of how to solve that one. But in terms of scaling it up to the full data sets, [00:47:00] real data, et cetera, et cetera, doesn't work. and going from A to B on this is really important if you want these things to actually have some sort of core understanding of what they're doing.

So I think that there's this general hope right now in the field that when you go from 200 billion parameters to 200, maybe 2 trillion parameters, that it just solves it. It just happens, but there's no scientific reason to think that it's true. Interesting. And so I think a lot of this has been forgotten because the work is just, it's just working so right now.

But and, sorry, but I forgotten. I just mean it's been put to the side.

Bryan Davis: Yep. So what we're talking about here is perhaps an embedded notion of how the world works, perhaps an idea of internalized physics or understandings of the kind of structure of the environment in which it's being trained.

rather than just [00:48:00] correlations about entities, which, to be fair, LLMs really seem amazing, but they are at their core really just predictions of next token.

Cinjon Resnick: Yes. And so what you're pointing at is something a little different as well. What you're pointing at is having this embedded world model and being able to condition on some world model that's different than having a learn from data, causal understanding.

There's different things we can point to and say, which is better, which is worse. The oral world will oftentimes say that actually what you want is these world models. That's not always true too. You're talking about model free versus model based here, but in terms of cause understanding, I think what you're saying is a great next step if we can get to the point where we can use these world models in a reliable way.

A plus involving physics into things a plus. Awesome. But what we ultimately want is for it to be able to causally learn from data. And so when you go about the world as the human you can learn that this mirror sits upon this desk. If the desk moves outta the way the mirror will fall.

It's not [00:49:00] clear at all that we bring into that experience any sense of this world model of physics. Instead, we have some just causal understanding that this desk is upholding this mirror.

Is

John McDonnell: it even conceptually, like this is actually, this seems like actually philosophically difficult, right? I, was it Hume who's who had that thing about how causality just can't be inferred from data? Cause you, you're really reading into your data. You're saying like, oh like I've seen this correlational structure before, but , really there's these kind of like rules underlying that and okay like the, and so some things that I see are because of the rules and some things that I see are due to rems in the environment, and I'm gonna go through and instead of decide like which things are which, and infer this like rule set, which I'm then gonna believe, but even my rule set might have problems.

So I also have to have uncertainty by my rule set , right? Is that even conceptual? Like this almost seems like a philosophical

Cinjon Resnick: problem. Like

Bryan Davis: I guess to some degree it's also empirical because I do believe there's evidence that some sort of core understanding of physics is baked into our baked into the model, like pre-baked [00:50:00] into the model.

So it's not being derived actively from interaction with the world. So it's very interesting to me to think about what elements of this are hardwired, in the circuitry and what elements of this are learned through interacting and giving that feedback from the world.

Cinjon Resnick: . Yeah.

These are the questions. And in respect to the philosophical thing is, another one you could ask is do we even need it to be causal or can you just have strong enough correlative things that actually ends up just being fine? Yeah. And there's no real issue. We don't have the answers to this.

Yeah. And because we also don't have the answers to what humans are doing with this. Yeah. I have a suspicion that if you want to get to a place where you can have reliable answers come out of a model as to what it knows, what it doesn't know. You want it to have some underlying facility for doing this.

Yeah. And perhaps that facility is not purposefully done. It's not built into as a prior in the model. Perhaps it's just emerges, but you need to be confident that it exists and we're It doesn't exist today. Yeah. [00:51:00] But the research into how it could exist, that's what I mean by the field of causality.

Yes. That's a really interesting question.

Bryan Davis: Perhaps we could wrap up with a question about a recommendation for listeners, something that you've read, or are reading or have been watching, or a game that you're interested in sharing. What would you, what's a takeaway from our conversation that you think you'd recommend to somebody?

Cinjon Resnick: I think there's cool book conditions clearly things like Diamond Age Enders Game, which we discussed earlier. Another one along those lines, which I was reminded about from a friend recently is Russo's Take on education. Emile. Those are some pretty clear ones if you wanna think about this direction.

I. Respect the causality stuff. Martin Arjovsky's thesis is fantastic. Very approachable. Yes, there's a lot of math in it, but if you want to ignore the maths, very approachable regardless. And [00:52:00] those

John McDonnell: are great. Yeah. And I also just wanted to ask what's next for you? What do you want to build research next

Cinjon Resnick: Yeah. It's it's fun. I'm right now figuring this out exactly where I'm gonna commit to, but I'm spending a lot of time thinking about childhood companion and how to use the modern tooling to really make something that can grow with a kit. If you can get, let's just imagine you have a child when they're five or six, you get them to love an experience, an interactive companion, and you grow with them over time.

I, I think this will properly ride the wave of research such that the clean ones around understanding more memory, understanding more empathy. I think that what you would have from this is the ability to form a lifelong companion. If the kid form a lifelong companion is something that can really help them a lot.

And and all the tooling is on. Its very cool.

John McDonnell: I really love that idea. I feel like I want a lifelong companion.[00:53:00]

Bryan Davis: Thanks so for being part of Pioneer Park.

Cinjon Resnick: Pleasure. Thanks. Appreciate,

John McDonnell: thanks so much.

0 Comments
Pioneer Park
Pioneer Park Podcast
Pioneer Park is a podcast that delves into the minds of the most innovative and thought-provoking individuals in the tech hub of Silicon Valley and Cerebral Valley. Hosting in-depth conversations and interviews with some of the brightest creatives and technologists, Pioneer Park provides an insightful platform for exploring the latest technological advancements, the creative processes behind them, and the impact they are having on society. Listeners can expect to hear from a diverse range of experts and thought leaders in the tech industry, as well as emerging voices that are shaping the future. Pioneer Park offers a unique perspective on the intersection of technology, art, and culture and is a must-listen for anyone interested in the future of technology and its role in shaping our world.