Monkeys Surfing on the Moon: Andrej Karpathy on Cyberpunk Reality - Plus, What are GANs?!

The following is a transcript of our YouTube video, for your reference and convenience. We hope you enjoy it, and if you do, please follow the link, and like and subscribe to our Channel. Thanks!


Hey this is John and Ben. Welcome to Artimatic's five plus five. Here at Artimatic, we're pushing the boundaries of AI and art. Today we're going to be talking about GANs (Generative Adversarial Networks) and some tweets by Andre Karpathy.


The deal with this is we have a five minute timer just so I'm not cheating here five minutes so we're going to start with Ben first



The deal with this is we have a five minute timer. So we're going to start with Ben first I'm going to press the timer and he's gonna get five minutes to talk about Gans or generative adversarial networks so ready here we go let's see go in the field of AI I really like things that reflect real life the things we can see on an everyday basis and I think generative adversarial networks are just one of these things they're a type of neural network architecture that relies on two different neural networks that try to trick one another um they've been hugely successful over the past five years and really pushed a lot of the the boundaries of creative AI forward there's some new stuff that's been happening lately that I'll talk about in some next episodes but for now yeah I'm gonna dig into Gans so there's two neural networks like I said there's a generator and a discriminator the generator is what is generating the art or whatever you're trying to create the discriminator is there to try to tell if whatever's coming into it is from a generator or if it's real data so they go back and forth of the generator producing something that starts pretty bad you can imagine an image it might just be completely static the discriminator takes in two inputs it takes in a real picture for example and the generator's input and it tries to tell which one of these is real which one's fake in the beginning it might actually have a tough time telling if a real image of say a dog or a picture of complete noise is real but they slowly go back and forth their gradients improve and eventually you end up with some pretty amazing art.




they're a type of neural network architecture that relies on two different neural networks that try to trick one another


One of the biggest and most important neural network Gans has been style Gan and Progressive Gan my favorites Progressive again it works by starting with small images of just like 16 by 16 and 16 by 16 pixels and they go back and forth until this really small image if you can imagine is close enough to a very scaled down version of that image you're going for and once it's good they pop up to a 32 by 32 keep training 64 by 64 and you can keep doing this all the way up to whatever image size your GPU can handle so basically it starts itty bitty and it just keeps getting bigger and bigger yeah exactly yeah and it's it's had some great success but it tends to top out at 128 256 512 image sizes and the 1024 resolution images have just proven to be a little too much so it's it's been needed it's been iterated on stylegan is something that's a little too much for this video but I would implore you to go check it out right and if there's enough interest we can always have like another one just on Style game so yeah explain it in five minutes yeah good luck I've got one and a half minutes left so go for it I I guess one metaphor that I really like that summarizes Gans is the the metaphor of the art forger and a art police I guess I don't know what to call it a Critic an art critic yeah yeah so this Art Forger starts off as let's say like a 10 year old and they're and they're trying to create some art they're trying to copy the Mona Lisa and this critics also a 10 year old and it doesn't really know at first it can guess and and they go back and forth as they Age and and each one of them gets better and this Art Forger keeps practicing forging the Mona Lisa and it keeps getting caught or some actually pass the the critic and it learns that this this kid learns what what is important what is what are these features of the painting that really make it the Mona Lisa and and make it passable.


it's important to note that you've got there's two things both of them are learning both the generator and the discriminator are both learning at the same time so and it's actually one of the hard Parts about training them is to get them to learn at the same time




Yeah and it's I think it's important to note that you've got there's two things both of them are learning both the generator and the discriminator are both learning at the same time so and it's actually one of the hard Parts about training them is to get them to learn at the same time like if the discriminator like if the art police gets good too fast then it'll always know and the other thing won't have a chance to learn so yeah yeah it's a it's a balancing act of making sure that one doesn't learn too quickly and surpass the other exactly uh-oh we are at one second and oh it didn't make a sound okay anyway I was waiting for it to go ding ding ding but anyway yes we are out of time so hopefully that was good for people obviously ask questions in the comments if you have them because we can always do another one that dials in more specifically on on individual things all right so I'm gonna start your timer John you ready I'm ready let's do it all right I did a timer on my end too just so I would know and I'd be able to cheat okay so I want to talk about a couple of tweets from Andre carpathy just in the past few days and I think it's really cool because it's actually talking he's talking about something becoming a reality that in the 1980s and like William Gibson's Neuromancer and other novels I'm thinking like Neil Stevenson and stuff also Snow Crash that these were like science fiction worlds of like immersive virtual reality worlds and karpathy is talking about the possibility that this could actually become real so I think it's pretty cool and worth at least thinking about a little bit all right so carpathy and by the way if you don't know he was the lead of AI at Tesla until he very recently took of sabbatical and then I don't know has decided to become unemployed and do other things for a while he's actually doing really cool stuff so you should follow his YouTube channel and his Twitter account for sure anyway he says Vision may be a high enough throughput input to the brain that is also sufficiently connected to its reward modules that AI assisted generative art May converge to wire heading probably nothing now he you know kind of backs off at the end of this but the wire heading is definitely a reference to things you know from the cyberpunk sort of genre which is basically if you imagine neural link or another company like that where you can plug electrodes directly into your brain and what that would do is give you access to the potentials like some of the really really cool stuff that's going on with Dolly and stable diffusion and we'll talk about these in future episodes obviously but that sort of world he's saying that we could plug that directly into our brain so forget about like computer monitors you know that I'm looking at right now or forget about VR glasses but like let's plug this straight into our brains if this stuff can get fast enough and generate image quickly enough we could have a world that's way cooler than the real world and it would plug straight into our brains and we wouldn't even have to like actually use our eyes for anything.


[Andrej Karpathy] says Vision may be a high enough throughput input to the brain that is also sufficiently connected to its reward modules that AI assisted generative art May converge to wire heading




So very much along the lines of what people in the 80s and 90s were talking about for cyberpunk we then get something from Ben Poole here that says you know the real reason so many of us has been addicted to research in general models for so long is nothing beats the reward of a batch of fresh samples which is very very cool Ben I think you can agree to that right oh yeah yeah getting output you didn't expect and yeah it can be it's so rewarding it's it's very cool it's like super cool and so then carpathy says nothing beats the reward of a batch of fresh samples now how would you like them it's 60 hertz in 4k in a cool pattern and personalized so the idea is how far can we push these things you know we've got Transformers we've got diffusion models we've got Gans that Ben talked about all of these things if we just keep pushing the technology both the hardware and the software we could eventually get to something where we could be 30 to 60 frames a second just like you're watching on your television set or on your computer monitor but we're getting all artificial intelligence all the time so it's not taking minutes or hours to rent out the render out these images but fractions of a second and if that happens that is a really really different world and then the final tweet from him is it would feel like tripping on a fully immersive AV or audio video or VR experience that you can't or don't want to pull yourself away from and that was the thing that really struck me with all of this because I was like that was the whole point I think is it ready player one also if I'm remembering correctly that was also like a immersive reality thing so anyway there's a lot of novels that have fictionalized this and they've talked about this as being a really cool possibility for science fiction but we are actually getting kind of close to the possibility of doing this in real time in reality instead of in science fiction now at present what you could do of course is you could spend hours rendering out in fact.


if you could shrink that down where it's doing 20 30 40 50 60 frames a second and at the same time you can be requesting what it's doing next


Like like I said you should look at Andre's YouTube channel but he'll and I'll put a link to that in the description by the way but you can spend hours like overnight rendering out 20 or 30 seconds of video and it's really really cool and then you can watch it on a monitor or you could put on VR glasses but what if we can keep compressing this though instead of hours to render this out you can do it in basically real time even at one frame a second it would be pretty cool but then if you could shrink that down where it's doing 20 30 40 50 60 frames a second and at the same time you can be requesting what it's doing next and at the same time it's not something that you're watching on a pair of goggles or watching on a monitor but it's actually in your brain there's a reasonable chance if that comes to pass that there will be a large group of people especially if we get the testing thank you especially if we get that I didn't notice that especially if you get something like the Tesla bot that does all the work for us just imagine people are going to be like well I don't have to be in reality I'm just going to plug this stuff into my head you know just like the Matrix plug it into the back of your brain and go and just you know just enjoy the ride and don't deal with reality so there's there's a there's a positive and negative a light side and a dark side to this but really really interesting oh all right so there we go that was two topics in five and five minutes I think Ben actually has some comments on what I said we're doing this real time by the way folks with like neither one of us has seen what the other person is doing so it's a real real-time reaction so what's your reaction to what I did yeah that's a lot of stuff Clifton strengthsfinder has this like 52 strengths and one of them is Future futurism and man you you just like totally intuited all of your futurism right into there well I have to say I'm channeling other people like I said it's very cyberpunk a collection of other people's ideas and you know what you believe into yeah I I I mean I I've been using Dolly and playing around with that and it's just single frames that you generate but I've been getting like addicted to just that I can only imagine if we can increase the speed of generation right well I mean just and also if we get okay even forgetting about the plug-in aspect of it if you have VR that has natural language processing so you could say like now generate me images of of you know monkeys surfing on the moon or something like that right and so it's just and so so it interprets that and then can reproduce that at close to real time but then holy crap if you can get that actually plugged directly into your brain I can't I can't imagine a drug that would be more powerful it would be a little bit scary yeah I might have to stick to BR goggles for a while it might be a little bit much yeah you're going to be the old generation it'll be like you know your kids is generational be like Dad come on get the like latest plug-in and you're like no no no no thank you back in my day we had to strap goggles to our face.


you could say now generate me images of monkeys surfing on the moon or something like that and so it interprets that and then can reproduce that at close to real time


Image generated by Dall-E 2, used under license


That'll be for sure the thing that happens so all right and then of course I mean in one of the enabling Technologies for stuff like this is Gans although sadly enough I feel like Gans have kind of taken a little bit of a back seat in the last I mean like months this stuff happens so rapidly it's absolutely crazy yeah we'll we'll see what the I mean again idea there's a lot you can do with it and expand off of it style Gans one where you have multiple but you have more than two neural networks and if face complexity and time complexity isn't as much of an issue and you can have like 20 neural networks doing this Gan style try to fake it try to catch it there's a lot you can still do with that right right and the other thing is that there's a lot of these architectures potentially could be combined with each other too so there's always that possibility as well definitely so I mean I could see a gan that uses a diffusion model as kind of the basis for the generator so so yeah I've talked about an intense amount of compute power that you would need for all of that stuff to work but anyway so yeah so we'll definitely talk about other things but please in the comments let us know you know what you're interested in as well because we're certainly happy to talk about that and we're trying to keep these reasonably short you know bite-sized things so if we don't go into enough detail just ask and we can always do another one with more detail. And check out our website artimatic.io. Absolutely and in the meantime we'll see you all next week!

--John & Ben

65 views0 comments