notes on AI state of the art, with a focus on generative and large language models. These are the "raw materials" for the https://lspace.swyx.io/ newsletter.
This repo used to be called https://github.com/sw-yx/prompt-eng, but was renamed because Prompt Engineering is Overhyped.
This Readme is just the high level overview of the space; you should see the most updates in the OTHER markdown files in this repo:
IMAGE_GEN.md
- the most developed file, with the heaviest emphasis notes on Stable Diffusion, and some on midjourney and dalle.TEXT.md
- text generation, mostly with GPT3CODE.md
- codegen models, like Copilot- stubs - very small/lightweight proto pages
AGENTS.md
- tracking "agentic AI"AUDIO.md
- tracking audio (transcription + generation)
Table of Contents
- images
- video
- img2img of famous movie scenes (lalaland)
- img2img transforming actor with ebsynth + koe_recast
- virtual fashion (karenxcheng)
- seamless tiling images
- evolution of scenes (xander)
- outpainting https://twitter.com/orbamsterdam/status/1568200010747068417?s=21&t=rliacnWOIjJMiS37s8qCCw
- webUI img2img collaboration https://twitter.com/_akhaliq/status/1563582621757898752
- image to video with rotation https://twitter.com/TomLikesRobots/status/1571096804539912192
- "prompt paint" https://twitter.com/1littlecoder/status/1572573152974372864
- audio2video animation of your face https://twitter.com/siavashg/status/1597588865665363969
- music videos
- video killed the radio star, colab This uses OpenAI's Whisper speech-to-text, allowing you to take a YouTube video & create a Stable Diffusion animation prompted by the lyrics in the YouTube video
- Stable Diffusion Videos generates videos by interpolating between prompts and audio
- direct text2video project
- img2img of famous movie scenes (lalaland)
- text-to-3d https://twitter.com/_akhaliq/status/1575541930905243652
- text products
- Jasper
- gpt3 email https://github.com/sw-yx/gpt3-email
- gpt3() in google sheet 2020, 2022 - sheet
- https://www.summari.com/ Summari helps busy people read more
- sequoia market map https://twitter.com/sonyatweetybird/status/1584580362339962880
- base10 market map https://twitter.com/letsenhance_io/status/1594826383305449491
- game assets - emad thread https://twitter.com/EMostaque/status/1591436813750906882
The more advanced GPT3 reads have been split out to https://github.com/sw-yx/prompt-eng/blob/main/GPT.md
- https://www.gwern.net/GPT-3#prompts-as-programming
- https://learnprompting.org/
- beginner
- openAI prompt tutorial https://beta.openai.com/docs/quickstart/add-some-examples
- DALLE2 prompt writing book http://dallery.gallery/wp-content/uploads/2022/07/The-DALL%C2%B7E-2-prompt-book-v1.02.pdf
- https://medium.com/nerd-for-tech/prompt-engineering-the-career-of-future-2fb93f90f117
- https://wiki.installgentoo.com/wiki/Stable_Diffusion overview
- https://www.reddit.com/r/StableDiffusion/comments/x41n87/how_to_get_images_that_dont_suck_a/
- https://mpost.io/best-100-stable-diffusion-prompts-the-most-beautiful-ai-text-to-image-prompts/
- https://andymatuschak.org/prompts/
- for nontechnical
- Intermediate
- DALLE2 asset generation + inpainting https://twitter.com/aifunhouse/status/1576202480936886273?s=20&t=5EXa1uYDPVa2SjZM-SxhCQ
- suhail journey https://twitter.com/Suhail/status/1541276314485018625?s=20&t=X2MVKQKhDR28iz3VZEEO8w
- composable diffusion - "AND" instead of "and" https://twitter.com/TomLikesRobots/status/1580293860902985728
- img2img https://andys.page/posts/how-to-draw/
- quest for photorealism https://www.reddit.com/r/StableDiffusion/comments/x9zmjd/quest_for_ultimate_photorealism_part_2_colors/
- settings tweaking https://www.reddit.com/r/StableDiffusion/comments/x3k79h/the_feeling_of_discovery_sd_is_like_a_great_proc/
- seed selection https://www.reddit.com/r/StableDiffusion/comments/x8szj9/tutorial_seed_selection_and_the_impact_on_your/
- minor parameter parameter difference study (steps, clamp_max, ETA, cutn_batches, etc) https://twitter.com/KyrickYoung/status/1500196286930292742
- Generative AI: Autocomplete for everything https://noahpinion.substack.com/p/generative-ai-autocomplete-for-everything?sd=pf
- Advanced
- Eugene Yan explanation of the Text to Image stack https://eugeneyan.com/writing/text-to-image/
- VQGAN/CLIP https://minimaxir.com/2021/08/vqgan-clip/
- 10 years of Image generation history https://zentralwerkstatt.org/blog/ten-years-of-image-synthesis
- Vision Transformers (ViT) Explained https://www.pinecone.io/learn/vision-transformers/
- negative prompting https://minimaxir.com/2022/11/stable-diffusion-negative-prompt/
- https://creator.nightcafe.studio/vqgan-clip-keyword-modifier-comparison VQGAN+CLIP Keyword Modifier Comparison We compared 126 keyword modifiers with the same prompt and initial image. These are the results.
- Google released PartiPrompts as a benchmark: https://parti.research.google/ "PartiPrompts (P2) is a rich set of over 1600 prompts in English that we release as part of this work. P2 can be used to measure model capabilities across various categories and challenge aspects."
- Video tutorials
- Misc
- StableDiffusion Discord https://discord.com/invite/stablediffusion
- https://reddit.com/r/stableDiffusion
- Akhaliq Discord: https://discord.gg/nYqfg4gnBt
- Deforum Discord https://discord.gg/upmXXsrwZc
- Lexica Discord https://discord.com/invite/bMHBjJ9wRh
- Midjourney's discord
- how to use midjourney v4 https://twitter.com/fabianstelzer/status/1588856386540417024?s=20&t=PlgLuGAEEds9HwfegVRrpg
- https://stablehorde.net/
This list will be out of date but will get you started. My live list of people to follow is at: https://twitter.com/i/lists/1585430245762441216
- Researchers/Developers
- https://twitter.com/_jasonwei
- https://twitter.com/johnowhitaker/status/1565710033463156739
- https://twitter.com/altryne/status/1564671546341425157
- https://twitter.com/SchmidhuberAI
- https://twitter.com/nearcyan
- https://twitter.com/karinanguyen_
- https://twitter.com/abhi_venigalla
- https://twitter.com/advadnoun
- https://twitter.com/polynoamial
- https://twitter.com/vovahimself
- https://twitter.com/sarahookr
- https://twitter.com/shaneguML
- https://twitter.com/MaartenSap
- https://twitter.com/ethanCaballero
- https://twitter.com/ShayneRedford
- https://twitter.com/seb_ruder
- https://twitter.com/rasbt
- https://twitter.com/wightmanr
- https://twitter.com/GaryMarcus
- https://twitter.com/ylecun
- https://twitter.com/karpathy
- https://twitter.com/pirroh
- https://twitter.com/eerac
- News/Aggregators
- https://twitter.com/ai__pub
- https://twitter.com/WeirdStableAI
- https://twitter.com/multimodalart
- https://twitter.com/LastWeekinAI
- https://twitter.com/paperswithcode
- https://twitter.com/DeepLearningAI_
- https://twitter.com/dl_weekly
- https://twitter.com/slashML
- https://twitter.com/_akhaliq
- https://twitter.com/aaditya_ai
- https://twitter.com/bentossell
- https://twitter.com/johnvmcdonnell
- Founders/Builders/VCs
- https://twitter.com/levelsio
- https://twitter.com/goodside
- https://twitter.com/c_valenzuelab
- https://twitter.com/Raza_Habib496
- https://twitter.com/sharifshameem/status/1562455690714775552
- https://twitter.com/genekogan/status/1555184488606564353
- https://twitter.com/levelsio/status/1566069427501764613?s=20&t=camPsWtMHdSSEHqWd0K7Ig
- https://twitter.com/amanrsanger
- https://twitter.com/ctjlewis
- https://twitter.com/sarahcat21
- https://twitter.com/jackclarkSF
- https://twitter.com/alexandr_wang
- https://twitter.com/rameerez
- https://twitter.com/scottastevenson
- Stability
- OpenAI
- HuggingFace
- Artists
- Other
- Bots and Apps
- Whisper
- https://huggingface.co/spaces/sensahin/YouWhisper YouWhisper converts Youtube videos to text using openai/whisper.
- https://twitter.com/jeffistyping/status/1573145140205846528 youtube whipserer
- multilingual subtitles https://twitter.com/1littlecoder/status/1573030143848722433
- video subtitles https://twitter.com/m1guelpf/status/1574929980207034375
- you can join whisper to stable diffusion for reasons https://twitter.com/fffiloni/status/1573733520765247488/photo/1
- known problems https://twitter.com/lunixbochs/status/1574848899897884672 (edge case with catastrophic failures)
- textually guided audio https://twitter.com/FelixKreuk/status/1575846953333579776
- Codegen
- pdf to structured data https://www.impira.com/blog/hey-machine-whats-my-invoice-total
- text to Human Motion diffusion https://twitter.com/GuyTvt/status/1577947409551851520
- abs: https://arxiv.org/abs/2209.14916
- project page: https://guytevet.github.io/mdm-page/
- Narrow, tedium domain usecases https://twitter.com/WillManidis/status/1584900092615528448?s=20&t=aV0Np-2Sx-zq-TQNn2y5AQ
- antihype https://twitter.com/alexandr_wang/status/1573302977418387457
- things stablediffusion struggles with https://opguides.info/posts/aiartpanic/
- New Google
- New Powerpoint
- via emad
- Appending prompts by default in UI
- DALLE: https://twitter.com/levelsio/status/1588588688115912705?s=20&t=0ojpGmH9k6MiEDyVG2I6gg
- bananadev cold boot problem https://twitter.com/erikdunteman/status/1584992679330426880?s=20&t=eUFvLqU_v10NTu65H8QMbg
- replicate.com
- banana.dev
- huggingface.co
- lambdalabs.com
- astriaAI
- cost of chatgpt - https://twitter.com/tomgoldsteincs/status/1600196981955100694
- A 3-billion parameter model can generate a token in about 6ms on an A100 GPU
- a 175b param it should take 350ms secs for an A100 GPU to print out a single word
- You would need 5 80Gb A100 GPUs just to load the model and text. ChatGPT cranks out about 15-20 words per second. If it uses A100s, that could be done on an 8-GPU server (a likely choice on Azure cloud)
- On Azure cloud, each A100 card costs about $3 an hour. That's $0.0003 per word generated.
- The model usually responds to my queries with ~30 words, which adds up to about 1 cent per query.
- If an average user has made 10 queries per day, I think it’s reasonable to estimate that ChatGPT serves ~10M queries per day.
- I estimate the cost of running ChatGPT is $100K per day, or $3M per month.
- NSFW filter https://vickiboykis.com/2022/11/18/some-notes-on-the-stable-diffusion-safety-filter/
- On "AI Art Panic" https://opguides.info/posts/aiartpanic/
- Yannick influencing OPENRAIL-M https://www.youtube.com/watch?v=W5M-dvzpzSQ
- art schools accepting AI art https://twitter.com/DaveRogenmoser/status/1597746558145265664